2012 | Craig Gentry, Shai Halevi, and Nigel P. Smart
The paper presents a working implementation of leveled homomorphic encryption (without bootstrapping) that can evaluate the AES-128 circuit in three different ways. The first variant takes under 36 hours to evaluate an entire AES encryption operation, using NTL (over GMP) as the underlying software platform and running on a large-memory machine. Using SIMD techniques, it can process over 54 blocks in each evaluation, yielding an amortized rate of just under 40 minutes per block. Another implementation takes just over two and a half days to evaluate the AES operation but can process 720 blocks in each evaluation, yielding an amortized rate of just over five minutes per block. The authors also detail a third implementation, which theoretically could yield even better amortized complexity but is less competitive in practice.
The implementation is based on a variant of the BGV cryptosystem, using techniques from Smart and Vercauteren and Gentry, Halevi, and Smart. The paper introduces several optimizations, including a new variant of key switching that does not require reducing the norm of the ciphertext vector and a method for implementing the Brakerski-Gentry-Vaikuntanathan modulus-switching transformation on ciphertexts in CRT representation.
The AES circuit is chosen as a benchmark due to its practical relevance and the regular, algebraic structure that makes it amenable to parallelism and optimizations. The paper discusses the benefits of using homomorphic AES decryption for data encrypted under AES, transforming it into FHE-encrypted data for further computation.
The implementation uses a machine with Intel Xeon CPUs running at 2.0 GHz, 18MB cache, and 256GB RAM. The main limiting factor in the implementation is memory, with the AES encryption taking just under two days to compute a single block using an implementation that minimizes memory usage. The computation was performed on ciphertexts that could hold 864 plaintext slots, allowing for 54 AES operations in parallel, giving an amortized time per block of roughly 40 minutes. A second implementation, requiring more memory, completed an AES operation in around five days, with ciphertexts capable of holding 720 different slots, yielding an amortized time per block of roughly five minutes.
The paper also discusses general-purpose optimizations, such as reducing the number of FFTs and CRTs by keeping polynomials in evaluation representation whenever possible, and techniques for dynamic noise management and randomized multiplication by constants to minimize noise increase.The paper presents a working implementation of leveled homomorphic encryption (without bootstrapping) that can evaluate the AES-128 circuit in three different ways. The first variant takes under 36 hours to evaluate an entire AES encryption operation, using NTL (over GMP) as the underlying software platform and running on a large-memory machine. Using SIMD techniques, it can process over 54 blocks in each evaluation, yielding an amortized rate of just under 40 minutes per block. Another implementation takes just over two and a half days to evaluate the AES operation but can process 720 blocks in each evaluation, yielding an amortized rate of just over five minutes per block. The authors also detail a third implementation, which theoretically could yield even better amortized complexity but is less competitive in practice.
The implementation is based on a variant of the BGV cryptosystem, using techniques from Smart and Vercauteren and Gentry, Halevi, and Smart. The paper introduces several optimizations, including a new variant of key switching that does not require reducing the norm of the ciphertext vector and a method for implementing the Brakerski-Gentry-Vaikuntanathan modulus-switching transformation on ciphertexts in CRT representation.
The AES circuit is chosen as a benchmark due to its practical relevance and the regular, algebraic structure that makes it amenable to parallelism and optimizations. The paper discusses the benefits of using homomorphic AES decryption for data encrypted under AES, transforming it into FHE-encrypted data for further computation.
The implementation uses a machine with Intel Xeon CPUs running at 2.0 GHz, 18MB cache, and 256GB RAM. The main limiting factor in the implementation is memory, with the AES encryption taking just under two days to compute a single block using an implementation that minimizes memory usage. The computation was performed on ciphertexts that could hold 864 plaintext slots, allowing for 54 AES operations in parallel, giving an amortized time per block of roughly 40 minutes. A second implementation, requiring more memory, completed an AES operation in around five days, with ciphertexts capable of holding 720 different slots, yielding an amortized time per block of roughly five minutes.
The paper also discusses general-purpose optimizations, such as reducing the number of FFTs and CRTs by keeping polynomials in evaluation representation whenever possible, and techniques for dynamic noise management and randomized multiplication by constants to minimize noise increase.