Homomorphic Evaluation of the AES Circuit

Homomorphic Evaluation of the AES Circuit

2012 | Craig Gentry, Shai Halevi, and Nigel P. Smart
This paper presents a working implementation of leveled homomorphic encryption (without bootstrapping) that can evaluate the AES-128 circuit in three different ways. One variant takes under 36 hours to evaluate an entire AES encryption operation, using NTL (over GMP) as the underlying software platform, and running on a large-memory machine. Using SIMD techniques, it processes over 54 blocks in each evaluation, yielding an amortized rate of just under 40 minutes per block. Another implementation takes just over two and a half days to evaluate the AES operation, but can process 720 blocks in each evaluation, yielding an amortized rate of just over five minutes per block. A third implementation, theoretically more efficient, is less competitive in practice. The authors develop both AES-specific optimizations and several "generic" tools for FHE evaluation. These include a variant of the Brakerski-Vaikuntanathan key-switching technique that does not require reducing the norm of the ciphertext vector, and a method of implementing the Brakerski-Gentry-Vaikuntanathan modulus-switching transformation on ciphertexts in CRT representation. The implementation is based on the BGV cryptosystem, which is one of three variants that seem the most likely to yield "somewhat practical" homomorphic encryption. The authors introduce many new optimizations, including techniques to reduce the number of FFTs and CRTs, and to reduce the number of modulus switching and key switching operations. They also describe variants of key switching and modulus switching that can be implemented while keeping almost all the polynomials in evaluation representation. The implementation uses the NTL C++ library running over GMP, on a machine with Intel Xeon CPUs running at 2.0 GHz and 256GB of RAM. The main limiting factor in the implementation is memory. The implementation can compute a single block AES encryption in just under two days, which is roughly two orders of magnitude faster than the Gentry-Halevi implementation. The computation is performed on ciphertexts that can hold 864 plaintext slots each, allowing for 54 AES operations in parallel, resulting in an amortized time per block of roughly forty minutes. The authors also describe a byte-sliced and bit-sliced implementation, which can consume less levels per round function. The byte-sliced implementation uses sixteen distinct ciphertexts to represent a single state matrix, while the bit-sliced implementation uses 128 distinct ciphertexts (one per bit of the state matrix). The main issue in both implementations is the SubBytes operation, which is implemented using a "depth-16" circuit of Boyar and Peralta, consuming four levels. The rest of the round function can be implemented with a shallow circuit in terms of number of multiplication gates.This paper presents a working implementation of leveled homomorphic encryption (without bootstrapping) that can evaluate the AES-128 circuit in three different ways. One variant takes under 36 hours to evaluate an entire AES encryption operation, using NTL (over GMP) as the underlying software platform, and running on a large-memory machine. Using SIMD techniques, it processes over 54 blocks in each evaluation, yielding an amortized rate of just under 40 minutes per block. Another implementation takes just over two and a half days to evaluate the AES operation, but can process 720 blocks in each evaluation, yielding an amortized rate of just over five minutes per block. A third implementation, theoretically more efficient, is less competitive in practice. The authors develop both AES-specific optimizations and several "generic" tools for FHE evaluation. These include a variant of the Brakerski-Vaikuntanathan key-switching technique that does not require reducing the norm of the ciphertext vector, and a method of implementing the Brakerski-Gentry-Vaikuntanathan modulus-switching transformation on ciphertexts in CRT representation. The implementation is based on the BGV cryptosystem, which is one of three variants that seem the most likely to yield "somewhat practical" homomorphic encryption. The authors introduce many new optimizations, including techniques to reduce the number of FFTs and CRTs, and to reduce the number of modulus switching and key switching operations. They also describe variants of key switching and modulus switching that can be implemented while keeping almost all the polynomials in evaluation representation. The implementation uses the NTL C++ library running over GMP, on a machine with Intel Xeon CPUs running at 2.0 GHz and 256GB of RAM. The main limiting factor in the implementation is memory. The implementation can compute a single block AES encryption in just under two days, which is roughly two orders of magnitude faster than the Gentry-Halevi implementation. The computation is performed on ciphertexts that can hold 864 plaintext slots each, allowing for 54 AES operations in parallel, resulting in an amortized time per block of roughly forty minutes. The authors also describe a byte-sliced and bit-sliced implementation, which can consume less levels per round function. The byte-sliced implementation uses sixteen distinct ciphertexts to represent a single state matrix, while the bit-sliced implementation uses 128 distinct ciphertexts (one per bit of the state matrix). The main issue in both implementations is the SubBytes operation, which is implemented using a "depth-16" circuit of Boyar and Peralta, consuming four levels. The rest of the round function can be implemented with a shallow circuit in terms of number of multiplication gates.
Reach us at info@futurestudyspace.com
Understanding Homomorphic Evaluation of the AES Circuit