Multimodal deep learning using on-chip diffractive optics with in situ training capability

Multimodal deep learning using on-chip diffractive optics with in situ training capability

23 July 2024 | Junwei Cheng, Chaoran Huang, Jialong Zhang, Bo Wu, Wenkai Zhang, Xinyu Liu, Jiahui Zhang, Yiyi Tang, Hailong Zhou, Qiming Zhang, Min Gu, Jianji Dong & Xinliang Zhang
This article presents a trainable diffractive optical neural network (TDONN) chip that enables multimodal deep learning with in situ training capability. The TDONN chip is based on on-chip diffractive optics with massive tunable elements, allowing it to process and classify data from multiple modalities, including vision, audio, and touch. The chip includes one input layer, five hidden layers, and one output layer, and only one forward propagation is required to obtain inference results without frequent optical-electrical conversion. The TDONN chip uses a customized stochastic gradient descent algorithm and a drop-out mechanism to enable in situ training and fast convergence in the optical domain. The chip achieves a potential throughput of 217.6 tera-operations per second (TOPS) with high computing density (447.7 TOPS/mm²), high system-level energy efficiency (7.28 TOPS/W), and low optical latency (30.2 ps). The TDONN chip successfully implements four-class classification in different modalities (vision, audio, and touch) with 85.7% accuracy on multimodal test sets. The TDONN chip has a compact footprint, is fabricated using standard complementary metal oxide semiconductor (CMOS) processes, and is capable of low-cost production. The chip's design allows for high scalability, with the ability to flexibly extend the size of input, hidden, and output layers and the number of hidden layers according to the requirements of different multimodal tasks. The TDONN chip's performance is comparable to digital computers, and it offers a promising solution for low-power AI large models using photonic technology. The work opens up a new avenue for multimodal deep learning with integrated photonic processors.This article presents a trainable diffractive optical neural network (TDONN) chip that enables multimodal deep learning with in situ training capability. The TDONN chip is based on on-chip diffractive optics with massive tunable elements, allowing it to process and classify data from multiple modalities, including vision, audio, and touch. The chip includes one input layer, five hidden layers, and one output layer, and only one forward propagation is required to obtain inference results without frequent optical-electrical conversion. The TDONN chip uses a customized stochastic gradient descent algorithm and a drop-out mechanism to enable in situ training and fast convergence in the optical domain. The chip achieves a potential throughput of 217.6 tera-operations per second (TOPS) with high computing density (447.7 TOPS/mm²), high system-level energy efficiency (7.28 TOPS/W), and low optical latency (30.2 ps). The TDONN chip successfully implements four-class classification in different modalities (vision, audio, and touch) with 85.7% accuracy on multimodal test sets. The TDONN chip has a compact footprint, is fabricated using standard complementary metal oxide semiconductor (CMOS) processes, and is capable of low-cost production. The chip's design allows for high scalability, with the ability to flexibly extend the size of input, hidden, and output layers and the number of hidden layers according to the requirements of different multimodal tasks. The TDONN chip's performance is comparable to digital computers, and it offers a promising solution for low-power AI large models using photonic technology. The work opens up a new avenue for multimodal deep learning with integrated photonic processors.
Reach us at info@futurestudyspace.com
Understanding Multimodal deep learning using on-chip diffractive optics with in situ training capability