Multimodal deep learning using on-chip diffractive optics with in situ training capability

Multimodal deep learning using on-chip diffractive optics with in situ training capability

23 July 2024 | Junwei Cheng, Chaoran Huang, Jialong Zhang, Bo Wu, Wenkai Zhang, Xinyu Liu, Jiahui Zhang, Yiyi Tang, Hailong Zhou, Qiming Zhang, Min Gu, Jianji Dong, Xinliang Zhang
This paper presents a trainable diffractive optical neural network (TDONN) chip designed for multimodal deep learning, addressing the limitations of existing photonic neuromorphic processors that can only handle single data modalities. The TDONN chip, fabricated on a silicon-on-insulator (SOI) platform, includes one input layer, five hidden layers, and one output layer, enabling in situ training and fast convergence in the optical domain. The chip achieves a potential throughput of 217.6 tera-operations per second (TOPS), high computing density (447.7 TOPS/mm²), high energy efficiency (7.28 TOPS/W), and low optical latency (30.2 ps). The TDONN chip successfully implemented four-class classification tasks in different modalities (vision, audio, and touch) with an average accuracy of 85.7%. The work opens new avenues for multimodal deep learning with integrated photonic processors, providing a potential solution for low-power AI large models using photonic technology.This paper presents a trainable diffractive optical neural network (TDONN) chip designed for multimodal deep learning, addressing the limitations of existing photonic neuromorphic processors that can only handle single data modalities. The TDONN chip, fabricated on a silicon-on-insulator (SOI) platform, includes one input layer, five hidden layers, and one output layer, enabling in situ training and fast convergence in the optical domain. The chip achieves a potential throughput of 217.6 tera-operations per second (TOPS), high computing density (447.7 TOPS/mm²), high energy efficiency (7.28 TOPS/W), and low optical latency (30.2 ps). The TDONN chip successfully implemented four-class classification tasks in different modalities (vision, audio, and touch) with an average accuracy of 85.7%. The work opens new avenues for multimodal deep learning with integrated photonic processors, providing a potential solution for low-power AI large models using photonic technology.
Reach us at info@study.space
Understanding Multimodal deep learning using on-chip diffractive optics with in situ training capability