This article presents a bionic visual-audio photodetector (VAPD) that integrates visual and acoustic signal detection with in-sensor perception and preprocessing capabilities. The VAPD, based on a vertically stacked graphene-germanium (Gra-Ge) hybrid field-effect phototransistor, can capture both light and sound waves, mimicking the human retina and auditory system. By controlling the gate voltage, the device exhibits tunable positive, negative, and zero photoresponses, enabling versatile functions such as visual feature extraction, object classification, and sound wave manipulation. The VAPD's bidirectional photocurrents (PPC and NPC) are achieved through the interplay of source-drain and gate leakage currents, mimicking synaptic behaviors. The device demonstrates high responsivity, fast response speeds, and stable performance, making it suitable for real-time signal processing and intelligent hardware systems. The VAPD's potential in constructing convolutional neural networks for target recognition is also explored, showcasing its promise in practical applications.This article presents a bionic visual-audio photodetector (VAPD) that integrates visual and acoustic signal detection with in-sensor perception and preprocessing capabilities. The VAPD, based on a vertically stacked graphene-germanium (Gra-Ge) hybrid field-effect phototransistor, can capture both light and sound waves, mimicking the human retina and auditory system. By controlling the gate voltage, the device exhibits tunable positive, negative, and zero photoresponses, enabling versatile functions such as visual feature extraction, object classification, and sound wave manipulation. The VAPD's bidirectional photocurrents (PPC and NPC) are achieved through the interplay of source-drain and gate leakage currents, mimicking synaptic behaviors. The device demonstrates high responsivity, fast response speeds, and stable performance, making it suitable for real-time signal processing and intelligent hardware systems. The VAPD's potential in constructing convolutional neural networks for target recognition is also explored, showcasing its promise in practical applications.