20 May 2024 | Abdulwahab Alazeb, Bisma Riaz Chughtai, Naif Al Mudawi, Yahya AlQahtani, Mohammed Alonazi, Hanan Aljuaid, Ahmad Jalal and Hui Liu
The paper presents an innovative remote intelligent perception system for multi-object detection and scene recognition. The system leverages advanced vision technology to address challenges such as semantic understanding, occlusion, and varying illumination. The methodology involves several key steps:
1. **Preprocessing**: Kernel convolution is applied to scene data to enhance image quality.
2. **Semantic Segmentation**: UNet is used to segment objects in the scene.
3. **Feature Extraction**: Discrete Wavelet Transform (DWT), Sobel and Laplacian operators, and Local Binary Pattern (LBP) are employed to extract features from the segmented data.
4. **Object Recognition**: A deep belief network (DBN) is used to recognize the objects based on the extracted features.
5. **Object-to-Object Relation Analysis**: The relationships between recognized objects are analyzed to understand their interactions.
6. **Scene Recognition**: An AlexNet neural network is used to assign labels to the scene based on the recognized objects.
The system's performance was validated using three standard datasets: PASCALVOC-12, Cityscapes, and Caltech 101. The accuracy on PASCALVOC-12 exceeds 96%, on Cityscapes is 95.90%, and on Caltech 101 is 92.2%. The study discusses the contributions, limitations, and future research directions, highlighting the system's effectiveness and potential for real-world applications.The paper presents an innovative remote intelligent perception system for multi-object detection and scene recognition. The system leverages advanced vision technology to address challenges such as semantic understanding, occlusion, and varying illumination. The methodology involves several key steps:
1. **Preprocessing**: Kernel convolution is applied to scene data to enhance image quality.
2. **Semantic Segmentation**: UNet is used to segment objects in the scene.
3. **Feature Extraction**: Discrete Wavelet Transform (DWT), Sobel and Laplacian operators, and Local Binary Pattern (LBP) are employed to extract features from the segmented data.
4. **Object Recognition**: A deep belief network (DBN) is used to recognize the objects based on the extracted features.
5. **Object-to-Object Relation Analysis**: The relationships between recognized objects are analyzed to understand their interactions.
6. **Scene Recognition**: An AlexNet neural network is used to assign labels to the scene based on the recognized objects.
The system's performance was validated using three standard datasets: PASCALVOC-12, Cityscapes, and Caltech 101. The accuracy on PASCALVOC-12 exceeds 96%, on Cityscapes is 95.90%, and on Caltech 101 is 92.2%. The study discusses the contributions, limitations, and future research directions, highlighting the system's effectiveness and potential for real-world applications.