[slides] Multimodal fusion for multimedia analysis%3A a survey

This survey provides an overview of multimodal fusion strategies used in multimedia analysis, focusing on the integration of multiple modalities to enhance various analysis tasks. The paper discusses the existing literature on multimodal fusion, categorizing it into feature-level, decision-level, and hybrid fusion methods. It highlights the advantages and limitations of each approach, such as the use of correlation and independence, confidence levels, contextual information, and synchronization between different modalities. The paper also addresses open issues and future research directions in the field of multimodal fusion. Key topics include the benefits and costs of multimodal fusion, the challenges of asynchrony and varying processing times, and the importance of selecting the appropriate modality for specific tasks. The survey covers a range of multimedia analysis tasks, including semantic concept detection, audio-visual speaker detection, human tracking, and event detection, and provides a detailed analysis of different fusion methods, including rule-based, classification-based, and estimation-based approaches.This survey provides an overview of multimodal fusion strategies used in multimedia analysis, focusing on the integration of multiple modalities to enhance various analysis tasks. The paper discusses the existing literature on multimodal fusion, categorizing it into feature-level, decision-level, and hybrid fusion methods. It highlights the advantages and limitations of each approach, such as the use of correlation and independence, confidence levels, contextual information, and synchronization between different modalities. The paper also addresses open issues and future research directions in the field of multimodal fusion. Key topics include the benefits and costs of multimodal fusion, the challenges of asynchrony and varying processing times, and the importance of selecting the appropriate modality for specific tasks. The survey covers a range of multimedia analysis tasks, including semantic concept detection, audio-visual speaker detection, human tracking, and event detection, and provides a detailed analysis of different fusion methods, including rule-based, classification-based, and estimation-based approaches.

Multimodal fusion for multimedia analysis: a survey

Received: 8 January 2009 / Accepted: 9 March 2010 / Published online: 4 April 2010 | Pradeep K. Atrey · M. Anwar Hossain · Abdulmotaleb El Saddik · Mohan S. Kankanhalli