Understanding Visual Knowledge in the Big Model Era%3A Retrospect and Prospect

The article "Visual Knowledge in the Big Model Era: Retrospect and Prospect" explores the concept of visual knowledge as a new form of knowledge representation that encapsulates visual concepts and their relationships in a concise, comprehensive, and interpretable manner. Rooted in cognitive psychology, visual knowledge is seen as a critical component of human cognition and intelligence, and is poised to play a pivotal role in the development of machine intelligence. With the advancement of AI techniques, large AI models have emerged as powerful tools capable of extracting patterns from data and abstracting them into a vast number of numeric parameters. To prepare for the next wave of AI development, the article presents a timely review of the origins and development of visual knowledge in the pre-big model era, emphasizing the opportunities and unique role of visual knowledge in the big model era. Visual knowledge is defined as stable mental representations of visual objects and the commonalities in the inherent rules among various tasks. It is constructed from four essential components: visual concept, visual relation, visual operation, and visual reasoning. Visual concepts are defined by prototype and scope, representing the typical features and variations of a category. Visual relations encompass geometric, temporal, semantic, functional, and causal relations, which are crucial for understanding the connections between visual elements. Visual operations include composition, decomposition, replacement, combination, deformation, motion, comparison, destruction, restoration, and prediction, enabling the manipulation and analysis of visual elements. Visual reasoning involves applying these concepts, relations, and operations to interpret visual data, solve problems, and make informed decisions. The article also discusses the challenges and limitations of large AI models, including their opacity, high data and computational demands, and susceptibility to generating nonsensical content. It argues that visual knowledge can help alleviate these issues by providing a more interpretable and expressive representation of visual concepts. The article highlights the importance of visual knowledge in enhancing the trust, interpretability, and accountability of AI systems, and suggests that future research should focus on developing techniques to build visual knowledge using large-scale statistical learning. The study also identifies promising directions for the development of more powerful AI systems that leverage the synergies between visual knowledge and big models to overcome their individual weaknesses. The article concludes that visual knowledge has the potential to play a pivotal role in the next generation of AI, providing a framework for understanding and interacting with the visual world in a more dynamic and meaningful way.The article "Visual Knowledge in the Big Model Era: Retrospect and Prospect" explores the concept of visual knowledge as a new form of knowledge representation that encapsulates visual concepts and their relationships in a concise, comprehensive, and interpretable manner. Rooted in cognitive psychology, visual knowledge is seen as a critical component of human cognition and intelligence, and is poised to play a pivotal role in the development of machine intelligence. With the advancement of AI techniques, large AI models have emerged as powerful tools capable of extracting patterns from data and abstracting them into a vast number of numeric parameters. To prepare for the next wave of AI development, the article presents a timely review of the origins and development of visual knowledge in the pre-big model era, emphasizing the opportunities and unique role of visual knowledge in the big model era. Visual knowledge is defined as stable mental representations of visual objects and the commonalities in the inherent rules among various tasks. It is constructed from four essential components: visual concept, visual relation, visual operation, and visual reasoning. Visual concepts are defined by prototype and scope, representing the typical features and variations of a category. Visual relations encompass geometric, temporal, semantic, functional, and causal relations, which are crucial for understanding the connections between visual elements. Visual operations include composition, decomposition, replacement, combination, deformation, motion, comparison, destruction, restoration, and prediction, enabling the manipulation and analysis of visual elements. Visual reasoning involves applying these concepts, relations, and operations to interpret visual data, solve problems, and make informed decisions. The article also discusses the challenges and limitations of large AI models, including their opacity, high data and computational demands, and susceptibility to generating nonsensical content. It argues that visual knowledge can help alleviate these issues by providing a more interpretable and expressive representation of visual concepts. The article highlights the importance of visual knowledge in enhancing the trust, interpretability, and accountability of AI systems, and suggests that future research should focus on developing techniques to build visual knowledge using large-scale statistical learning. The study also identifies promising directions for the development of more powerful AI systems that leverage the synergies between visual knowledge and big models to overcome their individual weaknesses. The article concludes that visual knowledge has the potential to play a pivotal role in the next generation of AI, providing a framework for understanding and interacting with the visual world in a more dynamic and meaningful way.

Visual Knowledge in the Big Model Era: Retrospect and Prospect

5 Apr 2024 | Wenguan WANG, Yi YANG, Yunhe PAN