[slides and audio] Pix2Code%3A Learning to Compose Neural Visual Concepts as Programs

Pix2Code is a neuro-symbolic framework that combines neural and program synthesis to learn generalizable, interpretable, and revisable visual concepts. The framework extracts symbolic object representations from images and synthesizes relational concepts as λ-calculus programs. It evaluates its ability to identify compositional visual concepts that generalize to novel data and concept configurations. Unlike neural approaches, Pix2Code's representations remain human interpretable and can be revised for improved performance. The framework uses both neural and program synthesis components to integrate the power of neural representations with the generalizability and readability of program representations. During inference, Pix2Code extracts symbolic object representations from raw image inputs and synthesizes λ-calculus programs that serve as concept classifiers. Pix2Code learns to abstract visual concepts by training both a generative program library and a program recognition model based on wake-sleep learning. The framework is evaluated on the Kandinsky Patterns and CURI datasets, showing its ability to generalize to unseen concept combinations and entity generalization. Pix2Code's concept representations are interpretable and can be revised via human guidance. The framework is also tested on real-world data, demonstrating its ability to abstract concepts from real-world images. Overall, Pix2Code provides a competitive alternative to purely neural approaches in terms of generalizability, interpretability, and revisability.Pix2Code is a neuro-symbolic framework that combines neural and program synthesis to learn generalizable, interpretable, and revisable visual concepts. The framework extracts symbolic object representations from images and synthesizes relational concepts as λ-calculus programs. It evaluates its ability to identify compositional visual concepts that generalize to novel data and concept configurations. Unlike neural approaches, Pix2Code's representations remain human interpretable and can be revised for improved performance. The framework uses both neural and program synthesis components to integrate the power of neural representations with the generalizability and readability of program representations. During inference, Pix2Code extracts symbolic object representations from raw image inputs and synthesizes λ-calculus programs that serve as concept classifiers. Pix2Code learns to abstract visual concepts by training both a generative program library and a program recognition model based on wake-sleep learning. The framework is evaluated on the Kandinsky Patterns and CURI datasets, showing its ability to generalize to unseen concept combinations and entity generalization. Pix2Code's concept representations are interpretable and can be revised via human guidance. The framework is also tested on real-world data, demonstrating its ability to abstract concepts from real-world images. Overall, Pix2Code provides a competitive alternative to purely neural approaches in terms of generalizability, interpretability, and revisability.

Pix2Code: Learning to Compose Neural Visual Concepts as Programs

2024 | Antonia Wüst, Wolfgang Stammer, Quentin Delfosse, Devendra Singh Dhami, Kristian Kersting