6 Jul 2024 | Antonia Wüst, Wolfgang Stammer, Quentin Delfosse, Devendra Singh Dhami, Kristian Kersting
The paper introduces Pix2Code, a neuro-symbolic framework designed to learn generalizable, inspectable, and revisable visual concepts from images. Pix2Code combines neural representations with program synthesis to integrate the strengths of both approaches. It extracts symbolic object representations from images and synthesizes λ-calculus programs to classify and interpret visual concepts. The framework is evaluated on challenging datasets like Kandinsky Patterns and CURI, demonstrating its ability to generalize to novel concept combinations and entity generalization. Pix2Code's representations are human-interpretable and can be easily revised, addressing limitations in existing concept learning benchmarks. The paper also discusses the interpretability and revisability of Pix2Code's concept representations, showing that they can be translated into natural language statements and revised to correct suboptimal behavior. Overall, Pix2Code offers a competitive alternative to purely neural approaches in visual concept learning, with advantages in generalizability, interpretability, and revisability.The paper introduces Pix2Code, a neuro-symbolic framework designed to learn generalizable, inspectable, and revisable visual concepts from images. Pix2Code combines neural representations with program synthesis to integrate the strengths of both approaches. It extracts symbolic object representations from images and synthesizes λ-calculus programs to classify and interpret visual concepts. The framework is evaluated on challenging datasets like Kandinsky Patterns and CURI, demonstrating its ability to generalize to novel concept combinations and entity generalization. Pix2Code's representations are human-interpretable and can be easily revised, addressing limitations in existing concept learning benchmarks. The paper also discusses the interpretability and revisability of Pix2Code's concept representations, showing that they can be translated into natural language statements and revised to correct suboptimal behavior. Overall, Pix2Code offers a competitive alternative to purely neural approaches in visual concept learning, with advantages in generalizability, interpretability, and revisability.