IsoBench is a benchmark dataset designed to evaluate multimodal foundation models on isomorphic representations of problems across four domains: math, science, algorithms, and games. Each problem is presented in multiple isomorphic forms, including visual, textual, and mathematical representations. The dataset provides fine-grained feedback to identify performance gaps caused by input representation form. Across various models, it is observed that models consistently prefer textual representations. For example, Claude-3 Opus performs 28.7 points worse when given images instead of text, while GPT-4 Turbo is 18.7 points worse and Gemini Pro is 14.9 points worse. Two prompting techniques, IsoCombination and IsoScratchPad, are introduced to improve model performance by combining and translating different input representations. These techniques enhance model performance, with IsoCB improving graph algorithm problem performance by up to 9.4 points and IsoSP improving science problem performance by up to 14.4 points. The study highlights a bias in favor of textual representations, contrary to human cognition, and shows that multimodal models often perform differently on isomorphic inputs. IsoBench includes over 1,887 samples across diverse domains, and evaluates models on tasks such as math, science, algorithms, and chess games. The results show that models perform better on textual representations, and that IsoCombination and IsoScratchPad improve performance on isomorphic tasks. The study also identifies performance discrepancies between different input modalities and suggests that multimodal fusion may not be sufficient for certain tasks. Overall, IsoBench provides a comprehensive evaluation of multimodal foundation models and highlights the importance of input representation in model performance.IsoBench is a benchmark dataset designed to evaluate multimodal foundation models on isomorphic representations of problems across four domains: math, science, algorithms, and games. Each problem is presented in multiple isomorphic forms, including visual, textual, and mathematical representations. The dataset provides fine-grained feedback to identify performance gaps caused by input representation form. Across various models, it is observed that models consistently prefer textual representations. For example, Claude-3 Opus performs 28.7 points worse when given images instead of text, while GPT-4 Turbo is 18.7 points worse and Gemini Pro is 14.9 points worse. Two prompting techniques, IsoCombination and IsoScratchPad, are introduced to improve model performance by combining and translating different input representations. These techniques enhance model performance, with IsoCB improving graph algorithm problem performance by up to 9.4 points and IsoSP improving science problem performance by up to 14.4 points. The study highlights a bias in favor of textual representations, contrary to human cognition, and shows that multimodal models often perform differently on isomorphic inputs. IsoBench includes over 1,887 samples across diverse domains, and evaluates models on tasks such as math, science, algorithms, and chess games. The results show that models perform better on textual representations, and that IsoCombination and IsoScratchPad improve performance on isomorphic tasks. The study also identifies performance discrepancies between different input modalities and suggests that multimodal fusion may not be sufficient for certain tasks. Overall, IsoBench provides a comprehensive evaluation of multimodal foundation models and highlights the importance of input representation in model performance.