Generating Automatic Feedback on UI Mockups with Large Language Models

Generating Automatic Feedback on UI Mockups with Large Language Models

July 17, 2017 | Peitong Duan, Jeremy Warner, Yang Li, Bjoern Hartmann
This paper presents a Figma plugin that uses GPT-4 to automatically generate feedback on UI mockups based on design guidelines. The plugin allows designers to evaluate their UIs iteratively, receiving text-based feedback that helps them revise their designs. The feedback is generated by querying GPT-4 with a JSON representation of the UI and the text of the guidelines. The LLM then returns a set of detected guideline violations, which are phrased as constructive suggestions for improving the UI. The plugin also allows designers to provide feedback on each generated suggestion, which is integrated into the model for future evaluations. The study evaluated the performance of GPT-4 in conducting heuristic evaluations on 51 UIs, comparing its feedback with that of human experts. The results showed that GPT-4 was generally accurate and helpful in identifying issues in poor UI designs, but its performance became worse after iterations of edits that improved the design. The feedback varied depending on the guideline, with GPT-4 performing well on straightforward checks and worse when the JSON differed from what was visually or semantically depicted in the UI. Despite some inaccuracies, most participants found the tool useful for their design practices, as it was able to catch subtle errors, improve the UI's text, and reason with the UI's semantics. They stated that the errors made by GPT-4 are not dangerous, as there is a human in the loop to catch them, and suggested various use cases for the tool. The contributions of this work include a Figma plugin that uses GPT-4 to automate heuristic evaluation of UI mockups with arbitrary design guidelines, an investigation of GPT-4’s capability to automate heuristic evaluations through a study where three human participants rated the accuracy and helpfulness of LLM-generated design suggestions for 51 UIs, a comparison of the violations found by this tool with those identified by human experts, and an exploration of how such a tool can fit into existing design practice via a study where 12 design experts used this tool to iteratively refine UIs, assessed the LLM-generated feedback, and discussed their experiences working with the plugin.This paper presents a Figma plugin that uses GPT-4 to automatically generate feedback on UI mockups based on design guidelines. The plugin allows designers to evaluate their UIs iteratively, receiving text-based feedback that helps them revise their designs. The feedback is generated by querying GPT-4 with a JSON representation of the UI and the text of the guidelines. The LLM then returns a set of detected guideline violations, which are phrased as constructive suggestions for improving the UI. The plugin also allows designers to provide feedback on each generated suggestion, which is integrated into the model for future evaluations. The study evaluated the performance of GPT-4 in conducting heuristic evaluations on 51 UIs, comparing its feedback with that of human experts. The results showed that GPT-4 was generally accurate and helpful in identifying issues in poor UI designs, but its performance became worse after iterations of edits that improved the design. The feedback varied depending on the guideline, with GPT-4 performing well on straightforward checks and worse when the JSON differed from what was visually or semantically depicted in the UI. Despite some inaccuracies, most participants found the tool useful for their design practices, as it was able to catch subtle errors, improve the UI's text, and reason with the UI's semantics. They stated that the errors made by GPT-4 are not dangerous, as there is a human in the loop to catch them, and suggested various use cases for the tool. The contributions of this work include a Figma plugin that uses GPT-4 to automate heuristic evaluation of UI mockups with arbitrary design guidelines, an investigation of GPT-4’s capability to automate heuristic evaluations through a study where three human participants rated the accuracy and helpfulness of LLM-generated design suggestions for 51 UIs, a comparison of the violations found by this tool with those identified by human experts, and an exploration of how such a tool can fit into existing design practice via a study where 12 design experts used this tool to iteratively refine UIs, assessed the LLM-generated feedback, and discussed their experiences working with the plugin.
Reach us at info@study.space
[slides] Generating Automatic Feedback on UI Mockups with Large Language Models | StudySpace