June 5, 2024 | Akshay Badagabettu, Sai Sravan Yarlagadda, Amir Barati Farimani
Query2CAD is a novel framework that generates CAD models using natural language queries. The framework uses a large language model (LLM) to generate executable CAD macros and incorporates self-refinement loops to improve the generated models. It operates without supervised data or additional training, using the LLM as both a generator and a refiner. The refiner leverages feedback from the BLIP2 model and incorporates human-in-the-loop feedback to address false negatives. A dataset of CAD design operations was developed and used to evaluate the framework. When using GPT-4 Turbo, the success rate on the first attempt was 53.6%, which increased to 76.7% after subsequent refinements. The framework was tested on a dataset with 57 user queries, achieving 95.23% accuracy on easy questions, 70% on medium, and 41% on hard. The system uses a refinement loop to iteratively improve the generated models based on feedback. The results show that the first refinement significantly improves the success rate, while subsequent refinements have less impact. The system also incorporates human feedback to enhance the accuracy of the generated models. The framework is open-sourced, providing access to the data, model, and code. The system is designed to be user-friendly, allowing non-experts to interact with it using natural language. The architecture includes a loop for error refinement and model refinement, with the latter using BLIP2 for caption generation and human feedback. The system's performance is evaluated using the Visual Question Answering Score (VQAScore) to determine the alignment between the generated model and the user query. The results demonstrate the effectiveness of the framework in generating accurate CAD models based on natural language queries.Query2CAD is a novel framework that generates CAD models using natural language queries. The framework uses a large language model (LLM) to generate executable CAD macros and incorporates self-refinement loops to improve the generated models. It operates without supervised data or additional training, using the LLM as both a generator and a refiner. The refiner leverages feedback from the BLIP2 model and incorporates human-in-the-loop feedback to address false negatives. A dataset of CAD design operations was developed and used to evaluate the framework. When using GPT-4 Turbo, the success rate on the first attempt was 53.6%, which increased to 76.7% after subsequent refinements. The framework was tested on a dataset with 57 user queries, achieving 95.23% accuracy on easy questions, 70% on medium, and 41% on hard. The system uses a refinement loop to iteratively improve the generated models based on feedback. The results show that the first refinement significantly improves the success rate, while subsequent refinements have less impact. The system also incorporates human feedback to enhance the accuracy of the generated models. The framework is open-sourced, providing access to the data, model, and code. The system is designed to be user-friendly, allowing non-experts to interact with it using natural language. The architecture includes a loop for error refinement and model refinement, with the latter using BLIP2 for caption generation and human feedback. The system's performance is evaluated using the Visual Question Answering Score (VQAScore) to determine the alignment between the generated model and the user query. The results demonstrate the effectiveness of the framework in generating accurate CAD models based on natural language queries.