21 Mar 2024 | JunLiang Ye1,2†, Fangfu Liu1†, Qixiu Li1, Zhengyi Wang1,2, Yikai Wang1, Xinzhou Wang1,2, Yueqi Duan1✉, and Jun Zhu1,2✉
The paper "DreamReward: Text-to-3D Generation with Human Preference" addresses the challenge of aligning 3D content generation with human preferences. The authors propose a comprehensive framework called DreamReward, which includes the construction of a labeled 3D dataset, the development of a human preference reward model (Reward3D), and an optimization algorithm (DreamFL) to enhance 3D generation results.
1. **Dataset Construction**: The authors collect 25k expert comparisons and build a diverse 3D dataset suitable for training and testing models aligned with human preferences. They use a clustering algorithm to extract 5k representative prompts and generate corresponding 3D content, filtering the dataset based on quality.
2. **Reward3D Model**: This model is trained to evaluate the quality of generated 3D content based on human preferences. It is designed to align with human preferences and can effectively distinguish between different 3D models.
3. **DreamFL Algorithm**: Building on the Reward3D model, DreamFL is a direct tuning algorithm that optimizes multi-view diffusion models using a redefined scorer. It aims to improve the alignment of 3D generation results with human preferences by incorporating the Reward3D model into the Score Distillation Sampling (SDS) loss function.
The paper demonstrates the effectiveness of DreamReward through extensive experiments, showing that it generates high-fidelity and 3D consistent results with significant improvements in prompt alignment with human intention. The authors also provide a detailed derivation of the DreamFL algorithm and discuss its implementation details. Overall, the work highlights the potential of learning from human feedback to enhance text-to-3D models.The paper "DreamReward: Text-to-3D Generation with Human Preference" addresses the challenge of aligning 3D content generation with human preferences. The authors propose a comprehensive framework called DreamReward, which includes the construction of a labeled 3D dataset, the development of a human preference reward model (Reward3D), and an optimization algorithm (DreamFL) to enhance 3D generation results.
1. **Dataset Construction**: The authors collect 25k expert comparisons and build a diverse 3D dataset suitable for training and testing models aligned with human preferences. They use a clustering algorithm to extract 5k representative prompts and generate corresponding 3D content, filtering the dataset based on quality.
2. **Reward3D Model**: This model is trained to evaluate the quality of generated 3D content based on human preferences. It is designed to align with human preferences and can effectively distinguish between different 3D models.
3. **DreamFL Algorithm**: Building on the Reward3D model, DreamFL is a direct tuning algorithm that optimizes multi-view diffusion models using a redefined scorer. It aims to improve the alignment of 3D generation results with human preferences by incorporating the Reward3D model into the Score Distillation Sampling (SDS) loss function.
The paper demonstrates the effectiveness of DreamReward through extensive experiments, showing that it generates high-fidelity and 3D consistent results with significant improvements in prompt alignment with human intention. The authors also provide a detailed derivation of the DreamFL algorithm and discuss its implementation details. Overall, the work highlights the potential of learning from human feedback to enhance text-to-3D models.