27 Mar 2024 | Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard
The paper introduces Real Acoustic Fields (RAF), a new dataset that captures real acoustic room data from multiple modalities, including high-quality and densely sampled room impulse response (RIR) data paired with multi-view images and precise 6DoF pose tracking data for sound emitters and listeners. The dataset is designed to evaluate existing methods for novel-view acoustic synthesis and impulse response generation, which have previously relied on synthetic data. The authors conducted a systematic evaluation of audio and audio-visual models using this dataset, assessing their performance against multiple criteria and proposing settings to enhance their performance on real-world data. They also investigated the impact of incorporating visual data into neural acoustic field models and demonstrated the effectiveness of a simple "sim2real" approach, where a model is pre-trained with simulated data and fine-tuned with sparse real-world data, leading to significant improvements in few-shot learning. RAF is the first dataset to provide densely captured room acoustic data, making it an ideal resource for researchers working on audio and audio-visual neural acoustic field modeling techniques. The dataset and benchmark are available on the project page.The paper introduces Real Acoustic Fields (RAF), a new dataset that captures real acoustic room data from multiple modalities, including high-quality and densely sampled room impulse response (RIR) data paired with multi-view images and precise 6DoF pose tracking data for sound emitters and listeners. The dataset is designed to evaluate existing methods for novel-view acoustic synthesis and impulse response generation, which have previously relied on synthetic data. The authors conducted a systematic evaluation of audio and audio-visual models using this dataset, assessing their performance against multiple criteria and proposing settings to enhance their performance on real-world data. They also investigated the impact of incorporating visual data into neural acoustic field models and demonstrated the effectiveness of a simple "sim2real" approach, where a model is pre-trained with simulated data and fine-tuned with sparse real-world data, leading to significant improvements in few-shot learning. RAF is the first dataset to provide densely captured room acoustic data, making it an ideal resource for researchers working on audio and audio-visual neural acoustic field modeling techniques. The dataset and benchmark are available on the project page.