May 11–16, 2024, Honolulu, HI, USA | Tzu-Sheng Kuo, Aaron Halfaker, Zirui Cheng, Jiwoo Kim, Meng-Hsin Wu, Tongshuang Wu, Kenneth Holstein, Haiyi Zhu
**Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia**
This paper introduces Wikibench, a system designed to enable community-driven data curation for AI evaluation on Wikipedia. The system aims to address the limitations of datasets created by developers and annotators outside the community, which can lead to misleading conclusions about AI performance. Wikibench allows community members to collaboratively select and label data points, resolve disagreements, and collectively decide on primary labels. A field study on Wikipedia demonstrates that datasets curated using Wikibench effectively capture community consensus, disagreement, and uncertainty. Participants used Wikibench to shape the overall data curation process, including refining label definitions, determining data inclusion criteria, and authoring data statements. The findings highlight the potential of community-driven data curation and provide future directions for HCI systems supporting this approach.
**Key Contributions:**
1. **System:** Wikibench is the first system designed to support community-driven curation of AI datasets.
2. **Field Study:** Findings from a field study on Wikipedia show how community members interact with Wikibench to collaboratively curate evaluation datasets.
3. **Future Directions:** The paper proposes future directions for HCI systems that support community-driven data curation within and beyond the context of Wikipedia.
**Design Requirements:**
1. **Community Leadership:** The data curation process should be led by the community and follow their established norms.
2. **Deliberation:** Encourage deliberation to surface disagreements, build consensus, and promote shared understanding.
3. **Embedding in workflows:** The data curation process should be integrated into existing workflows to reduce duplication of effort.
4. **Transparency:** The data curation process should be public and transparent to community members.
**Wikibench Features:**
- **Plug-in:** Allows community members to select and label new data points during their regular activities on Wikipedia.
- **Entity Page:** Displays individual labels and facilitates discussions and (re-)labeling.
- **Campaign Page:** Shows the entire dataset and enables discussions about the overall data curation process.
**Evaluation:**
- **Field Study:** Observed how Wikipedians use Wikibench in their regular activities.
- **Validation Study:** Compared labels generated through Wikibench with those generated through Wikipedia’s standard consensus-building process.
**Conclusion:**
The paper demonstrates the potential of community-driven data curation and provides insights into how Wikibench can support this process.**Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia**
This paper introduces Wikibench, a system designed to enable community-driven data curation for AI evaluation on Wikipedia. The system aims to address the limitations of datasets created by developers and annotators outside the community, which can lead to misleading conclusions about AI performance. Wikibench allows community members to collaboratively select and label data points, resolve disagreements, and collectively decide on primary labels. A field study on Wikipedia demonstrates that datasets curated using Wikibench effectively capture community consensus, disagreement, and uncertainty. Participants used Wikibench to shape the overall data curation process, including refining label definitions, determining data inclusion criteria, and authoring data statements. The findings highlight the potential of community-driven data curation and provide future directions for HCI systems supporting this approach.
**Key Contributions:**
1. **System:** Wikibench is the first system designed to support community-driven curation of AI datasets.
2. **Field Study:** Findings from a field study on Wikipedia show how community members interact with Wikibench to collaboratively curate evaluation datasets.
3. **Future Directions:** The paper proposes future directions for HCI systems that support community-driven data curation within and beyond the context of Wikipedia.
**Design Requirements:**
1. **Community Leadership:** The data curation process should be led by the community and follow their established norms.
2. **Deliberation:** Encourage deliberation to surface disagreements, build consensus, and promote shared understanding.
3. **Embedding in workflows:** The data curation process should be integrated into existing workflows to reduce duplication of effort.
4. **Transparency:** The data curation process should be public and transparent to community members.
**Wikibench Features:**
- **Plug-in:** Allows community members to select and label new data points during their regular activities on Wikipedia.
- **Entity Page:** Displays individual labels and facilitates discussions and (re-)labeling.
- **Campaign Page:** Shows the entire dataset and enables discussions about the overall data curation process.
**Evaluation:**
- **Field Study:** Observed how Wikipedians use Wikibench in their regular activities.
- **Validation Study:** Compared labels generated through Wikibench with those generated through Wikipedia’s standard consensus-building process.
**Conclusion:**
The paper demonstrates the potential of community-driven data curation and provides insights into how Wikibench can support this process.