Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

2024-05-14 | Irina Jurenka, Markus Kunesch, Kevin R. McKeel, Daniel Gillick, Shaojian Zhu, Sara Wilberger, Shubham Milind Phal, Katherine Hermann, Daniel Kasenberg, Avishkar Bhopchand, Ankit Anand, Miruna Pislar, Stephanie Chan, Lisa Wang, Jennifer She, Parsa Mahmoudieh, Aliya Rysbek, Wei-Jen Ko, Andrea Huber, Brett Wiltshire, Gal Elidan, Roni Rabin, Jasmin Rubinovitz, Amit Pitaru, Mac McAllister, Julia Wilkowski, David Choi, Roe Engelberg, Lidan Hackmon, Adva Levin, Rachel Griffin, Michael Sears, Filip Bar, Mia Mesar, Mana Jabbour, Arslan Chaudhry, James Cohan, Sridhar Thiagarajan, Nir Levine, Ben Brown, Dilan Gorur, Svetlana Grant, Rachel Hashimshoni, Laura Weidinger, Jieru Hu, Dawn Chen, Kuba Dolecki, Canfer Akbulut, Maxwell Bileschi, Laura Culp, Wen-Xin Dong, Nahema Marchal, Kelsie Van Deman, Hema Bajaj Misra, Michael Duah, Moran Ambar, Avi Caciularu, Sandra Lefdal, Chris Summerfield, James An, Pierre-Alexandre Kamienny, Abhinit Mohdi, Theofilos Strinopoulos, Annie Hale, Wayne Anderson, Luis C. Cobo, Niv Efron, Muktha Ananda, Shakir Mohamed, Maureen Heymans, Zoubin Ghahramani, Yossi Matias, Ben Gomes and Lila Ibrahim
This paper presents a responsible development approach for generative AI (gen AI) in education, focusing on creating a comprehensive evaluation framework to improve the pedagogical capabilities of AI tutors. The authors collaborated with learners and educators to translate learning science principles into seven diverse educational benchmarks, spanning quantitative, qualitative, automatic, and human evaluations. They developed LearnLM-Tutor, a fine-tuned version of Gemini 1.0, which was consistently preferred by educators and learners on multiple pedagogical dimensions. The paper highlights the challenges of translating pedagogical intuitions into gen AI prompts and the lack of good evaluation practices, which hinder the development of effective AI tutors. The authors propose a participatory approach, involving learners, educators, and researchers in the design and evaluation of AI tutors. They also discuss the limitations of current evaluation practices and the need for better metrics to measure pedagogical success. The paper emphasizes the importance of a shared framework across learning science, EdTech, and AI for education to enable progress. The authors also discuss the ethical and safety implications of their work, emphasizing the need for education-specific interventions. The paper concludes with a call for collaboration among stakeholders in research, EdTech, ethics, policy, and education to establish common guidelines, benchmarks, and working principles for the responsible development of AI for education.This paper presents a responsible development approach for generative AI (gen AI) in education, focusing on creating a comprehensive evaluation framework to improve the pedagogical capabilities of AI tutors. The authors collaborated with learners and educators to translate learning science principles into seven diverse educational benchmarks, spanning quantitative, qualitative, automatic, and human evaluations. They developed LearnLM-Tutor, a fine-tuned version of Gemini 1.0, which was consistently preferred by educators and learners on multiple pedagogical dimensions. The paper highlights the challenges of translating pedagogical intuitions into gen AI prompts and the lack of good evaluation practices, which hinder the development of effective AI tutors. The authors propose a participatory approach, involving learners, educators, and researchers in the design and evaluation of AI tutors. They also discuss the limitations of current evaluation practices and the need for better metrics to measure pedagogical success. The paper emphasizes the importance of a shared framework across learning science, EdTech, and AI for education to enable progress. The authors also discuss the ethical and safety implications of their work, emphasizing the need for education-specific interventions. The paper concludes with a call for collaboration among stakeholders in research, EdTech, ethics, policy, and education to establish common guidelines, benchmarks, and working principles for the responsible development of AI for education.
Reach us at info@study.space