30 May 2024 | Zekun Li, Zhiyu Zoey Chen, Mike Ross, Patrick Huber, Seungwhan Moon, Zhaojiang Lin, Luna Dong, Adithya Sagar, Xifeng Yan, and Paul A. Crook
This paper introduces a novel approach called FnCTOD, which integrates function calling into large language models (LLMs) to enhance zero-shot dialogue state tracking (DST) in task-oriented dialogues (TOD). The method treats each domain as a distinct function and the slot values within the domain as arguments, allowing LLMs to generate function calls along with responses. Experimental results on the MultiWOZ benchmark demonstrate that FnCTOD enables modestly sized open-source LLMs to achieve or surpass the performance of advanced proprietary models like ChatGPT, with improvements of up to 14% in average joint goal accuracy (JGA). The approach also shows that fine-tuning a 13B parameter LLaMA2-Chat model on a small collection of diverse task-oriented dialogues can equip it with function-calling capabilities and DST performance comparable to ChatGPT while maintaining its chat capabilities. The paper highlights the effectiveness of in-context prompting and the benefits of function call decomposition, providing a comprehensive evaluation of the approach's performance and limitations.This paper introduces a novel approach called FnCTOD, which integrates function calling into large language models (LLMs) to enhance zero-shot dialogue state tracking (DST) in task-oriented dialogues (TOD). The method treats each domain as a distinct function and the slot values within the domain as arguments, allowing LLMs to generate function calls along with responses. Experimental results on the MultiWOZ benchmark demonstrate that FnCTOD enables modestly sized open-source LLMs to achieve or surpass the performance of advanced proprietary models like ChatGPT, with improvements of up to 14% in average joint goal accuracy (JGA). The approach also shows that fine-tuning a 13B parameter LLaMA2-Chat model on a small collection of diverse task-oriented dialogues can equip it with function-calling capabilities and DST performance comparable to ChatGPT while maintaining its chat capabilities. The paper highlights the effectiveness of in-context prompting and the benefits of function call decomposition, providing a comprehensive evaluation of the approach's performance and limitations.