Understanding (Chat)GPT v BERT%3A Dawn of Justice for Semantic Change Detection

The paper evaluates the performance of (Chat)GPT and BERT in detecting semantic changes in texts over time. It introduces two new tasks, TempoWiC and HistoWiC, which are extensions of the Word-in-Context (WiC) task. TempoWiC focuses on short-term semantic changes in social media texts, while HistoWiC examines long-term semantic changes in historical texts. The study compares (Chat)GPT and BERT on these tasks, finding that (Chat)GPT performs significantly worse than BERT in detecting short-term changes but slightly worse in long-term changes. The results suggest that BERT is more effective for semantic change detection, especially in historical texts. The study also explores different prompting strategies and temperature settings for (Chat)GPT, finding that lower temperatures improve performance on WiC tasks. Additionally, it compares the performance of (Chat)GPT via the web interface and the OpenAI API, finding that the API provides better results. The paper highlights the limitations of (Chat)GPT, including its non-deterministic nature and the challenges of evaluating it for historical texts. Overall, the study shows that while (Chat)GPT is a powerful model, it is not yet suitable for semantic change detection, and BERT remains the preferred choice for this task.The paper evaluates the performance of (Chat)GPT and BERT in detecting semantic changes in texts over time. It introduces two new tasks, TempoWiC and HistoWiC, which are extensions of the Word-in-Context (WiC) task. TempoWiC focuses on short-term semantic changes in social media texts, while HistoWiC examines long-term semantic changes in historical texts. The study compares (Chat)GPT and BERT on these tasks, finding that (Chat)GPT performs significantly worse than BERT in detecting short-term changes but slightly worse in long-term changes. The results suggest that BERT is more effective for semantic change detection, especially in historical texts. The study also explores different prompting strategies and temperature settings for (Chat)GPT, finding that lower temperatures improve performance on WiC tasks. Additionally, it compares the performance of (Chat)GPT via the web interface and the OpenAI API, finding that the API provides better results. The paper highlights the limitations of (Chat)GPT, including its non-deterministic nature and the challenges of evaluating it for historical texts. Overall, the study shows that while (Chat)GPT is a powerful model, it is not yet suitable for semantic change detection, and BERT remains the preferred choice for this task.

Dawn of Justice for Semantic Change Detection

2024 | Francesco Periti, Haim Dubossarsky, Nina Tahmasebi