12 Apr 2024 | Athanasios Karapantelakis, Mukesh Thakur, Alexandros Nikou, Farnaz Moradi, Christian Olrog, Fitsum Gaim, Henrik Holm, Doumitrou Daniil Nimara, Vincent Huang
This paper explores the use of Large Language Models (LLMs) as Question Answering (QA) assistants for 3GPP standards, which have become increasingly complex and voluminous. The authors evaluate the capabilities and limitations of state-of-the-art LLMs, introduce a new model called TeleRoBERTa, and provide guidelines for improving LLM performance. Key contributions include:
1. **Benchmark and Evaluation Methods**: Develop a benchmark and methods to evaluate LLMs' performance.
2. **Data Preprocessing and Fine-Tuning**: Preprocess 3GPP documents and fine-tune LLMs to enhance accuracy.
3. **TeleRoBERTa Model**: Introduce TeleRoBERTa, an extractive QA model that performs similarly to foundation LLMs but with fewer parameters.
The paper highlights that LLMs can effectively assist in accessing relevant information from 3GPP standards, making them suitable for various applications such as troubleshooting, maintenance, network operations, and software product development. The evaluation uses metrics like BERTScore and GPT-4 Ref to measure accuracy, showing that TeleRoBERTa performs well and can be further improved through context engineering and fine-tuning.This paper explores the use of Large Language Models (LLMs) as Question Answering (QA) assistants for 3GPP standards, which have become increasingly complex and voluminous. The authors evaluate the capabilities and limitations of state-of-the-art LLMs, introduce a new model called TeleRoBERTa, and provide guidelines for improving LLM performance. Key contributions include:
1. **Benchmark and Evaluation Methods**: Develop a benchmark and methods to evaluate LLMs' performance.
2. **Data Preprocessing and Fine-Tuning**: Preprocess 3GPP documents and fine-tune LLMs to enhance accuracy.
3. **TeleRoBERTa Model**: Introduce TeleRoBERTa, an extractive QA model that performs similarly to foundation LLMs but with fewer parameters.
The paper highlights that LLMs can effectively assist in accessing relevant information from 3GPP standards, making them suitable for various applications such as troubleshooting, maintenance, network operations, and software product development. The evaluation uses metrics like BERTScore and GPT-4 Ref to measure accuracy, showing that TeleRoBERTa performs well and can be further improved through context engineering and fine-tuning.