Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?

15 Apr 2024 | Xue-Yong Fu*, Md Tahmid Rahman Laskar*, Elena Khasanova, Cheng Chen, Shashi Bhushan TN
This paper investigates whether smaller, compact large language models (LLMs) can serve as a cost-effective alternative to larger LLMs for real-world meeting summarization tasks. While larger LLMs have demonstrated strong performance in zero-shot scenarios, their deployment is often limited by high computational costs. The study compares the performance of fine-tuned compact LLMs (e.g., FLAN-T5, TinyLLaMA, LiteLLaMA) with zero-shot larger LLMs (e.g., LLaMA-2, GPT-3.5, PaLM-2) on real-world meeting summarization datasets. The results show that most smaller LLMs, even after fine-tuning, fail to outperform larger zero-shot LLMs in meeting summarization tasks. However, FLAN-T5 (780M parameters) performs on par or better than many larger zero-shot LLMs, while being significantly smaller. This makes FLAN-T5 a suitable cost-efficient solution for real-world industrial deployment. The study also evaluates the performance of LLMs on two datasets: an in-domain business meeting dataset and a filtered version of the QMSUM dataset. The results indicate that FLAN-T5-Large outperforms other models on the in-domain dataset but performs less well on the QMSUM-I dataset, which contains longer meetings. The performance difference is attributed to the shorter context length of FLAN-T5 compared to the longer transcripts in QMSUM-I. The study also examines the cost and inference speed of different LLMs. FLAN-T5-Large is significantly cheaper to deploy than larger models, requiring only 6GB of VRAM, while larger models require more expensive GPUs. Inference speed is also faster for FLAN-T5-Large, taking only 4.2 seconds per transcript compared to 15 seconds for LLaMA-2-7B. The paper concludes that FLAN-T5-Large is a suitable model for real-world meeting summarization due to its cost-effectiveness and performance. However, further research is needed to improve its performance on longer meetings. The study also highlights the limitations of the work, including the use of GPT-4 generated summaries as reference summaries and the limited range of instructions used in the evaluation.This paper investigates whether smaller, compact large language models (LLMs) can serve as a cost-effective alternative to larger LLMs for real-world meeting summarization tasks. While larger LLMs have demonstrated strong performance in zero-shot scenarios, their deployment is often limited by high computational costs. The study compares the performance of fine-tuned compact LLMs (e.g., FLAN-T5, TinyLLaMA, LiteLLaMA) with zero-shot larger LLMs (e.g., LLaMA-2, GPT-3.5, PaLM-2) on real-world meeting summarization datasets. The results show that most smaller LLMs, even after fine-tuning, fail to outperform larger zero-shot LLMs in meeting summarization tasks. However, FLAN-T5 (780M parameters) performs on par or better than many larger zero-shot LLMs, while being significantly smaller. This makes FLAN-T5 a suitable cost-efficient solution for real-world industrial deployment. The study also evaluates the performance of LLMs on two datasets: an in-domain business meeting dataset and a filtered version of the QMSUM dataset. The results indicate that FLAN-T5-Large outperforms other models on the in-domain dataset but performs less well on the QMSUM-I dataset, which contains longer meetings. The performance difference is attributed to the shorter context length of FLAN-T5 compared to the longer transcripts in QMSUM-I. The study also examines the cost and inference speed of different LLMs. FLAN-T5-Large is significantly cheaper to deploy than larger models, requiring only 6GB of VRAM, while larger models require more expensive GPUs. Inference speed is also faster for FLAN-T5-Large, taking only 4.2 seconds per transcript compared to 15 seconds for LLaMA-2-7B. The paper concludes that FLAN-T5-Large is a suitable model for real-world meeting summarization due to its cost-effectiveness and performance. However, further research is needed to improve its performance on longer meetings. The study also highlights the limitations of the work, including the use of GPT-4 generated summaries as reference summaries and the limited range of instructions used in the evaluation.
Reach us at info@study.space
Understanding Tiny Titans%3A Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization%3F