FreeVA is an empirical study that explores the use of existing image-based Multimodal Large Language Models (MLLMs) as training-free video assistants. The study demonstrates that offline image-based MLLMs, when combined with proper temporal aggregation, can achieve state-of-the-art performance in zero-shot video question-answering tasks, even surpassing methods that involve video instruction tuning. The study also reveals that using the widely adopted VideoInstruct-100K dataset for video instruction tuning does not necessarily lead to better performance compared to not training at all. Additionally, the commonly used evaluation metrics for zero-shot video question-answering are significantly influenced by changes in the GPT API version over time, which can affect the fairness and uniformity of comparisons between different methods. FreeVA provides a simple and effective baseline for extending image-based MLLMs to the video domain, encouraging direct evaluation of existing MLLMs in video tasks. The study also encourages researchers to reconsider whether current video MLLM methods have truly acquired knowledge beyond image MLLMs. FreeVA is implemented as a plug-and-play tool, and the code is available at https://github.com/whwu95/FreeVA.FreeVA is an empirical study that explores the use of existing image-based Multimodal Large Language Models (MLLMs) as training-free video assistants. The study demonstrates that offline image-based MLLMs, when combined with proper temporal aggregation, can achieve state-of-the-art performance in zero-shot video question-answering tasks, even surpassing methods that involve video instruction tuning. The study also reveals that using the widely adopted VideoInstruct-100K dataset for video instruction tuning does not necessarily lead to better performance compared to not training at all. Additionally, the commonly used evaluation metrics for zero-shot video question-answering are significantly influenced by changes in the GPT API version over time, which can affect the fairness and uniformity of comparisons between different methods. FreeVA provides a simple and effective baseline for extending image-based MLLMs to the video domain, encouraging direct evaluation of existing MLLMs in video tasks. The study also encourages researchers to reconsider whether current video MLLM methods have truly acquired knowledge beyond image MLLMs. FreeVA is implemented as a plug-and-play tool, and the code is available at https://github.com/whwu95/FreeVA.