Turning manual web accessibility success criteria into automatic: an LLM-based approach

Turning manual web accessibility success criteria into automatic: an LLM-based approach

16 March 2024 | Juan-Miguel López-Gil, Juanan Pereira
This paper explores the potential of using large language models (LLMs) to automate the evaluation of web accessibility success criteria that currently require manual checks. The authors focus on three specific WCAG (Web Content Accessibility Guidelines) success criteria: 1.1.1 Non-text Content, 2.4.4 Link Purpose (In Context), and 3.1.2 Language of Parts. They develop LLM-based scripts to evaluate test cases and compare the results with current web accessibility evaluators. The study finds that while automated evaluators often fail to reliably test these criteria, the LLM-based scripts achieve an overall detection rate of 87.18%, successfully identifying issues that the tools missed. The results suggest that LLMs can complement automated testing by detecting web accessibility issues that specialized testing tools may overlook. The paper discusses the implications of these findings and outlines future research directions, emphasizing the need for further evaluation across more test cases and content types.This paper explores the potential of using large language models (LLMs) to automate the evaluation of web accessibility success criteria that currently require manual checks. The authors focus on three specific WCAG (Web Content Accessibility Guidelines) success criteria: 1.1.1 Non-text Content, 2.4.4 Link Purpose (In Context), and 3.1.2 Language of Parts. They develop LLM-based scripts to evaluate test cases and compare the results with current web accessibility evaluators. The study finds that while automated evaluators often fail to reliably test these criteria, the LLM-based scripts achieve an overall detection rate of 87.18%, successfully identifying issues that the tools missed. The results suggest that LLMs can complement automated testing by detecting web accessibility issues that specialized testing tools may overlook. The paper discusses the implications of these findings and outlines future research directions, emphasizing the need for further evaluation across more test cases and content types.
Reach us at info@study.space