Turning manual web accessibility success criteria into automatic: an LLM-based approach

Turning manual web accessibility success criteria into automatic: an LLM-based approach

16 March 2024 | Juan-Miguel López-Gil, Juanan Pereira
This paper explores whether large language models (LLMs) can automate the evaluation of web accessibility success criteria (SC) that currently require manual checks. Three specific WCAG SC—1.1.1 Non-text Content, 2.4.4 Link Purpose (In Context), and 3.1.2 Language of Parts—were tested. LLM-based scripts were developed to evaluate these SC, and results were compared against current web accessibility evaluators. While automated tools struggled to reliably test these SC, often missing or only warning about issues, the LLM-based scripts successfully identified accessibility issues the tools missed, achieving an overall 87.18% detection rate across test cases. The results demonstrate that LLMs can augment automated accessibility testing to catch issues that pure software testing misses today. Further research is needed to expand evaluation across more test cases and types of content. The study highlights the potential of LLMs in automating web accessibility evaluation, particularly for SC that are challenging to test automatically. LLMs can provide deeper insights into complex datasets generated by automated tools, identify patterns or discrepancies, and assist in formulating evaluation strategies for SC that cannot be automatically assessed. The research also shows that LLMs can help in generating human-like reasoning to approximate the judgment of a human expert. However, the study has limitations, including a focus on a limited number of SC and the use of a specific test suite. The results suggest that LLMs have the potential to play a critical role in web accessibility evaluation by assisting in interpreting results and analyzing SC that cannot be evaluated automatically. The study contributes to a better understanding of how different tools perform under different conditions and highlights the importance of robust automatic detection to supplement manual evaluation.This paper explores whether large language models (LLMs) can automate the evaluation of web accessibility success criteria (SC) that currently require manual checks. Three specific WCAG SC—1.1.1 Non-text Content, 2.4.4 Link Purpose (In Context), and 3.1.2 Language of Parts—were tested. LLM-based scripts were developed to evaluate these SC, and results were compared against current web accessibility evaluators. While automated tools struggled to reliably test these SC, often missing or only warning about issues, the LLM-based scripts successfully identified accessibility issues the tools missed, achieving an overall 87.18% detection rate across test cases. The results demonstrate that LLMs can augment automated accessibility testing to catch issues that pure software testing misses today. Further research is needed to expand evaluation across more test cases and types of content. The study highlights the potential of LLMs in automating web accessibility evaluation, particularly for SC that are challenging to test automatically. LLMs can provide deeper insights into complex datasets generated by automated tools, identify patterns or discrepancies, and assist in formulating evaluation strategies for SC that cannot be automatically assessed. The research also shows that LLMs can help in generating human-like reasoning to approximate the judgment of a human expert. However, the study has limitations, including a focus on a limited number of SC and the use of a specific test suite. The results suggest that LLMs have the potential to play a critical role in web accessibility evaluation by assisting in interpreting results and analyzing SC that cannot be evaluated automatically. The study contributes to a better understanding of how different tools perform under different conditions and highlights the importance of robust automatic detection to supplement manual evaluation.
Reach us at info@study.space
[slides and audio] Turning manual web accessibility success criteria into automatic%3A an LLM-based approach