12 SEPTEMBER 2008 | Luis von Ahn, Benjamin Maurer, Colin McMillen, David Abraham, Manuel Blum
The article discusses the development and application of reCAPTCHA, a system that leverages human effort to improve the accuracy of optical character recognition (OCR) in digitizing old printed materials. CAPTCHAs are widely used security measures on the web to distinguish between humans and automated programs. The authors propose reCAPTCHA, which presents distorted words from scanned texts to users, who are asked to type the correct answers. This method has been shown to achieve a word accuracy of over 99%, matching the performance of professional human transcribers. The system is deployed on over 40,000 websites and has transcribed over 440 million words. The article also highlights the efficiency and security of reCAPTCHA, which uses a combination of OCR and multiple human inputs to improve the accuracy of OCR. The system has been successful in digitizing a large volume of historical texts, contributing to the preservation and accessibility of human knowledge.The article discusses the development and application of reCAPTCHA, a system that leverages human effort to improve the accuracy of optical character recognition (OCR) in digitizing old printed materials. CAPTCHAs are widely used security measures on the web to distinguish between humans and automated programs. The authors propose reCAPTCHA, which presents distorted words from scanned texts to users, who are asked to type the correct answers. This method has been shown to achieve a word accuracy of over 99%, matching the performance of professional human transcribers. The system is deployed on over 40,000 websites and has transcribed over 440 million words. The article also highlights the efficiency and security of reCAPTCHA, which uses a combination of OCR and multiple human inputs to improve the accuracy of OCR. The system has been successful in digitizing a large volume of historical texts, contributing to the preservation and accessibility of human knowledge.