reCAPTCHA Inc.

Original author(s)	Luis von Ahn Manuel Blum David Abraham Michael Crawford Ben Maurer Colin McMillen Harshad Bhujbal Edison Tan
Developer(s)	Google
Initial release	May 27, 2007; 17 years ago (2007-05-27)
Type	Classic version: CAPTCHA New version: Behavioral analysis
Website	google.com/recaptcha

reCAPTCHA Inc.^[1] is a CAPTCHA system owned by Google. It enables web hosts to distinguish between human and automated access to websites. The original version asked users to decipher hard-to-read text or match images. Version 2 also asked users to decipher text or match images if the analysis of cookies and canvas rendering suggested the page was being downloaded automatically.^[2] Since version 3, reCAPTCHA will never interrupt users and is intended to run automatically when users load pages or click buttons.^[3]

The original iteration of the service was a mass collaboration platform designed for the digitization of books, particularly those that were too illegible to be scanned by computers. The verification prompts utilized pairs of words from scanned pages, with one known word used as a control for verification, and the second used to crowdsource the reading of an uncertain word.^[4] reCAPTCHA was originally developed by Luis von Ahn, David Abraham, Manuel Blum, Michael Crawford, Ben Maurer, Colin McMillen, and Edison Tan at Carnegie Mellon University's main Pittsburgh campus.^[5] It was acquired by Google in September 2009.^[6] The system helped to digitize the archives of The New York Times, and was subsequently used by Google Books for similar purposes.^[7]

The system was reported as displaying over 100 million CAPTCHAs every day,^[8] on sites such as Facebook, TicketMaster, Twitter, 4chan, CNN.com, StumbleUpon,^[9] Craigslist (since June 2008),^[10] and the U.S. National Telecommunications and Information Administration's digital TV converter box coupon program website (as part of the US DTV transition).^[11]

In 2014, Google pivoted the service away from its original concept, with a focus on reducing the amount of user interaction needed to verify a user, and only presenting human recognition challenges (such as identifying images in a set that satisfy a specific prompt) if behavioral analysis suspects that the user may be a bot.

In October 2023, it was found that OpenAI's GPT-4 chatbot could solve CAPTCHAs.^[12]

Origin

Distributed Proofreaders was the first project to volunteer its time to decipher scanned text that could not be read by optical character recognition (OCR) programs. It works with Project Gutenberg to digitize public domain material and uses methods quite different from reCAPTCHA.

The reCAPTCHA program originated with Guatemalan computer scientist Luis von Ahn,^[13] and was aided by a MacArthur Fellowship. An early CAPTCHA developer, he realized "he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles".^[14]

Operation

ReCAPTCHA v1 (human-assisted OCR)

An example of how a reCAPTCHA challenge looked in 2007,^[15] containing the words "following" and "finding". The waviness and horizontal stroke were added to increase the difficulty of breaking the CAPTCHA with a computer program.

Scanned text is subjected to analysis by two different OCRs. Any word that is deciphered differently by the two OCR programs or that is not in an English dictionary is marked as "suspicious" and converted into a CAPTCHA. The suspicious word is displayed, out of context, sometimes along with a control word already known. If the human types the control word correctly, then the response to the questionable word is accepted as probably valid. If enough users were to correctly type the control word, but incorrectly type the second word which OCR had failed to recognize, then the digital version of documents could end up containing the incorrect word. The identification performed by each OCR program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 points, the word is considered valid. Those words that are consistently given a single identity by human judges are later recycled as control words.^[16] If the first three guesses match each other but do not match either of the OCRs, they are considered a correct answer, and the word becomes a control word.^[17] When six users reject a word before any correct spelling is chosen, the word is discarded as unreadable.^[17]

The original reCAPTCHA method was designed to show the questionable words separately, as out-of-context correction, rather than in use, such as within a phrase of five words from the original document.^[18] Also, the control word might mislead the context for the second word, such as a request of "/metal/ /fife/" being entered as "metal file" due to the logical connection of filing with a metal tool being considered more common than the musical instrument "fife".^{[citation needed]}

In 2012, reCAPTCHA began using photographs taken from Google Street View project, in addition to scanned words.^[19] It will ask the user to identify images of crosswalks, street lights, and other objects. It has been hypothesized that the data is used by Waymo (a Google subsidiary) to train autonomous vehicles, though an unnamed representative has denied this, claiming the data was only being used to improve Google Maps as of mid-2021.^[20]

Google charges for the use of reCAPTCHA on websites that make over a million reCAPTCHA queries a month.^[21]

No CAPTCHA reCAPTCHA (v2+)

The NoCAPTCHA reCAPTCHA

In 2013, reCAPTCHA began implementing behavioral analysis of the browser's interactions to predict whether the user was a human or a bot. The following year, Google began to deploy a new reCAPTCHA API, featuring the "no CAPTCHA reCAPTCHA"—where users deemed to be of low risk only need to click a single checkbox to verify their identity. A CAPTCHA may still be presented if the system is uncertain of the user's risk; Google also introduced a new type of CAPTCHA challenge designed to be more accessible to mobile users, where the user must select images matching a specific prompt from a grid.^[2]^[22]

In 2017, Google introduced a new "invisible" reCAPTCHA, where verification occurs in the background, and no challenges are displayed at all if the user is deemed to be of low risk.^[23]^[24]^[25] According to former Google "click fraud czar" Shuman Ghosemajumder, this capability "creates a new sort of challenge that very advanced bots can still get around, but introduces a lot less friction to the legitimate human."^[25]

reCAPTCHA v1 was declared end-of-life and shut down on March 31, 2018.^[26]

Implementation

The reCAPTCHA tests are displayed from the central site of the reCAPTCHA project, which supplies the words to be deciphered. This is done through a JavaScript API with the server making a callback to reCAPTCHA after the request has been submitted. The reCAPTCHA project provides libraries for various programming languages and applications to make this process easier. reCAPTCHA is a free-of-charge service provided to websites for assistance with the decipherment,^[27] but the reCAPTCHA software is not open-source.^[28]

Also, reCAPTCHA offers plugins for several web-application platforms including ASP.NET, Ruby, and PHP, to ease the implementation of the service.^[29]

Security

An example of how reCAPTCHA challenges were presented in 2010,^[30] containing the words "and chisels"

The main purpose of a CAPTCHA system is to block spambots while allowing human users. On December 14, 2009, Jonathan Wilkins released a paper describing weaknesses in reCAPTCHA that allowed bots to achieve a solve rate of 18%.^[31]^[32]^[33]

On August 1, 2010, Chad Houck gave a presentation to the DEF CON 18 Hacking Conference detailing a method to reverse the distortion added to images which allowed a computer program to determine a valid response 10% of the time.^[34]^[35] The reCAPTCHA system was modified on July 21, 2010, before Houck was to speak on his method. Houck modified his method to what he described as an "easier" CAPTCHA to determine a valid response 31.8% of the time. Houck also mentioned security defenses in the system, including a high-security lockout if an invalid response is given 32 times in a row.^[36]

On May 26, 2012, Adam, C-P, and Jeffball of DC949 gave a presentation at the LayerOne hacker conference detailing how they were able to achieve an automated solution with an accuracy rate of 99.1%.^[37] Their tactic was to use techniques from machine learning, a subfield of artificial intelligence, to analyze the audio version of reCAPTCHA which is available for the visually impaired. Google released a new version of reCAPTCHA just hours before their talk, making major changes to both the audio and visual versions of their service. In this release, the audio version was increased in length from 8 seconds to 30 seconds and is much more difficult to understand, both for humans as well as bots. In response to this update and the following one, the members of DC949 released two more versions of Stiltwalker which beat reCAPTCHA with an accuracy of 60.95% and 59.4% respectively. After each successive break, Google updated reCAPTCHA within a few days. According to DC949, they often reverted to features that had been previously hacked.

On June 27, 2012, Claudia Cruz, Fernando Uceda, and Leobardo Reyes published a paper showing a system running on reCAPTCHA images with an accuracy of 82%.^[38] The authors have not said if their system can solve recent reCAPTCHA images, although they claim their work to be intelligent OCR and robust to some, if not all changes in the image database.

In an August 2012 presentation given at BsidesLV 2012, DC949 called the latest version "unfathomably impossible for humans"—they were not able to solve them manually either.^[37] The web accessibility organization WebAIM reported in May 2012, "Over 90% of respondents [screen reader users] find CAPTCHA to be very or somewhat difficult".^[39]

Criticism

The original iteration of reCAPTCHA was criticized as being a source of unpaid work to assist in transcribing efforts.^[40]

Google profits from reCAPTCHA users as free workers to improve its AI research.^[41]

Privacy

The current iteration of the system has been criticized for its reliance on tracking cookies and promotion of vendor lock-in with Google services; administrators are encouraged to include reCAPTCHA tracking code on all pages of their website to analyze the behavior and "risk" of users, which determines the level of friction presented when a reCAPTCHA prompt is used.^[42] Google stated in its privacy policy that user data collected in this manner is not used for personalized advertising. It was also discovered that the system favors those who have an active Google account login, and displays a higher risk towards those using anonymizing proxies and VPN services.^[23]

Concerns were raised regarding privacy when Google announced reCAPTCHA v3.0, as it allows Google to track users on non-Google websites.^[23]

In April 2020, Cloudflare switched from reCAPTCHA to hCaptcha, citing privacy concerns over Google's potential use of the data they recollect through reCAPTCHA for targeted advertising^[43] and to cut down on operating costs since a considerable portion of Cloudflare's customers are non-paying customers. In response, Google told PC Magazine that the data from reCAPTCHA is never used for personalized advertising purposes.^[21]

Accessibility

Google's help center states that reCAPTCHA is not supported for the deafblind community,^[44] effectively locking such users out of all pages that use the service. However, reCAPTCHA does currently have the longest list of accessibility considerations of any CAPTCHA service.^[45]

Interface

In one of the variants of CAPTCHA challenges, images are not incrementally highlighted, but fade out when clicked, and replaced with a new image fading in, resembling whack-a-mole.

Criticism has been aimed at the long duration taken for the images to fade out and in.^[46]

Derivative projects

reCAPTCHA also created the Mailhide project, which protects email addresses on web pages from being harvested by spammers.^[47] By default, the email address was converted into a format that did not allow a crawler to see the full email address; for example, "mailme@example.com" would have been converted to "mai...@example.com". The visitor would then click on the "..." and solve the CAPTCHA to obtain the full email address. One could also edit the pop-up code so that none of the addresses were visible. Mailhide was discontinued in 2018 because it relied on reCAPTCHA v1.^[48]

References

External links

CAPTCHAs

Google

Company

Divisions

Ads
AI
- Brain
- DeepMind
Android
China
- Goojje
Chrome
Cloud
Glass
Google.org
Health
Maps
Pixel
Search
- Timeline
Sidewalk Labs
Sustainability
YouTube
- History
- "Me at the zoo"
- Social impact
- YouTuber

People

Current	Krishna Bharat Vint Cerf Jeff Dean John Doerr Sanjay Ghemawat Al Gore John L. Hennessy Urs Hölzle Salar Kamangar Ray Kurzweil Ann Mather Alan Mulally Rick Osterloh Sundar Pichai (CEO) Ruth Porat (CFO) Rajen Sheth Hal Varian Susan Wojcicki Neal Mohan
Former	Andy Bechtolsheim Sergey Brin (Founder) David Cheriton Matt Cutts David Drummond Alan Eustace Timnit Gebru Omid Kordestani Paul Otellini Larry Page (Founder) Patrick Pichette Eric Schmidt Ram Shriram Amit Singhal Shirley M. Tilghman Rachel Whetstone

Real estate

Design

Fonts
- Croscore
- Noto
- Product Sans
- Roboto
Logo
- Doodle
  - Doodle Champion Island Games
  - Magic Cat Academy
Material Design

Events

Android Developer Challenge Developer Day Developer Lab Code-in Code Jam Developer Day Developers Live Doodle4Google G-Day I/O Jigsaw Living Stories Lunar XPRIZE Mapathon Science Fair Summer of Code Talks at Google
YouTube	Awards CNN/YouTube presidential debates Comedy Week Live Music Awards Space Lab Symphony Orchestra

Projects and
initiatives

20% project
Area 120
- Reply
- Tables
ATAP
Business Groups
Computing University Initiative
Data Liberation Front
Data Transfer Project
Developer Expert
Digital Garage
Digital News Initiative
Digital Unlocked
Dragonfly
Founders' Award
Free Zone
Get Your Business Online
Google for Education
Google for Startups
Labs
Liquid Galaxy
Made with Code
Māori
ML FairnessNative Client
News Lab
Nightingale
OKR
PowerMeter
Privacy Sandbox
Quantum Artificial Intelligence Lab
RechargeIT
Shield
Silicon Initiative
Solve for X
Starline
Student Ambassador Program
Submarine communications cables
- Dunant
- Grace Hopper
Sunroof
YouTube
- Creator Awards
- Next Lab and Audience Development Group
- Original Channel Initiative
Zero

Criticism

2018 data breach 2018 walkouts Alphabet Workers Union Censorship DeGoogle "Did Google Manipulate Search for Hillary?" Dragonfly FairSearch "Ideological Echo Chamber" memo Litigation Privacy concerns Street View San Francisco tech bus protests Services outages Smartphone patent wars Worker organization
YouTube	Back advertisement controversy Censorship Copyright issues Copyright strike Elsagate Fantastic Adventures scandal Headquarters shooting Kohistan video case Reactions to Innocence of Muslims Slovenian government incident

Development

Operating systems

Android
- Automotive
- Glass OS
- Go
- gLinux
- Goobuntu
- Things
- TV
- Wear OS
ChromeOS
- ChromiumOS
- Neverware
Fuchsia
TV

Libraries/
frameworks

Platforms

App Engine AppJet Apps Script Cloud Platform Anvato Firebase Cloud Messaging Crashlytics Global IP Solutions Internet Low Bitrate Codec Internet Speech Audio Codec Gridcentric, Inc. ITA Software Kubernetes LevelDB Neatx Project IDX SageTV
Apigee	Bigtable Bitium Chronicle VirusTotal Compute Engine Connect Dataflow Datastore Kaggle Looker Mandiant Messaging Orbitera Shell Stackdriver Storage

Tools

Search algorithms

Others

BERT BigQuery Chrome Experiments Flutter Gemini Googlebot Keyhole Markup Language LaMDA Open Location Code PaLM Programming languages Caja Carbon Dart Go Sawzall Transformer Viewdle Webdriver Torso Web Server
File formats	AAB APK AV1 On2 Technologies VP3 VP6 VP8 libvpx VP9 WebM WebP WOFF2

Products

Entertainment

Currents (news app) Green Throttle Games Owlchemy Labs Oyster PaperofRecord.com Podcasts Quick, Draw! Santa Tracker Songza Stadia games Typhoon Studios TV Vevo Video
Play	Books Games most downloaded apps Music Newsstand Pass Services
YouTube	BandPage BrandConnect Content ID Instant Kids Music Official channel Preferred Premium original programming YouTube Rewind RightsFlow Shorts Studio TV

Communication

Search

Aardvark
Alerts
Answers
Base
BeatThatQuote.com
Blog Search
Books
- Ngram Viewer
Code Search
Data Commons
Dataset Search
Dictionary
Directory
Fast Flip
Flu Trends
Finance
Goggles
Google.by
Images
- Image Labeler
- Image Swirl
Kaltix
Knowledge Graph
- Freebase
- Metaweb
Like.com
News
- Archive
- Weather
Patents
People Cards
Personalized Search
Public Data Explorer
Questions and Answers
SafeSearch
Scholar
Searchwiki
Shopping
Catalogs
- Express
Squared
Tenor
Travel
- Flights
Trends
- Insights for Search
Voice Search
WDYL

Navigation

Earth
Endoxon
ImageAmerica
Maps
- Latitude
- Map Maker
- Navigation
- Pin
- Street View
  - Coverage
  - Trusted
Waze

Business
and finance

Ad Manager
AdMob
Ads
Adscape
AdSense
Attribution
BebaPay
Checkout
Contributor
DoubleClick
- Affiliate Network
- Invite Media
Marketing Platform
- Analytics
- Looker Studio
- Urchin
Pay (mobile app)
- Wallet
- Pay (payment method)
- Send
- Tez
PostRank
Primer
Softcard
Wildfire Interactive
Widevine

Organization
and productivity

Bookmarks Browser Sync Calendar Cloud Search Desktop Drive Etherpad fflick Files iGoogle Jamboard Notebook One Photos Quickoffice Quick Search Box Surveys Sync Tasks Toolbar
Docs Editors	Docs Drawings Forms Fusion Tables Keep Sheets Slides Sites Vids
Publishing	Apture Blogger Pyra Labs Domains FeedBurner One Pass Page Creator Sites Web Designer

Education

Others

Account Dashboard Takeout Android Auto Android Beam Arts & Culture Assistant Authenticator Body BufferBox Building Maker BumpTop Cast Cloud Print Crowdsource Digital Wellbeing Expeditions Family Link Find My Device Fit Google Fonts Gboard Gemini Gesture Search Impermium Knol Lively Live Transcribe MyTracks Nearby Share Now Offers Opinion Rewards Person Finder Poly Question Hub Quick Share Reader Safe Browsing Sidewiki SlickLogin Sound Amplifier Speech Services Station Store TalkBack Tilt Brush URL Shortener Voice Access Wavii Web Light WiFi
Chrome	Apps Chromium Dinosaur Game GreenBorder Remote Desktop Web Store V8
Images and photography	Camera Lens Snapseed Nik Software Panoramio Photos Picasa Web Albums Picnik

Hardware

Smartphones	Android Dev Phone Android One Nexus Nexus One S Galaxy Nexus 4 5 6 5X 6P Comparison Pixel Pixel 2 3 3a 4 4a 5 5a 6 6a 7 7a Fold 8 8a Comparison Play Edition Project Ara
Laptops and tablets	Chromebook Nexus 7 (2012) 7 (2013) 10 9 Comparison Pixel Chromebook Pixel Pixelbook Pixelbook Go C Slate Tablet
Wearables	Fitbit List of products Pixel Buds Pixel Watch Pixel Watch 2 Project Iris (unreleased) Virtual reality Cardboard Contact Lens Daydream Glass
Others	Chromebit Chromebox Clips Digital media players Chromecast Nexus Player Nexus Q Dropcam Liquid Galaxy Nest Smart Speakers Thermostat Wifi OnHub Pixel Visual Core Search Appliance Sycamore processor Tensor Tensor Processing Unit Titan Security Key

v t e Litigation
Advertising	Feldman v. Google, Inc. (2007) Rescuecom Corp. v. Google Inc. (2009) Goddard v. Google, Inc. (2009) Rosetta Stone Ltd. v. Google, Inc. (2012) Google, Inc. v. American Blind & Wallpaper Factory, Inc. (2017) Jedi Blue
Antitrust	European Union (2010–present) United States v. Adobe Systems, Inc., Apple Inc., Google Inc., Intel Corporation, Intuit, Inc., and Pixar (2011) Umar Javeed, Sukarma Thapar, Aaqib Javeed vs. Google LLC and Ors. (2019) United States v. Google LLC (2020) United States v. Google LLC (2023)
Intellectual property	Perfect 10, Inc. v. Amazon.com, Inc. and A9.com Inc. and Google Inc. (2007) Viacom International Inc. v. YouTube, Inc. (2010) Lenz v. Universal Music Corp.(2015) Authors Guild, Inc. v. Google, Inc. (2015) Field v. Google, Inc. (2016) Google LLC v. Oracle America, Inc. (2021) Smartphone patent wars
Privacy	Rocky Mountain Bank v. Google, Inc. (2009) Hibnick v. Google, Inc. (2010) United States v. Google Inc. (2012) Judgement of the German Federal Court of Justice on Google's autocomplete function (2013) Joffe v. Google, Inc. (2013) Mosley v SARL Google (2013) Google Spain v AEPD and Mario Costeja González (2014) Frank v. Gaos (2019)
Other	Garcia v. Google, Inc. (2015) Google LLC v Defteros (2020) Epic Games v. Google (2021) Gonzalez v. Google LLC (2022)
Category

Terms and phrases	"Don't be evil" Gayglers Google (verb) Google bombing 2004 U.S. presidential election Google effect Googlefight Google hacking Googleshare Google tax Googlewhack Googlization "Illegal flower tribute" Rooting Search engine manipulation effect Sitelink Site reliability engineering YouTube poop
Documentaries	AlphaGo Google: Behind the Screen Google Maps Road Trip Google and the World Brain The Creepy Line
Books	Google Hacks The Google Story Google Volume One Googled: The End of the World as We Know It How Google Works I'm Feeling Lucky In the Plex The Google Book The MANIAC
Popular culture	Google Feud Google Me (film) "Google Me" (Kim Zolciak song) "Google Me" (Teyana Taylor song) Is Google Making Us Stupid? Proceratium google Matt Nathanson: Live at Google The Billion Dollar Code The Internship Where on Google Earth is Carmen Sandiego?
Others	"Attention Is All You Need" elgooG Predictions of the end Registry .app (top-level domain) .dev g.co .google Pimp My Search Relationship with Wikipedia Sensorvault Stanford Digital Library Project

Italics indicate discontinued products or services.
Category
Commons
Outline
WikiProject

CAPTCHAs
reCAPTCHA NuCaptcha hCaptcha
Category WikiProject Commons

Origin

Operation

ReCAPTCHA v1 (human-assisted OCR)

No CAPTCHA reCAPTCHA (v2+)

Implementation

Security

Criticism

Privacy

Accessibility

Interface

Derivative projects

References

Further reading

External links