Optical Character Recognition (OCR) has many uses in the world today. Especially as we continue transitioning our communications from paper to digital forms. One common request for this technology is to solve captchas. Appropriate use for this is to automate the management of an account someone has online. For example, a seller wants to automate the response to new customer orders. These appear in their vendor account for an online marketplace. If a captcha is active for the login page that becomes a hurdle. We will walk through how to use the Google Vision API with Python to overcome it.
For sample captcha images we will use the ones listed on the examples page at Captcha.com. The Google Vision API has several different endpoints. We will avoid using facial recognition endpoints with public images. This is due to recent controversies around that technology. Instead, we will focus on captcha images. The purpose of a captcha is to prevent scripts from interacting with a website. But since the term was coined in 2003 OCR technology has caught up.
When site owners do not provide APIs, developers are often asked for help. The developers create scripts to interact with account data. This usually involves a login process. Captchas are sometimes used on login pages. This explains the need to solve them without manual intervention. Even Amazon used captchas on their Vendor Express login page. They did this before that service was retired. Amazon has many APIs. But order management within the Vendor Express platform was not covered.
The following list is a summary of the process we will follow:
A Five-Step Process
- Make sure you have python installed on your server
- Get an API Key
- Subscribe to the Google Vision API
- Use the Google Vision API with Python
- Validate the results
Step 1. Make sure you have python installed
For this article, we will be using a computer running Windows to run the Python code. I installed Python version 3 from the Python installation instructions for Windows.
But to be sure, we will run this sample code in a command window:
#!/usr/bin/env python import sys sys.stdout.write("hello from Python %sn" % (sys.version,))
When I save the code as hello.py and run this on my machine I see this:
>python hello.py hello from Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)]
Python version 3.x is required to use the http.client library in the sample Python code for the Google Vision API.
Step 2. Get an API Key
Once we know Python is available, we need to get an API Key. The Google Vision API we will be using is hosted on the RapidAPI platform. Getting a key is a simple and free process. Go to the RapidAPI home page and use an email address or social media account to connect.
Step 3. Subscribe to the Google Vision API
Once you are registered on the RapidAPI platform, the next step is to subscribe to the Google Vision API. You can do that by clicking on the blue button on the endpoints page which says “Subscribe to Test”:
After subscribing, I used the online test interface to view the outputs. I wanted to make sure the outputs contained the information I expected. I manually updated the URL string to match what I found for the first captcha example which is this image:
Then I clicked the test endpoint button and was pleased to see the correct result found:
Step 4. Use the Google Vision API with Python
Now that we are sure Python is available and we picked an API, it’s time to use it. First we will start with the sample code on the Endpoints page for Python http.client. This uses built in Python libraries (when you have Python version 3+ installed).
import http.client
conn = http.client.HTTPSConnection("google-ai-vision.p.rapidapi.com")
payload = "{\r
\"source\": \"https://captcha.com/images/captcha/botdetect3-captcha-ancientmosaic.jpg\",\r
\"sourceType\": \"url\"\r
}"
headers = {
'content-type': "application/json",
'x-rapidapi-key': "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
'x-rapidapi-host': "google-ai-vision.p.rapidapi.com"
}
conn.request("POST", "/cloudVision/imageToText", payload, headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))
You’ll need to replace the rapidapi-key Xs with your actual key. You will see this on the endpoints page while logged in.
This sample code pulls results in JSON format. Next we will loop through the left column of 30 examples from captcha.com. Within the loop we will compare the results with what we see with our eyes:
#! python """ File: readSampleCaptchas.py Description: This script uses the Google AI Vision API to read a list of sample captcha images listed at https://captcha.com/captcha-examples.html. The results are compared to what we can see with our eyes. """ #import libraries used below import http.client import json #This is a list of image paths to read imagePaths = [ 'https://captcha.com/images/captcha/botdetect3-captcha-ancientmosaic.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-blackoverlap.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-bubbles.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-bullets.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-bullets2.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-caughtinthenet.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-caughtinthenet2.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-chalkboard.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-chess.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-chess3d.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-chipped.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-circles.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-collage.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-corrosion.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-crossshadow.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-crossshadow2.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-cut.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-darts.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-distortion.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-electric.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-fingerprints.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-flash.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-ghostly.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-graffiti.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-graffiti2.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-halo.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-inbandages.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-jail.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-lego.jpg', 'https://captcha.com/images/captcha/botdetect3-captcha-mass.jpg'] #This is the associated text that should be read for each image imageTexts = [ 'W93BX', 'RBSKW', 'TSMS9', 'R84CH', '3M56R', 'UXP4D', 'RADTC', '3JYP4', 'URVTP', 'HAT8M', 'X8B9A', 'W9NB4', 'TK58P', 'B4T9S', 'AWSKH', 'WB3CX', 'XKWDN', '4NV3A', 'DWXM5', 'HK5B6', '9Y548', 'CEPT6', 'SREMD', 'W9H5K', 'HY4NM', '9T4JW', 'KNYWV', 'AWRTB', 'VETRC', 'WKRH5'] # Set to 1 to show details along the way for debugging purposes debug=0 #Loop through each image path and text result for imagePath,text in zip(imagePaths,imageTexts): #Connect to the API conn = http.client.HTTPSConnection("google-ai-vision.p.rapidapi.com") payload = "{\"source\": \""+imagePath+"\",\"sourceType\": \"url\"}" if debug>0: payload = "{\"source\": \"https://captcha.com/images/captcha/botdetect3-captcha-ancientmosaic.jpg\",\"sourceType\": \"url\"}" print(payload) headers = {'content-type': "application/json", 'x-rapidapi-key': "8ea6d22264mshd97d2ff15c516c5p160f30jsn8a63f7d83682", 'x-rapidapi-host': "google-ai-vision.p.rapidapi.com"} #conn.request("POST", "/cloudVision/imageToText", payload, headers) conn.request("POST", "/cloudVision/imageToText", payload, headers) res = conn.getresponse() data = res.read() if debug>0: print(data.decode("utf-8")) # Load API response into a Python dictionary object, encoded as utf-8 string json_dictionary = json.loads(data.decode("utf-8")) #Remove newlines at the end of the detected text detectedText = json_dictionary['text'].strip() #Compare the detected string with the actual string if (detectedText == text): print(detectedText+" is correct\n") else: print(detectedText+" is not correct, the actual string is "+text+"\n") if debug>0: break
Again, to use the code above you’ll need to replace the Xs with your RapidAPI Key. What the code will do is to loop through a list of sample captcha images. It calls the Google Vision API for each one to detect the string in the image. It then compares the results to what I see with my eyes.
Step 5. Validate the Results
The results for the first column (30 different Captcha patterns) are below:
- W93BX is correct
- RBSKW is correct
- TSMS is not correct, the actual string is TSMS9
- R84CH is correct
- 3M56R is correct
- UXP4D is correct
- PADTC is not correct, the actual string is RADTC
- 3JYP4 is correct
- URVTP is correct
- BATSM\nDAI is not correct, the actual string is HAT8M
- X8B9A is correct
- WONBA is not correct, the actual string is W9NB4
- TKS8P is not correct, the actual string is TK58P
- B4T9S is correct
- AWSKH is correct
- WB3CX is correct
- XKWDN is correct
- ANV3A is not correct, the actual string is 4NV3A
- is not correct, the actual string is DWXM5
- HK 5B 6 is not correct, the actual string is HK5B6
- GYS48 is not correct, the actual string is 9Y548
- CEP TO is not correct, the actual string is CEPT6
- SREMD is correct
- WOHSK is not correct, the actual string is W9H5K
- HY4NM is correct
- 9T43W is not correct, the actual string is 9T4JW
- KNYW\nHVAA is not correct, the actual string is KNYWV
- RIB is not correct, the actual string is AWRTB
- VETRC is correct
- WKRH5 is correct
Based on the 30 examples tested, 16 were correct out of 30. We get 17 correct when we remove all spaces. That doesn’t sound like great accuracy but sometimes I don’t get captchas right the first time as a human! Fortunately, the logic for forms with captcha images usually triggers an image refresh. Then you get to try again with a new image if you got it wrong the first time. That means after 1-3 attempts the Google Vision API should help you get past an image captcha.
Conclusion
In this article, we have walked through an example of using the Google Vision API with Python. We started by getting set up with the API and then used Python to call the API and read a sample of 30 image captchas. Then we discussed the accuracy and how to handle failures. Other applications of this endpoint include reading images of paper documents. You can save text from the images to create electronic copies. That’s how Google created Google Books which allows you to search the full text of books online.
FAQ
How to start with the Google Vision API in python?
Navigate to https://rapidapi.com/category/Visual%20Recognition and choose the Google AI Vision API. Test the endpoints to make sure you get the responses you are looking for. Click the code snippet dropdown and select Python.
What is the Google Vision API?
The Google Vision (or Google AI Vision) API is essentially a way for apps to talk to and interact with Google’s Vision AI. Check out how to use the API with Python in this article.
What can you do with the Google Vision API?
The possibilities are endless. Some examples include: Read text within document images, find faces within images, detect emotions of faces within images, find landmarks and logos within images, find similar images, and even get an adult rating for images.
Leave a Reply