How to use Google Vision API with Python

Optical Character Recognition (OCR) has many uses in the world today. Especially as we continue transitioning our communications from paper to digital forms. One common request for this technology is to solve captchas. Appropriate use for this is to automate the management of an account someone has online. For example, a seller wants to automate the response to new customer orders. These appear in their vendor account for an online marketplace. If a captcha is active for the login page that becomes a hurdle. We will walk through how to use the Google Vision API with Python to overcome it.

For sample captcha images we will use the ones listed on the examples page at Captcha.com. The Google Vision API has several different endpoints. We will avoid using facial recognition endpoints with public images. This is due to recent controversies around that technology. Instead, we will focus on captcha images. The purpose of a captcha is to prevent scripts from interacting with a website. But since the term was coined in 2003 OCR technology has caught up.

When site owners do not provide APIs, developers are often asked for help. The developers create scripts to interact with account data. This usually involves a login process. Captchas are sometimes used on login pages. This explains the need to solve them without manual intervention. Even Amazon used captchas on their Vendor Express login page. They did this before that service was retired. Amazon has many APIs. But order management within the Vendor Express platform was not covered.

The following list is a summary of the process we will follow:

A Five-Step Process

Make sure you have python installed on your server
Get an API Key
Subscribe to the Google Vision API
Use the Google Vision API with Python
Validate the results

Step 1. Make sure you have python installed

For this article, we will be using a computer running Windows to run the Python code. I installed Python version 3 from the Python installation instructions for Windows.

But to be sure, we will run this sample code in a command window:

#!/usr/bin/env python
import sys
sys.stdout.write("hello from Python %sn" % (sys.version,))

When I save the code as hello.py and run this on my machine I see this:

>python hello.py
hello from Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)]

Python version 3.x is required to use the http.client library in the sample Python code for the Google Vision API.

Step 2. Get an API Key

Once we know Python is available, we need to get an API Key. The Google Vision API we will be using is hosted on the RapidAPI platform. Getting a key is a simple and free process. Go to the RapidAPI home page and use an email address or social media account to connect.

Step 3. Subscribe to the Google Vision API

Once you are registered on the RapidAPI platform, the next step is to subscribe to the Google Vision API. You can do that by clicking on the blue button on the endpoints page which says “Subscribe to Test”:

After subscribing, I used the online test interface to view the outputs. I wanted to make sure the outputs contained the information I expected. I manually updated the URL string to match what I found for the first captcha example which is this image:

Then I clicked the test endpoint button and was pleased to see the correct result found:

Step 4. Use the Google Vision API with Python

Now that we are sure Python is available and we picked an API, it’s time to use it. First we will start with the sample code on the Endpoints page for Python http.client. This uses built in Python libraries (when you have Python version 3+ installed).

import http.client

conn = http.client.HTTPSConnection("google-ai-vision.p.rapidapi.com")

payload = "{\r
    \"source\": \"https://captcha.com/images/captcha/botdetect3-captcha-ancientmosaic.jpg\",\r
    \"sourceType\": \"url\"\r
}"

headers = {
    'content-type': "application/json",
    'x-rapidapi-key': "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    'x-rapidapi-host': "google-ai-vision.p.rapidapi.com"
    }

conn.request("POST", "/cloudVision/imageToText", payload, headers)

res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))

You’ll need to replace the rapidapi-key Xs with your actual key. You will see this on the endpoints page while logged in.

This sample code pulls results in JSON format. Next we will loop through the left column of 30 examples from captcha.com. Within the loop we will compare the results with what we see with our eyes:

#! python
"""
File: readSampleCaptchas.py
Description: This script uses the Google AI Vision API to read a list of sample
  captcha images listed at https://captcha.com/captcha-examples.html.  The
  results are compared to what we can see with our eyes.
"""

#import libraries used below
import http.client
import json

#This is a list of image paths to read
imagePaths = [
'https://captcha.com/images/captcha/botdetect3-captcha-ancientmosaic.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-blackoverlap.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-bubbles.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-bullets.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-bullets2.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-caughtinthenet.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-caughtinthenet2.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-chalkboard.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-chess.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-chess3d.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-chipped.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-circles.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-collage.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-corrosion.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-crossshadow.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-crossshadow2.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-cut.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-darts.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-distortion.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-electric.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-fingerprints.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-flash.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-ghostly.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-graffiti.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-graffiti2.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-halo.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-inbandages.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-jail.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-lego.jpg',
'https://captcha.com/images/captcha/botdetect3-captcha-mass.jpg']

#This is the associated text that should be read for each image
imageTexts = [
'W93BX',
'RBSKW',
'TSMS9',
'R84CH',
'3M56R',
'UXP4D',
'RADTC',
'3JYP4',
'URVTP',
'HAT8M',
'X8B9A',
'W9NB4',
'TK58P',
'B4T9S',
'AWSKH',
'WB3CX',
'XKWDN',
'4NV3A',
'DWXM5',
'HK5B6',
'9Y548',
'CEPT6',
'SREMD',
'W9H5K',
'HY4NM',
'9T4JW',
'KNYWV',
'AWRTB',
'VETRC',
'WKRH5']

# Set to 1 to show details along the way for debugging purposes
debug=0

#Loop through each image path and text result
for imagePath,text in zip(imagePaths,imageTexts):

  #Connect to the API
  conn = http.client.HTTPSConnection("google-ai-vision.p.rapidapi.com")

  payload = "{\"source\": \""+imagePath+"\",\"sourceType\": \"url\"}"

  if debug>0:
    payload = "{\"source\": \"https://captcha.com/images/captcha/botdetect3-captcha-ancientmosaic.jpg\",\"sourceType\": \"url\"}"
    print(payload)

  headers = {'content-type': "application/json",
    'x-rapidapi-key': "8ea6d22264mshd97d2ff15c516c5p160f30jsn8a63f7d83682",
    'x-rapidapi-host': "google-ai-vision.p.rapidapi.com"}

  #conn.request("POST", "/cloudVision/imageToText", payload, headers)
  conn.request("POST", "/cloudVision/imageToText", payload, headers)

  res = conn.getresponse()
  data = res.read()

  if debug>0:
    print(data.decode("utf-8"))

  # Load API response into a Python dictionary object, encoded as utf-8 string
  json_dictionary = json.loads(data.decode("utf-8"))

  #Remove newlines at the end of the detected text
  detectedText = json_dictionary['text'].strip()

  #Compare the detected string with the actual string
  if (detectedText == text):
    print(detectedText+" is correct\n")
  else:
    print(detectedText+" is not correct, the actual string is "+text+"\n")

  if debug>0:
    break

Again, to use the code above you’ll need to replace the Xs with your RapidAPI Key. What the code will do is to loop through a list of sample captcha images. It calls the Google Vision API for each one to detect the string in the image. It then compares the results to what I see with my eyes.

Step 5. Validate the Results

The results for the first column (30 different Captcha patterns) are below:

W93BX is correct
RBSKW is correct
TSMS is not correct, the actual string is TSMS9
R84CH is correct
3M56R is correct
UXP4D is correct
PADTC is not correct, the actual string is RADTC
3JYP4 is correct
URVTP is correct
BATSM\nDAI is not correct, the actual string is HAT8M
X8B9A is correct
WONBA is not correct, the actual string is W9NB4
TKS8P is not correct, the actual string is TK58P
B4T9S is correct
AWSKH is correct
WB3CX is correct
XKWDN is correct
ANV3A is not correct, the actual string is 4NV3A
is not correct, the actual string is DWXM5
HK 5B 6 is not correct, the actual string is HK5B6
GYS48 is not correct, the actual string is 9Y548
CEP TO is not correct, the actual string is CEPT6
SREMD is correct
WOHSK is not correct, the actual string is W9H5K
HY4NM is correct
9T43W is not correct, the actual string is 9T4JW
KNYW\nHVAA is not correct, the actual string is KNYWV
RIB is not correct, the actual string is AWRTB
VETRC is correct
WKRH5 is correct

Based on the 30 examples tested, 16 were correct out of 30. We get 17 correct when we remove all spaces. That doesn’t sound like great accuracy but sometimes I don’t get captchas right the first time as a human! Fortunately, the logic for forms with captcha images usually triggers an image refresh. Then you get to try again with a new image if you got it wrong the first time. That means after 1-3 attempts the Google Vision API should help you get past an image captcha.

Conclusion

In this article, we have walked through an example of using the Google Vision API with Python. We started by getting set up with the API and then used Python to call the API and read a sample of 30 image captchas. Then we discussed the accuracy and how to handle failures. Other applications of this endpoint include reading images of paper documents. You can save text from the images to create electronic copies. That’s how Google created Google Books which allows you to search the full text of books online.

FAQ

How to start with the Google Vision API in python?

Navigate to https://rapidapi.com/category/Visual%20Recognition and choose the Google AI Vision API. Test the endpoints to make sure you get the responses you are looking for. Click the code snippet dropdown and select Python.

What is the Google Vision API?

The Google Vision (or Google AI Vision) API is essentially a way for apps to talk to and interact with Google’s Vision AI. Check out how to use the API with Python in this article.

What can you do with the Google Vision API?

The possibilities are endless. Some examples include: Read text within document images, find faces within images, detect emotions of faces within images, find landmarks and logos within images, find similar images, and even get an adult rating for images.

5/5 - (1 vote)

How to use Google Vision API with Python

A Five-Step Process

Step 1. Make sure you have python installed

Step 2. Get an API Key

Step 3. Subscribe to the Google Vision API

Step 4. Use the Google Vision API with Python

Step 5. Validate the Results

Conclusion

FAQ

How to start with the Google Vision API in python?

What is the Google Vision API?

What can you do with the Google Vision API?

Kelly Arellano

Leave a Reply

A Five-Step Process

Step 1. Make sure you have python installed

Step 2. Get an API Key

Step 3. Subscribe to the Google Vision API

Step 4. Use the Google Vision API with Python

Step 5. Validate the Results

Conclusion

FAQ

How to start with the Google Vision API in python?

What is the Google Vision API?

What can you do with the Google Vision API?

Kelly Arellano

Reader Interactions

Leave a Reply

Footer

Building an Enterprise API Program Learn More

Building an Enterprise API Program
Learn More