In this blog post, we’ll explain how to use the Microsoft Computer Vision API with Python.
What is Microsoft Computer Vision API?
The Microsoft Computer Vision API uses machine learning to classify images. It’s not specifically geared for a complex task like facial recognition. It’s more of a general-purpose API.
Amazon, Google, IBM, and other companies offer this kind of machine learning service in the cloud. This saves the user from having to build their database of images and neural networks and renting or buying the infrastructure to handle all of that.
The Microsoft API uses its large infrastructure and machine learning models trained with millions of images. When the programmer posts image there it uses neural networks (deep learning) to classify the images. Classify means to put an object into a category, such as a boat, fish, or person. Depending on the complexity of the image and quality of the photo it can go further with, for example, fishing boat, barracuda, and soldier.
In the example below, we send it something simple, a rose. Then it identifies that image as a rose and assigns a probability that it is correct.
The Microsoft API offers several endpoints depending on what the programmer wants to extract from the image:
endpoint | description |
Analyze Image | This identifies then describes the image in terms that someone familiar with image processing would understand. |
Describe Image | This identifies the image then describes it in complete sentences and simpler labels, for the layman, such as “This is a rose.”. |
Generate Thumbnail | As the name implies, this creates a smaller image such as would be suitable for providing a clickable image for a web page. |
OCR (optical character recognition) | This reads the text from an image. |
Pricing: How much does the Microsoft Computer Vision API Cost?
Fortunately, Microsoft has a free tier that you can use to try out the algorithm. A credit card is required in case you incur overages.
Product Name | Monthly Price | Limits |
Basic | free | 5,000 per month then $0.005 each |
Pro | $19.90 | 15,000 per month then $0.0015 each |
Ultra | $74.90 | 70,000 per month then $0.0012 each |
Mega | $199.90 | 200,000 per month then $0.001 each |
Setup and Prerequisites
First, you need to have some basic knowledge of Python and REST APIs. A REST API means exposing a program over HTTP so that an external user can use it. Without that the user has to connect to the Microsoft internal network to even reach the app.
Next, you need to create a RapidAPI account. There is no fee for this. Then, go to the API Marketplace in RapidAPI and select Microsoft Computer Vision API.
RapidAPI will generate your API keys right away and present you with a screen where you can test the API.
Download any image and then upload it to the API.
To use the generated Python code you need Python version 3. You will need to install the requests Python library too. That is called HTTP for Humans as it makes working with HTTP very simple.
In this example, we will be using Python, but the API is available in other code snippets/SDKs including:
- C
- C#
- Go
- Java
- JavaScript
- Node.js
- Objective-C
- OCaml
- PHP
- Powershell
- Python
- Ruby
- Shell
- Swift
- RapidQL
How To Use the Microsoft Computer Vision API
Below we walk the programmer through an example.
First, download any image. We picked a rose, taken from the Home Depot website.
1. Select the API from the RapidAPI Marketplace
From RapidAPI, navigate to the Microsoft Computer Vision API and subscribe with your credit card. (Hint: There’s a free Basic plan that allows up to 5000 requests/month).
2. Run the API
Upload an image into the API console and then press “Test Endpoint”. As you can see the RapidAPI keys are already filled in. You can optionally fill some of the other parameters, such as visual features, details, or language.
3. Observe the Results and Generate a Python Code Snippet
Select Code/Python to generate code to call the same API with code. Select the Requests API as it is the easiest to work with.
The generated code is shown in the window.
Python Code Snippet
Within the API console, there is a button to upload the image. But in the Python code, we will have to modify the code to add the image. This is because the generated generic code does not know what image the programmer will select.
Below we show how to modify the generated code.
import requests url = "https://microsoft-azure-microsoft-computer-vision-v1.p.rapidapi.com/analyze" querystring = {"visualfeatures":"Categories,Tags,Color,Faces,Description"} payload = "" headers = { 'x-rapidapi-host': "microsoft-azure-microsoft-computer-vision-v1.p.rapidapi.com", 'x-rapidapi-key': "XXXXXXXXXXXXXXX", 'content-type': "multipart/form-data" } response = requests.request("POST", url, data=payload, headers=headers, params=querystring) print(response.text)
Modify the Code to Add the Image
Modify the code as shown below. Open the file in binary mode (rb). And then change the content type to application/octet-stream.
Below we hard-code the image name. You could also make that a command-line parameter.
import base64 import requests f = open("rose.jpeg", "rb") payload = f.read() url = "https://microsoft-azure-microsoft-computer-vision-v1.p.rapidapi.com/analyze" querystring = {"visualfeatures":"Categories,Tags,Color,Faces,Description"} headers = { 'x-rapidapi-host': "microsoft-azure-microsoft-computer-vision-v1.p.rapidapi.com", 'x-rapidapi-key': "xxxxxxxxxx", 'content-type': "application/octet-stream" } response = requests.request("POST", url, data=payload, headers=headers, params=querystring) print(response.text)
Here are the results. As you can see it tags the image with narrowing categories: plant, flower, bouquet, floral, and finally rose.
{ "categories": [{ "name": "sky_object", "score": 0.94921875 }], "color": { "dominantColorForeground": "Red", "dominantColorBackground": "White", "dominantColors": ["White", "Red"], "accentColor": "CA0109", "isBwImg": false, "isBWImg": false }, "tags": [{ "name": "plant", "confidence": 0.9926302433013916 }, { "name": "flower", "confidence": 0.98895359039306641 }, { "name": "rose", "confidence": 0.967316746711731 }, { "name": "bouquet", "confidence": 0.74222016334533691 }, { "name": "floral", "confidence": 0.54713988304138184 }], "description": { "tags": ["plant", "flower", "rose"], "captions": [{ "text": "a close up of a flower", "confidence": 0.93509495662483044 }] }, "faces": [], "requestId": "445d3193-3f6a-4ff9-869c-e8bdd42344a1", "metadata": { "width": 225, "height": 225, "format": "Jpeg" }
We tried it with a bougainvillea plant and the furthest it got was, something called close, with a low probability of 26%. But it did identify that the plant was changing color because the photo was taken in autumn. That, of course, makes it harder to identify the plant.
"tags": [{ "name": "flower", "confidence": 0.94000422954559326 }, { "name": "garden", "confidence": 0.894332766532898 }, { "name": "plant", "confidence": 0.67341375350952148 }, { "name": "autumn", "confidence": 0.66588139533996582 }, { "name": "close", "confidence": 0.26290088891983032 }],
Conclusion
As you can see the Computer Vision is, as the name suggests, good at recognizing objects. But it’s not a plant image classification tool. Instead, its main function is to pick objects out of a photo and classify them in general.
That has lots of applicable use cases, like letting police quickly scanning photos to find, for example, a criminal suspect walking down an otherwise empty alley.
RapidAPI makes it simpler to set up the API as the programmer can set up multiple target vendor APIs from one web site. Plus developers can write their APIs and upload them to RapidAPI for others to use.
Leave a Reply