Table of Contents
What is OCR?
OCR – Optical Character Recognition – is a useful machine vision capability. OCR lets you recognize and extract text from images, so that it can be further processed/stored. This is very useful for processing scans/pictures of text – for instance, when working with invoices, scanned forms and signage.
We’ve looked at several APIs for OCR, evaluating them based on:
- Accuracy – we tried them all with the picture bellow to make sure they clearly recognize the text.
- Price – we outline the price per call of the different APIs.
- Special capabilities – some of dthe API we’ve covered have special capabilities, making them more well suited for specific tasks like scanning invoices / recognizing logos.
We used the following image to try out the API as it contains a lot of text in different styles & sizes, as well as some graphics that could confuse the API.
The Best OCR APIs
The Microsoft Computer Vision API is a comprehensive set of computer vision tools, spanning capabilities like generating smart image thumbnails, recognizing celebrities in images and describing the content of images using AI.
The text recognition works well, and returns the text divided into regions of text. Each region has lines, and each line has words, which contain the actual text. The division is convenient for understanding the structure of the content in the image, though if you just need the text as one large string and don’t care about positioning, it’ll require more code.
The free tier for Microsoft’s API will give you 5,000 requests per month. The API has 3 paid plans:
- $19.90 -> 15,000 requests / month
- $74.90 -> 70,000 requests / month
- $199.90 -> 200,000 requests / month
The SemaMedia API also requires manually setting the language with each request (using the lang parameter). In scenarios where the language is known this should actually improve the accuracy, as it lets the API compare the recognized words with the dictionary (when using the df=True option).
The API handled the supplied image very well. It returns an array of results, each a region of text with a position in the image, as well as the text result.
The SemaMedia platform also supports video OCR with the Video OCR API. According to the docs, video OCR is an analysis cascade which includes video segmentation (hard-cut), video text detection/recognition, and named entity recognition from video text (NER is a free add-on feature). The analysis result of this method enables automatic video retrieval and indexing as well as content-based video search in video archives. A detailed example can be found in our demo website.
The free tier for SemaMedia’s API will give you 100 requests per month. The API has 3 paid plans:
- $50.00 -> 2,200 requests / month
- $200.00 -> 13,500 requests / month
- $500.00 -> 40,000 requests / month
The Taggun API is a unique OCR API, targeted directly at scanning invoices and receipts. This can be useful as the API not only recognizes the text in the image, it also recognizes the structure of the invoice and returns parsed data like totalAmount, taxAmoumt, merchantName etc…
Calling the simple receipt processing endpoint, the API returns an accuracy score with each piece of information returned. Sometimes, that’d be 0 and the information would be missing. However, when the information is there, it is usually accurate.
The label by label accuracy can be used to ask users for fields that are not properly recognized in the scanned invoice.
The Taggun API has a free plan that includes 50 requests per month, and a paid plan costing $90 that includes 1,000 monthly requests.
The Cloudmersive OCR API is a nifty tool for simple text extraction from images. It has only one endpoint – Image to Text , and returns all the text in the image as one string rather than by regions. This can be useful when transcribing a big blob of text (from a book / paper), and only the text itself is needed.
The API was pretty accurate, and successfully transcribed most words in the document.
The free tier for the Cloudmersive API will give you 50,000 requests per month. The API has 3 paid plans:
- $ 19.99 -> 100,000 requests / month
- $ 49.99 -> 250,000 requests / month
- $ 99.90 -> 500,000 requests / month
The Google Cloud Vision API is a comprehensive machine vision platform, with capabilities beyond OCR such as face recognition, image labeling and landmark detection (detecting natural/man-made landmark in images).
Using the /detectText endpoint with the supplied image, the API identified the text well. The response contains a textAnnotation field which has the different word segments in the image, with their text and location. This can be very handy for highlighting specific words in the image (for instance highlighting brand names / words from a list).
The API also returns a fullTextAnnotation field which contains the entire text in the image as a single string, as well as the detected language of the document.
The API includes 1,000 free API calls per month, and charges $1.5 for each subsequent 1,000 requests (as of April 2018).
The Google Cloud Vision API also has an OCR-related endpoint called /detectLogos . Given an image that contains brand logos, this endpoint could identify the brands they belong to. During our testing, this endpoint easily identified logos for top brands.
Summary: Best OCR APIs
Text by regions
Text annotation (all text as one string)
Requests in Free Tier
Est. price per call
Google Cloud Vision
Sema Media Data
Microsoft Computer Vision
What is OCR API?
OCR – Optical Character Recognition – is a useful machine vision capability. OCR lets you recognize and extract text from images so that it can be further processed/stored. This is very useful for processing scans/pictures of text – for instance, when working with invoices, scanned forms and signage.
What are some well known OCR APIs available as a web service?
Here are a few of the top OCR APIs: Google Cloud Vision, Sema Media Data, Taggun, Cloudmersive, and Microsoft Computer Vision
How much does it cost to use an OCR API?
The cost of using these APIs can range from $0.0015 to $0.09 per API request depending on the API and subscription plan.