What is OCR?
OCR – Optical Character Recognition – is a useful machine vision capability. OCR lets you recognize and extract text from images, so that it can be further processed/stored. This is very useful for processing scans/pictures of text – for instance, when working with invoices, scanned forms and signage.
We’ve looked at several APIs for OCR, evaluating them based on:
- Accuracy – we tried them all with the picture bellow to make sure they clearly recognize the text.
- Price – we outline the price per call of the different APIs.
- Special capabilities – some of dthe API we’ve covered have special capabilities, making them more well suited for specific tasks like scanning invoices / recognizing logos.
We used the following image to try out the API as it contains a lot of text in different styles & sizes, as well as some graphics that could confuse the API.
The Best OCR APIs
The Microsoft Computer Vision API is a comprehensive set of computer vision tools, spanning capabilities like generating smart image thumbnails, recognizing celebrities in images and describing the content of images using AI.
The text recognition works well, and returns the text divided into regions of text. Each region has lines, and each line has words, which contain the actual text. The division is convenient for understanding the structure of the content in the image, though if you just need the text as one large string and don’t care about positioning, it’ll require more code.
The free tier for Microsoft’s API will give you 5,000 requests per month. The API has 3 paid plans:
- $19.90 -> 15,000 requests / month
- $74.90 -> 70,000 requests / month
- $199.90 -> 200,000 requests / month
The SemaMedia API also requires manually setting the language with each request (using the lang parameter). In scenarios where the language is known this should actually improve the accuracy, as it lets the API compare the recognized words with the dictionary (when using the df=True option).
The API handled the supplied image very well. It returns an array of results, each a region of text with a position in the image, as well as the text result.
The SemaMedia platform also supports video OCR with the Video OCR API. According to the docs, video OCR is an analysis cascade which includes video segmentation (hard-cut), video text detection/recognition, and named entity recognition from video text (NER is a free add-on feature). The analysis result of this method enables automatic video retrieval and indexing as well as content-based video search in video archives. A detailed example can be found in our demo website.
The free tier for SemaMedia’s API will give you 100 requests per month. The API has 3 paid plans:
- $50.00 -> 2,200 requests / month
- $200.00 -> 13,500 requests / month
- $500.00 -> 40,000 requests / month
The Taggun API is a unique OCR API, targeted directly at scanning invoices and receipts. This can be useful as the API not only recognizes the text in the image, it also recognizes the structure of the invoice and returns parsed data like totalAmount, taxAmoumt, merchantName etc…
Calling the simple receipt processing endpoint, the API returns an accuracy score with each piece of information returned. Sometimes, that’d be 0 and the information would be missing. However, when the information is there, it is usually accurate.
The label by label accuracy can be used to ask users for fields that are not properly recognized in the scanned invoice.
The Taggun API has a free plan that includes 50 requests per month, and a paid plan costing $90 that includes 1,000 monthly requests.
The Cloudmersive OCR API is a nifty tool for simple text extraction from images. It has only one endpoint – Image to Text , and returns all the text in the image as one string rather than by regions. This can be useful when transcribing a big blob of text (from a book / paper), and only the text itself is needed.
The API was pretty accurate, and successfully transcribed most words in the document.
The free tier for the Cloudmersive API will give you 50,000 requests per month. The API has 3 paid plans:
- $ 19.99 -> 100,000 requests / month
- $ 49.99 -> 250,000 requests / month
- $ 99.90 -> 500,000 requests / month
The Google Cloud Vision API is a comprehensive machine vision platform, with capabilities beyond OCR such as face recognition, image labeling and landmark detection (detecting natural/man-made landmark in images).
Using the /detectText endpoint with the supplied image, the API identified the text well. The response contains a textAnnotation field which has the different word segments in the image, with their text and location. This can be very handy for highlighting specific words in the image (for instance highlighting brand names / words from a list).
The API also returns a fullTextAnnotation field which contains the entire text in the image as a single string, as well as the detected language of the document.
The API includes 1,000 free API calls per month, and charges $1.5 for each subsequent 1,000 requests (as of April 2018).
The Google Cloud Vision API also has an OCR-related endpoint called /detectLogos . Given an image that contains brand logos, this endpoint could identify the brands they belong to. During our testing, this endpoint easily identified logos for top brands.
Summary: Best OCR APIs
Text by regions
Text annotation (all text as one string)
Requests in Free Tier
Est. price per call
Google Cloud Vision
Sema Media Data
Microsoft Computer Vision
Let’s say you’ve been assigned the role of digitizing monthly invoices from suppliers. You can decide to go old school way and type them manually as you correct any spelling mistakes. You can also use a scanner, or leverage the popular Optical Character Recognition software to convert all the information on the invoices into digital files. While all the options mention above are doable, only Optical Character Recognition (OCR) guarantees efficiency, accuracy, and attention to detail. However, before we bombard you with more details, let’s cut to the chase and tell you what OCR is and where it is used.
What is OCR?
OCR is the abbreviation for Optical Character Recognition, a technology that allows you to electronically or mechanically convert texts in printed, handwritten, typed, scanned, and image documents into machine-readable, digital data format. The technology recognizes and extracts characters such as letters, numbers and punctuations from image texts, as well as printed and written documents, and transforms them into an electronic format that is easily readable by software programs and computers.
Earlier versions of OCR were trained with images of each character, and they could only work on a single font at a time. However, today you can find advanced systems that can produce a high degree of recognition accuracy. Additionally, modern systems can work on different fonts at a go and deliver results in a plethora of digital file format inputs.
However, OCR technology doesn’t take into account the nature of the document or item that holds the characters. It only scours the item for the texts that need to be converted. For those looking to recognize both the nature of the item and its characters, you need to fuse various technologies.
How OCR Works
The optical character recognition enables the conversion of characters through three main steps; image preprocessing, character recognition, and postprocessing.
This step involves aa series of processes that are designed to improve the image clarity for better and successful recognition. The primary purpose of preprocessing is to suppress distortions and enhance the vital features in a document or image being scanned.
This step involves two core OCR algorithms that enable the device to be used to detect only the intended portions or shapes of a digitized image. If the input data is too large, only a small portion of it will be processed. This step ensures that crucial parts of a document or image are retained, and the redundant parts sorted out—these guarantees better performance when it comes to text recognition.
Postprocessing a step that seeks to correct errors and ensure improved accuracy of the OCR. Accuracy can be improved through the use of a lexicon – a list of words, numbers or codes that are accepted. This way, the algorithm can only fall back to the list of numbers, words, and codes required. This step may also include other techniques aimed at improving accuracy. These include things such as the use of standard colors and business rules.
What is OCR Used For?
Since its inception, Optical Character Recognition has been adopted in various fields, ranging from banking to history. And now that the technology has undergone tremendous advancement, you’ll find it in several areas today. These include:
- Automated data processing and data entry in firms that need to digitize printed data such as invoices, bank statements, and receipts
- It is also used in digitizing historical documents and newspapers to make them searchable.
- Recognition of license plates by speed cameras and red-light camera software
- It is also found in speech synthesizers for individuals who are unable to speak.
- Creating automated workflows by digitizing PDF documents in various business units
- Identifying and registering people at borders and other checkpoints
- It can also be used in payment processes to ease cross-border transactions.
What is OCR API?
Like many other technologies, most businesses are looking for ways to integrate OCR into their applications and systems. And one of the best ways to do this through the use of API. Currently, there are several OCR APIs that individuals can leverage to recognize various characters from a vast array of images and documents. Rather than spending a fortune on OCR devices, individuals and businesses can take advantage of OCR APIs, which can also help extract printed or handwritten text from images.
What is OCR API?
OCR – Optical Character Recognition – is a useful machine vision capability. OCR lets you recognize and extract text from images so that it can be further processed/stored. This is very useful for processing scans/pictures of text – for instance, when working with invoices, scanned forms and signage.
What are some well known OCR APIs available as a web service?
Here are a few of the top OCR APIs: Google Cloud Vision, Sema Media Data, Taggun, Cloudmersive, and Microsoft Computer Vision
How much does it cost to use an OCR API?
The cost of using these APIs can range from $0.0015 to $0.09 per API request depending on the API and subscription plan.