OCR

FREEMIUM
By API 4 AI | Updated a month ago | Visual Recognition
Popularity

9.6 / 10

Latency

1,846ms

Service Level

100%

Health Check

100%

Back to All Tutorials (8)

ID and name recognising on driver's license with API4AI OCR

In this tutorial we will look at the apllication of OCR API to recognize US driver’s license ID and the name of the license holder.
We will use the demo API address with a limited number of queries - for our experiments this will be quite enough. For demonstration purposes, we will limit ourselves to a sample driver’s license for Washington, D.C.

We will use the picture:

image

The OCR API can be used in two modes: “simple_text” (by default) and “simple_words”. The first mode produces as a result text with recognized phrases separated by line breaks, and its position. We’re not really interested in that right now, because we want to know the location of each word so that we have something to fall back on. But first we need to understand how the API works. As they say, better one example code than 1024 words.

import math
import sys

import cv2
import requests

API_URL = 'https://demo.api4ai.cloud/ocr/v1/results?algo=simple-words'


# get path from the 1st argument
image_path = sys.argv[1]

# we us HTTP API to get recognized words from the specified image
with open(image_path, 'rb') as f:
    response = requests.post(API_URL, files={'image': f})
json_obj = response.json()

for elem in json_obj['results'][0]['entities'][0]['objects']:
    box = elem['box']  # normalized x, y, width, height
    text = elem['entities'][0]['text']  # recognized text
    print(  # show every word with bounding box
        f'[{box[0]:.4f}, {box[1]:.4f}, {box[2]:.4f}, {box[3]:.4f}], {text}'
    )

In this short code we’ve accessed the API by sending a picture in a POST request, the path to which is passed as the first command-line argument. This little program will simply display the normalized values of the top-left coordinate, width and height of the area with the recognized word, as well as the word itself. Output fragment for the above picture:

...
[0.6279, 0.6925, 0.0206, 0.0200], All
[0.6529, 0.6800, 0.1118, 0.0300], 02/21/1984
[0.6162, 0.7175, 0.0309, 0.0200], BEURT
[0.6515, 0.7350, 0.0441, 0.0175], 4a.ISS
[0.6515, 0.7675, 0.1132, 0.0250], 02/17/2010
[0.7662, 0.1725, 0.0647, 0.1125], tomand
[0.6529, 0.8550, 0.0324, 0.0275], ♥♥
[0.6941, 0.8550, 0.0809, 0.0275], DONOR
[0.6529, 0.8950, 0.1074, 0.0300], VETERAN
[0.9000, 0.0125, 0.0691, 0.0375], USA

Let’s try to apply the obtained data to an image: draw bounding boxes using OpenCV. To do this, we need to solve the problem of converting normalized values into absolute values expressed in integer pixels. And we need exactly the coordinate values of the upper left corner and lower right corner so that we can use them to draw the bounding box. To do this, let’s invent the get_corner_coords function.

def get_corner_coords(height, width, box):
    x1 = int(box[0] * width)
    y1 = int(box[1] * height)
    obj_width = box[2] * width
    obj_height = box[3] * height
    x2 = int(x1 + obj_width)
    y2 = int(y1 + obj_height)
    return x1, y1, x2, y2

The bounding box drawing function will be very simple:

def draw_bounding_box(image, box):
    x1, y1, x2, y2 = get_corner_coords(image.shape[0], image.shape[1], box)
    cv2.rectangle(image, (x1 - 2, y1 - 2), (x2 + 2, y2 + 2), (127, 0, 0), 2)

In this feature, we slightly (by two pixels) widened the frame so that the frame is not pressed close to the words.
(127, 0, 0) — is the navy blue color specified in BGR format. The thickness of the frame is two pixels.

Of course, to be able to work with an image, it must first be read. Let’s modify the last part of our script a bit: read the image, remove the debug output with information about frames, draw each bounding box on the read image, and then save the modified image to the file “output.png”.

image = cv2.imread(image_path)
for elem in json_obj['results'][0]['entities'][0]['objects']:
    box = elem['box']  # normalized x, y, width, height
    text = elem['entities'][0]['text']  # recognized text
    draw_bounding_box(image, box)  # add boundaries to image
cv2.imwrite('output.png', image)

image

Great! But how do we get to the number and name? These are the elements we have in the area we are interested in:

[0.3059, 0.1975, 0.0500, 0.0175], 4d.DLN
[0.3059, 0.2325, 0.1059, 0.0275], A9999999
[0.3074, 0.2800, 0.0603, 0.0200], 1.FAMILY
[0.3735, 0.2800, 0.0412, 0.0175], NAME
[0.3059, 0.3150, 0.0794, 0.0300], JONES
[0.3059, 0.3675, 0.0574, 0.0225], 2.GIVEN
[0.3691, 0.3675, 0.0529, 0.0225], NAMES
[0.3074, 0.4025, 0.1191, 0.0275], ANGELINA
[0.3074, 0.4375, 0.1191, 0.0300], GABRIELA

Yes, the POST request gave ordered results, but the order could actually be different, so we can’t rely on it. It is better to assume that the result always stores the recognized elements in a scattered fashion.

Let’s create a list named words, so that we can then easily search for words and word positions:

words = []
for elem in json_obj['results'][0]['entities'][0]['objects']:
    box = elem['box']
    text = elem['entities'][0]['text']
    words.append({'box': box, 'text': text})

Let’s call “4d.DLN”, “1.FAMILY”, and “2.GIVEN” as field names, and what’s below them in the picture as field values.
The easiest thing we can do is to look for the closest lower-lying elements, based on the positions of the field names. We may find words far to the right or to the left, so we immediately come to the idea that we should not think about the position relative to the axes, but about the distance between the text elements. So, let’s write some code. First, let’s find the positions of the field names:

ID_MARK = '4d.DLN'
FAMILY_MARK = '1.FAMILY'
NAME_MARK = '2.GIVEN'

id_mark_info = {}
fam_mark_info = {}
name_mark_info = {}

for elem in words:
    if elem['text'] == ID_MARK:
        id_mark_info = elem
    elif elem['text'] == FAMILY_MARK:
        fam_mark_info = elem
    elif elem['text'] == NAME_MARK:
        name_mark_info = elem

Then we will write a function that will find the nearest elements below the passed reference element:

def find_label_below(word_info):
    x = word_info['box'][0]
    y = word_info['box'][1]
    candidate = words[0]
    candidate_dist = math.inf
    for elem in words:
        if elem['text'] == word_info['text']:
            continue
        curr_box_x = elem['box'][0]
        curr_box_y = elem['box'][1]
        curr_vert_dist = curr_box_y - y
        curr_horiz_dist = x - curr_box_x
        if curr_vert_dist > 0:  # we are only looking for items below
            dist = math.hypot(curr_vert_dist, curr_horiz_dist)
            if dist > candidate_dist:
                continue
            candidate_dist = dist
            candidate = elem
    return candidate

Let’s try to apply this function and draw the boundaries of the found elements:

id_info = find_label_below(id_mark_info)
fam_info = find_label_below(fam_mark_info)
name_info = find_label_below(name_mark_info)
name2_info = find_label_below(name_info)
canvas = image.copy()
draw_bounding_box(canvas, id_info['box'])
draw_bounding_box(canvas, fam_info['box'])
draw_bounding_box(canvas, name_info['box'])
draw_bounding_box(canvas, name2_info['box'])
cv2.imwrite('result.png', canvas)

Result:

image

Based on all written let’s make a practically useful program without using OpenCV. It will take the path to the picture as an argument and output the ID number and full name to the terminal.

#!/usr/bin/env python3

import math
import sys

import requests

API_URL = 'https://demo.api4ai.cloud/ocr/v1/results?algo=simple-words'

ID_MARK = '4d.DLN'
FAMILY_MARK = '1.FAMILY'
NAME_MARK = '2.GIVEN'
ADDRESS_MARK = '8.ADDRESS'


def find_text_below(words, word_info):
    x = word_info['box'][0]
    y = word_info['box'][1]
    candidate = words[0]
    candidate_dist = math.inf
    for elem in words:
        if elem['text'] == word_info['text']:
            continue
        curr_box_x = elem['box'][0]
        curr_box_y = elem['box'][1]
        curr_vert_dist = curr_box_y - y
        curr_horiz_dist = x - curr_box_x
        if curr_vert_dist > 0:  # we are only looking for items below
            dist = math.hypot(curr_vert_dist, curr_horiz_dist)
            if dist > candidate_dist:
                continue
            candidate_dist = dist
            candidate = elem
    return candidate


if __name__ == '__main__':
    if len(sys.argv) != 2:
        print('Expected one argument: path to image.')
        sys.exit(1)
    image_path = sys.argv[1]
    with open(image_path, 'rb') as f:
        response = requests.post(API_URL, files={'image': f})
    json_obj = response.json()
    words = []
    for elem in json_obj['results'][0]['entities'][0]['objects']:
        box = elem['box']
        text = elem['entities'][0]['text']
        words.append({'box': box, 'text': text})

    id_mark_info = {}
    fam_mark_info = {}
    name_mark_info = {}

    for elem in words:
        if elem['text'] == ID_MARK:
            id_mark_info = elem
        elif elem['text'] == FAMILY_MARK:
            fam_mark_info = elem
        elif elem['text'] == NAME_MARK:
            name_mark_info = elem

    license = find_text_below(words, id_mark_info)['text']
    family_name = find_text_below(words, fam_mark_info)['text']
    name1_info = find_text_below(words, name_mark_info)
    name1 = name1_info['text']
    name2 = find_text_below(words, name1_info)['text']

    if name2 == ADDRESS_MARK:  # no second name
        full_name = f'{name1} {family_name}'
    else:  # with second name
        full_name = f'{name1} {name2} {family_name}'

    print(f'Driver license: {license}')
    print(f'Full name:      {full_name}')

The output of the program when started with the familiar picture as the first argument:

License:   A9999999
Full name: ANGELINA GABRIELA JONES

The program can easily be extended to retrieve other data from driver’s license. Of course, we didn’t consider all the possible problematic situations, because the goal was to demonstrate the practical use of the API, leaving room for improvement for the reader. For example, for working with rotated images, we could determine the angle of rotation from the key fields and use that information to search for “underlying” elements with field values. Try it! Using general ideas, it’s easy to implement program logic for other types of documents and other images with text.

Read the documentation of OCR API and examples of code written in different programming languages to learn more. Also you can use RapidAPI.