OCR

FREEMIUM
By API 4 AI | Updated 한 달 전 | Visual Recognition
Popularity

9.6 / 10

Latency

1,845ms

Service Level

100%

Health Check

100%

Back to All Tutorials (8)

How to analyze a shopping receipt with OCR API

There are numerous reasons necessitating the need to accurately calculate the amount of money expended during shopping endeavors.

Despite the potential challenges associated with this task, a viable solution can be readily achieved through the utilization of an Optical Character Recognition (OCR) API.

The OCR API possesses the capability to identify textual content within an image, effectively delineating bounding boxes around the recognized words and subsequently generating a comprehensive transcript of the enclosed text.

To implement this solution, a Python script will be developed to locate the keyword ‘Total,’ positioned in the lowermost section of a shopping receipt.

Python code

Get total from a receipt

Please send a photo of the receipt to the OCR API and retrieve all the words along with their respective positions in the photo.
Locate the word “total” at the bottom of the image and identify the number on the same horizontal level.
This number represents the total amount. Keep in mind that the receipt in the photo may be crumpled or tilted, so a slight discrepancy in the word heights is acceptable during the comparison process.

def parse_receipt(photo_path: Path, api_key: str):
    # We strongly recommend you use exponential backoff.
    error_statuses = (408, 409, 429, 500, 502, 503, 504)
    s = requests.Session()
    retries = Retry(backoff_factor=1.5, status_forcelist=error_statuses)

    s.mount('https://', HTTPAdapter(max_retries=retries))

    url = f'{API_URL}/v1/results?algo=simple-words'
    with photo_path.open('rb') as f:
        api_res = s.post(url, files={'image': f},
                         headers={'X-RapidAPI-Key': api_key}, timeout=20)
    api_res_json = api_res.json()

    # Handle processing failure.
    if (api_res.status_code != 200 or
            api_res_json['results'][0]['status']['code'] == 'failure'):
        print('Image processing failed.')
        sys.exit(1)

    # Find total in the receipt.
    objs = api_res_json['results'][0]['entities'][0]['objects']

    def close_equal(num1, num2, epsilon):
        """Compare two numbers with inaccuracy."""
        return abs(num1 - num2) < epsilon

    # Find total at the bottom of the receipt.
    lowest_total_obj = max([obj for obj in objs if obj['entities'][0]['text'].lower() == 'total'], key=lambda obj: obj['box'][1])
    # Find all words on the same line with the lowest total word.
    possible_total_obj = [obj for obj in objs if close_equal(obj['box'][1], lowest_total_obj['box'][1], 0.01)]

    # Find and return number at the Total number.
    for obj in possible_total_obj:
        try:
            return float(obj['entities'][0]['text'])
        except ValueError:
            continue
    return None

Parse command line arguments

The RAPID API Token and the path to the photo of the receipt will be provided as command-line arguments.
To facilitate this, we will utilize the built-in argparse package for implementation.

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--api-key', help='Rapid API token.', required=True)  # Get your token at https://rapidapi.com/api4ai-api4ai-default/api/ocr43/pricing.
    parser.add_argument('photo', type=Path,
                        help='Path to a shopping receipt photo.')
    return parser.parse_args()

Main function

Finally, call the parse_receipt function and handle its return.

def main():
    args = parse_args()
    if total := parse_receipt(args.photo, args.api_key):
        print(f'Total: {total}')
    else:
        print('Total not found.')

if __name__ == '__main__':
    main()

Python code

"""
Get total payment from a shopping receipt.
Run the script:
`python3 main.py --api-key <RAPID API TOKEN> <PATH TO A SHOPPING RECEIPT PHOTO>`
"""
import argparse
import sys
from pathlib import Path

import requests
from requests.adapters import Retry, HTTPAdapter


API_URL = 'https://ocr43.p.rapidapi.com'


def parse_args():
    """Parse command line arguments."""
    parser = argparse.ArgumentParser()
    parser.add_argument('--api-key', help='Rapid API token.', required=True)  # Get your token at https://rapidapi.com/api4ai-api4ai-default/api/ocr43/pricing.
    parser.add_argument('photo', type=Path,
                        help='Path to a shopping receipt photo.')
    return parser.parse_args()


def parse_receipt(photo_path: Path, api_key: str):
    """Get total from shopping receipt."""
    # We strongly recommend you use exponential backoff.
    error_statuses = (408, 409, 429, 500, 502, 503, 504)
    s = requests.Session()
    retries = Retry(backoff_factor=1.5, status_forcelist=error_statuses)

    s.mount('https://', HTTPAdapter(max_retries=retries))

    url = f'{API_URL}/v1/results?algo=simple-words'
    with photo_path.open('rb') as f:
        api_res = s.post(url, files={'image': f},
                         headers={'X-RapidAPI-Key': api_key}, timeout=20)
    api_res_json = api_res.json()

    # Handle processing failure.
    if (api_res.status_code != 200 or
            api_res_json['results'][0]['status']['code'] == 'failure'):
        print('Image processing failed.')
        sys.exit(1)

    # Find total in the receipt.
    objs = api_res_json['results'][0]['entities'][0]['objects']

    def close_equal(num1, num2, epsilon):
        """Compare two numbers with inaccuracy."""
        return abs(num1 - num2) < epsilon

    # Find total at the bottom of the receipt.
    lowest_total_obj = max([obj for obj in objs if obj['entities'][0]['text'].lower() == 'total'], key=lambda obj: obj['box'][1])
    # Find all words on the same line with the lowest total word.
    possible_total_obj = [obj for obj in objs if close_equal(obj['box'][1], lowest_total_obj['box'][1], 0.01)]

    # Find and return number at the Total number.
    for obj in possible_total_obj:
        try:
            return float(obj['entities'][0]['text'])
        except ValueError:
            continue
    return None


def main():
    """
    Script entry point.

    Print total payment from shopping receipt.
    """
    args = parse_args()
    if total := parse_receipt(args.photo, args.api_key):
        print(f'Total: {total}')
    else:
        print('Total not found.')


if __name__ == '__main__':
    main()

Test the script

Let’s run the script: python3 main.py --api-key "YOUR_API_KEY" "PATH/TO/RECEIPT"

Conclusion

The parsing of shopping receipts is an essential feature in various domains.

The OCR API proves to be a valuable tool in effectively addressing numerous digitization challenges.

If you require further enhancements in receipt parsing, you have the option to upgrade the provided code yourself. Alternatively, you can reach out to us at https://api4.ai/get-started. We are prepared to offer tailored solutions for your specific requirements.