This section provides guidelines for structuring your Document Parsing API requests to Extracta.ai. Ensure to follow the format below for successful data extraction:
{
"extractionDetails": {
"name": "Extraction Name", // required - Name your extraction process
"language": "Supported Language", // required - Choose from the supported languages
"fields": [
{
"key": "Field Key", // required - Define the key for data extraction
"description": "Field Description", // optional - Describe the field
"example": "Field Example" // optional - Provide an example value
},
...
]
},
"file": "base64String or file URL" // required - Provide the document in base64String format or as a URL
}
In addition to the basic format outlined in the previous sections, Extracta.ai also supports more complex data structures for specialized extraction needs. This advanced format allows the definition of nested objects and arrays, catering to a broader range of data representation.
object
The object type represents a structured object with multiple properties. Each property is defined as an object within an array, and can include its own key, description, type, and example.
{
"key": "personal_info",
"description": "Personal information of the person", // optional
"type": "object",
"properties": [
{
"key": "name",
"description": "Name of the person", // optional
"example": "Alex Smith", // optional
"type": "string" // optional
},
{
"key": "email",
"description": "Email of the person",
"example": "alex.smith@gmail.com",
"type": "string"
},
.....
]
}
array
The array type is used for lists of items, such as a collection of work experiences. The items key contains an object defining the structure of each item in the array.
{
"key": "work_experience",
"description": "Work experience of the person", // optional
"type": "array",
"items": {
"type": "object",
"properties": [
{
"key": "title",
"description": "Title of the job", // optional
"example": "Software Engineer", // optional
"type": "string" // optional
},
{
"key": "start_date",
"description": "Start date of the job",
"example": "2022",
"type": "string"
},
...
]
}
}
object
and array
types, the example
parameter is applicable only for their inner properties/items.type
is specified, it defaults to string
.object
and array
types, the inner fields can only be of type string
. This means that each property within an object or each item within an array should be a string type, ensuring consistency and simplicity in data representation.Extracta.ai is capable of processing documents in image (JPG, PNG), PDF, and DOCX formats. This enhancement allows for a wider range of document types to be submitted for extraction.
Extracta.ai currently supports document extraction in the following languages: Romanian, English, French, Spanish, Arabic, Portuguese, German, Italian. Additional support for 20 more languages is planned by January.
Note: If an unsupported language is specified, the API will return an error message indicating an invalid language choice. Keep updated with our API documentation for new language additions.
Who We Are
Extracta.ai is a technology company specializing in data extraction from a wide array of documents. We cater to various industries, helping businesses automate workflows, enhance efficiency, and reduce manual data handling. Our core expertise lies in parsing structured and unstructured data from diverse document formats.
What We Do
Our flagship offering is the Document Parsing API, a robust tool designed for seamless extraction of data from documents such as CVs, invoices, contracts, and more. This API supports multiple formats, including PDF, Word, TXT, as well as scanned documents in PNG and JPG formats, using OCR technology where needed.
Key Features:
Our Mission
At Extracta.ai, our mission is to streamline data extraction processes, making them more efficient and accessible for businesses of all sizes. We aim to continuously innovate and adapt our technology to meet the evolving needs of our clients.
Our Vision
We envision a future where data extraction is not a bottleneck in business operations but a catalyst for growth and efficiency. By focusing on technology that is both powerful and user-friendly, we strive to be at the forefront of the data extraction industry.
For more information about our Document Parsing API or if you want to contact us directly, visit www.extracta.ai.
Thank you for considering Extracta.ai for your data extraction needs.