AI Textraction

By TextractionAI | Updated 2日前 | Text Analysis

9.7 / 10



Service Level


Health Check


Followers: 8
Product Website Terms of use
API Creator:
Rapid account: Textraction AI
Log In to Rate API
Rating: 5 - Votes: 1


Input Text

  • Text to extract entities from.
  • Up to 50,000 characters long.

example: “The quick brown fox jumps over the lazy dog.”

Input Entities:

  • An array of custom query entities to extract from the text, up to 12 entities per request.
  • Each entity entry is described by a JSON with 3-4 key-value pairs:
    • “description”: a free text description of the entity, up to 150 characters long.
    • “type”: desired entity value output format, any primitive (“string”, “integer”, “float”, “boolean”), or any array of them (example: “array[string]”).
    • “var_name”: a descriptive entity variable name to be used in the output results, up to 50 characters long. It must start with a letter, followed by letters, digits, or underscores.
    • (optional) “valid_values”: an array of valid extracted entity values - use it to limit the extracted entity value to one of pre-defined possible values. Up to 20 values, up to 50 characters each.

example: [{“description: “number of animals mentioned in text”, “type”: “integer”, “var_name”: num_of_animals”}]


  • “results”: a JSON containing an entry for each input entity, mapping from var_name to to the extracted value.
  • “stats”: a JSON with basic request statistics.

example: {“results”: {“num_of_animals”: 2}, “stats”: {“n_text_characters”: 44, “n_entities”: 1, “n_tokens_used”: 300}}


  • Extract custom entities from unstructured text.
  • Powered by a powerful SOTA AI model.
  • Multi-language support.
  • Supports long texts: up to 50,000 characters.


  • View our website for inspirational examples:
  • Input Text:
    • Remove any irrelevant parts of the text to focus the model on the relevant parts only (examples: HTML tags, irrelevant paragraphs, etc).
    • If relevant, add metadata and context for a better semantic understanding (example: “The following Curriculum Vitae was received from a candidate on 2023-04-23: …”).
  • Input Entities:
    • Description:
      • Be explicit and accurately describe the desired value (example: “number of rooms in the property, including only bedrooms and living rooms”).
      • If relevant, specify an output format for better standartization (examples: YYYY-mm-dd, ISO, etc).
      • If needed, add limitations (example: “product summary, 3-5 words”).
    • Variable name:
      • Should be descriptive.
      • Think of them as variable names in a programming language, JSON keys, or columns names of a data table.
    • Type:
      • Should match the desired output value.
    • Valid values:
      • If needed, limit the extracted entity value to one of several expected values.
      • This is very useful when dealing with categorical values (example: automatically setting a value for a drop-down list or a radio button).
  • Output:
    • The model is trained to handle missing/uncertain values by returning a “null” - handle them according to your product requirements (example: fill them with a default value).

Common Use Cases

  • Parse any text:
    • Curriculum Vitae: candidate name, contact details, skills, education, etc.
    • Product listing: product name, specifications, price, etc.
    • Financial: revenues, number of sold items, earning per share, stock ticker, etc.
    • Customer support: order id, customer’s request, etc.
  • Automatically fill/validate detailed user input fields (checkboxes, radio buttons, drop-down lists, text boxes, etc) based on a free text user input.
  • Convert multiple texts into data tables:
    • Add filters based on text entities.
    • Train Machine Learning (ML) models over the extracted entities.
  • Get answers to questions about a text in a structured format.


Tokens are pieces of words containing 1 or more letters. They are closely related to the computational resources needed to process a request. Therefore, our tokens-based pricing plan provides a dynamic and fair pricing structure across a variety of use cases - from very short texts to very long ones.

We offer 50,000 free tokens per month, with a cost that is ranging between 0.01$ to $0.03 for every 1,000 additional tokens. See exact details here.

The total number of tokens consists of the input text, entities, extracted values and internal system overhead. To facilitate easy tracking of tokens usage, you can find the number of tokens utilized in the response under “stats” -> “n_tokens_used”.