NLP Text Processor Categorizer and Analyzer

FREEMIUM

By Christer Fredrickson | Updated a month ago | Text Analysis

Popularity

0.1 / 10

Latency

484ms

Service Level

0%

Health Check

N/A

README

Fliporium’s Full Text Data Natural Language Processing and Categorization

The purpose of this API is to categorize text data, and to do so it must first process the text data using a variety of methods. This app currently only supports English (for now). This API provides the following data after processing:

1.) Categorization
2.) RAKE keyword extraction
3.) Highest occurring Bigrams
4.) Highest occurring Trigrams
5.) Tokenization into sentences
6.) Tokenization into three types of Part of Speech tags; Simple, Detailed and Syntactic Dependency
7.) Sentiment Analysis
8.) Lemmatization
9.) Named Entity recognition

The Categorizer
First off, this is a good categorizer, but of course it’s not perfect. I strive everyday to improve it and I welcome input from users to help improve it.

How it works:
When you post your text data to the API, the app then processes the data by extracting the necessary keywords and Entitles to check against a trained model. The trained model is spaCy’s Large Trained Model of 685k keys and 685k unique vectors (300 dimensions) - https://spacy.io/models/en#en_core_web_lg.

After all the necessary keywords and Entities get extracted from the text data, they are then checked on the model for Word Similarity. “Similarity is determined by comparing word vectors or “word embeddings”, multi-dimensional meaning representations of a word.” - https://spacy.io/usage/spacy-101#vectors-similarity.

The app checks for the similarity of keywords and Entitles to 1200+ general topics and keywords on the model. It gathers up all the scores to bring back the top three categories based not only on score but also which categories have the highest occurrences. The ‘rank’ of the keyword was how many times the categorizer found a keyword to match a particular category.

The categorizer can give false categories if the text data uses a lot of slangs, proverbs, etc. I’m actively working on tuning this up.

Tokenization, Parts of Speech, Named Entity Recognition and Lemmatization
This app uses spaCy’s tokenizers and Part of Speech tags and Dependencies - https://spacy.io/usage/spacy-101#annotations-pos-deps.
There you can find all the info you need on how to examine the data of this API.

RAKE Keywords, Bigrams and Trigrams
This app uses NLTK Natural Language Toolkit for Collocations data - https://www.nltk.org/howto/collocations.html

Sentiment Analysis
This app uses NLTK and VADER Sentiment Analysis
https://github.com/cjhutto/vaderSentiment

There is a character limit of 25,000 characters on each request.

Followers: 2

Resources:

Product Website Terms of use

API Creator:

Christer Fredrickson

christerfredrickson

Rate API:

Rating: 5 - Votes: 1