Text Similarity

FREEMIUM
By Twinword API | Updated 1日前 | Text Analysis
Popularity

9.4 / 10

Latency

280ms

Service Level

100%

Health Check

N/A

Back to All Discussions

Models or logic in determining the score

Hi there, I’m using this API as part of my university project. May I know some background/backend processing for the text-similarity? Like what are the factors that determine the scoring. Does it based on any models or how does it work in general? Would appreciate it as I hope to know more about the BlackBox model.

Rapid account: Twinword
twinword Commented 3年前

Although we cannot reveal our proprietary technology or algorithms, I can describe in principle how it works.

Our patented technology that understands the relationship between words. This is so much more than synonyms as there is no synonym for the English word “nail”. However, related words like “finger” exists. We have mapped these relationship and that can be seen here: https://www.twinword.com/api/visual-context-graph.php

Given two words, we can return a score of how semantically related things are by how close they relate to one another on our word map / word graph. When given two paragraphs, our technology identifies the keywords and does the same. The higher the score, the more semantically related those keywords are.

For the word limitations, the maximum number of characters is approximately 75,000 characters. Max number of words is about 10,000 words. Everything after 75,000 characters are not considered and any words after 10,000 words are not considered. Please also be aware that within our algorithm, we identify from the text an X number of keywords we believe is meaningful according to our algorithm. The API is heavily dependent on those keywords among other indicators.

I hope this explanation suffices.

Join in the discussion - add comment below:

Login / Signup to post new comments