Text Similarity

FREEMIUM
Par Twinword, Inc. | Mise à jour il y a 3 mois | Text Analysis
Popularité

9.4 / 10

Latence

502ms

Niveau de service

98%

Retour à toutes les discussions

Models or logic in determining the score

avatar
contact-NB1Y8R5z7
il y a 8 mois

Hi there, I’m using this API as part of my university project. May I know some background/backend processing for the text-similarity? Like what are the factors that determine the scoring. Does it based on any models or how does it work in general? Would appreciate it as I hope to know more about the BlackBox model.

avatar
twinword commented il y a 8 mois

Although we cannot reveal our proprietary technology or algorithms, I can describe in principle how it works.

Our patented technology that understands the relationship between words. This is so much more than synonyms as there is no synonym for the English word “nail”. However, related words like “finger” exists. We have mapped these relationship and that can be seen here: https://www.twinword.com/api/visual-context-graph.php

Given two words, we can return a score of how semantically related things are by how close they relate to one another on our word map / word graph. When given two paragraphs, our technology identifies the keywords and does the same. The higher the score, the more semantically related those keywords are.

For the word limitations, the maximum number of characters is approximately 75,000 characters. Max number of words is about 10,000 words. Everything after 75,000 characters are not considered and any words after 10,000 words are not considered. Please also be aware that within our algorithm, we identify from the text an X number of keywords we believe is meaningful according to our algorithm. The API is heavily dependent on those keywords among other indicators.

I hope this explanation suffices.

Participez à la discussion - ajoutez un commentaire ci-dessous:

Connectez-vous / Inscrivez-vous pour publier de nouveaux commentaires
Note : 5 - Votes : 1