Text Similarity

Taraf Twinword, Inc. | Güncelleyen 4 месяца назад | Text Analysis

9.5 / 10



Hizmet Düzeyi


Tüm Tartışmalara Dön

Models or logic in determining the score

10 месяцев назад

Hi there, I’m using this API as part of my university project. May I know some background/backend processing for the text-similarity? Like what are the factors that determine the scoring. Does it based on any models or how does it work in general? Would appreciate it as I hope to know more about the BlackBox model.

twinword commented 10 месяцев назад

Although we cannot reveal our proprietary technology or algorithms, I can describe in principle how it works.

Our patented technology that understands the relationship between words. This is so much more than synonyms as there is no synonym for the English word “nail”. However, related words like “finger” exists. We have mapped these relationship and that can be seen here: https://www.twinword.com/api/visual-context-graph.php

Given two words, we can return a score of how semantically related things are by how close they relate to one another on our word map / word graph. When given two paragraphs, our technology identifies the keywords and does the same. The higher the score, the more semantically related those keywords are.

For the word limitations, the maximum number of characters is approximately 75,000 characters. Max number of words is about 10,000 words. Everything after 75,000 characters are not considered and any words after 10,000 words are not considered. Please also be aware that within our algorithm, we identify from the text an X number of keywords we believe is meaningful according to our algorithm. The API is heavily dependent on those keywords among other indicators.

I hope this explanation suffices.

Aşağıya yorum ekleyerek tartışmaya katılın:

Yeni yorumlar göndermek için giriş yapın / kaydolun
Değerlendirme: 5 - Oy Sayısı: 1