Text Similarity

부분 유료
분류별 Twinword, Inc. | 업데이트됨 3달 전 | Text Analysis

9.4 / 10

지연 시간


서비스 수준


모든 토론으로 돌아가기

Models or logic in determining the score

Hi there, I’m using this API as part of my university project. May I know some background/backend processing for the text-similarity? Like what are the factors that determine the scoring. Does it based on any models or how does it work in general? Would appreciate it as I hope to know more about the BlackBox model.

twinword commented 8달 전

Although we cannot reveal our proprietary technology or algorithms, I can describe in principle how it works.

Our patented technology that understands the relationship between words. This is so much more than synonyms as there is no synonym for the English word “nail”. However, related words like “finger” exists. We have mapped these relationship and that can be seen here: https://www.twinword.com/api/visual-context-graph.php

Given two words, we can return a score of how semantically related things are by how close they relate to one another on our word map / word graph. When given two paragraphs, our technology identifies the keywords and does the same. The higher the score, the more semantically related those keywords are.

For the word limitations, the maximum number of characters is approximately 75,000 characters. Max number of words is about 10,000 words. Everything after 75,000 characters are not considered and any words after 10,000 words are not considered. Please also be aware that within our algorithm, we identify from the text an X number of keywords we believe is meaningful according to our algorithm. The API is heavily dependent on those keywords among other indicators.

I hope this explanation suffices.

아래에 의견을 추가하고 토론에 참여하세요.

새 댓글을 게시하려면 로그인 / 가입
등급: 5 - 투표: 1