Word Similarity

PAID
By DanceWithData | Updated vor 23 Tagen | Text Analysis
Health Check

N/A

Back to All Tutorials (1)

Overview

The word2vec models often generate vectors in high-dimension spaces. While these models retain much information on language within, they are hard to analyze. That is why the results of the current most-similar words services are yet unsophisticated. The Owl API uses advanced text clustering techniques to improve current tools.

You can find the results of the most similar words for “Toyota” generated by the Owl API using the *glove-wiki-gigaword-300 *model below. You can see the results are well-separated to the models, makers, and general subgroups; a granularity that you can’t find in the original model.

"Owl (news)": {
0: ["camry","prius","lexus"], 
1: ["honda","nissan", "mazda", "motor", "ford"], 
2: ["automaker", ,"automakers"]
}

Also, you can find the results of the most similar words for “apple” generated by the Owl API using the en-core-web-lg model. You can see the results are well-separated to the fruits, gadgets, and berries.

"Owl (general)": {
0: ["apples","fruit","pineapple", "pear", "cider"], 
1: ["iphone","ipad"], 
2: ["blueberry", ,"strawberry", "blackberry"]
}