Yarn

FREEMIUM
By weaverdigital | Updated vor 24 Tagen | Media
Health Check

N/A

README

Yarn /vocabulate endpoint returns response in JSON format with three top level fields:

  • page - object containing vocabulated page response
  • averages - object containing averages data of all yarn vocabulated pages.
    Useful for making analysis how given pages look like in comparison to averages.
  • leaderboards - object containing two leaderboards: one with all time
    and one with last ‘x’ days data of top and least difficult vocabulated pages

####page

  • page.overAllScore - number within 1-100 range determining difficulty of given page.
    It’s derived from average difficulty scores of each word on the page (based on Yarn 100k word difficulty index)
    excluding most common words and taking into account overall scores of other already indexed pages.
    Sample overall score ranges interpretation:

    • 1-33 - easy page
    • 34-66 medium difficulty page
    • 67-100 hard to read page
  • page.article - array containing each word from processed page main content in natural reading order,
    useful of presenting page content with difficulty levels applied for each word.

    • page.article[]word - string - word as it appears in main page content
    • page.article[]difficultyScore - number within range 1-100 based on Yarn 100k word difficulty index.
      It’s not available if given word is outside that index or is one of most common words
    • page.article[]category - enum - one of ‘easy’, ‘medium’ or ‘hard’.
      It’s not available if given word is outside that index or is one of most common words
  • page.wordsCategorized - object with 3 fields: easy, medium and hard each of which is an array
    containing page words belonging to a given difficulty category sorted by difficulty in descending order.
    Each array item an object with following fields:

    • page.wordsCategorized{category}[]word - string a word as it appears in page main content
    • page.wordsCategorized{category}[]difficultyScore - number within range 1-100 based on Yarn 100k word difficulty index
    • page.wordsCategorized{category}[].occurrencesCount - number determining how many times given word appeared in main page content
  • page.difficultyCategories - object with 3 fields easy, medium and hard each containing percentage number
    that is determining how many percent of page content words belong to a given difficulty category.
    For example: page.difficultyCategories.easy.percentage = 12, means there are 12% of easy words on a given page

  • page.difficultyDistribution - array sorted by difficulty level ascending with data that shows how many percent words belongs to each difficulty level.

    • page.difficultyDistribution[]difficultyScore - number in 1-100 range
    • page.difficultyDistribution[]wordsPercentage - number describing how many percent of words occurs in page content for given page.difficultyDistribution[]difficultyScore

####averages

  • averages.difficultyCategories - same as page.difficultyCategories but calculated based on all already vocabulted by Yarn pages.
  • averages.difficultyDistribution - same as page.difficultyDistribution but calculated based on all already vocabulted by Yarn pages.

####leaderboards

  • leaderboards.daysBack - object representing leaderboard for ‘x’ days back (where ‘x’ can be send as request parameter: leaderboard[daysBack]) .
    Contains top ‘y’ most difficult and easy pages for a given period (‘y’ can be send as a request parameter: leaderboard[count])
  • leaderboards.total - object representing leaderboard for top ‘y’ most difficult and easy pages out of all vocabulated by Yarn pages (‘y’ can be send as a request parameter: leaderboard[count])
    • leaderboard{daysBack/total}.difficult[] - array containing top ‘y’ most difficult to read pages
    • leaderboard{daysBack/total}.easy[] - array containing top ‘y’ easiest to read pages
    • leaderboard{daysBack/total}.current - object representing current page (page for provided URL) in context of given leaderboard.
      If it is not present, it means that current page is within difficult or easy array (marked with isCurrent flag set to true)
      Yarn measures and compares vocabulary difficulty on English-language web pages.
      It’s handy if you’re a writer, editor or teacher and hoping to reach readers with intermediate literacy levels.

Rare words are hard

The frequency with which a word occurs across a broad corpus of texts has been found to be a decent proxy
for word difficulty: the less often a word is used, the the more difficult it’s likely to be.

From 850 million words…

We have built a 100k word difficulty index based on how often words occur across the 850 million words
of the Corpus of Contemporary American English, the Corpus of Historical American English,
the British National Corpus, and the Corpus of American Soap Operas (in other words, many and diverse texts).
###…One number
Using a logarithmic mapping to smooth out the raw data a little, we give each word in the Yarn
index a difficulty score on a scale of 1 to 100. Then we use a weighted average of the difficulty scores
of the words in the main content of each web page we index to get to the score you see.

A lexical curio

Yarn is not meant to be taken too seriously — after all, the difficulty of a text is clearly about a lot more
than the difficulty of its individual constituent words. We hope you find it thought provoking and fun nonetheless.

Followers: 14
Resources:
Product Website
API Creator:
W
weaverdigital
weaverdigital
Log In to Rate API
Rating: 5 - Votes: 1