Web Page Hasher

FREEMIUM
By jamiembrown | Updated 13 दिन पहले | Tools
Health Check

N/A

README

It should be a simple problem to work out when a web page changes, but it’s surprisingly tricky. Web Page Hasher makes this easy by doing the following:

  1. Fetching the page for you.
  2. Stripping out all script and HTML.
  3. Removing common components that change regularly, such as Twitter or Facebook widgets.
  4. Returning just the text to you.
  5. Also returning an MD5 hash of the content.

By calling this API whenever you want to check a page - and just comparing the hash you stored previously and the hash you get now - you can see whether or not the page has changed.
We don’t want our API to be used for any malicious purposes and we don’t want to upset web masters, so we have some caching and rate limiting:

  1. We will cache results for a period. Usually a minute. You can tell if the result you’re getting back is “fresh” or cached by looking at the “from_cache” variable in the response.

  2. We will not allow more than 10 requests in a period to the same domain. In this case you’ll get back an error number 1004 (“Please wait before trying this domain again”). Please note that this applies globally across all users of the API, so if you’re checking a very popular site expect to deal with this in your code and just hold off for a bit before trying again.
    We’ve already built up a comprehensive database of components that we strip out automatically ourselves, but if you come up with one that you’re hitting frequently then let us know and we’ll add support for that too. Just file an API support ticket!

Followers: 9
API Creator:
J
jamiembrown
jamiembrown
Log In to Rate API
Rating: 5 - Votes: 1