Article Text Extraction and Data Mining

FREEMIUM
By Stephan Yazvinski | Updated 2달 전 | Data
Health Check

N/A

Back to All Tutorials (1)

Uncovering the Facts: Article API documentation

Endpoint

/extract_article_data (GET)

Parameters

  • url: The URL of the article that you want to scrape data from. This parameter is required.
  • js: A boolean value (True or False) that specifies whether or not to execute javascript on the article’s website. This parameter is optional and defaults to False
  • js_timeout: The time in seconds to wait for javascript execution. This parameter is optional and defaults to 20 seconds.

Request

GET /extract_article_data?url=https://www.example.com&js=True&js_timeout=30

Responses

  • 200 OK: The request was successful and the data was returned in a JSON format.
  • 400 Bad Request: The request was malformed. Check the error message for more details.
  • 500 Internal Server Error: An error occurred while processing the request. Check the error message for more details.

Example

{ "article": { "text": "example text", "html": "example html", "media": [], "images": [], "author": "John Doe", "pub_date": "2022-10-01", "url": "https://www.example.com", "canonical_url": "https://www.example.com", "title": "Example Title", "language": "en", "image": "https://www.example.com/image.jpg", "summary": "Example summary", "modified_date": "2022-10-01", "site_name": "Example", "favicon": "https://www.example.com/favicon.ico", "encoding": "utf-8", "keywords": ["example", "keywords"] }, "time": 0.01 }

Notes

  • If the js parameter is set to True, the js_timeout parameter must be provided.
  • If the provided url is malformed or not reachable, the request will return a 400 Bad Request with the error message “Invalid URL”.
  • If the provided js_timeout is not a valid integer, the request will return a 400 Bad Request with the error message “js_timeout must be of type int”.
  • If the provided js parameter is not a valid boolean, the request will return a 400 Bad Request with the error message “Options for JS parameter are ‘True’ and ‘False’”
  • The data returned by the API is in JSON format, if the jsonify() is not used, the response will be in string format.