/extract_article_data
(GET)
url
: The URL of the article that you want to scrape data from. This parameter is required.js
: A boolean value (True
or False
) that specifies whether or not to execute javascript on the article’s website. This parameter is optional and defaults to False
js_timeout
: The time in seconds to wait for javascript execution. This parameter is optional and defaults to 20 seconds.GET /extract_article_data?url=https://www.example.com&js=True&js_timeout=30
200 OK
: The request was successful and the data was returned in a JSON format.400 Bad Request
: The request was malformed. Check the error message for more details.500 Internal Server Error
: An error occurred while processing the request. Check the error message for more details.{ "article": { "text": "example text", "html": "example html", "media": [], "images": [], "author": "John Doe", "pub_date": "2022-10-01", "url": "https://www.example.com", "canonical_url": "https://www.example.com", "title": "Example Title", "language": "en", "image": "https://www.example.com/image.jpg", "summary": "Example summary", "modified_date": "2022-10-01", "site_name": "Example", "favicon": "https://www.example.com/favicon.ico", "encoding": "utf-8", "keywords": ["example", "keywords"] }, "time": 0.01 }
js
parameter is set to True
, the js_timeout
parameter must be provided.url
is malformed or not reachable, the request will return a 400 Bad Request
with the error message “Invalid URL”.js_timeout
is not a valid integer, the request will return a 400 Bad Request
with the error message “js_timeout must be of type int”.js
parameter is not a valid boolean, the request will return a 400 Bad Request
with the error message “Options for JS parameter are ‘True’ and ‘False’”