Ujeebu Web Page Scraping

FREEMIUM
By Ujeebu | Updated לפני 21 ימים | Data
Popularity

6.1 / 10

Latency

1,318ms

Service Level

100%

Health Check

N/A

Followers: 0
Resources:
Product Website Terms of use
API Creator:
Rapid account: Ujeebu
Ujeebu
lexper
Log In to Rate API
Rating: 5 - Votes: 1

README

Getting started

To use API, subscribe to a plan on RapidAPI and send a request to the scrape endpoint using either GET or POST.

Example: the following will return a screenshot of site https://ujeebu.com/

curl --location --request GET 'https://ujeebu-web-page-scraping.p.rapidapi.com/v1.1/scrape?response_type=screenshot&js=1&url=https://ujeebu.com/ \
--header 'X-RapidAPI-Key: '

Parameters

  • url (required): URL to render.
  • response_type: indicates what to return. Possible values are: ‘html’,‘raw’, ‘pdf’ or ‘screenshot’. default=‘html’
  • json: when set to true, returns a JSON response instead of raw content as specified by response_type. default=false
  • useragent: override default headless browser user agent. default=null
  • cookies: indicates custom cookies to send with request. default=null
  • timeout: Maximum number of seconds before request times out. default=60
  • js: indicates whether to execute Javascript or not. default=true
  • js_timeout: when js is enabled, indicates how many seconds the API should wait for the JS engine to render the supplied URL.default=60
  • custom_js: Javascript code to execute in page context when js is enabled. default=null
  • wait_for: indicates number of milliseconds to wait before returning response, a selector to wait for, or custom Javascript to handle the wait. Needs js to be on. default=0
  • wait_for_timeout: indicates timeout (in milliseconds) for the wait_for param. default=null
  • screenshot_fullpage: when response_type is ‘screenshot’, indicates whether to take a screenshot of the full page or just the visible viewport. default=false
  • screenshot_partial: when response_type is ‘screenshot’, a valid selector of element to screenshot or json-string with coordinates (x, y, width, height ) of the rect to screenshot. default=null
  • scroll_down: indicates whether to scroll down the page or not, this applies only when Javascript is enabled. default=false
  • scroll_wait: when scroll_down is enabled, indicates the duration of the wait (in milliseconds) between two scrolls. default=100
  • progressive_scroll: indicates type of scroll. If set to true: progressively scrolls down until page height no longer increases or URL changes. If set to false (default) goes to scroll_to_selector or end of page. default=false
  • scroll_callback: defines a JavaScript function with boolean output that determines whether to stop scrolling or not. default=null
  • scroll_to_selector: when scroll_down is enabled, indicates the element to scroll to in each scroll. If ‘null’ the scroll is performed until the end of the page. default=null
  • device: indicates type of device to use to render page. Possible values: ‘desktop’, ‘mobile’. default=‘desktop’
  • window_width: indicates browser viewport width. default=null
  • window_height: indicates browser viewport height. default=null
  • block_ads: indicates whether to block ads or not. default=true
  • extract_rules: defines rules used to Extract data from supplied web page. default=null
  • strip_tags: indicates comma-separated list of tags to remove from page after rendering. default=null
  • http_method: indicates the http method (GET, POST, PUT) to use to request the target web page. default=‘GET’
  • post_data: Data to forward to target web page in case of POST or PUT http method. default=null

Response

The response returned by the API depends on the response_type and json parameters. It can be either a byte array in the case of ‘pdf’ and ‘screenshot’, text when response_type='raw' or html, or JSON when json=1. response_type possible values are:

  • html: returns the html code of the page . If js = 1 it will first execute JavaScript.
  • raw: returns the source html (or file content if URL is not an HTML page) as received from the URL without running JavaScript. js = 1 is ignored.
  • pdf: converts page to PDF and returns the PDF binary data.
    If the json parameter is set to ‘true’ a JSON response is returned with the base64 encoded value of the pdf file. e.g.:
{
  "success": true,
  "screenshot": null,
  "html_source": null,
  "pdf": "JVBERi0xLjQKJeLjz9MKNCAwIG9iaiAKPDwKL1N1YnR5cGUgL0xpbms...",
  "html": null
}
  • screenshot: produces a screenshot of the URL in PNG format and returns the binary data.
    • If screenshot_fullpage is set to ‘true’, takes a screenshot of the full page. If set to ‘false’, takes a screenshot of the visible viewport only.
    • If the json parameter is set to ‘true’, a JSON response is returned with the base64 encoded value of the image file. e.g.:
{
  "url": "string",
  "message": "string",
  "error_code": "string",
  "errors": ["string"]
}

Response Codes

  • 200 Successful request.
  • 400 Some required parameter is missing (URL)
  • 401 Missing API-KEY.
  • 404 Provided URL not found.
  • 408 Request timeout (need to increase timeout parameter)
  • 429 Too many requests. (need to upgrade your plan)
  • 500 Internal error. (try request again or contact us)

Questions or Support Requests?

Please don’t hesitate to ask a question on our community site at https://ujeebu.com/community, or send us a message on https://ujeebu.com/contact.