The Article Scraper API provides a robust solution for extracting structured information from articles sourced from various websites. The response format includes data nodes, each representing a specific element from the article, such as images, text, headings, and more.
The API response consists of the following key components:
‘data’: An array containing information about different elements within the article.
Each element in the array includes the following fields:
- 'group_heading': A heading associated with a group of nodes.
- 'group_number': An identifier for the group.
- 'is_node_heading', 'is_node_image', 'is_node_list': Flags indicating the type of node.
- 'node_alt': Alternative text for image nodes.
- 'node_name': Name of the node.
- 'node_src': Source URL for image nodes.
- 'node_type': Type of the node (e.g., 'img', 'text', 'p', 'h1').
- 'position_info': Information about the position and dimensions of the node.
‘status’: Indicates the status of the API request (e.g., ‘success’ or ‘error’).
‘msg’: Additional information or error messages.
Image Node (node_type: ‘img’): Image nodes represent images within the article. They include information such as the source URL (node_src) and alternative text (node_alt).
Text Node (node_type: ‘text’, ‘p’): Text nodes represent textual content within the article. The actual content can be accessed using the node_content field.
Heading Node (node_type: ‘h1’): Heading nodes represent headings within the article. The actual heading text can be accessed using the node_content field.