Web2Meaning is an API that efficiently scrapes information from given web pages.
It allows you to process any volume of websites quickly and to collect databases for further text-mining activities.
Data extraction: extracts text, images, videos, links, files, metadata;
Text cleaning: cleaning text from HTML tags, and irrelevant content such as ads;
Extracts dynamic content: retrieve data from web pages with content that dynamically loads and renders through JavaScript;
Entities extraction: extracts key entities from a text(products, technologies, brands, etc.)
Text classification: classifies a page’s textual content, enhancing understanding and categorization;
Domain classification: categorize a website’s main page based on its overarching topic;
Article determination: determines whether a specific page qualifies as an article;
Hyperlinks extraction: extracts hyperlinks from the page and tagging them in the extracted text;
Explore additional capabilities in the documentation.