Automatic multilingual text classification according to pre-established categories defined in a model. The algorithm used combines statistic classification with rule-based filtering, which allows to obtain a high degree of precision for very different environments. Three models available: IPTC (International Press Telecommunications Council standard), EuroVocs and Corporate Reputation model. Languages covered are Spanish, English, French, Italian, Portuguese and Catalan.
Topics Extraction tags locations, people, companies, dates and many other elements appearing in a text written in Spanish, English, French, Italian, Portuguese or Catalan. This detection process is carried out by combining a number of complex natural language processing techniques that allow to obtain morphological, syntactic and semantic analyses of a text and use them to identify different types of significant elements.