Guest Post by Robert Brauer from Interzoid:
Analysts estimate that the average cost of an organization’s various data quality challenges, depending on the size of the company and the value of its data assets to the organization, is in the tens of millions of dollars. For smaller and non-data-driven organizations, this number is not as high, but for larger companies and those that view their data assets as a strategic advantage, the cost can be dramatically higher.
In the era of APIs, these kinds of data problems can be significantly easier and far more cost-effective to address than ever before. Interzoid has just launched a set of APIs focused on solving some of these data quality challenges on RapidAPI, the world’s largest API marketplace. By leveraging RapidAPI’s rich set of discovery, testing, managing, and API consumption tools, publishing the APIs on the RapidAPI Marketplace makes it easy for developers and information technology professionals to quickly incorporate this technology into a wide range of applications, websites, and business processes.
If an organization maintains customer, prospect, supplier, or other non-numerical data assets, more than likely a sizable percentage of the records within its internal databases are redundant, duplicate, or otherwise multiple versions of the same inconsistently represented data. As data is the basis of so much of an organization’s operational, marketing, sales, and other strategic initiatives, this can cause a long list of issues, including inaccurate reporting results used for decision making, lack of a complete customer view, poor customer communication, and a significant waste of money within various marketing campaigns.
What Causes Data Quality Problems?
The problem exists because data is inherently inconsistent. “Jim” or “James”, “Bob” or “Robert”, and “Jennifer” or “Jenny” are just a small sample of the interchangeability of first names when collected and stored in databases. With last names, spelling variations are very common with examples such as “Johnson” or “Johnston,” “McGinn” or “MacGinn,” “Smith” or “Smythe” as just the tip of the iceberg. In the category of physical addresses, many abbreviations are sometimes used such as “Street” or “St,” “East” or “E,” and “APT” or “Unit”. Company names can include abbreviations such as “B of A” versus “Bank of America,” and company legal terms like “Inc” or “Corp” may or may not be used. State names are sometimes stored as a two-letter abbreviation but sometimes written out. “SF,” “San Fran,” and “San Francisco” are of course all the same city. Add to this capitalization, spacing, punctuation, and endless typos, and the problem becomes even more of a challenge to address.
These data inconsistencies that result in redundant data can cause serious problems when performing any kind of business intelligence analysis activities, such as trying to determine customer usage patterns. The issues can make it difficult to determine who your best customers are or if there are gaps in a customer’s product portfolio. Embarrassment when working with prospects can occur if multiple salespeople are calling on the same individual within a target company. The list of resulting difficulties to the organization is endless.
How Can APIs Help?
An approach to solving this problem is using an algorithmically generated “Similarity Key” that address the data inconsistency issue. The keys are generated for a given data value using similarity trees, smart hashing, heuristics, knowledge bases, and various forms of machine learning. These generated keys then assist in the identity of inconsistent permutations of data that represent the same piece of information.
Typically, a Similarity Key is generated for each record in a database, storing this Similarity Key either as a new column in a table or within a new table with a reference back to the original record. The generated Similarity Key essentially looks like a hash value (essentially a string of characters and numbers). If multiple fields are being used for match identification, such as a combination of individual name, address, and company name, then each record will contain multiple Similarity Keys (or again, a new table or equivalent can be created to store these keys), one having been generated for each data type.
Example of Similarity Key generation:
If the generated keys are then stored as columns with a table (or possibly as a key-value pair within non-SQL data stores), then sorting the table by Similarity Key will enable similar records to appear next to each other, either for the purposes of compiling a match report, for automated processing of redundant records, or for the opportunity to visually inspect candidate duplicate records.
Example of using Similarity Key generation to identify duplicate company names:
If the use case is matching data across multiple tables or data sources as part of a data merge, then using the Similarity Key as the basis of the match (or join) will yield a great number of intelligently identified matches.
Example of using Similarity Keys to identify duplicate customer candidates:
Again, the only thing required to achieve this is to programmatically send records to the Matching APIs one at a time to generate Similarity Keys. This makes it easy to utilize the same solution against multiple data sources, types, and use cases of various data assets, including data that is independently-stored or accessible within applications.
How Does RapidAPI Enable and Amplify the Solution?
The great news is that these Similarity Keys can be generated from a single API endpoint. This is why leveraging the RapidAPI Marketplace makes so much sense for Interzoid. Publishing the Matching APIs within RapidAPI enables the Similarity Key generation APIs to be used with all of RapidAPI’s various API analysis, testing, and integration tools. Also, the consistent interfaces that exist across the RapidAPI system enable the Interzoid matching algorithms to be used effortlessly in conjunction with other APIs on the platform for even more powerful solutions.
The required workflow within the RapidAPI publishing system was easy to understand and follow, making the decision to publish on RapidAPI an easy one for Interzoid. The RapidAPI team was quite helpful and responsive during the integration and publishing process, quickly answering any questions we had along the way.
If you are ready to check out these matching APIs and see if they will help address data quality with your organization, try them for free here on the RapidAPI Marketplace: