Data DeDuplication

FREEMIUM
By Contactous | Updated 5 days ago | Data
Popularity

6 / 10

Latency

1,362ms

Service Level

86%

Health Check

N/A

README

Data DeDuplication (DeDupe) API

DeDupe is a SaaS platform for Data Deduplication using a standard RESTful framework. It uses advance AI and fuzzy logic techniques to return high quality pattern matches. DeDupe API is used to match data in any custom or proprietary database or datasets of tens of millions of records and provide results instantly.

How does it work? - Steps

  • Register your project using the API
  • Build an index of every data value (only the encrypted index value is stored by us, not the data)
  • Query the index for any exact or probable match

How does it work? - An Example
Say, you have a website registration database, with the fields: Name, Date of Birth, Address and Passport Number. There are 4 fields here of type: name, date, address and text respectively. Let’s say there is an existing database of 1 Million records and everyday 1,000 new ones are coming in.

As a first step, you will need to build an index of all data values that is required to be checked. In this example, an index on text and date will need to be created as the requirements (for every new record) are as follows:

if (new-passport-number has a match in database) then
process-A
else
process-B
create new record
fi

In this example, you will need to call a match API only once - for the passport number. If the control reaches process-B then a create API call will need to be made to update the index.

Data Types
DeDupe API returns matches for following data types:

  • name (eg, “Dr. Albert Einstein”)
  • company name (eg, “Kruger Brent Pvt Ltd.”)
  • address (eg, “22/7, Pie Blvd.”)
  • phone (eg, “+91 98336 90611”)
  • email (eg, "janice.toh@contactous.com")
  • URL (eg, “www.contactous.com”)
  • text (eg, “H-144234”)
  • date (eg, “3rd august of 1968”)

Data Matches
DeDupe API maintains two indexes - Exact and Probable. An Exact index returns success only if there is a 100% match, including any leading/trailing spaces and is case sensitive. The Probable match uses hundreds of proprietary algorithms, AI approaches and fuzzy logic approaches to create an intelligent index of the data value.

Let’s consider an example: Say a phone number is indexed, the original value of which is “9833690611/”. An Exact match will return success only if the queried string is the same as original value, with an “/” at the end, which is probably a typo error. However, the probable match will give success to many high quality matches like:

  • “9833690611”
  • “98336 90611”
  • “98 33 69 06 11”
  • “+91-98336.90611”
  • “9 8 3 3 6 9 0 6 1 1”
  • "phone: 9 83 36 90611"
    and of course,
  • "9833690611/"
    In 9 of 10 cases, Probable match is used, which performs a high quality pattern match based on algorithms which have been tested on millions of data points.

Reference Key
A reference key (“reference_id” in API documentation) is the identifier of record in your system or database. This is the link between DeDupe and your environment. This key is unique identifier, which has one record of multiple data values (corresponding to their data types).

**DeDupe APIs **
There are 4 Endpoints:

RegisterProject: Creates a new project for DeDupe API
CreateRecord: It takes the data values and reference key (External ID) and creates exact and probable indexes within DeDupe environment for the project.
DeleteRecord: It deletes all indexes within DeDupe for a reference key. To update a record, you will have to DELETE and then CREATE it.
MatchRecord: It finds a match for the incoming string within DeDupe Database. The type of match (Exact or Probable) will need to be defined. It returns a set of reference keys that match the string.

Data Security
While DeDupe API maintains a customized index of your database, it does not store any data values. It creates a unique encrypted index of every data value and stores it to match query strings. For example, a value of name field: “John Doe” could be stored in our index as “3d95bc532661d5e56f126b28f4634fc8”, which cannot be tracked back to original value. Now we would match an exact value of “John Doe” or probable matches of “Dr. John Doe, PHD” or “Mr. Doe, John” with the same index.

Audience
DeDupe API is used by Software Developers and IT departments of organizations. ISVs and partners of large software vendors create extensions to add DeDuplication functionality to standard software provided by vendors like Salesforce, Microsoft and Zoho. Organizations use it to enhance the quality of their own systems and provide real-time check for incoming data. The API can be fully tested on RapidAPI itself. Note that if your database is not accessed for 6 months and has less than 1000 records, it will be marked as inactive and achieved.

Examples
https://www.contactous.com/dedupeapi.html

Followers: 0
Resources:
Product Website Terms of use
API Creator:
Rapid account: Contactous
Contactous
contactousapp-oO8YC-PlBj
Log In to Rate API
Rating: 5 - Votes: 1