When it comes to data quality, Microsoft Dynamics CRM (or Dynamics 365 for Sales as the latest name) provides some out-of-box tools that can help work with duplicate data. It allows you to set up Duplicate Detection Rules which can help reduce the opportunity of importing duplicate records, but this doesn’t guarantee eliminating the duplicates that are already in the system, and also your CRM user could always choose to ignore those duplicate warning by entering duplicate data through CRM UI. In addition to that, the out-of-box duplicate detection capabilities are limited to certain matching options which makes it very inflexible.
With that being said, we saw a big need for a better duplicate detection tool, so we recently introduced a Duplicate Detector component in our SSIS Productivity Pack, which is a collection of premium SSIS components that help achieve more when using SSIS as your integration platform.
First, let’s have a quick look at the Duplicate Detector component.
What is Duplicate Detector?
Duplicate Detector is an SSIS component which can be used to scan incoming rows from a data source and identify duplicate rows. It comes with 7 different matching types to meet your different duplicate detection needs. (Click here for the detailed matching algorithm of the component)
You would typically connect the Duplicate Detector component after an SSIS source component which reads data from the source system that you would like to analyze the duplicates. You will notice that the component comes with two outputs, which are Duplicate Rows and Unique Rows.
The Duplicate Rows output has four additional fields which can be used for data quality analysis:
- GroupID – A unique GroupID value is given to each group of duplicate records.
- Richness Score - Each row is given a richness score which represents data quality with the value = (number of fields are not NULL )/(total number of fields).
- Richness Rank - Each row in a duplicate group is given a rank based on the richness score.
- Similarity Score – The similarity percentage of the duplicate rows compared to the top-ranked duplicate row in the group.
Find and Merge CRM Duplicates
Finding CRM duplicates and merging them accordingly becomes a simplified process using KingswaySoft Duplicate Detector and CRM Destination Component.
By taking advantage of the data quality analysis feature of our Duplicate Detector component, the top-ranked duplicate record containing the richest information within the group, can be considered the target record that is being merged into.
- RichnessRank == 1: the target records that is being merged into.
- RichnessRank!= 1: the subordinate records of the merging.
Then we use Left Join on GroupCode to find the GUID of the target duplicate record and the GUID of the subordinate records.
The last step is to merge the duplicates in CRM by using CRM destination component, the configuration is extremely simple with just a few clicks to map the input columns to the destination fields.
The data flow would be something like this:
Sample SSIS Package
A sample SSIS package can be download here.
The sample package was prepared in an SQL Server 2008 R2 format, and it should automatically upgrade if you are using any newer version of SSIS (you might need to do some layout adjustment to make it look better).