[UPDATE] With our v9 release of SSIS Integration Toolkit for Microsoft Dynamics 365 on July 7, 2017, we added support of multi-threaded writing in our CRM destination component, the techniques illustrated in this blog post are no longer needed for Microsoft Dynamics integration, but they can still be used for other scenarios if needed - so we still keep this blog post for your reference. [/UPDATE]
I previously wrote a blog post on my personal blog site about using SSIS BDD (Balanced Data Distributor) component to help improve data load performance. The blog post has been well received in the community, and it is considered one of the easiest ways to improve CRM integration performance without having to do something really complicated. It is easy to setup and you can see the improvement almost immediately. However the BDD component does come with a couple of limitations which has made me wonder for a relatively long time if there is a better solution.
After our recent release of the SSIS Productivity Pack, we made an update to the pack yesterday, so it now includes a new feature called RowIndex function in the Premium Derived Column and Data Spawner components offered in the pack. The RowIndex function can be used to implement BDD functionality in a much more controllable fashion. Let me first briefly introduce you to the SSIS Productivity Pack.
What's SSIS Productivity Pack?
SSIS Productivity Pack is a new product that we just released to public last month, which includes a collection of premium SSIS components and tasks that help make SSIS more powerful. The SSIS Productivity Pack currently comes with 2 Connection Managers, 11 Data Flow Components and 5 Task Components, with many more to come. Out of the 18 components, we are using two of them in this blog post.
- Data Spawner - this is a SSIS component that can help generate random sample data using some predefined formula, it allows you to generate data such as FirstName, LastName, email address, along with many other capabilities. It is not a requirement for our demo, but we use it for simplicity reason so that we don't have to depend on an additional external data source.
- Premium Derived Column - this is a more advanced Derived Column component that offers nearly 200 functionalities featured with an advanced expression editor. Using the Premium Derived Column, you can achieve some advanced data manipulations that are not possible using the out-of-box Derived Column component. One of the nearly 200 functionalities in the component is the RowIndex function, which allows you to add a column to the SSIS buffer that produces an incremental index for the incoming rows.
How Does it work?
The basic idea is, we get the row index using the Premium Derived Column component that we mentioned above. We take it in the Conditional Split component which is right after the Premium Derived Column component and divide the index by 5 and use the remainder to create multiple outputs, which will be sent to different destination components respectively.
The following is the whole picture of the data flow.
The configuration of the Premium Derived Column component is relatively simple, we just use the RowIndex function to produce the index, which you can see from the following screenshot.
The Conditional Split component will then take the index and divide it by 5 so it creates 5 cases based on the remainder, which essentially generates 5 outputs.
After the Conditional Split component, you will attach each output to a destination component. The following is the final execution of the data flow task. As you can see, the Data Spawner component generates 100 records which are evenly distributed to the 5 different outputs by the Conditional Split component. They are then sent to 5 identical CRM destination components to write to CRM server in parallel.
Why is this better than BDD?
As I mentioned above, there are a couple of limitations with BDD component, which makes the approach demonstrated in this blog post a better solution.
- With BDD, you don't have much control over which rows go to which destination. This may not matter in most cases, but there might be situations that you want to be specific about which rows go to which branches.
- BDD distribution only happens if the first distribution has been filled by the default buffer size, which is 10,000 by default. What this means is, the second branch would not receive anything until the first branch (or the destination component in our example) has finished processing 10,000 records, which can take a while if the destination component is significantly slower than the source component - that is typically the case for a web service component like the CRM ones, it is the situation that BDD components are designed for. With our RowIndex approach, we don't have this particular issue, all rows are evenly distributed immediately without this type of delay.
Download of the sample package
For your convenience, I have made the package available for download. The package was created using SSIS 2008 R2, it should automatically upgrade if you are using a newer version of SSIS.
Note that the package requires the latest SSIS Productivity Pack that we just updated yesterday (or a more recent version if available).