Implement BDD functionality using RowIndex function in the SSIS Productivity Pack

02 September 2016
Daniel Cai

[UPDATE] With our v9 release of SSIS Integration Toolkit for Microsoft Dynamics 365 on July 7, 2017, we added support of multi-threaded writing in our CRM destination component, the techniques illustrated in this blog post are no longer needed for Microsoft Dynamics integration, but they can still be used for other scenarios if needed - so we still keep this blog post for your reference. [/UPDATE]

I previously wrote a blog post on my personal blog site about using SSIS BDD (Balanced Data Distributor) component to help improve data load performance. The blog post has been well received in the community, and it is considered one of the easiest ways to improve CRM integration performance without having to do something really complicated. It is easy to setup and you can see the improvement almost immediately. However the BDD component does come with a couple of limitations which has made me wonder for a relatively long time if there is a better solution.

After our recent release of the SSIS Productivity Pack, we made an update to the pack yesterday, so it now includes a new feature called RowIndex function in the Premium Derived Column and Data Spawner components offered in the pack. The RowIndex function can be used to implement BDD functionality in a much more controllable fashion. Let me first briefly introduce you to the SSIS Productivity Pack.

What's SSIS Productivity Pack?

SSIS Productivity Pack is a new product that we just released to public last month, which includes a collection of premium SSIS components and tasks that help make SSIS more powerful. The SSIS Productivity Pack currently comes with 2 Connection Managers, 11 Data Flow Components and 5 Task Components, with many more to come. Out of the 18 components, we are using two of them in this blog post.

  • Data Spawner - this is a SSIS component that can help generate random sample data using some predefined formula, it allows you to generate data such as FirstName, LastName, email address, along with many other capabilities. It is not a requirement for our demo, but we use it for simplicity reason so that we don't have to depend on an additional external data source.
  • Premium Derived Column - this is a more advanced Derived Column component that offers nearly 200 functionalities featured with an advanced expression editor. Using the Premium Derived Column, you can achieve some advanced data manipulations that are not possible using the out-of-box Derived Column component. One of the nearly 200 functionalities in the component is the RowIndex function, which allows you to add a column to the SSIS buffer that produces an incremental index for the incoming rows.

How Does it work?

The basic idea is, we get the row index using the Premium Derived Column component that we mentioned above. We take it in the Conditional Split component which is right after the Premium Derived Column component and divide the index by 5 and use the remainder to create multiple outputs, which will be sent to different destination components respectively.

The following is the whole picture of the data flow.

RowIndex Data Flow

The configuration of the Premium Derived Column component is relatively simple, we just use the RowIndex function to produce the index, which you can see from the following screenshot.

RowIndex Column

The Conditional Split component will then take the index and divide it by 5 so it creates 5 cases based on the remainder, which essentially generates 5 outputs.

RowIndex Conditional Split

After the Conditional Split component, you will attach each output to a destination component. The following is the final execution of the data flow task. As you can see, the Data Spawner component generates 100 records which are evenly distributed to the 5 different outputs by the Conditional Split component. They are then sent to 5 identical CRM destination components to write to CRM server in parallel.

RowIndex Execution Results

Why is this better than BDD?

As I mentioned above, there are a couple of limitations with BDD component, which makes the approach demonstrated in this blog post a better solution.

  • With BDD, you don't have much control over which rows go to which destination. This may not matter in most cases, but there might be situations that you want to be specific about which rows go to which branches.
  • BDD distribution only happens if the first distribution has been filled by the default buffer size, which is 10,000 by default. What this means is, the second branch would not receive anything until the first branch (or the destination component in our example) has finished processing 10,000 records, which can take a while if the destination component is significantly slower than the source component - that is typically the case for a web service component like the CRM ones, it is the situation that BDD components are designed for. With our RowIndex approach, we don't have this particular issue, all rows are evenly distributed immediately without this type of delay.

Download of the sample package

For your convenience, I have made the package available for download. The package was created using SSIS 2008 R2, it should automatically upgrade if you are using a newer version of SSIS. 

Note that the package requires the latest SSIS Productivity Pack that we just updated yesterday (or a more recent version if available).

The SSIS Productivity Pack currently includes 2 Connection Managers, 8 Data Flow Components and 3 Task Components, with many more to come.

Archive

December 2024 1 November 2024 3 October 2024 1 September 2024 1 August 2024 2 July 2024 1 June 2024 1 May 2024 1 April 2024 2 March 2024 2 February 2024 2 January 2024 2 December 2023 1 November 2023 1 October 2023 2 August 2023 1 July 2023 2 June 2023 1 May 2023 2 April 2023 1 March 2023 1 February 2023 1 January 2023 2 December 2022 1 November 2022 2 October 2022 2 September 2022 2 August 2022 2 July 2022 3 June 2022 2 May 2022 2 April 2022 3 March 2022 2 February 2022 1 January 2022 2 December 2021 1 October 2021 1 September 2021 2 August 2021 2 July 2021 2 June 2021 1 May 2021 1 April 2021 2 March 2021 2 February 2021 2 January 2021 2 December 2020 2 November 2020 4 October 2020 1 September 2020 3 August 2020 2 July 2020 1 June 2020 2 May 2020 1 April 2020 1 March 2020 1 February 2020 1 January 2020 1 December 2019 1 November 2019 1 October 2019 1 May 2019 1 February 2019 1 December 2018 2 November 2018 1 October 2018 4 September 2018 1 August 2018 1 July 2018 1 June 2018 3 April 2018 3 March 2018 3 February 2018 3 January 2018 2 December 2017 1 April 2017 1 March 2017 7 December 2016 1 November 2016 2 October 2016 1 September 2016 4 August 2016 1 June 2016 1 May 2016 3 April 2016 1 August 2015 1 April 2015 10 August 2014 1 July 2014 1 June 2014 2 May 2014 2 February 2014 1 January 2014 2 October 2013 1 September 2013 2 August 2013 2 June 2013 5 May 2013 2 March 2013 1 February 2013 1 January 2013 1 December 2012 2 November 2012 2 September 2012 2 July 2012 1 May 2012 3 April 2012 2 March 2012 2 January 2012 1

Tags