Bulk Load to Amazon Redshift using SSIS Productivity Pack

04 March 2021
KingswaySoft Team

Similar to how we had shown how Snowflake Bulk load feature can be used to load huge data sets in a previous blog post, we have our Amazon Redshift Destination components that supports bulk load of data to the Redshift tables. The performance improvement to a very significant level is the major advantage of this feature. In this blog post, we will be discussing about the bulk loading capabilities of the Amazon Redshift Destination component that is available in our SSIS Productivity Pack, utilizing the features provided by the Amazon Redshift API.

In here, the staging location would be Amazon S3 location. Therefore, you could utilize our Amazon S3 components that is also a part of our SSIS productivity pack.

Connecting to the Data Source

Consider the example below, in which we have a database table that consists of 100,000 records. For a regular load, this may take a while depending on the SSIS buffer settings and other performance factors, however, by using the bulk load feature in Amazon Redshift Destination component, we will now demonstrate how this can be done instantaneously. First, we use a Premium ADO.NET Source component in order to read from the SQL Server table. In the Source component, the ADO connection manager can point to your database, and the Data Source can be chosen to be the required table. If you wish to work with a particular set of records, you could even provide a Command to have a where clause, but in this case, we are considering the entire table.

Premium ADO.NET Source Component

Configuring the Amazon Redshift Destination component

Once we have the Source component setup, you would need to configure the Amazon Redshift Destination component. In order to work with the bulk load feature, as we had mentioned, you would need a cloud storage Amazon S3 location as staging. Therefore, you would first need to create an Amazon S3 Connection manager. By right clicking on the “Connection Manager” section in VS SSDT, you can choose the Amazon S3 connection manager and configure it as shown below.

Amazon S3 Connection Manager

Once you have the Amazon S3 connection manager ready, you could start configuring the Amazon RedShift Destination component. Choose the "Connection Manager" and select the action. You can select the rest of the Database, Schema, and the Destination table. If you do not have a Destination table yet created, then our Amazon RedShift Destination component has the “Create Table” button, which would help you create one instantly.

Amazon Redshift Destination Component General Page


Once you click on the button, the page below will open. In which the create table command is already populated, and the columns would be having the name and datatype depending on the metadata from the upstream Source component. Once you have modified the names, as per your requirement, you can click on “Execute Command” and you would see that the table is created and chosen in the “Destination Table” field in the Amazon RedShift Destination component.

Amazon Redshift Table Creator

After the Destination Table is selected, you can then choose the Bulk Upload Connection Manager. This would be the connection manager pointing to the staging location. Since we are using Amazon S3 location for this, you can use the same connection manager we created in the previous steps, and then navigate to the path in S3 location by clicking on the ellipsis (…) button. There is an option “Remove Uploaded Files Upon Completion”, when enabled, would remove the data files that are created in the staging location. Please note that for the sake of demonstrating how the files are populated in the storage location, we have disabled this option. Next, you can choose to provide the batch size. This would determine how the files are created and how many rows to be written to each data file getting created in the staging location. We have specified the batch size as zero, so that the SSIS buffer size would be automatically be used. In the columns page of the Destination component, you can map from the input Source columns to the Destination Table columns.

Once you have the above configured, you can start execution. You would see that the Bulk load feature is much more efficient than a regular load. Even though our example data set is for 100,000 records, millions of records could be processed like this.

Data Flow Execution

Once the task has run, since we have the option “Remove Uploaded Files Upon Completion” disabled, you would see the files populated in the Amazon S3 location that you had chosen.

Amazon S3 Files

Please note that if the option to remove the files upon completion is enabled, these files would have been removed upon execution was successfully completed.


Closing Notes

In conclusion, the bulk writing feature in KingswaySoft's Amazon Redshift Destination Component can be used to work with large data loads leveraging Amazon S3 as intermediate cloud storage. This should offer significantly better performance than trying to insert one row at a time or even inserting records per batch. Otherwise, achieving an acceptable performance for moving large datasets to Redshift in SSIS would be challenging to implement by just using the out-of-the-box components.

Archive

December 2024 1 November 2024 3 October 2024 1 September 2024 1 August 2024 2 July 2024 1 June 2024 1 May 2024 1 April 2024 2 March 2024 2 February 2024 2 January 2024 2 December 2023 1 November 2023 1 October 2023 2 August 2023 1 July 2023 2 June 2023 1 May 2023 2 April 2023 1 March 2023 1 February 2023 1 January 2023 2 December 2022 1 November 2022 2 October 2022 2 September 2022 2 August 2022 2 July 2022 3 June 2022 2 May 2022 2 April 2022 3 March 2022 2 February 2022 1 January 2022 2 December 2021 1 October 2021 1 September 2021 2 August 2021 2 July 2021 2 June 2021 1 May 2021 1 April 2021 2 March 2021 2 February 2021 2 January 2021 2 December 2020 2 November 2020 4 October 2020 1 September 2020 3 August 2020 2 July 2020 1 June 2020 2 May 2020 1 April 2020 1 March 2020 1 February 2020 1 January 2020 1 December 2019 1 November 2019 1 October 2019 1 May 2019 1 February 2019 1 December 2018 2 November 2018 1 October 2018 4 September 2018 1 August 2018 1 July 2018 1 June 2018 3 April 2018 3 March 2018 3 February 2018 3 January 2018 2 December 2017 1 April 2017 1 March 2017 7 December 2016 1 November 2016 2 October 2016 1 September 2016 4 August 2016 1 June 2016 1 May 2016 3 April 2016 1 August 2015 1 April 2015 10 August 2014 1 July 2014 1 June 2014 2 May 2014 2 February 2014 1 January 2014 2 October 2013 1 September 2013 2 August 2013 2 June 2013 5 May 2013 2 March 2013 1 February 2013 1 January 2013 1 December 2012 2 November 2012 2 September 2012 2 July 2012 1 May 2012 3 April 2012 2 March 2012 2 January 2012 1

Tags