Bulk Load to Snowflake using KingswaySoft

24 March 2021
KingswaySoft Team

When you have large data sets to be loaded to your instances, it will be ideal to have a way to load them in bulk. This is particularly true when you work with a large number of records in your data warehouse projects such as Snowflake. Utilizing bulk load features can significantly improve your data load performance, thus reduces your total load time. In this blog post, we will be discussing the bulk loading capabilities that was recently introduced in the Snowflake Destination component which is part of our SSIS Productivity Pack product offering. , you can see we can load data into Snowflake efficiently utilizing such capabilities in our software.

The Snowflake API uses the COPY command to load large volume data from a staging location to the Snowflake tables, it requires a staging storage location, which can be any of the following:

  • Microsoft Azure Blob Storage
  • Amazon S3 storage

In our SSIS productivity Pack, you will find such an option in the Snowflake destination component, that can be used to specify the staging storage container, which is served for the intermediate storage location in order to enable such bulk load. Using the Snowflake destination component, this is made incredibly easy and straightforward. Let's dive into the entire case, shall we?

Connecting to the Data Source

Consider the example below, in which we have a database table that consists of 100,000 records - this by no means is a large table, but we are simply using the case as an example. For a regular load, this may take a while depending on the network latency, network bandwidth, and many other performance factors. However, by using the bulk load feature in Snowflake Destination component, this can be done almost instantaneously. First, we use a Premium ADO.NET Source component to read from the SQL Server table as we just discussed. Within the data flow, we will leverage the SSIS ETL engine to push the data to its final destination by utilizing a Snowflake destination component that is shipped in the SSIS Productivity Pack product, which is our main focus of this blog post today. In the Source component, the ADO connection manager can point to your database, and the Data Source can be chosen to be the required table. If you wish to work with a particular set of records, you could provide a custom SQL Command in order to extract data in whichever way you like. In this case, we are keeping it simple, we are simply extracting data from a database table called DataforSnowflake using a custom command as you can see below.

Premium ADO.NET Source Component

Configuring the Snowflake Destination component

Once we have the Source component setup, you can connect the source to a Snowflake Destination component which can be dragged and dropped to the data flow design surface in your SSIS Visual Studio development environment (SSDT). In order to work with the bulk load feature, as we had mentioned, you would need a cloud storage location. In this example, we utilize an Azure Blob storage container for that. In order to have an Azure Blob Storage container to act as staging, you would first need to create an Azure Blob Storage Connection Manager. By right clicking on the “Connection Manager” section in VS SSDT, you can choose the Azure Blob Storage Connection Manager and configure it like something below. The Account Name and Access Key can be used to easily connect to your blob storage. The Account Name is the Storage Account you've created in Azure Storage Services, which can be found within Azure Portal by navigating to All Services -> Storage accounts -> {Your storage account name} -> Access Keys. Whereas the Access Key can be found within Azure Portal by navigating to the All Services -> Storage accounts -> {Your storage account name} -> Access Keys page.

Azure Blob Storage Connection Manager

Now, you could provide a specific container that would stage the data files. And for that, you could either use your Azure portal, or use Microsoft Azure Storage Explorer. This can be downloaded and used to connect and view your containers and its contents.

Azure Storage Explorer

Once you have connected to your account, you can right click on Storage account, and then click on Connect to your Azure Storage, which will open up the below window. In here, you can provide the same Account Name and Access Key in order to establish a connection to the storage.

Azure Storage Explorer Connect to Container

And inside this, you can create a blob container. Our container name is snowflakefiles, which you would see being utilized in the upcoming section of this post, in Snowflake Destination component.

Create Blob Container

Once you have the container ready, you could start configuring the Snowflake Destination component. Choose the connection manager and select the Insert action. It is important to note that Bulk load is only supported by the Insert action. You can select the rest of the Database, Schema, and the Destination table to complete most configurations of the destination component. If you do not have a Destination table yet created, then our Snowflake Destination component provides the “Create Table” button, which can be used to create one while configuring the component. Once you click on the button, the below page would open up, in which the create table command is automatically populated based on input column metadata. The columns would have the name and datatype - depending on the metadata from the upstream Source component. Once you have modified the names as per your requirement, you can click on “Execute Command” to create the table. In doing so, you would see that the table is chosen in the “Destination Table” field in the Snowflake Destination component.

Snowflake Table Creator

After the Destination Table is selected, you can enable the option “Use Bulk Copy” and then chose the Bulk Copy Connection Manager. This would be the connection manager pointing to the staging location. Since we are using Azure Blob Storage Containers for this, you can use the same connection manager we created in the previous steps, and then navigate to the container you have created by clicking on the ellipsis (…) button. There is an option “Remove Uploaded Files Upon Completion”, when enabled, would remove the data files once they are successfully uploaded to the staging location. In our case, as you can see below, we have the option unselected. In doing so, the files uploaded to the blob storage will be kept there after the entire writing is finished. This provides us the ability to investigate the data load job in case there are any issues. For production purpose, we generally recommend you have the option enabled, so it will automatically clean up such intermediate blob storage files so that they don't eat up your storage space.

And then, you can choose to provide the batch size. This would determine how the files are created, and how many rows to be written to each data file that is created in the staging location. We have specified the batch size as zero, so that the SSIS buffer size would be automatically be used.

Snowflake Destination Component

Next, in the columns page of the Snowflake Destination component, you can map from the input Source columns to the Destination Table columns.

Snowflake Destination Component Columns Page

Once you have the above configured, you can start execution. You will see that the destination component would finish almost instantaneously when the Bulk load feature is used. Without using the feature, it would take much longer when the records are processed in a row-by-row fashion. Even though our example data set is for only 100,000 records, you would see similar behavior when working with a larger data set, such as millions of records, or even billions of records.

Data Flow Execution

Once the task has run, since we have the option “Remove Uploaded Files Upon Completion” disabled, you would see the files populated in the Azure Blob Storage location, under a temp_**** folder.

Azure Storage Container Temp Folder

Azure Storage Container Snowflake Files

It is worth noting that we generally recommend having the “Remove Uploaded Files Upon Completion” option enabled for production purpose. The option should only be disabled if you need to debug and troubleshoot a particular data load process.

Closing Notes

In conclusion, the bulk writing feature in KingswaySoft Snowflake Destination Component can be used to work with large data loads leveraging Azure Blob Storage or Amazon S3 as intermediate cloud storage. This should offer significantly better performance than trying to insert one row at a time or even inserting records per batch. Otherwise, achieving an acceptable performance for moving large datasets to Snowflake in SSIS would be challenging by just using the out-of-the-box components.

Archive

November 2024 3 October 2024 1 September 2024 1 August 2024 2 July 2024 1 June 2024 1 May 2024 1 April 2024 2 March 2024 2 February 2024 2 January 2024 2 December 2023 1 November 2023 1 October 2023 2 August 2023 1 July 2023 2 June 2023 1 May 2023 2 April 2023 1 March 2023 1 February 2023 1 January 2023 2 December 2022 1 November 2022 2 October 2022 2 September 2022 2 August 2022 2 July 2022 3 June 2022 2 May 2022 2 April 2022 3 March 2022 2 February 2022 1 January 2022 2 December 2021 1 October 2021 1 September 2021 2 August 2021 2 July 2021 2 June 2021 1 May 2021 1 April 2021 2 March 2021 2 February 2021 2 January 2021 2 December 2020 2 November 2020 4 October 2020 1 September 2020 3 August 2020 2 July 2020 1 June 2020 2 May 2020 1 April 2020 1 March 2020 1 February 2020 1 January 2020 1 December 2019 1 November 2019 1 October 2019 1 May 2019 1 February 2019 1 December 2018 2 November 2018 1 October 2018 4 September 2018 1 August 2018 1 July 2018 1 June 2018 3 April 2018 3 March 2018 3 February 2018 3 January 2018 2 December 2017 1 April 2017 1 March 2017 7 December 2016 1 November 2016 2 October 2016 1 September 2016 4 August 2016 1 June 2016 1 May 2016 3 April 2016 1 August 2015 1 April 2015 10 August 2014 1 July 2014 1 June 2014 2 May 2014 2 February 2014 1 January 2014 2 October 2013 1 September 2013 2 August 2013 2 June 2013 5 May 2013 2 March 2013 1 February 2013 1 January 2013 1 December 2012 2 November 2012 2 September 2012 2 July 2012 1 May 2012 3 April 2012 2 March 2012 2 January 2012 1

Tags