At the beginning of every month we pull data from Google Analytics to report on the previous month. In the early days this was a simple single export from Google Analytics to a flat file we could then bring into our reports. As time went on the number of reports we were exporting from Google Analytics started to grow to the point where we were spending a significant amount of time just getting the Google Analytics data into our database.
At the same time our development team had created an SSIS component for retrieving data from Google Analytics which is included with our SSIS Productivity Pack. Looking to reduce the amount of time our monthly web reports took to create, we got to work developing a package to pull all our reports from Google Analytics and load them directly into our SQL database, eliminating the need for flat files completely.
Using SSIS Productivity Pack we were able to completely automate our Google Analytics data exports which offered significant time savings. Instead of taking time out to manually exporting reports from Google Analytics one at a time and uploading them to SQL, we now automatically have all our data available in our SQL database each month so we can focus more on analyzing the data. In this blog post we will cover what we developed to extract our Google Analytics data using SSIS. For our purposes we are writing the data to a SQL database though any other destination may be used as the target for your Google Analytics data.
Prerequisites
To follow this blog post you will need to be sure to have SSDT installed on your system followed by SSIS Productivity Pack. Please see our Installation Guide if you require assistance with this set-up.
Connecting to Google Analytics from SSIS
Connecting to Google Analytics from SSIS is very straight forward. We simply create a new Google Analytics Connection Manager, select Update Token, log in with the Google ID associated with the Google Analytics account and allow access.
For now you can use our default KingswaySoft app for the authorization for convenience, for ongoing use however you will need to create a Google Analytics application following the directions from Google.
Once we’ve successfully connect to the appropriate Google Analytics account we can begin building our package.
Retrieving Data from Google Analytics Using SSIS
In the Control flow tab, create a Data Flow Task, label it with the name of your first Google Analytics report, and then double click to begin developing the task. We will select and drag out the Google Analytics Source and double click to launch the editor form.
First we will select the Google Analytics Connection Manager we already created. Then we’ll select the view the containing the data we would like to retrieve, here you will have all views available to the Google Account you signed in as in the connection manager.
With Metadata Mode we can select from some pre-defined reports, or leave it as and create a custom report. Configuring the rest of the page is very similar to creating a Custom Report within Google Analytics as you define the Data Range, Metrics, and Dimensions for the data you wish to retrieve. For the Date Range we will select Last Month, that way we will not need to update this component before running the package each month. You also have the option of sorting the data should you need to. Lastly we can define our segments. The dropdown will have all segments associated with the account you are retrieving from so any custom segments created are available to be selected. You can also type in your own conditions if you do not have the desired segment already saved in Google Analytics.
On the Filters page we can define any filtering just the same as when we create a report in Google Analytics. The Columns page shows us all the fields we will be retrieving. Here you have the option to modify any of the field properties should you need to.
Click OK to save the component.
Writing Google Analytics Data to SQL Server using SSIS
With the data defined, we can now write this to our SQL Database. For this we will use Premium ADO.NET Destination also found in SSIS Productivity Pack. First we will set up our connection to our SQL Server by creating a new ADO.NET Connection Manager.
With the connection established we can now drag out our Premium ADO.NET Destination and connect our source to it, then double click on the destination to configure the component.
Here we can select the connection manager we just established. For the write action, in our case we know we are populating with new data every time the job runs so we can use the Insert option. In your export from Google Analytics, if there is updated data as well as new data, Upsert would be your best choice as this will check if a row already exists based on the matching criteria you specify, if it does, it will update that row, if is does not find a match it will create a new row. This would all be handled within a single destination.
Next, we’ll select the appropriate table the data from Google Analytics should be written to. A very useful feature for us is the ability to Create Table. Occasionally we need to update our package when we have a new segment or data-set we want to retrieve from Google Analytics. Since we have each segment in its own table, instead of having to build the table beforehand, we can quickly create it right within SSIS when we are updating the package. The Create Table option automatically generates the command for us based on our input data, we can then modify the command as needed before executing.
On the Columns Page we can use the dropdowns to map the fields we are pulling from Google Analytics to the appropriate columns in our SQL database. Once all fields have been mapped we can configure our Error Handling and select OK to save the component.
Completing Google Analytics Exports
Now that our first data flow has been complete, creating the tasks for the rest of our reports is very straight forward. Back on the Control Flow tab we can copy and paste our first Data Flow Task then rename it to our next report. We can then enter into our new task and edit the components as needed. For us, many of our reports are pulling the same Metrics and Dimensions, just from a different segment so it is very easy to select the new segment in our Google Analytics Source for those cases, for other reports we can change the configurations as needed. In the Premium ADO.NET Destination we just need to select the correct destination table. On the columns page any of the columns that have the same name will remain mapped, and we can quickly to complete the mapping for any new fields that may exist.
We can follow the above pattern for each report we need. This same principle can also be used later on if we need to retrieve any new reports from Google Analytics. In Control Flow we’ll connect each Data Flow Task to one another that way each task can be executed in sequence one after another when the package is run. Below is an example of what your Control Flow may look like.
With that complete we can now execute this package to test it to make sure everything performs as expected. We can then schedule this package to run automatically as often as we need using SQL Agent Job or any other scheduling tool that can execute SSIS packages.
For more information on the components used in this blog post and other tools available to help make you more productive see our SSIS Productivity Pack.