Parsing Advanced JSON Structures with Ease Using SSIS Productivity Pack

16 February 2021
KingswaySoft Team

Many applications these days offer a REST API to facilitate data extracting or processing. Many of those modern REST APIs use JavaScript Object Notation (JSON) format to serialize and deserialize data when exchanging data between the client and the server (essentially the service side). This provides some great interoperability between the server and clients on different platforms. In working with those REST APIs in an ETL process, it becomes imperatively important to have a proper strategy to work with such data structure in an efficient manner. Previously we have written a blog post that demonstrates How to Work with RESTful Endpoints in your SSIS ETL Process, in which we talked about a relatively simple JSON parsing scenario which utilizes the Import JSON (Web) function button in our JSON Source/Extract component to import the JSON document served by the API service endpoint lively, this method requires a very minimal effort and it can be used to work with most of the JSON structures with merely a few mouse clicks. Once the document structure has been imported, the component should be mostly ready for us. We have however come into some advanced JSON structure which requires some more tweaking in order to work for its perfect use case. In this blog post, we will look at a few more complex JSON structures that we have encountered in the past and show you how you can best process them using the KingswaySoft JSON Source/Extract components to extract the data in SSIS.

Example #1: Array values

In our first example, we will be dealing with JSON array values. Let’s take a look at the following sample JSON file:

{
	"id": 51,
	"name": "John Smith",
	"roles": [
		"admin",
		"contributor",
		"editor"
	]
}

The JSON structure consists of one object which has three properties: id, name and roles. The roles property is an array of string values.

For a JSON document like this, we can configure JSON Source/Extract component as the following:

As shown in the screenshot above, the roles property is set to Array node type with a Value type subnode. Note the Value type node is set to be a repeated node. For a JSON design like this, the array values will be outputted in delimited text format. 

The roles property can also be outputted as a separate output by setting the Output Type to Output on the roles Array node.

With a JSON design like the screenshot above, the JSON Source/Extract component generates two outputs. You may need to pay attention to two special columns - _RowIndex and _ParentKeyField:

  • _RowIndex: This column contains the current row index of this output, this field can be used as a key field when the node has no primary key field.
  • _ParentKeyField: This column contains the value of this records parent key field.

The _ParentKeyField column in the roles output indicates the relationship between parent and child rows.

Example #2: Multi-dimensional array

Multi-dimensional JSON array stores an array inside another JSON array. Given an example below is a sample JSON document with contacts multi-dimensional array.

{
	"name": "contacts",
	"contacts": [
		[
			"John",
			"Smith",
			"john.smith@example"
		],
		[
			"Joe",
			"Doe",
			"joe.doe@example"
		]
	]
}

 The JSON Document Designer in JSON Source/Extract component can be configured like something below:

The contacts property is configured to Array node type and set its output type as an Output. The contacts array output contains an Array subnode of string values. As you can see from the screenshot above, we have named the three Value nodes with friendly columns names of firstname, lastname and email. Let's set up a dummy destination component (in our case, we used a Premium Derived Column component) and enable the Data Viewer to check the contacts output result. Note that this is one of the outputs created by the JSON source component, it works as a child output.

Example #3: Key-Value pairs

Now we will be getting into something that is even more interesting. Below is a sample JSON structure that contains some key-value pairs. The properties array consists of 3 objects and each object has 2 properties: name and value.

{
	"id": 51,
	"properties": [
		{
			"name": "firstname",
			"value": "John"
		},
		{
			"name": "lastname",
			"value": "Smith"
		},
		{
			"name": "email",
			"value": "[email protected]"
		}
	]
}

 For a JSON structure like this, we would like to pivot the properties values and use name as column name in the output. Based on that, the JSON Source/Extract component can be configured as below:

The properties node is set to Array node type with an output type of Key Value Pivot. The subnode Object contains two Value nodes – name and value, the name Value node is configured as the key field. When Key Value Pivot mode is used, we can define the pivot key values in Output Settings to let the JSON Source/Extract component to create output columns for each value.

Let’s check the JSON Source/Extract output result:

As you can observe from the above screenshot, we can see that the JSON Source/Extract Component is working according to our expectation and converted our source data to our required format.

Previously we had a blog post of Extracting Key/Value Pairs JSON Structure into Tabular Format which demonstrates key-value pairs JSON structure in mode details.

Example #4: Property Name as Column Value

The sample JSON document below shows is a JSON structure that id value is used as an object name.

{
	"contacts": {
		"51": {
			"firstname": "John",
			"lastname": "Smith",
			"email": "[email protected]"
		},
		"101": {
			"firstname": "Joe",
			"lastname": "Doe",
			"email": "[email protected]"
		}
	}
}

For this JSON structure, we want to have the object name outputted as a property at the same level as firstname, lastname and email. The JSON Source/Extract component can be configured like this:

The contacts node is an Object node type and set to output and an Output. Note that the subnode Object is configured to Property Name As Column Value output type and given a friendly name of id. The following screeshot shows the output results where we can find the component works correctly to output the JObject name as the column value.

Closing Note

In this blog post, we went through some relatively complex JSON designs and demonstrated how to work with such document structures in an efficient manner using the KingswaySoft JSON Source component. The same strategies can be used to work with the JSON Extract component as well which share the same infrastructure of JSON data processing in our SSIS Productivity Pack product. With a few tweaks of the components, we can achieve a proper reading purpose of such complex documents. Ultimately, our objective is to provide an ETL solution that can work with virtually any complex JSON documents with ease without having to write a single line of code in SSIS.

We hope you find this blog helpful.

Archive

December 2024 1 November 2024 3 October 2024 1 September 2024 1 August 2024 2 July 2024 1 June 2024 1 May 2024 1 April 2024 2 March 2024 2 February 2024 2 January 2024 2 December 2023 1 November 2023 1 October 2023 2 August 2023 1 July 2023 2 June 2023 1 May 2023 2 April 2023 1 March 2023 1 February 2023 1 January 2023 2 December 2022 1 November 2022 2 October 2022 2 September 2022 2 August 2022 2 July 2022 3 June 2022 2 May 2022 2 April 2022 3 March 2022 2 February 2022 1 January 2022 2 December 2021 1 October 2021 1 September 2021 2 August 2021 2 July 2021 2 June 2021 1 May 2021 1 April 2021 2 March 2021 2 February 2021 2 January 2021 2 December 2020 2 November 2020 4 October 2020 1 September 2020 3 August 2020 2 July 2020 1 June 2020 2 May 2020 1 April 2020 1 March 2020 1 February 2020 1 January 2020 1 December 2019 1 November 2019 1 October 2019 1 May 2019 1 February 2019 1 December 2018 2 November 2018 1 October 2018 4 September 2018 1 August 2018 1 July 2018 1 June 2018 3 April 2018 3 March 2018 3 February 2018 3 January 2018 2 December 2017 1 April 2017 1 March 2017 7 December 2016 1 November 2016 2 October 2016 1 September 2016 4 August 2016 1 June 2016 1 May 2016 3 April 2016 1 August 2015 1 April 2015 10 August 2014 1 July 2014 1 June 2014 2 May 2014 2 February 2014 1 January 2014 2 October 2013 1 September 2013 2 August 2013 2 June 2013 5 May 2013 2 March 2013 1 February 2013 1 January 2013 1 December 2012 2 November 2012 2 September 2012 2 July 2012 1 May 2012 3 April 2012 2 March 2012 2 January 2012 1

Tags