Extracting Information with Ease Using KingswaySoft JSON/XML Extract Task

28 June 2024
KingswaySoft Team

In your ETL integration development, you might run into a situation where you need to obtain some information from a JSON or XML file served by an HTTP service endpoint via either SOAP or REST framework. Creating a solution to achieve this can be challenging without a proper tool. If you are a data flow specialist with heavy data flow engineering experience, particularly if you have worked with our SSIS Integration Toolkit product family, including SSIS Productivity Pack, you would most likely start a new data flow task and put in a dummy source component (such as the Data Spawner component). Then, you would send the dummy source component to an HTTP requestor component to download the file from the service endpoint before sending the data to a JSON Extract or XML Extract component, in order to extract the required pieces of information. Later on, you would write such extracted information to an SSIS variable using our Premium Derived Column component, so that the information stored in the SSIS variables can be used at a later stage of the integration process. As you can already see, this entire solution can be cumbersome to implement due to the heavy use of ETL data processes while the requirement is simply extracting a few pieces of information from a few XML or JSON nodes.

With the above in mind, we want to show you a much more simplified solution in this blog post. Our focus is to avoid the heavy use of data flow process whenever possible, particularly if our purpose is to simply extract a few XML or JSON node values from a known service endpoint. Having said that, we will be showing a solution that leverages the following three components - note that the third one is optional, and is only used for verification purposes.

The overall design of our flow would look as shown below. Note that, you would be replacing the JSON Extract Task with the XML Extract Task if your requirement is to read data from XML structures. We will be going through each of the tasks separately, on its design and configuration. It is worth mentioning that the last data flow task "Variable Value Validation" is just for data verification purposes, it is most likely not needed in your real-life business use case.

Control Flow.png

Now, let's dig deeper into the specific design aspects.

HTTP Requester Task for File Download

First, we use the HTTP Requester Task to download files or file contents from the service endpoint. Drag and drop the task from the control flow SSIS toolbox, open it, and select an already configured HTTP connection manager from the list of available connections (click here to learn more about the connection manager). Then, enter the other details such as relative path, HTTP method, query string parameters, request headers, etc., as required. Note that, in the screenshot below, we have provided a relative path for downloads and a couple of query string parameters to request precisely the file according to our business requirements. After this, in the Response Body Output section, choose Output to Variable. In doing so, the component would download the file when executed and save the file content to the specified SSIS variable (@User::JSONVar) that will be used later in the ETL process.

HTTP Requester task.png

Now, with the file content saved in the SSIS variable, we just need to properly read this variable and extract the information from the document accordingly. Below is part of the JSON file content we intend to download, with the highlighted portions indicating what we need to parse out. As you can see, the portion of the nodes we wish to read out is straightforward and contains only a few pieces of information. To emphasize the point mentioned in the earlier section, in such cases, it is easier to achieve this purpose using a control flow task if available, which makes the solution much simpler.

JSON Structure.png

Now, we move on to the next section.

JSON Extract to Read and Parse JSON Structure

We would now drag and drop a new JSON Extract Task to the ETL process, and connect it right after the HTTP Requester Task created earlier. To complete the configuration, we need to open the JSON Extract task. Here, we choose the connection manager as File Content in Variable and then select the variable @User::JSONVar. At runtime, this variable will have the file content downloaded in our previous task.

JSON Extract Task.png

Once done, switch to the Output page, and enter the JPath for the node (which can be identified easily using any online JPath finder tools). Then, for each document node required, you would assign the node content to a new SSIS variable. If you do not have a variable created yet, we provide an option to create one on the go at design time, by clicking on the "New Variable" option.

Add a new Variable.png

The Output page of the JSON Extract Task will look as shown below once you have assigned all the JPaths to their respective variables. Note that the JSON Extract Task also offers an option called Error On Not Found which can be enabled if required, to handle the paths/values that do not exist in the source file as an error-handling strategy. This can be accomplished on the Error Handling page of the task. Please check out our Online Help Manual for more details on this.

JSON Extract Output.png

Now, when you run the package or the container, the output variables will get assigned the respective values from the JPaths, and these variables can be used at a later stage in your integration.

At this point, our development around the file content extraction is very much done. We can technically wrap up the blog post. Before we do so, we will add one more section to show you how to validate the above process design. While doing so, you might be able to get some inspiration on how the variable can be properly used as well.

Variable Value Validation

As we just mentioned above, this section is entirely optional. We are just showing you how to quickly verify the values retrieved by the JSON Extract Task - for the purpose of demonstration and illustration.

For demonstration, let us assign the variables as column values using a Data Spawner component, as shown below. Then, we use a Dummy Destination component to terminate the data flow, and a data viewer enabled on the data flow path.

Data Spawner.png

Once you have it all set up, execute the package/container that holds the tasks, and you should get the output as shown below. You will notice that the node values from JSON would be parsed out to variables, which are then read as column values per our design.

Package execution.png

Conclusion

As seen above, you can easily work with our JSON Extract Task to read and parse node values within the control flow, without having to go into more complex data flow designs. This proves to be simpler in terms of design while giving you the capability to extract just the required set of values from your JSON structure. The JSON Extract Task also eliminates the need for a JSON document design to identify the nodes and instead relies on a JPath that you are familiar with. It is worth mentioning that our XML Extract Task works in a very similar fashion if your requirement is to extract from an XML file served by an HTTP service endpoint or a local file.

We hope this has helped!

Archive

December 2024 1 November 2024 3 October 2024 1 September 2024 1 August 2024 2 July 2024 1 June 2024 1 May 2024 1 April 2024 2 March 2024 2 February 2024 2 January 2024 2 December 2023 1 November 2023 1 October 2023 2 August 2023 1 July 2023 2 June 2023 1 May 2023 2 April 2023 1 March 2023 1 February 2023 1 January 2023 2 December 2022 1 November 2022 2 October 2022 2 September 2022 2 August 2022 2 July 2022 3 June 2022 2 May 2022 2 April 2022 3 March 2022 2 February 2022 1 January 2022 2 December 2021 1 October 2021 1 September 2021 2 August 2021 2 July 2021 2 June 2021 1 May 2021 1 April 2021 2 March 2021 2 February 2021 2 January 2021 2 December 2020 2 November 2020 4 October 2020 1 September 2020 3 August 2020 2 July 2020 1 June 2020 2 May 2020 1 April 2020 1 March 2020 1 February 2020 1 January 2020 1 December 2019 1 November 2019 1 October 2019 1 May 2019 1 February 2019 1 December 2018 2 November 2018 1 October 2018 4 September 2018 1 August 2018 1 July 2018 1 June 2018 3 April 2018 3 March 2018 3 February 2018 3 January 2018 2 December 2017 1 April 2017 1 March 2017 7 December 2016 1 November 2016 2 October 2016 1 September 2016 4 August 2016 1 June 2016 1 May 2016 3 April 2016 1 August 2015 1 April 2015 10 August 2014 1 July 2014 1 June 2014 2 May 2014 2 February 2014 1 January 2014 2 October 2013 1 September 2013 2 August 2013 2 June 2013 5 May 2013 2 March 2013 1 February 2013 1 January 2013 1 December 2012 2 November 2012 2 September 2012 2 July 2012 1 May 2012 3 April 2012 2 March 2012 2 January 2012 1

Tags