Paging REST requests in SSIS using KingswaySoft JSON Source Component

14 January 2021
Chen Huang

In today's world, many of the applications that we try to integrate with contain a lot of data. Because of that, it is impractical, most of times nearly impossible, to return all records with one single service call. Fortunately many of the REST services provide a way to page through their records in the system in order to properly read the records in their entirety, which is crucial to many large-scale integration projects. Previously we had a blog post of How to Work with RESTful Endpoints in your SSIS ETL Process which shows a relatively simple pagination scenario. In this blog post, we will go a little beyond, and show you some of the most commonly used API pagination strategies and demonstrate how you may configure the KingswaySoft JSON Source component to accomplish a proper pagination when working with a large data set in your SSIS ETL process.

Page-based Pagination

Page-based pagination is the simplest and most common form of pagination. The dataset is separated by pages.

Example #1

In our first example, the page-based pagination is achieved using a combination of limit and offset parameters.

The API request using limit and offset will look like something below:

GET /contacts?limit=20&offset=100

Once called, the service would return a “hasMore” property which indicates whether there are more records available in the system to be retrieved. The API response might look like something below:

{
	"data": [...],
	"hasMore": true
}

For an API like this, we can configure JSON Source component as the following:

Page-based Pagination Example 1

As shown in the screenshot above, the Query String Parameters are set as:

  • limit: 20
  • (Expression Mode) offset: @[RequestCount]*20

We use Expression Mode for the “offset” query string parameter to dynamically change the value based on the @[RequestCount] number. @[RequestCount] is a system variable which start from 0 and increment by one(1) each time when the subsequent paging service calls are made, it is essentially the page number. The key is to make sure the Request Again Rule is properly evaluated, it should be evaluated to continue the paging when it is evaluated to true, or otherwise, it should stop paging calls when it is evaluated to false. Note that we have used the GetValueFromJson function which can be used to evaluate a JSON response by extracting data from a particular JSON node or property. In our case, the expression extracts from the returned JSON body and check if the “hasMore” value returned from the response is true. Note that we use a dot(.) character before the JSON name since the GetValueFromJson function expects a JPath for the evaluation:

  • (Expression Mode): GetValueFromJson( @[ResponseBody],".hasMore" )

Example #2

The second page-based pagination example still uses the limit and offset parameters, but the API response returns the total record number instead of an indicator to tell if there are more records.

Here’s a sample response when the service returns the total number in the API response:

{
	"data": [...],
	"total": 137
}

 The JSON Source component can be configured like this:

Page-based Pagination Example 2

The Query String Parameters are the same as the previous example:

  • limit: 20
  • (Expression Mode) offset: @[RequestCount]*20

The Request Again Rule is changed to the expression below to determine if there is more data available within the API.

  • (Expression Mode): (DT_I4)GetValueFromJson( @[ResponseBody],".total" )-(@[RequestCount]*20)>0

In the case when the service returns the total number in an HTTP header field:

X-Total: 137
We can adjust the Request Again Rule to use the following expression:

  • (Expression Mode): (DT_I4)GetValueFromPreviousResponseHeaders( "X-Total")-(@[RequestCount]*20)>0

Keyset-based Pagination

Keyset pagination is often used when fetching a sorted result from the service, the service provides a key parameter that acts as a delimiter of the page.

Example #1

In the first keyset-pagination example, the API endpoint returns a "vid-offset" field which lets you know where you are in the list of contacts and use the “vid-offset” value in the next request to page through the contacts. The API request looks like:

GET /contacts?limit=20&vid-offset=118

Let’s take a look at the sample API response:

{
	"data": [...],
	"vid-offset": 118
}

When there are no more records in the system, the “vid-offset” field will not be present in the API response. Based on the API behavior, we can configure JSON Source component as:

Keyset-based Pagination Example 1

The query string parameters are set to:

  • limit: 20
  • (Expression Mode) vid-offset: @[RequestCount]==0 ? (DT_NTEXT)"" : GetValueFromJson( @[ResponseBody], ".vid-offset" )

We use an expression below as the Request Again Rule to check the existence of the “vid-offset” in the API response to determine when to stop paging.

  • (Expression Mode): ValueExistsInJson( @[ResponseBody], ".vid-offset" )

The JSON Source configuration above sends an empty “vid-offset” query string parameter in the first request, in the case the service doesn’t like having the empty “vid-offset” parameter in the request, the JSON Source component can be configured as below:

Keyset-based Pagination Example 2

Instead of specifying “vid-offset” as a dynamic query string parameter, we have specified the “vid-offset” parameter in the Relative Path:

  • (Expression Mode): @[RequestCount]==0?(DT_NTEXT)"contacts":("contacts?vid-offset="+GetValueFromJson( @[ResponseBody], ".vid-offset" ))

Cursor-based Pagination

Cursor-based pagination works by returning a paging cookie (or cursor) which is associated with a record in the system and use it for the subsequent pagination service call to retrieve next set of records right after the paging cookie (or cursor). In most cases, the cursor is opaque so users cannot manipulate.

Example #1

Here’s a typical cursor-based pagination response example:

{
	"data": [...],
	"next_cursor": "xxx",
	"before_cursor": "yyy"
}

When the pagination has reached the last page, the service would return an empty “next_cursor” which indicates that there are no more records in the system to be fetched. Such as something below:

{
	"data": [...],
	"next_cursor": "",
	"before_cursor": "yyy"
}

Based on the above behaviors, the screenshot below shows the JSON Source configuration for this cursor-based pagination request:

Cursor-based Pagination Example 1

The query string parameters are set as below:

  • limit: 20
  • (Expression Mode) next_cursor: @[RequestCount]==0 ? (DT_NTEXT)"" : GetValueFromJson( @[ResponseBody], ".next_cursor" )

Request Again Rule is set to the expression below to check if the returned “next_cursor” value is empty:

  • (Expression Mode): GetValueFromJson( @[ResponseBody], ".next_cursor" ) != ""

Example #2

Some cursor-based pagination endpoints return URL to the next page and the previous page of results:

{
	"data": […],
	"links": {
		"next": "https://baseurl/api/v2/tickets?next_cursor=xxx&limit=20",
		"prev": " https://baseurl/api/v2/tickets?before_cursor=yyy&limit=20"
	}
}

 The service will return a null value for the next url when there are no more records to be retrieved:

{
	"data": […],
	"links": {
		"next": null,
		"prev": " https://baseurl/api/v2/tickets?before_cursor=yyy&limit=20"
	}
}

Based on the API behavior, we can configure JSON Source component as below:

Cursor-based Pagination Example 2

The query string parameters are set to:

  • limit: 20
  • (Expression Mode) next_cursor: @[RequestCount]==0 ? (DT_NTEXT)"" : GetTokenAtPosition(GetTokenAtPosition( GetValueFromJson( @[ResponseBody], ".links.next" ), "next_cursor=", 2 ), "&" ,1)

Request Again Rule is set to check if the returned “next” url is null:

  • (Expression Mode): !IsNULL( GetValueFromJson( @[ResponseBody], ".links.next" ) )

Closing Note

In this blog, we went through some most commonly used REST API pagination strategies and demonstrated the configuration of working with different paging techniques using KingswaySoft JSON Source component which can be found in SSIS Productivity Pack. With the functionality and flexibility of the JSON Source component, you can virtually work with any REST API. We hope you find this blog helpful.

Archive

November 2024 3 October 2024 1 September 2024 1 August 2024 2 July 2024 1 June 2024 1 May 2024 1 April 2024 2 March 2024 2 February 2024 2 January 2024 2 December 2023 1 November 2023 1 October 2023 2 August 2023 1 July 2023 2 June 2023 1 May 2023 2 April 2023 1 March 2023 1 February 2023 1 January 2023 2 December 2022 1 November 2022 2 October 2022 2 September 2022 2 August 2022 2 July 2022 3 June 2022 2 May 2022 2 April 2022 3 March 2022 2 February 2022 1 January 2022 2 December 2021 1 October 2021 1 September 2021 2 August 2021 2 July 2021 2 June 2021 1 May 2021 1 April 2021 2 March 2021 2 February 2021 2 January 2021 2 December 2020 2 November 2020 4 October 2020 1 September 2020 3 August 2020 2 July 2020 1 June 2020 2 May 2020 1 April 2020 1 March 2020 1 February 2020 1 January 2020 1 December 2019 1 November 2019 1 October 2019 1 May 2019 1 February 2019 1 December 2018 2 November 2018 1 October 2018 4 September 2018 1 August 2018 1 July 2018 1 June 2018 3 April 2018 3 March 2018 3 February 2018 3 January 2018 2 December 2017 1 April 2017 1 March 2017 7 December 2016 1 November 2016 2 October 2016 1 September 2016 4 August 2016 1 June 2016 1 May 2016 3 April 2016 1 August 2015 1 April 2015 10 August 2014 1 July 2014 1 June 2014 2 May 2014 2 February 2014 1 January 2014 2 October 2013 1 September 2013 2 August 2013 2 June 2013 5 May 2013 2 March 2013 1 February 2013 1 January 2013 1 December 2012 2 November 2012 2 September 2012 2 July 2012 1 May 2012 3 April 2012 2 March 2012 2 January 2012 1

Tags