In today's world, many of the applications that we try to integrate with contain a lot of data. Because of that, it is impractical, most of times nearly impossible, to return all records with one single service call. Fortunately many of the REST services provide a way to page through their records in the system in order to properly read the records in their entirety, which is crucial to many large-scale integration projects. Previously we had a blog post of How to Work with RESTful Endpoints in your SSIS ETL Process which shows a relatively simple pagination scenario. In this blog post, we will go a little beyond, and show you some of the most commonly used API pagination strategies and demonstrate how you may configure the KingswaySoft JSON Source component to accomplish a proper pagination when working with a large data set in your SSIS ETL process.
Page-based Pagination
Page-based pagination is the simplest and most common form of pagination. The dataset is separated by pages.
Example #1
In our first example, the page-based pagination is achieved using a combination of limit and offset parameters.
The API request using limit and offset will look like something below:
GET /contacts?limit=20&offset=100
Once called, the service would return a “hasMore” property which indicates whether there are more records available in the system to be retrieved. The API response might look like something below:
{ "data": [...], "hasMore": true }
For an API like this, we can configure JSON Source component as the following:
As shown in the screenshot above, the Query String Parameters are set as:
- limit: 20
- (Expression Mode) offset: @[RequestCount]*20
We use Expression Mode for the “offset” query string parameter to dynamically change the value based on the @[RequestCount] number. @[RequestCount] is a system variable which start from 0 and increment by one(1) each time when the subsequent paging service calls are made, it is essentially the page number. The key is to make sure the Request Again Rule is properly evaluated, it should be evaluated to continue the paging when it is evaluated to true, or otherwise, it should stop paging calls when it is evaluated to false. Note that we have used the GetValueFromJson function which can be used to evaluate a JSON response by extracting data from a particular JSON node or property. In our case, the expression extracts from the returned JSON body and check if the “hasMore” value returned from the response is true. Note that we use a dot(.) character before the JSON name since the GetValueFromJson function expects a JPath for the evaluation:
- (Expression Mode): GetValueFromJson( @[ResponseBody],".hasMore" )
Example #2
The second page-based pagination example still uses the limit and offset parameters, but the API response returns the total record number instead of an indicator to tell if there are more records.
Here’s a sample response when the service returns the total number in the API response:
{ "data": [...], "total": 137 }
The JSON Source component can be configured like this:
The Query String Parameters are the same as the previous example:
- limit: 20
- (Expression Mode) offset: @[RequestCount]*20
The Request Again Rule is changed to the expression below to determine if there is more data available within the API.
- (Expression Mode): (DT_I4)GetValueFromJson( @[ResponseBody],".total" )-(@[RequestCount]*20)>0
In the case when the service returns the total number in an HTTP header field:
X-Total: 137 |
- (Expression Mode): (DT_I4)GetValueFromPreviousResponseHeaders( "X-Total")-(@[RequestCount]*20)>0
Keyset-based Pagination
Keyset pagination is often used when fetching a sorted result from the service, the service provides a key parameter that acts as a delimiter of the page.
Example #1
In the first keyset-pagination example, the API endpoint returns a "vid-offset" field which lets you know where you are in the list of contacts and use the “vid-offset” value in the next request to page through the contacts. The API request looks like:
GET /contacts?limit=20&vid-offset=118 |
Let’s take a look at the sample API response:
{ "data": [...], "vid-offset": 118 }
When there are no more records in the system, the “vid-offset” field will not be present in the API response. Based on the API behavior, we can configure JSON Source component as:
The query string parameters are set to:
- limit: 20
- (Expression Mode) vid-offset: @[RequestCount]==0 ? (DT_NTEXT)"" : GetValueFromJson( @[ResponseBody], ".vid-offset" )
We use an expression below as the Request Again Rule to check the existence of the “vid-offset” in the API response to determine when to stop paging.
- (Expression Mode): ValueExistsInJson( @[ResponseBody], ".vid-offset" )
The JSON Source configuration above sends an empty “vid-offset” query string parameter in the first request, in the case the service doesn’t like having the empty “vid-offset” parameter in the request, the JSON Source component can be configured as below:
Instead of specifying “vid-offset” as a dynamic query string parameter, we have specified the “vid-offset” parameter in the Relative Path:
- (Expression Mode): @[RequestCount]==0?(DT_NTEXT)"contacts":("contacts?vid-offset="+GetValueFromJson( @[ResponseBody], ".vid-offset" ))
Cursor-based Pagination
Cursor-based pagination works by returning a paging cookie (or cursor) which is associated with a record in the system and use it for the subsequent pagination service call to retrieve next set of records right after the paging cookie (or cursor). In most cases, the cursor is opaque so users cannot manipulate.
Example #1
Here’s a typical cursor-based pagination response example:
{ "data": [...], "next_cursor": "xxx", "before_cursor": "yyy" }
When the pagination has reached the last page, the service would return an empty “next_cursor” which indicates that there are no more records in the system to be fetched. Such as something below:
{ "data": [...], "next_cursor": "", "before_cursor": "yyy" }
Based on the above behaviors, the screenshot below shows the JSON Source configuration for this cursor-based pagination request:
The query string parameters are set as below:
- limit: 20
- (Expression Mode) next_cursor: @[RequestCount]==0 ? (DT_NTEXT)"" : GetValueFromJson( @[ResponseBody], ".next_cursor" )
Request Again Rule is set to the expression below to check if the returned “next_cursor” value is empty:
- (Expression Mode): GetValueFromJson( @[ResponseBody], ".next_cursor" ) != ""
Example #2
Some cursor-based pagination endpoints return URL to the next page and the previous page of results:
{ "data": […], "links": { "next": "https://baseurl/api/v2/tickets?next_cursor=xxx&limit=20", "prev": " https://baseurl/api/v2/tickets?before_cursor=yyy&limit=20" } }
The service will return a null value for the next url when there are no more records to be retrieved:
{ "data": […], "links": { "next": null, "prev": " https://baseurl/api/v2/tickets?before_cursor=yyy&limit=20" } }
Based on the API behavior, we can configure JSON Source component as below:
The query string parameters are set to:
- limit: 20
- (Expression Mode) next_cursor: @[RequestCount]==0 ? (DT_NTEXT)"" : GetTokenAtPosition(GetTokenAtPosition( GetValueFromJson( @[ResponseBody], ".links.next" ), "next_cursor=", 2 ), "&" ,1)
Request Again Rule is set to check if the returned “next” url is null:
- (Expression Mode): !IsNULL( GetValueFromJson( @[ResponseBody], ".links.next" ) )
Closing Note
In this blog, we went through some most commonly used REST API pagination strategies and demonstrated the configuration of working with different paging techniques using KingswaySoft JSON Source component which can be found in SSIS Productivity Pack. With the functionality and flexibility of the JSON Source component, you can virtually work with any REST API. We hope you find this blog helpful.