Using the CosmosDB Source Component
The CosmosDB Source Component is an SSIS data flow pipeline component that can be used to read/retrieve data from CosmosDB.
The component includes the following four pages to configure how you want to read data:
- General
- Document Designer
- Columns
- Advanced
General Page
The General page of the CosmosDB Source Component allows you to specify the general settings of the component.
- Connection Manager
-
The CosmosDB Source Component requires a connection in order to connect to a CosmosDB instance. The Connection Manager drop-down will show a list of all CosmosDB connection managers that are available to your current SSIS package.
- Database
-
This option lists all the available Databases in the CosmosDB instance. After selecting the Database you wish to read from, the Collection drop-down will be populated with the available Collections in the selected Database.
- Container
-
This option lists all the Containers available in the selected Database.
- Partition Key
-
The Partition key is required when reading Items in a partitioned collection.
- Advanced Settings
-
This option navigates to the Advanced Page of the CosmosDB Source Component.
- Source Type
-
The Source Type option allows you to specify whether you want to read an Item, or use the Item Change Feeds option.
- Items: Retrieves Data from Items.
- Item Change Feeds: Retrieves the captured changes in data from CosmosDB based on a token or a certain time.
- Output As Raw Text
-
The Output as Raw Text option specifies whether the output should be one single output column that contains the values in Raw Text format for each row returned by CosmosDB.
- Input Variable Type
-
This option allows you to specify how you want to retrieve Item Change Feeds (Only available when working with Item Change Feeds Source Type).
- Start Time: Retrieves the captured changes in data from CosmosDB based on a date time.
- Continuation Token: Retrieves the captured changes in data from CosmosDB based on the token.
- Input Variable
-
This option lists all the available parameters or user variables in your package which will hold your Input for the Document Change Feeds operation (Only available when working with Item Change Feeds Source Type).
- Output Variable
-
This option lists all the available parameters or user variables in your package which will hold the output of the Document Change Feeds operation (Only available when working with Item Change Feeds Source Type).
- Query
-
This textbox allows you to specify a query in order to retrieve and filter your data from CosmosDB. This option is only available for Items Source Types.
- Expression fx Icon
-
Click the blue fx icon to launch SSIS Expression Editor to enable dynamic updates of the property at run time.
- Generate Documentation Icon
-
Click the Generate Documentation icon to generate a Word document that describes the component's metadata including relevant mapping, and so on.
Document Designer Page
The Document Designer page allows you to build the design of the document you are trying to read or import the design from an existing document. This page is only available for Document and Document Change Feeds Source Types.
The Document Designer includes the following two tabs:
- Details View
- Additional Settings
In the Details View tab, the top part of the page is used to manually configure the nodes in the design:
- Add Node: This button will add a new node to your Document design.
- Remove Nodes: This button will remove a node from your Document design.
- Direction buttons: These buttons can be used to rearrange the position of the nodes.
- Rename Nodes: This option allows you to specify how the node name should be represented.
-
- Use Qualified Names: When this option is selected, the output/column name will be set to the full qualified node name based on the node location in the document.
- Use Short Names: When this option is selected, the output/column name will be set to the given Node Name directly.
-
Filter Columns: This option allows you to show or hide certain Columns in the grid.
- Show Basic Columns: When this option is selected, only basic columns will be shown in the grid.
- Show All Columns: When this option is selected, all available columns will be shown in the grid.
- Filter Nodes: This option allows you to filter the list of nodes shown in the grid by typing a keyword in the textbox.
The Details View grid consists of:
-
Node Type: This option allows you to specify the type of the Node in your document design. There are four options available:
- Array
- Object
- Value
- Raw: This type can be used when trying to retrieve data under a node exactly as it is in the document.
- Node Name: The Name of the Node in the document.
- Output/Column Name: The name which will be set for the output or the column of a node.
- Is Repeated: This option allows you to specify if a node is repeated within a document (Available when Show All Columns is selected).
- Output type: The type of output for a node such as a Column or a Secondary Output depending on the Node Type.
- Output Settings: This option allows you to specify the settings of each output such as the datatype of Value Node Types.
In the Additional Settings tab, you would find the following options:
- Null Mode: This option allows you to specify the handling of Null values.
-
'Is Repeated' Text Qualifier: This option allows you to specify the
Text Qualifier used in a document when the
Is Repeated property is set to
True for one or more nodes. There are four options available:
- Double-quote(“)
- Single-quote (‘)
- Tick (`)
- None
-
'Is Repeated' Text Delimiter: This option allows you to specify the
Text Delimiter used in a document when the
Is Repeated property is set to
True for one or more nodes. There are seven options available:
- Newline (\n)
- Carriage Return (\r)
- Semicolon (;)
- Colon (:)
- Comma (,)
- Tab (\t)
- Vertical Bar (|)
- Import
-
This option allows you to import the design of your document from one of the following four sources:
- Designer Settings: Import the design from an existing .designer.settings file.
- Document (CosmosDB): Import the design based on the retrieved document from the connection manager.
- JSON (Local File): Import the design based on a JSON file on your local file system.
- JSON Schema (Local File): Import the design based on a JSON Schema file on your local file system.
- CosmosDB Item Importer
-
When selecting the Item (CosmosDB) Import option, the CosmosDB Item Importer window will open which allows you to specify a query that will set the design of the Source Component based on the retrieved document.
- Items to scan: This option allows you to specify the maximum number of retrieved items that will be used to set the design of the Source Component. Setting this option to 0 will read all the retrieved documents.
- Export
-
Designer Settings: This option allows you to export the current document design to a .designer.settings file which can be used later to import the same design into a different component.
Columns Page
The Columns page of the CosmosDB Source Component shows you the available columns based on the settings on the Document Designer page.
-
On the top left of the grid, you can see a checkbox, which can be used to toggle the selection of all available fields. This is a productive way to check or uncheck all available fields. The Columns Page grid consists of:
- Include Field Checkbox: A checkbox that determines if the field will be available as an output column.
- Column Name: Column that will be retrieved from the document.
- Data Type: The data type of this field.
- Hide Unselected Fields
-
When the Hide Unselected Fields checkbox is checked unselected output columns will be hidden.
- Hide Selected Fields
-
When the Hide Selected Fields checkbox is checked, selected columns will be hidden.
- Filter
-
The visible output columns can be filtered by entering text in the Filter text box.
Note: As a general best practice, you should only select the fields that are needed for the downstream pipeline components. Do this on the columns page using the checkboxes or on the General page by removing the column from the command entirely.
Advanced Page
The Advanced page of the CosmosDB Source Component shows you additional options when retrieving data from CosmosDB.
- Consistency Level
-
You can choose from the dropdown the type of consistency level required for the (query/read feed) operation. Available options are:
- Null (Default)
- Strong
- Bounded Staleness
- Session
- Eventful
- Consistent Prefix
- Enable Low Precision Order By
-
This option can be used to enable low-precision order in the CosmosDB service.
- Enable Scan In Query
-
This option can be used to enable scans on the queries which couldn't be served as indexing was opted out on the requested paths.
- Max Buffered Item Count
-
The maximum number of items that can be buffered on the client side during parallel query execution in CosmosDB service.
- Max Degree Of Parallelism
-
The number of concurrent operations run on the client side during parallel query execution in CosmosDB service.
- Max Item Count
-
The maximum number of items to be returned in the enumeration operation in the CosmosDB service.
- Session Token
-
The session token for use with session consistency in the CosmosDB service.