Using the Salesforce Source Component
The Salesforce Source Component is an SSIS data flow pipeline component that can be used to read/retrieve data from your Salesforce instance. The Salesforce Source Component supports using either a Salesforce object or an object query statement as the data source.
The Salesforce Source Component includes the following two pages to configure how you want to read data from Salesforce.
- General
- Columns
General Page
The General page of the Salesforce Source Component allows you to specify the general settings of the component.
- Salesforce Connection Manager
-
The Salesforce source component requires a Salesforce connection in order to connect to your Salesforce.com instance. The Salesforce Connection Manager option will show all Salesforce connection managers that have been created in the current SSIS package.
- Batch Size
-
The Batch Size option allows you to specify how many records you want to retrieve each time. The default value is 1000. The maximum allowed value is 2000. This parameter should not be too small for performance reasons. Note that the component will automatically page through all records based on the object or object query you have specified below, regardless of the value of the Batch Size option.
- Include Deleted / Archived
-
The Include Deleted / Archived option specifies whether those deleted or archived Salesforce records should be returned.
- Use Bulk API
-
The Use Bulk API option specifies whether you want to use Salesforce Bulk API to read data from your Salesforce organization. This option can be used when you have a significantly large number of records in the Salesforce system which requires an extraordinarily long time if they are read synchronously.
Note that when this option is enabled, the Include Deleted / Archived option above will have no effect, as Salesforce Bulk API does not support reading deleted or archived records.
- Source Type
-
The Source Type option allows you to specify whether you want to read data from a Salesforce object, or use a snippet of a Salesforce object query. Using the Object option, you have the flexibility of visually picking which fields you want to read from the Salesforce object. When ObjectQuery is used, you can leverage the capabilities and flexibility of SOQL (Salesforce Object Query Language), such as JOIN, filtering criteria, etc. Your business requirements should influence which option to choose.
- Source Object
-
The Source Object option is only available when the Source Type option has been chosen as Object. When you click the drop-down button of this option, the toolkit will automatically retrieve the list of all available Salesforce objects, so that you can pick one.
- Output Timezone
-
The Output Timezone option specifies how Salesforce datetime values are retrieved. There are 3 options available:
- Timezone of Connection User (Default)
- Adjust to UTC
- Adjust to Local System Timezone
- Use Timezone Aware Data Types
-
The Use Timezone Aware Data Types option allows you to specify whether the source component should return datetime values with timezone information attached. When it is set to false (which is the default), all datetime fields will have a DT_DBDATE data type. Otherwise, when it is true, they will have a DT_DBTIMESTAMPOFFSET data type which is essentially a database timestamp field that can contain timezone information.
- PK Chunking Header (since v6.0)
-
PK Chunking Header allows you to use automatic primary key (PK) chunking for a bulk query job when working with SOAP service endpoint, which can be used to handle large data set extracts.
The supported field values are:
- TRUE: Enables PK chunking with the default chunk size (100,000), starting from the first record ID in the queried table.
- FALSE: Disables PK chunking. If no value is provided to PK Chunking Header property, the default is FALSE.
- chunkSize: Specifies the number of records within the ID boundaries for each chunk. The maximum size is 250,000.
- parent: Specifies the parent object when you’re enabling PK chunking for queries on sharing objects.
- startRow: Specifies the 15-character or 18-character record ID to be used as the lower boundary for the first chunk.
For example, the PK Chunking Header can be as simple as:
TRUE
or
chunkSize=50000
The following is an example with more options specified.
chunkSize=50000; startRow=00130000000xEftMGH; parent=Account
Note this option will only be available when the Use Bulk API option is enabled.
- Object Query
-
When the above Source Type option has been chosen as ObjectQuery, you will be presented with a text editor. You can use this to enter a snippet of the object query to retrieve data from your Salesforce.com instance.
There are several advantages of using ObjectQuery. First, you can use SOQL (Salesforce Object Query Language) to build a complex query that retrieves data from Salesforce. You may include fields from the parent object(s) in your query to read data from the parent object(s) directly. Secondly, using the ObjectQuery option, you can apply the WHERE clause to filter the data that is returned from the Salesforce instance, so that you only work with the data that you are interested in.
SSIS Integration Toolkit for Salesforce has the most sophisticated support of SOQL. You can write very complex SOQL queries, including using child-to-parent relationships, aggregation functions, and datetime functions in the queries. The following is a supported SOQL query that includes the usage of a child-to-parent relationship.
SELECT c.Id, c.FirstName, c.LastName, c.Account.Name, c.Account.NumberOfEmployees FROM Contact c
The following query with aggregation functions is also supported.
SELECT CALENDAR_YEAR(CreatedDate) CreatedYear, SUM(Amount) TotalAmount FROM Opportunity GROUP BY CALENDAR_YEAR(CreatedDate)
Due to the constraints of SSIS, we do not support reading data from child objects in the object query statement. For example, the following query is not supported.
SELECT Id, Name, (SELECT Id, Name FROM Contacts) FROM Account
The object query editor comes with a toolbar, which provides some common functionality useful when editing the query statement. One of the buttons is called Insert Variables, which can be used to insert SSIS variables in the query. With this capability, you can parameterize your SOQL query so it only returns the records that you are interested in.
You can use SSIS variables in whichever way you would like to use them, and you are not limited to using SSIS variables in conditions only.
It should be noted, when using SOQL aggregation functions and datetime functions in the SELECT clause, you would typically provide an alias for each function, otherwise, your output field name will be in exprN format, where N is the sequence number of such function.
- Refresh Salesforce Metadata Button
-
By clicking the Refresh Salesforce Metadata button, the component will retrieve the latest metadata from Salesforce and update each field. To streamline this process, three options are available to configure how updates should be handled for existing output columns:
- Update all: This would update all the fields regardless of whether there are metadata changes.
- Update if there are incompatible metadata: This option would refresh any incompatible metadata to the discovered metadata.
- Update non: No metadata updates would happen to existing fields.
The Refresh Salesforce Metadata feature has been designed to minimize any additional rework. Particularly, the component will only update all existing fields to the latest Salesforce metadata without actually deleting and re-creating them. This would otherwise cause rework if you have a downstream pipeline component mapped to such fields.
- Expression fx Icon
-
Click the blue fx icon to launch SSIS Expression Editor to enable dynamic updates of the property at run time.
- Generate Documentation Icon
-
Click the Generate Documentation icon to generate a Word document that describes the component's metadata including relevant mapping, and so on.
Columns Page
The Columns page of the Salesforce Source Component shows you all available columns from the object or object query that you have defined in the General page.
When the Source Type option has been chosen as Object on the General page, you will notice that the grid on the Columns page has a checkbox column on the left. This allows you to specify what fields you want to read from the chosen Salesforce object.
On the top left of the grid, the checkbox is used to toggle the selection of all available Salesforce fields. This is a productive way to select or unselect all available Salesforce fields.
Note: As a general best practice, only select the Salesforce fields that are needed for the downstream pipeline components. This would not only speed up Salesforce queries but also save resources for the SSIS engine.
When the Source Type option has been chosen as ObjectQuery on the General page, you will not see any checkboxes in the grid. The reason is, the object query itself defines what to read from Salesforce.