Using the Premium Slowly Changing Dimension Component
The Premium Slowly Changing Dimension Component is an SSIS data flow pipeline component that can be used to monitor changes in the value of dimension fields that change slowly. Slowly changing dimensions are dimensions with values that are relatively static and change infrequently and unpredictably. The component can process three types of slowly changing dimensions: Fixed Attributes which are never supposed to change, Changing Attributes which can change but don't have a history, and Historical Attributes which can change, with a history of all previous values is kept.
The component requires two inputs: a Current Dimension Input with data that should be monitored for changes, and a New Source Input with potentially changed data.
The component includes the following four pages to configure how you want to process data.
- General
- Change Type Settings
- Advanced Settings
- Writing (since v7.1)
- Error Handling
General Page
The General page of the Premium Slowly Changing Dimension Component allows you to specify the general settings of the component.
- Fields to Compare Grid
-
The Fields To Compare grid will display all the available fields from the Current Dimension Input. Each of these fields can be paired with a field in the New Source Input. For every field that has been paired it is possible to designate these fields as a business key. A compound business key is created from all the key fields that are selected. The compound business key is used to determine which rows to compare in the Current Dimension Input and New Source Input. Selecting a Column Type for a mapped pair of fields will enable options on the Change Type Settings page.
The possible column types are:
- Fixed Attribute (Type 0) (Retain Original): Attributes that are not supposed to change. An error row will be directed if this attribute changes, and an unchanged row will be directed if it does not change.
- Changing Attribute (Type 1) (Overwrite): Attributes that are supposed to change. A changed row will be directed if this row changes, and an unchanged row will be directed if it does not change.
- Historical Attribute (Type 2) (Track History): Attributes that are supposed to change and a history is kept. If this attribute changes a changed row will be directed to expire the current row, and a new row will be directed to represent the new current row.
- Surrogate Key: A surrogate key column has a numeric value that is unique in the current dimension input. Having a surrogate key can be useful when performing inserts and updates later in the workflow. The component can automatically increment this value in new rows that it creates.
- Current Record: The current record column type is used to identify a record as current or expired. The values that represent current and expired are defined on the change type settings page.
- Start Date: The start date column type is used to identify the column in the record that represents the start date for the period in which it is current.
- End Date: The end date column type is used to identify the column in the record that represents the end date for the period in which it is current. The format of the end date is defined on the change type settings page.
Note: Fields that are different types can be compared, but a warning may be shown if the comparison combination is unsupported. The component supports the following comparisons of different data types:
Data Type 1 Data Type 2 Notes Any String Type Any String Type Any Numeric Type Any Numeric Type Internally numbers are converted to a common comparable type and compared. Any String Type Boolean The component attempts to parse the string to a boolean. True/False and Yes/No strings are supported. This is a case-insensitive comparison. Any Numeric Type Boolean The component treats 0 as False and 1 as True. - Filter
-
The current dimension input columns that are visible can be filtered by entering text in the Filter text box.
- Hide Unpaired Fields
-
When the Hide Unpaired Fields checkbox is checked only paired input columns will be shown.
- Hide Paired Fields
-
When the Hide Paired Fields checkbox is checked only unpaired input columns will be shown.
- Refresh Component Button
-
Clicking the Refresh Component button will reload input columns from the current dimension input and new source input.
- Map Unmapped Fields Button
-
Clicking the Map Unmapped Fields button will attempt to pair all the current dimension input fields to new source input fields with matching names.
- Clear All Mappings Button
-
Clicking the Clear All Mappings button will clear all pairings between current dimension input fields and new source input fields.
Change Type Setting Page
The Change Type Settings page has settings that are applicable to change types selected on the General page.
Changing Attribute Settings
-
If you have set one of the paired dimensions on the General page to change type "Changing Attribute" the changing attribute settings option will be enabled, the option is labeled "Change all the matching records, including outdated records, when changes are detected in a changing attribute". If this option is checked all records from the primary input with a matching business key in the secondary input will be updated with the new dimension value.
If this option is not checked, the component will attempt to identify the current record and only update that record. The current record for changing attributes can only be identified if dimensions on the General page have column types Start Date/End Date or Current Record.
- Historical Attribute Settings
-
If you have set the column type of one of the dimensions on the General page to Start Date/End Date or Current Record the historical attribute settings option will be enabled. There are two ways to identify the current and expired records, using a single column, or using start and end dates.
- Historical Attribute Settings - Single Column
-
Values used to detect current or expired columns can be used on string, numeric or boolean columns. The component will attempt to convert the strings that you specify to the type of column to identify or mark the record as current or expired.
- Value when current - The value of the record when current.
- Value when expired - The value of the record when expired.
- Historical Attribute Settings - Date Columns
-
- Variable to set date values - The variable that is used to get the current date for setting start date and end date values in new and updated rows.
- End Date Value for Current Records - The value to use when setting the end date for current records, options are Null Value, Dec 31, 9999, or a custom date.
- Expire Current Dimension Records not found in New Source
-
When this option is enabled, records that exist in the Current Dimension table but not in New Source will be set to expired.
Advanced Settings Page
The Advanced Settings page contains additional settings to control the behavior of the premium slowly changing dimension component.
- Culture Identifier
-
Specifies the culture identifier that should be used when comparing strings.
- Ignore Leading and Trailing White Space
-
When Ignore Leading and Trailing White Space is checked leading and trailing white space of string fields will be ignored during comparison.
- Ignore Case
-
When the Ignore Case is checked case will be ignored when comparing string fields.
- Surrogate Key Settings
-
If you have selected a surrogate key column two additional options must be specified. In both cases, the value can be specified, loaded from a variable, or auto-detected.
- Surrogate Key Start Value: The value to use for the surrogate key in the first new row. If this value is set to Auto Detect the component will look for the highest surrogate key value in the current dimension input and use this as the start value.
- Surrogate Key Increment: The value by which the surrogate key will increment for each new row. If this value is set to Auto Detect the component will look for the smallest increment between surrogate key values in the current dimension input and use this as the increment.
- Enable DetectedChangeType Column for Default Output
-
When the DetectedChangeType checkbox is checked the type of change, either 'Fixed', 'Updated', or 'New' will be included in an additional output column for each output.
Writing (since v7.1) Page
The Writing page allows you to specify an ADO.NET connection manager to write to the destination table directly. Note the writing is optional.
- Connection Manager
-
The Premium ADO.NET Destination Component requires an ADO.NET connection manager or a Snowflake Connection Manager. The component currently supports the following ADO.NET data providers:
- SQL Client Data Provider
- ODBC Data Provider
- OLE DB Data Providers
- Oracle Client Data Provider
- ODP.NET Managed and Unmanaged
- MySQL Client Data Provider
- Database (Available Only when working with a Snowflake Connection Manager)
-
The Database drop-down menu displays a list of available databases in the Snowflake instance defined in the Connection Manager. Selecting a database here will automatically populate the Schema drop-down list.
- Schema (Available Only when working with a Snowflake Connection Manager)
-
The Schema drop-down menu displays a list of available Schemas in the previously selected Snowflake database. Selecting a schema here will automatically populate the Destination Table drop-down list.
- Destination Table
-
The Destination Table drop-down displays a list of available tables for the database specified in the Connection Manager.
- Use Bulk
-
Enable bulk when supported to get an increase in performance. If you do not see the Use Bulk option, it simply means that the current ADO.NET Connection Manager does not support it.
- Column Mappings
-
- Key Column: The key checkbox column is used when performing an update where the selected fields represent key columns. These fields are used in the WHERE clause of the UPDATE statement.
- Input Column: Select an Input Column from an upstream component here.
- Destination Table Column: This is the field you are writing data to.
- Data Type: This column indicates the type of value for the current field.
- Unmap: This column can be used to unmap the field from the upstream input column, or otherwise it can be used to map the field to an upstream input column by matching its name if the field is not currently mapped.
Error Handling Page
The Error Handling page allows you to specify how errors should be handled when they happen.
There are three options available.
- Fail on error
- Redirect rows to error output
- Ignore error
When the Redirect rows to error output option is selected, rows that failed to write to the Premium Slowly Changing Dimension component will be redirected to the 'Error Output' output of the component. The blue output connection represents rows that were successfully written, and the red 'Error Output' connection represents erroneous rows. The 'ErrorMessage' output column found in the 'Error Output' may contain the error message that was reported by the server or the component itself.
Sample Scenario
The scenario used for this tutorial has one Supplier Dimension table and a source file. The Supplier Dimension table will be updated using a source file.
Source File
The source file contains 5 records that will apply changes to the dimension table when processed.
SupplierCode | SupplierName | Address | State |
---|---|---|---|
S0000001 | ABCD Company | 46 Lori Street | AL |
S0000002 | XYZ Corporation | 1156 Balist Road | MA |
S0000003 | Test New Company | 6540 Cole Alley | GA |
S0000004 | Blue Account | 216 Reinger Shores | NE |
S0000005 | M Organization | 153 Baff View | NJ |
Supplier Dimension
The Supplier Dimension is a table in the target database. Initially, this table contains records for four Suppliers.
SupplierSK | SupplierCode | SupplierName | State | IsCurrent | EffDate | ExpDate |
---|---|---|---|---|---|---|
1 | S0000001 | ABC Company | AL | 1 | 2017-01-13 | |
2 | S0000002 | XYZ Corporation | WA | 1 | 2017-01-01 | |
4 | S0000003 | Test Company | NY | 1 | 2016-10-26 | |
5 | S0000004 | Blue Account | NE | 1 | 2016-05-08 |
Building Slowing Changing Dimension SSIS Package
We are going to build an SSIS package that reads the data from Supplier Dimension and source file and loads data into the Premium Slowly Changing Dimension component to perform the comparison and configure what action we want to take when a change occurs.
The General page in Premium Slowly Changing Dimension is used to define what the various columns of the dimension table are used for. Set column types for the columns as shown below.
We have set the SupplierName as Historical Attribute so that it will output an expired record to the ‘Changed Rows’ output, and a new record to the ‘New Rows’ output which can be used to update the dimension table. The State attribute is set to Change Attribute thus existing row in the dimension table is updated to contain the new attribute values.
In the Change Type Settings page, we define the values written to the Current Record, Start Date, and End Date columns.
The Advanced Settings page is used to specify how surrogate keys are generated and additional string comparison rules. Also, we have the option to enable the DetectedChangeType column which can be used as a reference of what action has been applied to the record.
After finishing the configuration in the Premium Slowly Changing Dimension Component, then we need to write the output data into the Supplier Dimension table as shown below.
Below is a snapshot of the output data after processing.
SupplierSK | SupplierCode | SupplierName | State | IsCurrent | EffDate | ExpDate | DetectedChangeType |
---|---|---|---|---|---|---|---|
Unchanged Rows | |||||||
5 | S0000004 | Blue Account | NE | 1 | 2016-05-08 | No Change | |
Changed Rows | |||||||
1 | S0000001 | ABC Company | AL | 0 | 2017-01-13 | 2017-03-07 | Expired (Type2) |
2 | S0000002 | XYZ Corporation | MA | 1 | 2017-01-01 | Update | |
4 | S0000003 | Test Company | NY | 0 | 2016-10-26 | 2017-03-07 | Expired (Type2) |
New Rows | |||||||
6 | S0000001 | ABCD Company | AL | 1 | 2017-03-07 | New Record(Type2) | |
7 | S0000003 | Test New Company | GA | 1 | 2017-03-07 | New Record(Type2) | |
8 | S0000005 | M Organization | NJ | 1 | 2017-03-07 | New Record |