Naming Conventions
Many of the fields in the production database do not have descriptive names. In order to make maintenance easier and ease future upgrades in the BI system, fields in the production database are renamed in views and UDFs by using a consistent and descriptive naming convention. The following naming convention rules are used in views and UDFs on the production database: Use the prefix vw for the name of the view. For example: vwReservations, or vwGuests. Tables and Views use the same name space. Using the prefix vw allow you to easily differentiate between tables and views when a table list is presented to the user in a drop-down list. User-defined functions are prefixed with udf. Stored procedures are prefixed with usp. When naming the primary key field, use the singular form of the table name + ID. For example, if the table name is People, the primary key field name would be PersonID, for the table named Guests, the primary key field would be GuestID. Do not use just ID or Code for a primary key field name. This makes maintenance and troubleshooting very difficult if the primary key field name for all the tables is either ID or Code. Use Pascal Case for closed, compound field names (capitalize the first character of each word in the field name). For example: PascalCaseField, or WorkPhone. Do not use underscores in field names unless you are separating two parts of the name that contain abbreviations. For example: CC_ID for credit card ID. Actually, for this field, the preferred method would be better to spell out the words credit and card. Then, the field name would b CreditCardID.
Production Database
If a table is used in more than one UDF, a view is created to provide consistent naming for all fields. Otherwise, the UDF pulls directly from the table.
Stored Procedure
SSIS
UDF
SSIS Package
Table
View
Stored Procedure is called from SSIS Package. Data & Debug paramater is passed from SSIS to Stored Proc. A table-valued set is returned to SSIS package
Table(s)
UDF
Stored Procedure
SSIS Package
Table(s)
A stored procedure is put between the UDF and the SSIS package because there is a bug in UDFs that causes problems when parsing parameters that come from an SSIS package.
When the bug is fixed that causes a problem in the UDF when parsing parameter from an SSIS package , the stored procedure can be removed from the pipeline, and the UDF can be called directly from the SSIS package
In SSIS there are four levels of control. The first three level provided modularity, and determines the order in which ETL packages will be executed.
Flow Control
Configuration Information
When a package is started, connection information and variables are set from the package configuration. For the ADE BI System all configuration information is kept in the table admin.Configuration in the relational Staging database. Any information passed from one package to another is through this database.
The job of LoadGrouspFull_Daily.dtsx is to specify the order of operations: First dimension data is loaded into the relational warehouse, then fact data is loaded, then dimensions are processed in the AS cube, then facts are processed.
Figure 2. LoadGrouspDimensions_Daily.dtsx Sequence container holds packages to load dimension data into staging db
Figure 3. Dim_Company.DTSX High level flow control to load dimCompany into staging
When you drill down into Data Flow Task (DFT) Load Company the following flow control is revealed:
SRC DimCompany
SRC DimCompany is an OLE DB flow component. This component executes the stored procedure etl.uspDimCompany to extract the data from udfDimCompany in the Source database. The stored procedure has two parameters: @logicalDate and ,@debug exec etl.uspDimCompany @logicalDate = ? ,@debug = 0 The create script for the stored procedure is shown below:
CREATE procedure [etl].[uspDimCompany] @logicalDate datetime ,@debug bit = 0 --Debug mode? with execute as caller as /* This procedure is used to extract Company information into * the staging database * * exec etl.uspDimCompany '2007-05-23', 1 */ begin set nocount on if @debug = 1 begin select top (100) * from etl.udfDimCompany (@logicalDate) end else begin select * from etl.udfDimCompany (@logicalDate) end --if set nocount off end proc etl.udfDimCompany is a user defined function that returns a TableValued Function containing the desired records from the Company and Address tables. The create script for etl.udfDimCompany is shown below: CREATE function [etl].[udfDimCompany](@logicalDate datetime) returns table as return ( SELECT dbo.company_profile.account AS CompanyAccountID , dbo.company_profile.name AS CompanyName , dbo.company_profile.contact_name AS ContactName , dbo.company_profile.contact_title AS ContactTitle , dbo.address.address AS Address , dbo.address.Address_2 AS Address2 , dbo.address.city , dbo.address.state , dbo.address.country
2007 STATRA. All rights reserved. Proprietary & Confidential. Page 7
, dbo.address.zip , dbo.address.phone , dbo.address.fax , dbo.address.email , dbo.company_profile.credit_limit , dbo.company_profile.status , dbo.company_profile.property AS PropertyID , dbo.company_profile.locale_id AS LocaleID FROM dbo.company_profile INNER JOIN dbo.address ON dbo.company_profile.property = dbo.address.propertyfrom WHERE Logical_Date > @logicalDate );
STAT Source
STAT Source is a script component that executes a custom SQL Script. The purpose of STAT Source is to count the number of source records. The SQL script defined in STAT Source is as follows:
Imports Imports Imports Imports System System.Data System.Data.OleDb System.Collections
Public Class ScriptMain Inherits UserComponent Private startTicks, totalTicks As Long Private rowCount, totalRows As Integer Private rps As New ArrayList() 'rps = rows per second Public Overrides Sub Input0_ProcessInput(ByVal Buffer As Input0Buffer) 'Save the rate statistic for this buffer If startTicks <> 0 Then totalRows += rowCount Dim ticks As Long = CLng(DateTime.Now.Ticks - startTicks) If ticks > 0 Then totalTicks += ticks Dim rate As Integer = CInt(rowCount * (TimeSpan.TicksPerSecond / ticks)) rps.Add(rate) End If End If 'Reinitialize the counters rowCount = 0 startTicks = DateTime.Now.Ticks 'Call the base method MyBase.Input0_ProcessInput(Buffer) End Sub
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) rowCount += 1 'No exposed Buffer.RowCount property so have to count manually End Sub Public Overrides Sub PostExecute() MyBase.PostExecute() 'Define the Stored Procedure object With New OleDbCommand("audit.uspEvent_Package_OnCount") .CommandType = CommandType.StoredProcedure 'Define the common parameters .Parameters.Add("@logID", OleDbType.Integer).Value = Variables.LogID .Parameters.Add("@componentName", OleDbType.VarChar, 50).Value = Me.ComponentMetaData.Name .Parameters.Add("@rows", OleDbType.Integer).Value = totalRows .Parameters.Add("@timeMS", OleDbType.Integer).Value = CInt(totalTicks \ TimeSpan.TicksPerMillisecond) 'Only write the extended stats if RowCount > 0 If rps.Count > 0 Then 'Calculations depend on sorted array rps.Sort() 'Remove boundary-case statistics If rps.Count >= 3 Then rps.RemoveAt(0) 'Calculate min & max Dim min As Integer = CInt(rps.Item(0)) Dim max As Integer = CInt(rps.Item(rps.Count - 1)) 'Define the statistical parameters .Parameters.Add("@minRowsPerSec", OleDbType.Integer).Value = min .Parameters.Add("@maxRowsPerSec", OleDbType.Integer).Value = max End If 'Define and open the database connection .Connection = New OleDbConnection(Connections.SQLRealWarehouse.ConnectionString) .Connection.Open() Try .ExecuteNonQuery() 'Execute the procedure Finally 'Always finalize expensive objects .Connection.Close() .Connection.Dispose() End Try End With End Sub End Class
DER Coalesce
DER Coalesce is a data flow component that is used to replace any NULL values in the incoming source data with pre-defined unknown strings and numbers for the non-nullable target fields.
Inferred Output
The first component in the Inferred Output branch, STAT Inferred, counts the number of records that include inferred members. Inferred members would be added if a fact includes, in this case, a Company that doesnt already exist in the Company table. The final step for the Inferred Output branch updates a record in the dimension table that is blank except for the business primary key that was retrieved from a fact record. The remaining data in this record will be updated with actual data in a future import.
New Output
The first component in the New Output branch, STAT New, counts the number of records that include new type 2 members. New members would be added if a fact includes, in this case, a Company that doesnt already exist in the Company table. The next component, a Union All Transform Component named All New SCD 2, unions updates made to type 2 columns as well as new records containing type 2 data. The next component, a Derived Colum Transform Component named DER New SCD 2, sets the values for all the derived columns such as the surrogate key, CurrentRow, StartDate. EndDate, InferredMember, LastModifidiedDate. The final component in the New Output branch, an OLE DB Destination Component named DTS New SCD-2, writes the data to the DimCompany table in the staging database.
SCD-2 Output
The first component in SCD-2 Output branch, STAT SCD-2, counts the number of records that include type 2 members that have updates. Historical changes will be saved with all type 2 data columns. The next component, a Derived Colum Transform Component named DER SCD 2, sets the values for all the derived columns such as the surrogate key, CurrentRow, StartDate. EndDate, InferredMember, LastModifidiedDate. The next component in the SCD 2 Output branch, an OLE DB Destination Component named DTS New SCD-2, updates the CurrentRow field for the give business primaryKey. The SCD 2 Output branch then merges with the Output branch at the Union All Transform Component named All New SCD 2.
SCD-1 Output
The first component in the SCD-1 Output branch, STAT SCD-1, counts the number of records that include type 1 members that have updates. Historical changes will be overwritten with all type 1 data columns. The second and final step in this branch, updates all data for type 1 changes in the DimCompany table.