Anda di halaman 1dari 76

The following table contains terms and definitions which are used in this document

and apply to data implementation within Sunrise processes.

Agile: A software development methodology that divides larger software


products into smaller units of well tested pieces. Agile supporters claim, true
Agile is NOT Cowboy Coding and NOT Waterfall Development although at its
smallest level, individual developers do a lot of both.
Cowboy coding is software development where programmers have
autonomy over the development process. This includes control of the
project's schedule, languages, algorithms, tools, frameworks and
coding style.
Waterfall Development: The waterfall development model is a
sequential design process, used in software development processes,
in which progress is seen as flowing steadily downwards (like a
waterfall) through the phases of conception, initiation, analysis,
design, construction, testing, production/implementation and
maintenance.
Product Owner: A knowledgeable resource from the business community with
grand vision of overall technical product/solution and who acts as a liaison
between the technical delivery team and the business users. As a liaison
between the business community and technical teams a product owner
manages prioritization of requirements for the technical teams and manages
delivery expectations to the business community.
Stakeholders: The members of the business community who access the
technical product and its features to deliver value to the organization.
Development Squad: The technical community that works on transforming
business requirements into technical products/solutions and features on an
ongoing basis. Development Squads are generally made up of business
analysts, technical architects, development and testing teams.
User Story: A requirement defined in business terms by the business
community with a view of the end result in business terms. A user story is
generally made up of 2 components
A requirement for a feature or set of features
A business justification for the requirement
The user stories in Sunrise can be grouped into any of the following types
New Feature or Enhancement
Upstream Application Replacement
Source Data Changes
Infrastructure Changes
User story status: Each user story can be tracked based on its current status.
Draft A user story that is in the process of being created or just
drafted as a wish list.
Submitted A user story that has been submitted by the business
community for implementation by the IT team.
Page 1

Reviewed This status marks the review completion of a user story


within IT team with feasibility checks and a tentative implementation
plan.
Approved The IT management approves the user story and the user
story gets added to the backlog and assigned to a resource.
Scheduled The IT resource has picked up the user story for
implementation within any of the planned iterations and marks the
completion of the iteration as an estimated completion time
Completed Once the user story has been implemented and signed of
on, the user story is marked complete.
Cancelled The user story can be withdrawn by the stake holders or
business whereupon the status is set to cancelled
Backlog: A preliminary wishlist of requirements requested by business
stakeholders in business terms in context of a business value.
Backlog grooming:
What: A defined periodic review of the business wishlist review
When: Often - weekly or twice a week
Who: Spear Headed by Product Owner with participation from development
teams or business users
Outcome:
1. Cancel or Submit User Stories
2. User Story Grading: Based on the complexity and estimated history of
user story delivery the user stories are graded during the backlog
grooming process.
3. Prioritization of user stories based on story grade and business and
knowledge value.
T Shirt Sizing:
User story review: The process where development squads analyze user
defined stories to understand the requirement and the feasibility of
implementation. Various factors of a business defined user story like business
rules, workflow steps, platforms, roles, operations etc. are analyzed and
reviewed within squads or with dialogue with business users.
Splitting User Stories: As a result of user story reviews squads arrive at an
agile implementation path by breaking down business defined user stories
into well rounded testable development user stories that fit within an
iteration.
Epic: An epic is a requirement or set of requirements that is either too large,
too complex or both and needs multiple iterations for delivery.
Backend Story: A well rounded requirement that can be developed, tested
and delivered within an iteration. Backend stories are defined by the
development squad in technical terms with a technical end goal. These
technical goals can generally be traced back to one or more business defined
user stories but maybe independent stories defined by the squad.

Page 2

Backend Story Points: A quantitative measure used to comparative scale of


user stories based on multiple factors including the delivery, complexity,
number of components involved, number of resources needed etc.
Backend Story Point Estimation: The process of collaborative or individual
analysis of a user or backend story to grade the delivery of a completed story.
Fibonacci Estimate: A numerical scale that gradually progresses between the
series element with an increased gap i.e. 1, 2, 3, 5, 8, 13. The Fibonacci
series is used to estimate agile work items since the gradual progression of
the series provides additional weight between series elements as they
progress and thus accounting for the additional uncertainty with larger agile
work elements. Marketing Transformation group has adopted Fibonacci
estimation as standard for estimating story points across squads
Iteration: A defined timeframe providing regularly spaced delivery of features
to the business community. Functionally a feedback loop needs to fit within
an iteration.
Delivery/Deployment: An agile landmark that marks the completion of a
acceptance of a feature at the end of a feedback lo
Retrospectives
Work in Progress (Limit): The
Acceptance Criteria
Splitting stories
Business risk, Social/Cultural risk, Technical risk, Cost and Schedule risk
Customer value vs time
Reactive vs Proactive
Feedback Loop
Story burnup chart: Delivered Stories vs Time
Outcome Chart/Deployment Chart:
Funnel
Splitting User stories:
User story grading: Following a user story review the story is assigned a
story grade using the Agile T--shirt sizing. Any user story can be graded as an
S, M, L or XL
Work Items: A work item is a manageable technical deliverable that can be
traced back to a user story. Work items from various user stories may be
grouped from an agile delivery context into 2 week iterations
User story implementation: Once reviewed, approved and graded the user
story will be broken down into work items

Page 3

There are three principal data environments with the Sunrise architecture. A graphic
view is shown below along with information on the purpose served by each.

Warehouse
All source data is loaded to DB2 tables in the Sunrise warehouse database.
When all sources are transitioned to the Sunrise sourcing architecture, the
Sunrise warehouse will have replaced the existing collector database (CDB)
along with the load database (LPDB), the ETL database (PETL), and parts of
the raw database.
With the agile development approach Sunrise has taken source data is
identified to support a user story. This data is documented along with any
rules required to manage the data as well as the source of the data. In many
cases the data required to support a user story already exists in the Sunrise
environment but new data from existing or entire new sources is identified.
When the content is new from an existing source the source DataStage or LEI
process is extended to include the new element. The elements history over
time, where available, is also brought in. When the data is from a new source
the source is defined to the DataStage or LEI processes and a source specific
data acquisition document created. This document is provided to an ETL
developer who will acquire the data from the source. The newly acquired
data, and historical content if available, will be loaded into the warehouse.
The Data Design section covers details on the data warehouse processes.

Reporting Data Mart

Page 4

The reporting data mart contains a business specific set of marketing data
based on business reporting requirements. This data has been transformed
into a dimensionally based star schema implementation. There are three
major data areas within the dimensional reporting data mart which are:
1. Classification dimensions (e.g., what country is the opportunity located
in),
2. Business measurement fact (e.g., validated leads created, total
response count), and
3. Summary aggregate (e.g., align multiple measures into a single record
for high performance reporting).
Each of these is detailed below.

Dimensions
The word dimension is another word for a categorization or classification
scheme to be applied in order to understand data. There are principally two
kinds of dimensions which are:
1

Reference data dimensions and

1. Business data dimensions.


Reference data dimensions are normally easy to understand when examples
such as country, brand, or even quad tier are used. These types of
dimensions have a fixed set of business agreed values which applications will
present and capture (e.g., the 248 countries in the country reference data
standard which may change once every five years).
Business data dimensions, on the other hand, have entries based on some
business process and are subject to constant change. An IBM employee
dimension would have the HR country code and serial number for every IBM
individual. Since people are constantly hired and leave this dimension
changes hourly. Another example of a business data dimension is marketing
campaign tactic. There is not a fixed list of tactics; they change every day.
But results are categorized or grouped based on the planned business data
captured for each defined tactic.
Whether the dimension is reference or business data Sunrise manages the
dimension content over time; in warehouse terminology this is whats known
as a type 2 dimension. This means that every time a dimension row
changes a new record is inserted into the dimension with the appropriate
Page 5

efective date and setting the expiration date of the previous record. In
addition to setting the expiration date on the old record the replaced by key
is also set to point to the newly insert record. One additional update is made
to all prior historical records to set the current key value to the key of the
newly inserted dimension record. This provides an easy method to bring
historical content to a current view. For example a S/36 because an AS/400
which became an iSeries. The older S/36 and AS/400 records current key
would be the key of the current iSeries record.
Below is a Sunrise dimension record so highlight the control information as
well as the dimension values itself. The dimension value includes the
dimension code, short name, and long name values. Every other element is
used by the dimension control processes to manage the dimension in and
across time.

Facts
Based on business reporting requirements content from the warehouse
is processed into business metrics and summarized, where required,
into aggregated information. For each of the measures the business
Page 6

identified classification dimensions, along with the dimension depth,


are used in creating each business metric per business metric rules. An
example of a currently defined business metric fact and classification
dimensions is:
Validated Pipeline Revenue classified by
1

IBM Geography Hierarchy Dimension


Customer Set Dimension
Measurement Time Dimension
EDGE Brand Hierarchy Dimension
Below is a screen capture of a Sunrise metrics table containing various
facts required for business reporting.

Summary Aggregates
Once metrics have been generated for a Sunrise story performance of
the Sunrise reporting or analytics is evaluated. If the amount of data
cannot support a near-real time response to Cognos reports then use of
Page 7

summary aggregates are pursued. Basically the aggregate provide the


means to roll-up lower level record content. In addition to rolling-up
content often the number of dimensions required for high level reports
is less than detail reports.
Given these two situations aggregates normally have measures from
one or more metrics tables with fewer dimensions than the detail
records have and at a higher level in the dimensions that are included
in the aggregate. For example global reports may only reflect IOT and
IMT geography content along with the first level of the customer set
dimension. By rolling up the country content to their IMT and the
second level customer set values to the first level the number of
records required may be only 10% of the detail metric input record
count.
An example of an aggregate, viewed physically, is depicted below.

Although the Cognos reporting framework, or analyst queries, could


calculate the metric every time they execute it is far more efficient to
calculate the value once and store it for all queries to use aligned to
the required business classification dimensions.

Roll Up Summary Aggregates


When balancing the volume of data against the number of required
dimensions it is sometimes advantageous to roll up aggregate results
to a higher level within the same dimensions or drop dimensions based
on report usage. For example if 70% of the reports only use the first
two levels of the geographic hierarchy and the additional hierarchy
levels increase the number of aggregate rows by 400%, it may make
sense to create a roll up view. In the first release of Sunrise the
Page 8

implementation of such a table reduce report run times for executive


management reports by 60%.

Analyst Data Repository


Although a great deal of research and analysis goes into content before it is
ever included in the marketing management system there are always
outliers which required analysts to investigate. In addition there are always
great ideas on what data could be used to better interpret results or predict
future results. To support this Sunrise has an analyst data repository which
contains the data used in the management reports as well as additional detail
identified for research or future analysis. The ADR contains, based on current
requirements, a three year running set of data for analysts use. The maintain
timeframe can vary by data area, by source, or by a criteria specific to the
needs of business analysis. No matter what the maintain timeframe, once
that time has elapsed, records older than the required timeframe are
removed from the ADR. In the event that business requirements change and
a longer value for the elapsed timeframe is required, the removal rule can be
changed and the older data refreshed from the warehouse.
The ADR is enabled for ODBC as well as DB2 access to facilitate a wide range
of tool access to support as many methods to analyze content as possible.
For example many marketing operations analysis use Hyperion Explorer,
others use SPSS statistics, and other use Cognos Insights. No matter what the
tool analysts have access to all the data used in reporting to research issues
for explanations and/or investigate new ways to help better manage the
business using marketings business data assets.
Page 9

Page 10

Trusted Data Source


In order to minimize issues with sourced data it is always best to acquire the
content from the original capture application. This eliminates issues that occur
when acquiring source data from some downstream application or database
where records may have been filtered, transformed, or otherwise masked.
Depending on the downstream systems availability, retention, accessibility,
performance, etc. the data may be close to or way diferent from the original
capture and management source. To that end Sunrise will always seek to acquire
data directly from the capture/management application point based on an
interface DOU.

Trusted Source Identification


When business data requirements are identified for marketing management
reporting or analytics the original source will be contacted to acquire data. In
some cases, a downstream source may also be required in order to fulfill on
the business data requirement. For example, the marketing management
system requires opportunity data which was managed in CRM and is now
managed in Sales Connect. But, even though this is the master source and
Sunrise does acquire directly from it, the business required that opportunities
within the current sales pipeline reported by marketing must align with those
reported by sales. Since sales reporting is performed by the SMS application
using data from the EIW/ESA database, Sunrise also acquires data from
EIW/ESA.

Source Construct Change Tracking and Alerts


In a perfect world every application would pre-announce any changes to its
application data structures, Unfortunately, in today agile and ever changing
business data world seldom is advance notice provide on changes to
structures and, even when notice is provide, often the notice does not cover
the entire extend of the changes. To that end Sunrise implemented a tracking
process whereby every source is queried to identify new, changed or deleted
constructs. Constructs retrieved include both tables/documents as well as
columns/fields.

Page 11

The construct data is acquired weekly and runs through the normal WDM
routines to determine change. If there is no change or the change involves
addition, the process does not raise an alert. If, on the other hand, existing
structures are modified (e.g., table has a column removed or the data type
for a column changes) the process issues a check source alert. The architect
responsible for the source will review the changes to determine any impact to
Sunrise build along with any downstream consumers.

Page 12

Trusted Source Data Acquisition


There are many sources of business data content for marketing operations in
Sunrise. There are two primarily database technologies in use by the source
capture/management applications which are (1) DB2 and (2) Lotus Notes. For
each of these technologies there are many tools to acquire data into Sunrise.
Short descriptions for each technology as well as appropriate use are outlined
below.

InfoSphere DataStage
DataStage is a product which allows data from a multitude of sources and source
technologies to be managed into a DB2 database, Netezza appliance, or many
other data environments. It provides many built-in functions for transforming
data thereby simplifying the amount of skills required to establish a source to
target data load.
Sunrise will primarily use DataStage as the means to acquire data into the
Sunrise warehouse. Below is an example source to target data flow defined into
DataStage.

For DataStage services, as with reporting, Sunrise uses the IBM Worldwide BACC
infrastructure.

Page 13

DB2 Database Replication


DB2 provides built-in functionality to copy data from one DB2 database to
another. This is managed by DB2 using transaction log entries which identify
records which had changes. These changed records are then sent to the target
DB2 database where they are inserted, updated, or deleted. Database replication
is fast and provides the means to easily keep to database tables in sync.
Although replication is relatively easy to implement, for Sunrise, it requires
additional processing at the warehouse end to determine changes and manage
the warehouse content over time. Additionally most of the Sunrise source data is
not taken in its entirety so straightforward replication does not fit for many
Sunrise sources.

DB2 Database Federation


Another means DB2 can make data available to other DB2 databases is via a
federated connection Federation, like replication, is easy to set up but unlike
replication no data is automatically sent to a target database. Instead the
target database executes a select query which the DB2 target database routes
to the source database. The query executes on the source database and the
results of the query are returned to the target database.
In addition to the target selecting data from the source and performing a local
insert, the source database could federate the target. This allows the source to
remotely insert records into the target database without any efort on the target
database processes. The PDb makes extensive use of federation as does the
events DB2 database.

CastIron Appliance
Another company that IBM bought was CastIron. This company created a robust
series of source to target functionality which operates across many diferent
exchange technologies. Although the technology exists and is in use by Sales
Connect to acquired tactic plan data from PDb/Sunrise the infrastructure is not
as pervasive as that of DataStage. Additionally DataStage resource is relatively
easy to acquire.

Page 14

Lotus Enterprise Integrator (LEI)


IBM acquired the Lotus company many years ago and Lotus Notes became a
prevalent tool for creating anywhere to a simple entry application to a complex
planning system. Notes stores data into documents, think database rows, which
can be moved from a Notes container to a DB2 database via the Lotus
Enterprise Integrator (LEI). LEI provides the means to have Notes control data
inserting, updating or deleting content from DB2 databases as changes occur in
the Notes application database. The Event applications as well as the MRF
application makes extensive use of LEI to manage data from Notes into DB2.
In addition to LEI Notes also provides an ODBC interface to the Notes container
database. This enables SQL based tools to issue SQL queries to retrieve data.
Although this technology has been used in previous PDb data sourcing work it
tends to be slow and was limited to Latin-1 character content. So, for example,
an event name in Japanese would show corrupted characters in the returning
data. For Sunrise any Notes application source will use LEI to move data from
Notes to the Sunrise warehouse.

Source Application Selection


The following matrix provides a visual guide for which technologies to use for
Sunrise data acquisitions depending on the data requirements.
ETL Tools
DB2 Federation

DB2 Replication

CastIron

DB2
1. Local results
limit acquired data
from source
database (like
tactic in EST with
PDb)
1. When source
table and target
table same
structure.
2. When near real
time data required
3. When database
logging is not
circular.
1. When source or
target application

Notes
N/A

GSA File
N/A

N/A

N/A

N/A

When GSA file


processed by
Page 15

DataStage

LEI

have CastIron
appliance
infrastructure
1. Net change
based on solid
source update
date process
2. Change data
capture if source
enabled
1. Inserts into
stage table

source application
N/A

Connects to Notes
database and
inserts new or
updated records
along with deleted
records

When production
process
consistently
outputs same file
structure and
required needed
to change file
structure
N/A

Page 16

Business Data Standard Reference Data


Common reference data is critical to support accurate data integration from
multiple applications. IBM established the business data standards (BDS) process
and repository to support and enforce the use of common reference data. Use of
common reference data, such as the two character country code, saves millions
of dollars in IT expense attempting to integrate data from two diference
applications when they have diferent country standards.

Reference Standards Process


Each reference data, or data element, which requires a standard is nominated
and, upon acceptance, added to IBMs business data standard. The BDS can be
located at the following URL. Once an element or reference value is defined it
goes through a cross business unit, cross geography, and cross application
review. Part of that process will determine when the standard can go into efect
and the process insures everyone is clear on what the element is or what the
reference value represents.
Once a standard is approved all applications must implement the standard by
the defined deployment date. Any application that cannot meet the standards
deployment date must request an exception, in writing, since data from that
application will now be out of sync with other applications. Those out of sync
conditions require all users and interfacing applications to implement transforms,
as best they can, to adjust for the application which did not make the standard
deployment date.
BDS reference data values change over time where older values are replaced or
superseded by newer values. For example the value of AS/400 was replaced by
the value iSeries. When the standard owner implements a replacement all
applications must either (1) map any existing old values to the new
replacement standard or (2) must provide the means to dynamically translate
the old values to the new.

Sunrise Management Process for IBM Standard Data

Page 17

Most IBM BDS reference data become dimensions in Sunrise. The reference data
values are loaded and managed in the warehouse environment. Part of this
process includes the mapping of any replaced by values as well as expiration of
values that are no longer to be used but which were not replaced. At each
months BDS review the Sunrise standards delegate identifies any changes to
values used by Sunrise. For each month where there is a change a requirement
is opened. From this the reference data will be instantiated and go efective
when the BDS standard requires it. Any existing standard values which are
expire or made obsolete have their expiration date set for on day the BDS
documented.

Architecture Frequently Asked Questions/Answer


The following architecture questions and answers have surfaced over time and are
documented in context for reader understanding.
1

Why have data marts in addition to a warehouse?

1. What is the ADR separate from the warehouse?


2. What happens if the business identifies new data elements to collect from a
source?

Page 18

Data Design
The following section the high level design approach for establishing and managing
data throughout the Sunrise data environments.

Sunrise Data Fundamentals


Since there are volumes and volumes of material covering data warehouses for all
kinds of diferent situations, this document will not attempt to explain what data
warehouses are, how to build or operate, or how to spin of business unit specific
data marts. All of that kind of content can easily be found on the internet and in
many books. Instead this section will focus on specific aspects of the Sunrise data
warehouse implementation which provide specific functionality required in
managing marketing data for operational analysis and reporting.

Data Classification
Control Elements
There is a class of fields which are used by data source applications, the Sunrise
warehouse and ETL processes to insure that data is accurately managed. These
elements serve no purpose to any business person other than to prove that data is
being appropriately managed. This section will cover some specific control elements
that are found in the end to end management of data for Sunrise along with the
rationale for each.

Warehouse Control Elements


The Sunrise data warehouse creates and maintains a set of control fields which
are used in managing data throughout the warehouse. A list of the elements
along with the purpose they serve are outlined below.

Page 19

Unique generated key for each record assigned by Sunrise warehouse


processes and stored in each records ETL, stage, or temporary key
column.

1. A Sunrise generated data key based on business data content which is


used to determine what warehouse operations, if any, need to execute
against that staged source record.
2. Record change process indicator (SRC_DELETED) is determined by the
Sunrise warehouse processes inspecting each staged source record for
each data area against previously loaded records in the sources data area
active table. If the sources primary key does not exist in the active
warehouse table, the record is an A addition. If the key exists but the
data key contents are diferent than the record is a U. If a source record
exists in the data area active table and was not received for a refresh data
area operating in net change mode, a D delete is determined.
3. Deletion timestamp assigned at the time the Sunrise warehouse identifies
that a currently active source record was removed from the source
application.
4. Efective and expiration dates are used by the warehouse to manage
dimensions over time. This allows data over time which may have related
to a dimension value at one point in time to realign to a current value for
over time analysis or reporting.
5. Replaced by keys are also used by the warehouse to manage dimensions
over time. This allows analysis and reporting to know what value replaced
an existing value which, itself, could have subsequently be replaced by
yet another value. This allows the realignment to a current value for over
time analysis or reporting.

Source Control Data Elements


The following are some of the control elements that can be used when data is
available based on how the data source manages its data content:
1

Record create date

1. Record last update date


2. Record primary key
3. Record interface or surrogate key
4. Record version identifier
Page 20

Depending on the business process that an application supports as well as the


technology that the application operation with, there will be many more
application source control element. The important item to keep in mind is that
the acquire requirements for the source must clearly delineate control elements
from business data for proper Sunrise warehouse management.

ETL Control Elements


The following are some of the control elements that are used in controlling and
managing content from any source or delivery to a target application:
1

Extract timestamp for each source acquired record content validation


and reconciliation. This is included as part of every DataStage acquired
record, or source data acquired with any tool, which records the exact
time the acquire tool picked up the data from the source.

1. Source record count


2. Source hash total

Not all control elements identified above apply to all data sources or the data that
Sunrise manages in the warehouse.

Business Data Elements


Information that has been identified by business persons falls into the
business data element classification. The data can be reference values (e.g.,
country) as well as any value keyed in by a person for the business (e.g.,
entering a respondents city). The following is a short list of example elements
which may help when determining if some field is a business element versus
a control element.
1

City name

Tactic ID

In market date

Ofer name

Job title
Page 21

Purchase order number

Customer set

Warehouse Source Data Management Types


There are three modes that the Sunrise warehouse uses to manage each sources
data area through warehouse processes. The first is known as a type R which
stands for refresh mode. The second mode type is N for net change. The third
mode is type S which stands for synchronization mode. Each mode along with
how it is processed is described below.

Complete Refresh Mode


When it is critical to understand how a record has changed over time that
data area is normally defined as refresh type. This mode causes the
warehouse to instantiate the complete set of records each time data is
acquired from the source. This mode provides the means to easily know and
measure a complete set of data at any and every point in time. For example,
if marketing needs to measure opportunities this year against the same point
in time last year, last quarter, etc. having the complete set of data at each
point in time is mandatory.

Net Change Mode


As data changes or evolves over time each change must be recorded along
with when it changed. This applies to business as well as reference data and
provides the only way to have a basis to judge the impact of activities over
time. To support this need the warehouse has the capabilities to determine
when the business data content, as defined by the business data
requirements, changes. If, for example, a source updates a field and thus
changes that source records last update timestamp, the record will flow to
the Sunrise warehouse staging table. Once successfully acquired the load
operation will determine, based on that data areas defined business data
content, if the records marketing required content actually changed. In the

Page 22

example where a non-marketing identified element changed but no element


marketing needs changed, the net change process will ignore the record.
To determine when business content changes the load process computes a
business data key based on all the business data elements contained in the
data area. The key generation addresses any content issues such as nulls
versus blanks versus empty strings so that an accurate change identification
can occur. Business data as well as control data elements are identified in
that data areas control table and is a common function used to determine
change in source content or change in integrated marketing measurement
content. Once the business data area key has been generated the load
processes then compare the acquired source record to the same record, by
source key, to set the change indicator. Change indicator values are:
1

A new record will receive a change indicator of A (addition) and is


identified by the load process when the source key for the data area
exists in the stage table but not in the active table.

1. A change to an existing record will receive a change indicator of U


(update) and is identified by the load process when the source key for
the data area exists in the stage table and also exists in the active data
area but the business data key values are diferent.
2. A deleted record can be determined in one of two methods. First is
when the data area acquires the complete source content every time
and the load process finds there is a source key in the active table but
not in the staged table. The second method is when there is another
acquired data area which contains deleted record keys along with the
acquired source deleted record date/time. In either case the load
process inserts a record into the data area load table with a value of
D (delete). Additionally the warehouse preserves source records
when the data source archives or removes older records which are
not required by the business process the application supports. This is
managed by the load process which is aware of this condition and the
archive timeframe defined as part of the data area definitional
information.

Net change will record a record to the history table only when the business
data content changed. This results in much fewer records being recorded to
the data area history table, as compared to a refresh type data source. This
mode though does not provide an easy means to determine and measure all
data at all points in time; although it is possible to do. History from net
change is primarily used for diagnostic purposes to explain why something
reported results at one point in time but not at another.
Page 23

Synchronization Mode
Over time source applications will apply manual updates to records due to
some issue which was discovered with some application functionality without
updating the last update timestamp (if available). In addition time based
issues with GMT versus local time with daylight savings time will sometimes
cause a net change process to miss some records which were changed,
added or deleted. To accommodate all the issues that are otherwise not easily
identified any data area which operates net change then is the ability to
acquire a sync set of data. The load process, at the delete step, will perform
synchronization check between stage content and the active content. It know
the data area is in sync mode because all records have a change indicator of
S. The S will be change to a D when it is determined that a record is in
active but not load and it is within the archive time limits of the data area.
These records will then be used in the normal net change to determine any
missing records (additions), any changes (updates), along with any deletes.

Mode Changes Over Time


As business processes evolve and the information required to support them
change there will be a need for a sources data area to change from complete
refresh to net change or from net change to complete refresh. For Sunrise
this is accomplished by changing that source data area definition record from
one mode to the other. When this kind of change is made the existing data
area record is expired and a new record added with the mode required going
forward. This allows for appropriate analysis techniques to be used on the
historical content as well as with the content from the change point forward.

Critical Element Change Tracking


Both the complete refresh and net change provide the ability to establish
critical element change indicators. These elements are identified in the data
area element control table. The warehouse load operation will append the
change indicators during the same operations which append common
marketing reference data dimensions. Using sales opportunity record as an
example, critical change elements could include:
1

Revenue change (increased or decreased)


Page 24

Client representative changed

Forecast date changed

Products/services involved changed

The above is a very short list for the more than 100 elements marketing
current uses when integrating marketing content with sales opportunities to
understand how marketing activities support sales eforts.

Page 25

Warehouse Data Management Areas


The Sunrise data warehouse compartmentalizes data and functions to optimize both
performance and recoverability. This section covers the various compartments and
the roles each provides to the overall end to end Sunrise warehouse management
process.

Control Area
In order to understand how well an acquire source process is executing as
well as understand historical acquire sessions two primary control tables are
used. The first is used to store the status of each source data area being
acquired indicating success for each data area acquired by updating that
areas value from a zero to a one. This table is also used to store acquire
validation results which correlate content using prior record content as well
as acquired data area content comparing that content to another acquired
data areas content. A diagram depicting run control tables for MAT is below.

The run status table has only one row for each and every executing run. Once
an acquire process completes the run status record along with the data area
status records are inserted into their respective history tables.

Page 26

The second control table stores start and completion timestamps as well as
counts for each data area acquired. The diagram for the current run and the
historical data area results is below.

The run date and run sequence columns are used to join the data area status
details to each run occurrence.

Control Pre-Acquire Valid Run Check


Before a new acquire process executes it must verify that the last run
successfully completed. This is done by the process checking that the current
record in the source run status table exists in the source run status history
table and there are no status columns where the value is a zero. If there is a
zero value present in the run status the process should have failed and never
written any record into the history tables since a recovery run is required. If
this is the first time a data source or data area is acquired this check must
not raise an error or alert.
So long as the prior run successfully completed, the next acquire will clear
the run status table and create a new record for this run. If the prior run failed
the DataStage process, when restarted, must determine which data areas still
need to be acquired and only acquire those. The existing run status record is
used to record this recovery acquire run results. For the data area which
previously failed a new detail record is created for each so that the number of
attempts and time involved is easily seen since the same data area name will
exist multiple times for a single run data acquisition.
If and when a prior MAT acquire was cancelled, identified when there is no
record in the run status table and the last history record contains status
values of zero, the acquire process is to start as if the last run was successful.
When this occurs and if the prior detail area history records are used in
validation, the last successful runs of the same mode detail area history
records must be used.

Page 27

Recording Error and Warning Detail


Sunrise requires data from many sources at difering points in time for
difering marketing operations analysis and reporting. As with all systems
issues will occur during day to day operations. In Sunrise all issues will be
recorded into a DB2 table (depicted below). Any entries recorded in this table
for a process will be emailed to Sunrise operations for information and
resolution as required.

Depending on the severity or the number of identified issues the workflow


execution may immediately send an alert to Sunrise operations to deal with
the issue or may wait until either (1) more issues occur or (2) the completion
of the process execution.

Staging Area
When source data is acquired to support Sunrise business data requirements
each source data area, e.g., a DB2 table from a source, will be inserted into a
stage table. The acquired process will deliver all data from a source to that
sources data area staging table. The following information provides context
and rules for acquired data into the warehouse.
1

Each stage table, no matter the source, will have a column with the source
system timestamp when the acquire process started.
1. The source query may include one or more tables/documents but the
target will always be a single data area warehouse stage table.
2. Warehouse data management processes will not start until after all
data areas have been successfully acquired from the source.

Page 28

3. When acquire process starts it will record an entry into the data area
into the data area detail control table along with the start timestamp
from the Sunrise warehouse.
4. Once a data area has been successfully be acquired the completion
timestamp from the Sunrise warehouse will be set in the data area
detail control area.

Load Area
Once all the source data areas have been successfully acquired the
warehouse data management processes begin. The first of these processes
involves loading the data from the acquired source. This load process can
involve any of the following capabilities.
1

Appending of marketing or business data standards reference elements.


1. If this data area is a net change area, the load operation will determine
if each record is a new record (A addition), an change to an existing
record (U update), or a deleted record (D delete). There are two
methods available for determine data area deletions depending on the
data area content acquired from the source and how the source
manages data over time. If, for example, the source removes older
transaction records which are no longer needed for those applications
business processes, the Sunrise warehouse must not remove those
records deleted due to age. The only removals must be due to source
application data issues. These are covered in the subsequent Net
Change section.
2. If the data area is always a complete refresh data area, which means
every time the data area is acquired the complete snapshot of that
data is managed to the warehouse. The EIW data source detail table is
an example of a complete refresh data area.
All data areas for a source can be load processes in parallel through the
remaining warehouse processes.

Active Area
Once a sources data area has completed the load process, that data areas
content will be immediately applied to that data areas active table.
Page 29

Depending on the data area type one of the following two operations will
occur. For net change data areas the following processes execute.
1

For record deletions the active area process will insert every deleted
record from the active table to the delete table including the deletion
timestamp. Once the deleted records are in the deletion table they are
deleted from the active table.

1. For record updates the current active area record will be inserted into
that areas history table. Once the updated records are in the history
table they are deleted from the active table since the updated record
in the load table will be inserted in its place.
2. For record additions the added record from the load table will be
inserted into the active data area table.
For data areas which are operate as complete refresh data areas the
following process occurs. These steps preserve the record content at each
point in time to easily support in depth over time analysis.
1

The data areas active table record set is cleared.

1. The complete set of data in that data areas load table is inserted into
that data areas active table.
2. The complete set of data in that data areas active table is inserted into
that data areas history table.
Type R data areas will always have the complete set of records in the
history database area.

Historical Area
As described in the process outline above in Active area section, the historical
area will contain older records. The older record could have been identified
by the warehouse net change load process or will be every record when the
data area is a complete refresh area.

Delete Area
As described in the process outline above in Active area section, the delete
area will contain records which were physically removed from the source due
Page 30

to an application or use error. Records that are archived or aged out in the
source will be maintained in the active warehouse area.

Warehouse Data Area Name Prefix


In order to easily understand what tables support which sources, build processes,
and delivery content the warehouse uses a combination prefix structure. The first
prefix for all data areas is:
1 SRC Identifies the data area is sourced from an external application
1. BLD Identifies when a data area is created within the warehouse using
source along with appropriate build content to derive data required for
reporting and/or analytics.
2. TGT Identifies when a data area contains data which will be delivered to one
or more downstream applications which require Sunrise data.
The second prefix, following the one described above, identifies the
1

Source Designation- Each source is assigned a three character acronym


which is used by DataStage and WDM routines to acquire data and managed
acquired content in the warehouse.
1. Build Designation Each build process is assigned a three character acronym
which is used by execution control processes to integrate content for
subsequent build, analyst or reporting.
2. Target Designation Each target application. Like each source application, is
assigned a three character code.
Examples of prefix usage include:
1

SRC_MAT: Source data area from the MAT application


BLD_MKO: Build process for marketing opportunity

Warehouse Data Area Flow


Using the warehouse areas described above the diagram below depicts how
acquired data staged into the warehouse flows from area to area each and every
time that sources data is acquired.

Page 31

The diagram above covers data area that operate in net change mode as well as
those which are refresh types. One rule to remember is that a data area operating in
complete refresh mode will never have any records in the delete area since all that
data areas records are point in time and stored in history.

Dimension
Although dimensions are sourced from business data standard repositories or
marketing specific standard repositories there is additional work performed on them
by the warehouse. Since marketing requires two views of data available at all times,
with a near real time presentation to the user requirement, dimensions have an
additional element appended by the warehouse. This element, known as the current
dimension key for each dimension value, is the second column of each dimension
and maintained by warehouse processes.
To illustrate this elements use lets take the example where the System/34 brand
was replaced by the System/36 brand which was replaced by the AS/400 brand
which was later replaced by the iSeries brand. While each dimension record
maintains linkage to the value which replaced or superseded it, providing marketing
Page 32

a current view of all System/34 through iSeries brands would require having a
normal dimension join to itself four times to get all records aligned to iSeries. By the
warehouse maintaining the current key value there is only one join and all the
historical records are rolled up into the current brands view; iSeries in this example.
Below is a screen capture that depicts the EDGE brand hierarchy structure.

Sales opportunities can roll up from the product category level of the brand
hierarchy, level four, all the way up to the first level of that hierarchy (e.g., SWG,
IGS) allowing reporting to easily sum up value to a higher level in the brand
organization. Also shown in the dimension above are additional elements for
grouping dimension values. This is often referred to as creating an alternate
hierarchy. Marketing uses two of these grouping elements in the EDGE brand to
regroup content based on how marketing needs to either analyze and/or report
brand.

Page 33

DB2 Data Implementation


All Sunrise data is staged and managed In the Sunrise warehouse. Depending on
the ADR or data mart requirements those environments may also be support by
DB2 or by Netezza. The following definition standards will be used for all DB2 based
Sunrise data environments.
Tablespace Definitions
DB2 use to have a limit on the number of records it will place on a page.
Page sizes can be defined from as small as 4K (4096 bytes) up to 32K. If the
table being stored has one a few rows and a few narrow columns using a 32K
page use to waste upwards of 80% of memory allocated to the page. To
optimize the use of page memory all 32k tablespaces are defined as large
removing the record per page limit. Sunrise standardized on defining both a
4K tablespace and a 32K tablespace in which data will be stored aligning
them to the source, build, or target delivery process the tablespace will
support. In addition to data tablespaces Sunrise has standardized on the
creation of an index tablespace for each source or build data. This will allow
flexibility when tuning the database for performance by allowing indexes to
be placed on a separate file system so index access will not be in
competition with data being accessed from tables. Examples of tablespace
names are below:
1

SD132KEIW Source Data 1st Tablespace 32k Page Size for EIW
SI132KMPW Source Index 1st Tablespace 32k Page Size for MPW
BD1132KMKO Build Data 1st Tablespace 32k Page Size for Mktg
Opportunity

In addition to source, build, and target delivery specific tablespaces there is a


32k data and index established for any and all persisted temporary tables.
Temporary tables are separated out since their content is rebuild each run
and therefore they do not need to be part of the source, build, or target
delivery backup (see section in SDC section on Tablespace Backup).

Table Compression

Page 34

Starting with DB2 version eight column and table level compression has been
available. This facility allows for the storage of more data in a smaller space.
Performance with compressed tables tends to be as good as uncompressed
and in some cases it is better due to more data is brought in for processing
with every I/O from the storage subsystem. Sunrise has standardized table
definitions to, by default, be compressed.

Index Creation
Indexes can provide high performance paths to access data in a DB2
database. The downside with indexes, depending on how they are defined,
includes:
1

Excessive storage use when too many elements are included

Never selected for use to access data when too few values exist for the
index to use

To that end the Sunrise index standard requires the following:


1

A unique index must exist within or outside every defined table


to insure record uniqueness

1. Indexes that only contain a single column must be used to join to a


huge (table with more than one million rows).
2. Composite indexes can be set up to support build process specific
access to data to optimize the access path for that process. When an
index that contains most of the required content already exists that
index must be altered to add the missing column versus creation of a
new index which is 90% duplicative of another index.
3. Columns which do not have many unique values but provide improved
access performance must be included as elements on a unique index
for the table For example a column which contains the values of Y
and N will never be used as an index but when included on a unique
index the DB2 optimizer will use the index versus reading every row in
the table.

Tablespace Definition Standards

Page 35

In Sunrise every source will have its own set of tablespaces. This will allow
the source content, once acquired and processed through the warehouse, to
be backed up by the SDC while still supporting select access to that sources
warehouse tables. In addition to source specific tablespaces there are build
specific spaces. These are defined to support data mart or analyst required
content that will be copied from the warehouse to the appropriate target
mart.
The following naming and size standards are applied to all Sunrise data
tablespaces:

The first letter of the tablespace indicates if the tablespace supports a


S source or a B build process

The second letter of the tablespace name indicates if the tablespace is


used for D data or I indexes

The third letter of the tablespace name identifies the tablespace


sequence. Multiple sequences may be required for extremely large
data sources.

The fourth through sixth characters of the tablespace name identify


the size of the pages defined for the tablespace

The remaining characters of the tablespace name identify the source


or the build process the tablespace supports.

The maximum length of a tablespace name is 18 characters.

Bufferpool Definition Standards


The Sunrise p770 infrastructure was sized with a large amount of real
memory (1.5 terabytes) which allows for a great deal of in memory DB2 work.
This provides a high performance environment to satisfy most report and
even analyst requests. The initial Sunrise environment was only configured
with two primary buferpools for Sunrise date. These are:

A 32k page size bufer pool and

A 4k page size bufer pool.

Page 36

Over time, as performance tuning needs dictate, additional buferpools will be


established for a source, set of sources, build process, or even individual
tablespace to insure the shortest process execution and query run times for
all Sunrise data.

Reorganizing Table and Index


As data changes over time with the additions of new records, updates to and
deletions of existing records, DB2 tables and indexes become fragmented.
Over time this fragmentation causes performance to degrade. DB2 provides a
utility to reorganize the data to recover the fragments and align data for
optimal performance. For Sunrise, unless otherwise required by a specific
source or build management requirement, the following reorganization will be
performed:
1

On a weekly basis an index reorganization will be run on all


source and build tables. This will occur every Tuesday beginning
at midnight US eastern time for all indexes. Part of the workflow
management will insure that if there is a source acquire or build
process running that source or build tables indexes will be
deferred until successful completion of the acquired or build
process. Since most index reorganizations run in a matter of a
few moments there is little impact to production schedules.

1. On a monthly basis, on the first Tuesday of every month at midnight US


eastern time, all tables will have the reorganization utility run. As with
the index reorganization the workflow management process will insure
if there is a source acquire or build process running that source or build
tables indexes will be deferred until successful completion of the
acquired or build process.

Page 37

Workflow and Control


The following section covers how the Sunrise process will manage and control data
from acquisition, through build and ultimately to delivery.

Source to Delivery High Level Flow


The following diagram provides an overview of data being acquired, processed, and
made available for dashboard and reporting use. In this document area will cover
various aspects of the processes involved in this end to end flow.

Process Area Review

Page 38

Although this document previously mentioned the Sunrise processing areas it is


important to have that definition in mind when reviewing this section. As of this
document version, the following are the defined process areas in Sunrise:

Source acquire (SRC)

1. Build (BLD)
2. Delivery (TGT)
3. Process Execution Control (PEC)
When reviewing any data within Sunrise the process area which is primarily
responsible for the data content can be identified based on the data areas table
name (e.g., content from EIW will have a process prefix of SRC).

Metadata
One of the critical requirements for Sunrise was to be able to adapt as business
needs change in minutes where possible and always within two weeks to any
business requirement. In order to achieve this requirement standard processes and
objects were created which utilize metadata to define what is required in
operationally manage Sunrise data. Each process area has a corresponding set of
metadata that execution will use to manage data. The following identifies the
metadata for each of the process areas.

SRSRC Sunrise Source Acquire Metadata

SRBLD Sunrise Build Metadata

SRTGT Sunrise Target/Delivery Metadata

SRPEC Sunrise Execution Control

The metadata change process involved updates to the master set of metadata
which, when applied to the Sunrise environment, versions the existing metadata to
a history table. This enable recovery, in the event of any issues, as well as tracks
updates over time based on the changing business needs over time. Below is an
ERD depicting the source metadata tables.

Page 39

The diagram shows element involved in defining each source, the data areas from
the source, alert contacts to notify related to issue at the source or data area levels,
the elements in each data area along with partitioning information when data is
acquired and recording errors/warnings when issues arise.

Acquired Source Workflow & Control


Source acquisition, if initiated by a Sunrise process, starts based on a time or event
based trigger. Sources which deliver data to Sunrise are also identified in the
metadata even though Sunrise does nothing to acquire the data. This provides a
single place to understand source content. For data that is acquired by DataStage
the process to get the data begins based on when the metadata identified. So, for
example, EIW data is acquired weekly at 2am US eastern. At that time DataStage
reads source metadata from the Sunrise warehouse, attempts to connect to the
source, and provided all is well starts the data acquire after recording start times by
data area. Once each data area acquire completes DataStage records the
Page 40

timestamp of that completion and validates the content and/or record count based
on defined rules. If there are no issues the source acquire is set to a Success
status. Below is a more detailed flow DataStage follows in acquiring data.

Acquired Source Flow through the Warehouse


Once source data is acquired, either by DataStage or some other acquire process,
that sources status record is set to Success. With that warehouse objects are
invoked to manage the newly acquired content into the warehouse. The following
are the primary objects used in managing data into the post STG database areas
of the warehouse:
4. Data Area Copy (DAC) Copies data from one database area to another.
Page 41

1. Business Data Key (BDK) Sets the key(s) used to determine if business data
content changed for any record.
2. Active History Delete (AHD) Shifts data between load, active, delete and
history database areas depending on how that data is defined to be
managed.
3. Reference Data Key (RDK) Resolves data elements to Sunrise business and
reference data dimensions.
4. Active History Reference (AHR) Manages reference data dimensions
including versioning based on metadata definitions.
5. Rollback (RBK) Used to reset warehouse data to some previous point in
time due to a Sunrise error our bad content being acquired from or delivered
by a source.
The diagram below shows how staged data moves through the warehouse with the
end result in the active, history and delete database areas.

Page 42

Source Control Content


Earlier in this document was a view of source metadata tables. Records in those
tables are used by acquire and warehouse processes to determine how to manage
data. So, for example, AHD inserts all the EIW detail weekly fact records to both the
active and the history table versus the same AHD inserting only add/update records
for MATs contact response after it moved the active version of the update records
to the history table.

Page 43

Build Workflow & Control


As with source definitions build definitions also utilize metadata. As part of the
Sunrise evolution, over time the older, legacy based build processes are enhanced
into build and process execution control metadata.

Build Workflow & Control


Build processes can be triggered based on time, an event, or manually. Most of the
older processes where manually initiated and, as more Sunrise stories and
functionality is enabled, are transitioned to event or time based builds. A build
process, at its core, is a series of commands which are executed in some order
(parallelized if possible) to manage some data for end consumption. Each one tends
to be specific to the end goal of the build it is designed for. For example, the build
for the Sunrise MSM dashboard calculates metrics required for that marketing role
but not for the demand programs role.
Each Sunrise user story may require a change to an existing build or have an
entirely new build created to achieve that stories requirements. In all cases the
build is dependent on current source data content being available and, in many
cases, current data available from another build process. For example the
GMMP/MARS fact generation build process requires that the EIW build process has
completed. That build required that the EIW acquire process successfully completed
acquisition of 11 EIW data areas. The chart below provides a high level view of the
weekly build process.

Page 44

The design of each build has its own workflow to optimize the time required to
process the build. In many cases the build workflow is parallelized to help reduce
the time to have data available. Sunrise, as a general rule, has a noon US eastern
data available in the dashboard requirement for the weekly management system.
Over time the build workflows will integrate within process execution control to
facilitate better management across all Sunrise processes.

Build Flow through the Warehouse


In an attempt to provide a useful scenario covering how build content flows through
the warehouse and on for consumption the marketing activity build is depicted as
an example. Marketing activity requires data from multiple sources (e.g., MAT, EIW,
CRM, etc.) in order to create an integrated picture of marketing activity and its
efect on IBMs sales pipeline. A high level view of the build process is shown below:

Page 45

The build begins after the EIW, MAT, and CRM data has been acquired, processed
into the warehouse, and had each of their respective builds completed. The build
then:

Integrates the MAT and CRM response as well as contact addressed data so
that no duplication of response or contact address exists.

1. Aligns the response tactics with master set of tactics generated by the tactic
simplification integration build including identifying when auto-deploy
tactics are required for responses from countries which were not planned as
part of the tactic.
2. Aligns the response contact with opportunity contacts using creation and
deletion dates to determine the type of marketing influence for an
opportunity.

Page 46

3. Based on the type of source which created an opportunity record, along with
associated marketing tactics, create a response if one does not exist (e.g.,
LDR created opportunity but no response exists for the tactic the LDR
associated with the opportunity).

Page 47

Target Delivery Workflow & Control


This section describes how Sunrise warehouse content is provided to the required
consumption point. That consumption point may be a Sunrise data mart, the ADR,
or could be a downstream system.

Delivery Flow within Sunrise Infrastructure


When the necessary warehouse processes have completed work and the results of
that work are required in another Sunrise database, the content is copied across to
the target environment via DB2 federation. The following diagram shows the high
level steps used at the target environment.

Once the data has applied to the target environment the completion is recorded.

Delivery Flow outside the Sunrise Infrastructure

Page 48

When Sunrise delivers data to a downstream application the standard delivery


mechanism is DataStage provided the end application uses some version of DB2
(e.g., DB2 on pSeries, iSeries, etc.). Once data is in the form to deliver to the
downstream application DataStage is invoked and, using the target delivery
metadata, determines what data areas to delivery and whether that delivery is a
net change, refresh or sync delivery. This process is very similar to the process used
by DataStage when acquiring data for Sunrise.
To help with understanding the process for data delivery the high level flow for MDb
delivery is included below.

The content required for MDb users includes output from two build processes. Once
those processes have completed the DataStage delivery process is invoked.

Delivery Control Content


Using metadata which, like the source acquire side, identifies what data areas and
content are to be delivered, DataStage connects and inserts the appropriate records
into each of the three MDb environments. As with source acquire the processes
record the number of records and time required to deliver content. This is used to
understand any delivery timeframes which exceed business requirements as well as
have validated delivery that all content which was to be delivered, was delivered.

Page 49

Page 50

Sunrise Audit Areas


This section of the document covers how Sunrise processes record tasks they
perform, how long the tasks took, and the final results from the task include
beginning and ending records counts where appropriate.

Acquire, Build and Delivery Audit


Each area in Sunrise along with each database procedure records activity into status
and transaction log tables. Earlier in this document there was a discussion on how
DataStage records what it has acquired and validated using acquire status tables
and how, once the acquire is completed, that information is moved to the
corresponding history table.
To accentuate the point the acquire data area status tables are shown below. The
status table will record each data area which is to be acquired, the status of that
area, the time the acquire started and the time it completed. Additionally the
process will determine the last successful acquires record count for the data area as
well as record this acquires record count for determining deviations from norms.

After an acquire completes all records are shifted to the history table shown on the
right for over time operations and data source analysis.

Warehouse Procedure Audit


Every warehouse procedure has a corresponding log table where it records the
activity it has performed. The information contained in the table can be seen in the
table layout shown below.
Page 51

This particular process status table records activity performed by the DAC
procedure. Since a procedure can be called in parallel by multiple processes each
invocation records the calling process, its processing sequence, the from and two
data areas along with the beginning and ending timestamps as well as record
counts. Additionally any errors or warnings that occurred will set the status and
populate the error description field supporting operations diagnostics.

Page 52

Sunrise Infrastructure Support and Setup -- Database


DB2 Database Environment
This section covers the hardware and software environments which the Sunrise
project is deployed with. In addition there are references to the Netezza warehouse
appliance which will be used in conjunction with the Sunrise data warehouse to
support high speed query access for reporting and analysis.

System Environment
Hardware

p770 machine with 84 processors and 1.5 terabytes of physical memory of


which the marketing reporting database infrastructure is allocated 36 virtual
processors and 192GB of physical memory. Sunrise can consume 9 real
processors and 192GB of real memory.

DS8300 storage with 25 terabytes of disk available for marketing data.

Software

AIX v6.1

DB2 v10.1

Storage
In order to enable increased utilization of CPU and memory, given the
software environment Sunrise operates within, the original storage approach
was reworked in 2013 to allow for multiple concurrent parallel I/O operating
Page 53

on multiple adapters with multiple bufers for multiple file systems which
contain Sunrise databases. The diagram below shows the end result of the
storage alteration made which enabled Sunrise to meet its by noon business
requirement.

Database Instances and Database Name


Sunrise Warehouse (Inst5 - SRWHSE)
1. Sunrise Reporting DataMart (Inst6 - SRDM)
2. Sunrise Analyst Data Repository (inst5 - MARSRAW)
3. GMMP/MARS Reporting DataMart (Inst1 - UPDB)
4. GMMP/MARS Load Data Repository (Inst3 - LPDB) Sunset April 2015
5. GMMP/MARS Extract Transform Load (Inst2 - PETL)

Security Enablement
Data Access Groups
The following table defines the groups established for controlling access to
each of the database environments as well as content within each database
environment.
Sunrise Warehouse

Page 54

The user ID which operations uses to manage the warehouse content


will have select, update, delete, and other privileges necessary to
operate data processes in support of Sunrise.
Any interface user IDs such as the DataStage ID used to acquire data
for Sunrise (e.g., srds) will have connect privilege to the warehouse
along with select privilege on all of its source tables in the warehouse.
It will also have insert, update and delete on source staging tables. The
ID will also have insert and update authority on source and process
control tables.
For each source application there is a group created for subject matter
experts (SMEs) from that application to validate acquired content. For
each source at least one SME is identified and that individuals Sunrise
user ID will be added to that sources group. The group allows connect
to the warehouse database and select access to the source tables.

Sunrise Reporting DataMart


The user ID which operations uses to manage warehouse and
subsequent built measurement along with aggregate or derived
content from the warehouse to the data mart will have select, update,
delete, and other privileges necessary to operate data mart processes
in support of Sunrise.
The Cognos reporting user IDs (e.g., baccprod) will have connect
privilege to the data mart along with select privilege on all of its
dimension, fact/metric, and summary aggregate tables.
The Analyst group will enable level two analysts to connect privilege to
the data mart along with select privilege on all of its dimension,
fact/metric, and summary aggregate tables in order to support
research requests on report content.

Sunrise Analyst Data Repository


The user ID which operations uses to manage warehouse and
subsequent built measurement along with aggregate or derived
content from the warehouse to the data mart will have select, update,
delete, and other privileges necessary to operate analyst data
repository processes in support of Sunrise.
Page 55

The Cognos business intelligence user ID (e.g., baccbi) will have


connect privilege to the ADR along with select privilege on all of its
dimension, fact/metric, summary aggregate tables and raw table
content.
The Analyst group will enable level two analysts to connect privilege to
the analyst data repository along with select privilege on all of its raw
source content, dimension, fact/metric, and summary aggregate tables
in order to support research requests as well as determining insight on
marketing performance.

Existing Analyst/Data Quality Groups


analyst Marketing Operations Level 2 Analyst
1. srdq Data Quality Operations

Existing Source Groups


srccrm CRM Siebel Source Group
1. srceiw Sales Operations/EIW/EDGE Source Group
2. srcevt Event Source Group
3. srciwm IBM Web Membership Source Group
4. srcmat Marketing Automation Tool Source Group
5. srcmdb MI Marketing Database Source Group
6. srcmii Marketing Inbound Interface Source Group
7. srcmp Marketing Planning Source Group
8. srcnis NetInsight Source Group
9. srcpdb Presentation Database Source Group
10.srcscn Sales Connect Source Group

Existing Operational Groups and Operational User IDs

Page 56

srops Sunrise Operational Group containing any user ID which can execution
production processes.
1. srwhbld Sunrise warehouse operational user ID
2. srdmbld Sunrise data mart operational user ID

Forced Group/User Uncommitted Read


One of the more pervasive problems in a DB2 database implementation is
created when users lock tables in DB2 when they are running queries. DB2,
after a short period of time, will start killing processes to eliminate the locks
so held up queries can execute. Since the Sunrise environment is managed
by integrated workflow processes all connections to the database, other than
interface and operations user connections, are forced to be read only. This
insures that the queries will not create a lock on one or more tables which
could impact a production data process.

Database Content Flow and Access


The following diagram provides a high level view of the databases involved in
Sunrise supporting the management system reporting and analyst data
requirements.

Page 57

Business Data Dictionary


Given the multitude of applications from which Sunrise sources content IT and
business analysts require an understanding of what content comes from what
sources and the purpose the content serves in supporting the business processes in
the source application. In addition build processes transform source content for
integration and the creation of new elements necessary for marketing analysis and
reporting. In both cases there are dictionary constructs within the warehouse to
record the following information for each data area:
Business name
Business definition
Business purpose
Business rules
Usage classification
Additional information

Page 58

In addition to the source and IT maintained data area information content the
source change tracking process also updates the dictionary records every time
there is a change to a data area as well as when the change was captured. For each
data area element the following is to be entered:
Business data name
Business definition
Business purpose
Content example(s)
Business rules
Associated content standard
Additional information
Below is an example of dictionary content is depicted below:
DTL_SSM_STEP_NO:
Business name: Sales cycle code
Definition: Contains a text value to identify various measurement
points for an opportunities lifecycle.
Purpose: Used to understand where an opportunity is, in the sales
lifecycle, as well as what stages it previously passed through along
with how much time it spent in each stage.
Example(s): 3, 6
Standard: BDS

When a standard element exists in support of Sunrise processes a single dictionary


master entry can be created for use across data areas. For example, the extract
timestamp element used in the acquire process (EXTR_TS), is defined only once
even though the same element exists for every source acquired data area.

SDC Integration
For performance as well as total recoverability it is critical that the DB2 database
environment must be managed in concert with Service Delivery Center (SDC)
operations.
Page 59

Establish Mechanism for Invoking Tablespace Backup After Successful


Source Acquire
Once all of a data sources content has been successfully acquired the
tablespace which contains that sources data is to be immediately backed up
via an online tablespace backup. The workflow process calls the necessary
DB2 function to have the tablespace backed up. While the backup is in
progress the data in the tablespace will remain accessible for reconciliations
as well as subsequent build processes. After the backup the exported
tablespace is compressed.

Eliminate Logging on Build Tablespace Tables


The build processes operate by select data from acquired source tables and
create, update or delete rows. Since this data is completely recoverable by
simply re-running any afected build process there is no need for forward
recovery of content. To that end, plus with the source tablespace backup
covered earlier in this section, the Sunrise databases operate with circular
logging. As part of individual build steps logging can be turned completely of
to eliminate that performance overhead when the data is 100% recoverable
by re-running that step.

Netezza Environment
Although DB2 is a highly scalable environment performance does become an
issue when the table has a huge number of records. IBM acquired a
company with capabilities in high scale high volume data management
capabilities. This company, Netezza, has a hybrid hardware and software
implementation. The BACC, as the intelligence and analysis center of
competency, has installed a Netezza to support analytics and reporting.
Sunrise will utilize this environment, in concert with the DB2 environment,
when size and scale dictate its use.

Page 60

Sunrise Infrastructure Support and Setup InfoSphere


DataStage
This section covers the BACC DataStage environment which is the IBM worldwide
implementation for extracting, transforming, and loading data.

System Environment
Hardware

zSeries model 10 with


a. Services/metadata with eight processors and 16GB memory
b. DataStage engine with 18 processors and 36GB memory
c. Local DB2 database environment with eight processors and eight
GB memory

Software

SUSE Linux Enterprise Server

IBM HTTP Server

IBM WebSphere Application Server

InfoSphere Server v9.1

MQ Client

DB2 v9.7

Page 61

The following diagram provides an overall architectural view for BACC DataStage
environments.

The following link will navigate to the BACC home wiki page. The following screen
capture depicts the current DataStage environment.

Page 62

Page 63

Sunrise Infrastructure Support and Setup Sunrise


Reporting
This section provides an overview of the BACC Cognos reporting environment which
supports the MARS management reports along with the Sunrise dashboard.

Overview

System Environment
Hardware

IBM zSeries z10

Software

zLinux

Page 64

Cognos v10.2.1.5

Page 65

Sunrise Infrastructure Support and Setup Statistics


Environment
Statistical models are required for marketing to understand how well campaign
investments in the recent past and short term future should realize future pipeline
opportunity content for sales. Before Sunrise prediction was done based on
historical reports and a best guess at what results should occur. After IBM acquired
the SPSS company marketing was handed a powerful statistical tool suite capable of
determining, based on varying data elements, how campaign investments will
create pipeline. Initially the marketing operations team started using the SPSS
statistics client tool operating on analysts desktop machines. As the amount of data
increased and the need for direct integration with management reports grew,
Sunrise integrated with the BACC data services environment to provide SPSS server
based model execution.

SPSS Integration with Sunrise


As with the BACC Cognos and DataStage environments Sunrise is positioned to
utilize the BACC SPSS server environment to run statistical models for campaign
contribution based on both historical content and analyst provided variables. As of
the time of this writing the marketing operations analysts were using the BACC SPSS
statistics ofering for analysis.

System Environment
Hardware

IBM zSeries z10

Software

Page 66

SPSS v19 fixpak 2

zLinux

Page 67

Sunrise Infrastructure Support and Setup Marketing


Targets
For the last six years marketing has established targets which marketing campaign
investment should obtain in support of IBM sales business objectives. Originally the
targets were created in an Excel environment which was loaded to a Hyperion cube
in RTMApex. After that process the marketing business went to only using Excel with
hundreds of manual efort hours required to consolidate and manage. After IBM
acquired Cognos, which recently acquired the TM1 application, marketing chose to
set up a TM1 environment where marketing targets would be interactively
managed.

Target Setting Process


The process to create targets for core marketing metrics requires a great deal of
data and analysis as input to the target collection, review, and approval process.
Below is a high level flow that depicts process steps which directly afect the target
input files provided to TM1 for their management in TM1.

Page 68

Each of the high level process steps above is defined and maintained in the annual marketing targets
implementation guide.

Marketing Target Data Flow


The following diagram depicts the interrelationship between the core cubes and dimensions in capturing
and managing targets. As one can see the dimensions are critical to all data since the same dimensions, or
at least a level in a hierarchy, must exist across all data cubes. The dark blue arrows signify TM1 data
load processes to initialize the targets setting process.

Page 69

The red line at the bottom signifies two data relationships. First is that the initial IMT sector targets are
calculated by the IMT sector spread and second is that sub-brand and sector targets for the same
marketing program in the same IMT must have the same value.

Week by Week Skew Ratios


Targets are set for a calendar year based on marketing contribution to the sales
pipeline. Having just an end goal does not provide the means to understand if
marketing is on track to delivery that goal or not. Given the variations which occur
throughout a year with marketing contributed opportunities, a set of skews or
factors are created. The analysts setting the marketing targets, using historical
norms as well as their knowledge of product/oferings to occur during the planning
year, will review, adjust and approve week by week skews. The GMMP/MARS reports
will use the skew values to calculate the amount of marketing contribution by
brand, IMT, etc. that needs to be realized in order to meet the annual marketing
target.
Page 70

TM1 Example Cube


The screen capture below provides the reviewer some idea of the content that is
interactively managed in TM1 for establishing the annual marketing targets.

Once the marketing target content is set in TM1 it is exported and loaded into the
Sunrise warehouse for use in the management measurement system.

Page 71

Sunrise Operational Management


Realizing the number of data sources, data areas, build processes, and consumption
points there is a great deal of operational control that is required. To facilitate this
Sunrise has been enabled with exception and validation processes which inform
operations when something is not correct. This is a shift from the legacy approach
where each and every process wrote to log files which production operations
meticulously reviewed to determine if some error was logged somewhere in the log.

Operational Alerts for Source Issues


Whenever there is a failure in acquiring source content an alert needs to be
emailed to operations. The email includes the data area which failed to
acquire or, if DataStage was unable to connect to the source, the source
acronym with the appropriate error message. As time is of the essence with
Sunrise operations will immediate address any issues that the operational
processes identify.
Once Sunrise operations have worked with appropriate groups to resolve the
issue(s) the failed source data areas must be acquired. This should be
achieved by operations starting that source acquires job and only data area
not acquired or failed will be extracted. Any successfully completed will be
skipped in a restart/recovery execution.

Operational Alerts for Sunrise Processing Issues


When Sunrise based build processes are running, as with data sources, any
issues which occur will cause an email alert to go to the production
operations group. There the team will take the information from the email to
query the Sunrise audit areas involved with the issue.
If Sunrise operations can resolve the issue, e.g. a database log filled up, they
will restart the afected process. If not, once they have worked with
appropriate groups to resolve the issue(s) and then restart the failed process.

Page 72

Sunrise Restart/Recovery Operational Process


Any Sunrise process must be enabled to automatically start based on
successful completion of dependent process(s), start based on time and day,
and be manually started when performing restart/recovery. In all cases the
command(s) to execute must be the same. In a recovery or restart the
process will skip all steps which were recorded as successful (completion
indicator of 1) and only execute those which were not successful
(completion indicator of 0) or not completed due to a prior dependent task
failure (blank/null).
If the failure, for instance, was with a source acquire and it is determined that
a failing source data acquisition will have limited impact on Sunrise build
operations the operations team can manually set a failing data area or
validation value to be a 1 because of the in-writing confirmation received
from a data source subject matter expert. By doing this manual update and
then restarting the DataStage acquire process will see the failing area as not
failing and continue. Operations will log the issue to be sure the root cause is
addressed before the next source acquire is run.

Entire Run Failure Operational Process


In the event that a run was started and the due to the nature of the failure no
restart/recovery is possible, the process must allow operations personnel to
move the current run status and run detail data area to history even though
the process never completed. This will leave no records in the run status or
run detail data area status tables. The next time the acquire source process
executes it will recognize that the last run was a failure and create a new run
status record.

Integrated SDC Backup Process


Once Sunrise has completed acquire and validate processes for a weekly data
source a tablespace backup is initiated for that source. In cases where the
data source operates on a diferent frequency, frequencies of less than a
week are backed up weekly and those with more than a week are backed up
Page 73

once that source acquire completes. The SDC will maintain four backups
versions in case a restore is required.

Performance and Monitoring Tools


In a system as large and integrated as Sunrise performance issues are a
constant issue. To that end the Sunrise environment has enabled the
operations team with the following tools:
DB2TOP

NMON

The first tool is used to track what is executing in any Sunrise database at
any time. It provides not just what is executing but how much database
resource is being consumed and how efficiently the database process is
executing. With this tool operations can easily tell if some process seems to
be running or not. If the process is executing, but not performing reads and/or
writes at tens of thousands to a million plus records per second, ops will
watch that process. If it continues to run without seemingly executing, ops
will contact development for a review.
The second tool provides the ability to monitor the AIX environment that the
database is executing within. This allows operations to see how much system
resource each Sunrise database is consuming including CPU, memory and I/O.
This is where operations has found issues with the I/O subsystem performing
well given the Sunrise database load at that time.

Page 74

Sunrise Development Environment and Management


For each data source an initial acquired start time will be designated. At that time
the process will attempt to connect and confirm that the data available is as
required. So, for example, the AP EIW acquire will have its first attempt

Infrastructure Environment
Systems
The Sunrise development environment is supported on a pSeries platform in
the Lexington DST environment. The pSeries configuration for development
is:
Thirty two processors

132GB memory

Two vSCSI adapters for storage

The development environment, each quarter, loads a DB2 production backup


to enable development on and performance testing with a full set of data.

Software
The development environment maintains the same software levels as the
AHE production environment although it may have additional environments
to the next level of software for testing.
AIX v6.1

DB2 v10.1

DB2 v10.5

Page 75

CMVC for version control

Databases
At the time of this update the following databases are defined on
development:
SRWHSE

SRDM

LPDB

UPDB

MARSRaw/ADR

SRWHSE10

Security and Access


As this is a DST managed environment all requests for users and groups via
CQ tickets initiated by either the project manager or the application architect.

Page 76

Anda mungkin juga menyukai