Anda di halaman 1dari 99

Clariteq

ADM
extract

Extract from Clariteq’s workshop:

Advanced Data Modeling -


Communication, Consistency, and Complexity

Alec Sharp
Senior Consultant
bute!
i
Clariteq Systems Consulting Ltd.
t distr
West Vancouver, BC, Canada o
don
Mobile – 604 418-3352 s e
asharp@clariteq.com plea

www.clariteq.com terial
r y ma
pr ieta Alec
Pro nks,
a
Th

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 1 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract “Thanks!” from Alec for participating!
 Me: asharp@clariteq.com
 My company: www.clariteq.com
 My book: Workflow Modeling, Second Edition
(A complete rewrite of the first edition, not just a minor
refresh)

 Microblog: www.twitter.com/alecsharp
 Data Modeling blog: © 2010 Clariteq
contact asharp@clariteq.com

Alec’swww.erwin.com/expert_blogs/authors/22/
bio:
Alec Sharp, a senior consultant with Clariteq Systems Consulting, has deep expertise in a
rare combination of fields – business process analysis and redesign, application requirements
specification, and data modeling. With almost 30 years of hands-on consulting experience, his
practical approaches and global reputation in model-driven methods have made him a sought-
after resource in locations as diverse as Ireland, Illinois, and India.
He is also a popular conference speaker, mixing content and insight with irreverence and
humour. Among his many top-rated presentations are “The Lost Art of Conceptual Modeling,”
“The Human Side of Data Modeling,” “Crossing the Chasm - From Process Model to IT
Requirements,” and “Getting Traction for Process – What the Experts Forget.”
Alec literally wrote the book on business process modeling – he is the principal author of
“Workflow Modeling: Tools for Process Improvement and Application Development, Second
Edition” The first edition was published in 2001, and the second edition was published in 2009.
It has consistently been the top-selling title on business process modeling, and is widely used
as a consulting guide and as an MBA textbook.
Alec’s popular workshops on Workflow Process Modeling, Data Modeling (introductory and
advanced,) and Requirements Modeling (with Use Cases and Services) are conducted at
many of the world’s best-known organizations. His classes are practical, energetic, and fun,
with the most common participant comments being “best course (or best instructor) I’ve ever
had.”

Advanced Data Modeling extract 2 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Clariteq courses for analysts
Workflow Process Modeling – Defining, Mapping, and Analyzing Business Processes 2 days
Business processes matter, because business processes are how value is delivered. Understanding how to work with business processes is now a
core skill for business analysts, process and application architects, functional area managers, and even corporate executives. But too often, material
on the topic either floats around in generalities and familiar case studies, or descends rapidly into technical details and incomprehensible models.
This workshop is different – in a practical way, it shows how to discover and scope a business process, clarify its context, model its workflow with
progressive detail, assess it, and design a new process. Everything is backed up with real-world examples, and clear, repeatable guidelines.

Data Modeling – A Business-Oriented Approach to Entity-Relationship Modeling 2 days


Data modeling is critical to the design of quality databases, but is also essential to other requirements techniques such as workflow modeling and
requirements modeling (use cases and services) because it ensures a common understanding of the things – the entities – that processes and
applications deal with. This workshop introduces entity-relationship modeling from a non-technical perspective, provides tips and guidelines for the
analyst, and explores contextual, conceptual, and detailed modeling techniques that maximize user involvement.

Requirements Modeling – Proven Techniques for Use Cases and Service Specifications 2 days
Use cases have offered great promise as a requirements definition technique, but many analysts get disappointing results. That’s because published
methods are often inconsistent, complex, or focused on internal design. This unique workshop clears up the confusion. It shows how to employ use
cases to discover external requirements – how users wish to interact with an application – and how to use service specifications to define internal
requirements – the validation, rules, and data manipulation performed behind the scenes. Better yet, it shows in concrete terms how the two
perspectives interact, and demonstrates synergies with data modeling and business process workflow modeling.

Advanced Data Modeling – Communication, Consistency, and Complexity 2 or 3 days


After gaining some practical experience, data modelers encounter situations such as the enforcement of complex business rules, handling recurring
patterns, satisfying regulatory requirements to capture complex changes and corrections, dealing with existing databases or packaged applications,
integrating with dimensional modeling, and other issues not covered in introductory data modeling classes. This highly participative workshop
provides approaches for many advanced data modeling situations, as well as techniques for improving communication between data modelers and
subject matter experts.

Facilitation & Presentation – Session Techniques for Business Analysts 2 days


The primary approach for discovering and validating business requirements has shifted from one-on-one interviews to facilitated workshops. This
began with JAD or “joint application development” sessions, and has now become the norm. Just as important as gathering information in a facilitated
session are skills in presenting that information for validation and to inform a wider audience. While there are many general-purpose courses
available on these topics, there is very little available that is specifically designed for the needs of the business analyst. This unique workshop will
provide specific methods and techniques in both skills – facilitation and presentation.

Now available! Business Analysis Overview – Model-Driven Techniques for Processes, Applications, and Data 2 days
Essential content from Clariteq’s Process, Requirements, and Data Modeling workshops.
© 2010 Clariteq
contact asharp@clariteq.com
Clariteq
ADM
extract Seven typical problems

The problem… Why it’s a problem…


1. Missing the point We’re designing businesses,
altogether not databases
2. Starting with a data You’ll turn potential participants into actual
modeling lecture non-participants
3. Not investigating the You need it to show how much
“as-is” model better life will be with the “to-be”
4. Fear of asking You need to show that they’re the experts, and
“dumb” questions someone will be glad that you asked

5. Not applying graphic An ERD is a graphic – otherwise,


principles why bother?
6. Getting stuck in a You won’t get full participation,
data modeling rut understanding, and buy-in
7. Generalizing too much, You’re really just showing off –
too soon give us mere mortals a break!

© 2010 Clariteq
contact asharp@clariteq.com

1 - think about it – do architects bring hammers and saws to their first meeting with a client
2 - by putting them to sleep
3 - and know what has to be left in place, and what has to be converted or integrated
4 - and besides, you never really know the business as well as you think you do
5 - maybe you could just give them the DDL
6 - because for some of the folks, you aren’t using the right language
7 - besides, your “elegant” model is probably wrong if no one can validate it

Advanced Data Modeling extract 4 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Seven positive behaviors

The behavior… What it means… For example…


1. Accessibility Data modeling can be challenging enough to “Just do it!” - don’t start
participate in - make it easy for everyone to get with a lecture on data
involved modeling
2. Directionality Like process models and org charts, data Draw models so that
models are easiest to understand if they dependency is visually
have a direction. obvious.
3. Simplicity The forces of complexity are everywhere – Use methods that let
resist them! Use simple techniques and you start simple, and
frameworks, at least at the beginning. add detail in layers
4. Consistency Like children, adults learn from repetition – Follow the same “script”
always do the same things the same way, & whenever adding a new
they’ll learn modeling by osmosis. entity
5. Visibility It’s best if your clients spot the need for things Draw models so that
like generalization – be patient, and give them generalization , etc. are
every chance. visually obvious.
6. Relevance Data models can be quite abstract to many Use familiar “props” like
people, so “attach” concrete, relevant artifacts forms or reports to
and issues to them illuminate models
7. Plurality Data modeling, and data model diagrams, Use scenarios and
appeal to some, but not all – use other narratives in addition to
techniques to involve everyone E-R diagrams.
© 2010 Clariteq
contact asharp@clariteq.com

And maybe…
8. Patience (is a virtue)
9. Humility (Don’t be afraid to ask! Spend more time saying “tell me more.”)
10. Empathy (Feel their pain! Put yourself in their shoes!)

Advanced Data Modeling extract 5 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract What is a data model?
Customer definition:
• A description of a business A Customer is a person or organization that
is a past, present, or potential user of our
in terms of the things it needs to know about products or services. Excludes the
company itself when we use our own
• Things (Entities) and products or services, but includes cases
where the Customer doesn’t have to pay
Facts about Things (Attributes & Relationships) (e.g., a charity.)

Plus “Assertions” (rules)


• “Real world”, not technical implementation - Each Order must contain one or more
Order Lines (i.e., at least one Order Line)
• Graham Witt – “A narrative supported by a graphic” - Each Order Line is contained in exactly
one Order
- Each Order can contain at most one Order
Customer Line per Product
Entity
Customer ID
Name
a distinct thing of interest
Billing Address about which the business
Shipping Address placed by
must maintain information
etc..
Identifier
Order Product One or more attributes that
Order ID
places Product ID can be used to uniquely
Placed Date Description specify a single instance
Delivery Date Unit Price
Status
(only in detailed data models)
etc.
etc.
Attribute
A property of an entity
that can be expressed
specifies
as a piece of data Key Point
contained in
Order Line Relationship Not the same as
Order ID A named association
Product ID
database design
between two entities
Quantity
etc..
© 2010 Clariteq
contact asharp@clariteq.com

There are many ways to describe a business...


• How it works - Process Model
• How it’s organized - Organization Chart
• Where it operates - Location Map
and…
• What it needs to maintain records about - Data Model

Data modeling symbols will vary slightly among the different “dialects”, but the meaning is constant.
The symbols are much more standardized than they used to be.

Data Modeling involves:


Gathering knowledge from Subject Matter Experts (the hard part!)
Representing that knowledge using a set of standard symbols and conventions (the easier part!)

Not just for Database Design anymore!

Advanced Data Modeling extract 6 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Entity types and conventions
Kernel
Independent
A fundamental thing of interest to the
enterprise whose existence does not depend
on any other entity – it can “stand alone”
Supertype Drawn at the top of its area
Contains facts (attributes
and relationships) that are
common to all instances of
the entity. Any kind of
entity can be a supertype.

Reference or Type
Independent
Classifies or categorizes other
entities and/or allows the
recording of allowable values
for a descriptive attribute
Subtype Drawn diagonally out from or
Contains facts that are beside the classified entity
specific to a particular
subset of instances of
Characteristic
the entity. Dependent on one parent
Records multi-valued facts
about a parent entity that
Associative have been “cast out” from
Dependent on two or more parents that entity
Records facts about a relationship Drawn below parent
(association) between two or more
parent entities – is often the Recursive relationship
resolution of a M:M relationship A relationship between
between the parents instances of the same entity.
Drawn between and below parents Can be 1:1, 1:M, or M:M
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 7 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Three types of data models
Different levels of detail support different perspectives

Type of Data Model .. The need


Contextual  Agreement on “big picture” and
1 (Scope) vocabulary for process or subject

Conceptual  Agreements on basic concepts,


2 (Overview) more vocabulary, and rules

Logical  Excruciating detail for physical


3 (Detail) design
Remember…
Upper levels often lost because… • Maintain SME involvement
• Get maximum value from the
technique
Didn’t know they were important
Tool provided no support
Started at too low a level
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 8 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Summary – data model types
Contextual Conceptual Logical
1 (Scope) 2 (Overview) 3 (Detail)
 Agreement on “big  Agreements on basic  Excruciating detail
picture”, main terms concepts and rules for physical design
and definitions Main differences
 May be a simple block
diagram, or primarily  Ensures that everyone is on the  Provides all detail for first-cut
same wavelength before diving physical database design and
textual – a list into the details requirements specification
 Optional – not
 Overview: main entities,  Detailed: ~ 5 times as many
necessary on smaller attributes, and relationships entities as the conceptual model
projects  Lots of M:M relationships  M:M relationships resolved
 Later in this course,
we’ll look at some
 Relationships show multiplicity  Relationship optionality added
important techniques  No keys  Primary, foreign, alternate keys
for dealing with  No reference entities except  Lots of reference entities
where they are “structural”
contextual models  Fully normalized – no multi-
 Many attributes will be non- valued, redundant, or non-
atomic and multi-valued atomic attributes. All attributes
defined and “propertized”
 Verified by direct inspection
 A “one-pager”  May be verified by other means:
sample data, report mockups, …
 20% of the modeling effort  May be partitioned
 80% of the modeling effort
© 2010 Clariteq
contact asharp@clariteq.com

Note that across the industry, there is a lack of consistency in defining these types of models. In the
“Zachman Framework” these would be the planner’s, owner’s, and designer’s views.
Analogies:
- The contextual model is like the site plan with a definition of what will be built. The focus is scope or
“footprint.”
- The conceptual model is like a floor plan and sketches for a building. The focus is the essential terms,
definitions, and facts / rules.
- The logical data model is like the detailed blueprints for a building. The focus is on the individual
data items the enterprise needs, and the rules that govern them.
A basic message for conceptual modeling – “Resist the urge to normalize or generalize until it
matters!!!
The logical model is not necessarily the “as built” model – the physical database design. The database
designer or DBA will make changes in the interest of performance, recoverability, distribution, etc.
Everyone who currently supports an application should:
- draw the application’s logical data model following strict top-down drawing conventions.
- abstract the model “up” to a conceptual data model
- at least consider reviewing the conceptual model with analysts, developers, and subject matter experts
to ensure that it reflects the intentions of the business.

Advanced Data Modeling extract 9 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Contextual data model

 A list of the main topics (“subject areas”) in scope, and an


associated vocabulary or glossary
 Glossary may include items other than Entities
E.g., processes, transactions, industry terminology, Key
Performance Indicators [KPIs], etc.
 Primarily textual; optionally, a diagram showing the topics and
their interrelationships, e.g.

© 2010 Clariteq
contact asharp@clariteq.com

Main use: “Do we understand the scope and the main terms?”

Advanced Data Modeling extract 10 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Conceptual data model

 Shows main or core entities, relationships, and attributes


 Gets the “concept” across
 Great for communication, but not for database design
 Best done before any significant process modeling or application
requirements (use cases and service specifications)

Let's see what


happens when we
take these three
entities to the
"Logical" level...

© 2010 Clariteq
contact asharp@clariteq.com

The conceptual model is the “crossroads” at which both business and IT can communicate – both
parties have “shared accountability” to ensure that there is a common understanding of the basics.
As you add detail, your conceptual model will evolve into a logical data model, but don’t lose the
conceptual view!!! It is an absolutely vital tool for presentations, training, and so on.
After Logical Data Modeling, the next stage in the progression would be to turn your logical data
model into a Physical Database Design for your particular implementation environment (MS Access,
SQL Server, Oracle, DB2, etc.)

Advanced Data Modeling extract 11 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Logical data model

 All necessary detail


– it’s the data
specifications
 Input to first-cut
physical database
design
 Completed after use
cases and service
specifications are
finalized

© 2010 Clariteq
contact asharp@clariteq.com

This could be made even more detailed


• we haven’t shown entities like “Semester”, “Building”, or “Room”
• we haven’t shown reference entities like “Course Method” or “Degree Level”

Advanced Data Modeling extract 12 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
From conceptual to initial logical
The progression from conceptual to logical is largely based
on identifying and dealing with three attribute characteristics
• Multi-valued - the attribute can have multiple different values for one
instance of the entity, either “at a time” or “over time”
E.g., “Employee Name” if aliases or previous names are tracked
• move it down to the “many” end of a 1:M relationship into a characteristic entity
• if it’s a fact about a M:M relationship between entities, move it down to the
“many” end of a 1:M relationship into an associative entity
• both move data structure into 1st Normal Form – 1NF

• Redundant - the same attribute value is recorded multiple times, in


different entity instances, possibly inconsistently
E.g., “Company Name” in a “Department” entity
• move it up to the “one” end of a M:1 relationship
to one of the parent (or higher) entities (2nd Normal Form – 2NF)
• you might have to create a new parent entity where non existed before

• Constrained - a descriptive attribute needs to be restricted to a set of


standardized values to improve integrity and reporting
E.g., “Employee Type”
• move it out to the “one” end of a M:1 relationship
to a reference or other related entity (3rd Normal Form - 3NF) © 2010 Clariteq
contact asharp@clariteq.com

For multi-valued attributes, ask “On what basis does the attribute repeat?” The answer should be in the
form “It occurs once per …” This will provide a clue as to what entity the multi-valued attribute should
be moved to.
Two variations of the same example:
- If a Resource has multiple Chargeout Rates over time, then the Chargeout Rate doesn’t vary in
relation to some other entity. We could say that the Chargeout Rate attribute repeats “within” the
Resource entity, so we’ll simply move it down (“cast it out”) into a characteristic entity called
Resource Chargeout Rate. It will need the attributes Effective Date and End Date in addition to
Amount.
- If a Resource has multiple Chargeout Rates, one per Project that the Resource is contracted to, then
we could say that the Chargeout Rate attribute repeats “in relation to” the Project entity. In other
words, we know that Chargeout Rate is a fact about the relationship between Resource and Project, and
belongs in an associative between them. That associative may depict a contract or agreement, and
might have the word “Contract” in its name.
Another example:
- If the attribute Expected Duration is in the Project entity, and it is multi-valued, with one value per
project phase, then Expected Duration should be moved down into a Characteristic (of Project) entity
called Project Phase. The Task entity would likely be a characteristic of Project Phase

Advanced Data Modeling extract 13 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Migrating multi-valued attributes

Attributes can’t repeat within an entity –


“repeating” or “multi-valued” attributes are moved into a characteristic entity

For each Section, there


can be one or more Note –
Lecture times. Later, we’ll discuss the
Depending on the type of inclusion of primary keys
and the
Course, there may be
added relationship symbols
none.

For each Section, there


can be one or more
Tutorial times. There
will always be at least
one.

We must move each


"repeating group" into a
child entity.

© 2010 Clariteq
contact asharp@clariteq.com

This is one of the rules for normalization - entities are in First Normal Form once all the repeating
attributes or groups of attributes have been sent (“cast out”) to their own entities.

Advanced Data Modeling extract 14 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Migrating attributes of relationships

When the multi-valued attribute is actually a fact about a relationship,


we create an associative entity:

"When did John Smith enroll in Math 100?"


"What grade did John get at midterm?"
"What was his final grade?“
“What is the average grade for Math 100 Section 3?”

These required facts are not about Student,


or Section, but the relationship between
a Student and a Section

We need to create a new associative entity

© 2010 Clariteq
contact asharp@clariteq.com

“Many to many” relationships will almost always get a “promotion” to an entity, as in the example
above, because there are usually attributes about the relationship that must be recorded.

This is a variation on putting data into First Normal Form.

Advanced Data Modeling extract 15 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Migrating redundant attributes

We eliminate redundancy by ensuring that every attribute is in the entity that it


describes, so that the attribute value is recorded only once.

• Before migration, attribute values about a Department would be recorded redundantly with
every Course offered by that Department, so it is moved up to a parent entity.

• Before migration, values of the Delivery Method Description attribute would be carried
redundantly in many instances of Course, so it is moved out to a “type” (or “reference” or
“lookup” or “classification”) entity.
© 2010 Clariteq
contact asharp@clariteq.com

Eliminating redundancy puts entities into Second Normal Form if the redundant attributes move “up”
the parentage hierarchy, and into Third Normal Form if the attributes move “out” to a related entity
(often a “type” entity.)

Advanced Data Modeling extract 16 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
World’s shortest course on normalization
• Unnormalized (UNF or 0NF)
 Contains a “repeating group” UNF
• First Normal Form (1NF)
 Repeating attributes moved down to Characteristic
or Associative entities 1NF
• Second Normal Form (2NF)
 Only applies to dependent entities
 No attributes in a child entity are really facts about
a parent (or grandparent or…)
2NF
 That is, no Characteristic or Associative entity
redundantly contains facts from its parent(s) – if it
does, move the fact(s) up
(create a new parent entity if necessary)
• Third Normal Form (3NF)
 If any entity redundantly contains facts from a
related (non-parent) entity, move the fact(s) out to
the other entity
(create a new entity if necessary) 3NF
• BCNF (Boyce-Codd NF)
 Not an issue if you keep your wits about you
• Fourth and Fifth Normal Form (4NF, 5NF)
 “Large” (3-way or more) associatives need to be 4NF, 5NF?...
broken down into more granular entities
© 2010 Clariteq
contact asharp@clariteq.com

Other normal forms – forget about it!

The reason we’re covering this? You have to be able to make it simpler for the data “layperson”

Advanced Data Modeling extract 17 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Script – adding a dependent entity

An “orderly script” –
adding a new characteristic or associative entity to a logical model
1. Place the entity (and relationships) on the diagram
according to dependency
2. Ask “What is one of these things?” then
name and define the entity accordingly
3. Add relationship names, and add multiplicity
(or confirm, if it was already specified)
4. Add attributes
5. Perform further attribute migration, dealing with
multi-valued attributes first, and reference data last
(1NF, 2NF, 3NF in sequence)
… and only then worry about…
6. Relationship optionality
7. Primary keys or uniqueness constraints
8. Additional constraints (e.g., rules on date ranges)
Whenever you add a new entity
• check to see if attributes or relationships from nearby entities
should be moved to the new entity
• check that you haven’t introduced transitivity (clue: “loops”) © 2010 Clariteq
contact asharp@clariteq.com

Consistency is very important to engaging your clients in the data modeling process. Have a method,
or have scripts – do the same things the same way, and draw the same things the same way. If you do
this, participants will learn modeling “by osmosis” and will learn what to expect. (E.g., that a M:M
relationship will eventually get resolved.)

Advanced Data Modeling extract 18 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Seven questions for date ranges and dates
For records dependent on the same parent…
1. Can there be gaps between date ranges of adjacent
(in time) records?

2. Must the date ranges be contiguous (no gaps)?

3. Can the date ranges overlap?

For any date range…


4. Can a date range begin in the future?

5. Is a date range inclusive or exclusive of the


End Date? (“until” or “through?”)

6. Must a date range fit within the date range of a


parent entity?
7. Will the dates have to handle global time zones?© 2010 Clariteq
contact asharp@clariteq.com

Note that in this example, we could ask the questions for both date ranges:
- Effective / End Date
- Recorded / Corrected Date

To clear up confusion around question 5, some organizations have standardized on “Last Valid Date”
instead of “End Date.”

Advanced Data Modeling extract 19 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Script – meeting a new requirement…
Confirm and extend the model:
 discover new requirements, using a variety of techniques
Philosophy
 don’t dive in – start simple, add detail in layers
 start out in “natural language”
 Start out using the client’s language
1) State the new  Then, ensure that the assertion uses terms from the
requirement as an data model (entity names, relationship names, etc.)
assertion This “leads” you to the solution.
 Confirm it!

 Look for the simplest option first: no change needed,


2) Develop a a new reference attribute, a multi-valued attribute(s),
conceptual solution M:M relationship, new entity
 Explore rules, like “what is the basis for multi-valued?”
 Confirm it!

 Fully normalized, fully attributed


3) Develop a  Follow an “orderly script” –
logical solution don’t get ahead of yourself or the client
 Confirm it!, possibly using other easy-to-follow formats
such as screen or report mock-ups. © 2010 Clariteq
contact asharp@clariteq.com

Issues in meeting new requirements:


Original modeler moves on, often without properly documenting the model, and subsequent modelers
don’t really understand the conceptual underpinnings of the model
Failure to confirm the requirement with the subject matter expert, often by not using techniques like
narrative assertions or concrete examples, and instead jumping too quickly into the details (keys,
normalization, detailed attributes, reference data, etc.)
When dealing with new requirements, modeler/DBA works at the physical level, instead of at the
conceptual level. The result – a tendency to “bolt on” new tables (entities) rather than properly
“building in” the new requirement. This results in more complexity than is really needed.

Advanced Data Modeling extract 20 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Refining the logical
As the model nears completion, the entities have been made as
granular (normalized) as necessary.
Once the model meets known requirements, we’ll also “granularize”
the attributes by finding and resolving the following:
• Non-atomic attributes:
The attribute has “internal structure” - it could be decomposed into
more granular (“atomic”) attributes. E.g.,
“Employee Address” is non-atomic,
“Employee Address Street Name” is atomic – it is at the finest
level of granularity that will ever be manipulated or displayed

• Semantically overloaded attributes:


The attribute is “overworked” - it contains multiple different
attributes, typically encoded into a single attribute
• in the earlier days of systems, this was done deliberately by
designers to save space (think of the Y2K problem…)
• now, it will more likely be done inadvertently by business
people who don’t know the negative consequences of
overloaded coding schemes
Finally, name and define attributes, and document attribute properties © 2010 Clariteq
contact asharp@clariteq.com

The distinction between non-atomic and semantic overload can be confusing:


A non-atomic attribute needs to be broken down into finer attributes, each of which is a “smaller” part
of the same overall attribute. See page 36 for more information and examples.
A semantically overloaded attribute also needs to be broken down, but into distinctly different
attributes as opposed to smaller pieces of the same attribute. See page 57 for more information and
examples.

Note – we don’t typically do this until after we’ve searched for, discovered, and satisfied outstanding
requirements using the techniques that we’ll look at shortly.

Advanced Data Modeling extract 21 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract A natural progression
Focus – scope
Contextual context and boundaries,
glossary of main terms and definitions

tra Focus – overview


dit business perspective, all terms and
io na Conceptual
lm definitions, overall structure, major
od facts and rules
eli
ng
an Focus – detail
dde
ve
lop
Logical all facts, detailed rules,
me input to 1st cut physical design
rev nt
ers
ee Physical
ng
ine DB
eri
ng Design
The “Danger Zone”
! Get into the high “value-added” space Analysts shouldn’t
worry about physical
 Contextual – helpful for large models design issues while
 Conceptual – a great way to add value data modeling.
 Improve communication among all players
 Highlight disconnects – terms, rules, scope, …
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 22 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Three phases in data modeling
1) Establish initial 2) Develop initial 3) Refine & extend
Conceptual Logical Data Model Logical Data Model
Data Model

 Focus is on developing a  Focus shifts to attribute  Focus is on refinement, and


core set of entities: rigor and structure when validation via new
 named going to the logical level requirements using…
 defined  First check attributes for:  …an event-based
 minimally attributed  completeness approach: fast and easy…
 bound by basic rules  necessity  …or full business analysis:
and relationships  name and definition  process workflow model
 placed on an ERD  placement  use cases (external)
 Might start bottom-up:  Resolve attributes that are:  service specs (internal)
brainstorm details then  multi-valued  Profiling existing data
synthesizing “up”  redundant  informational needs
 Might start top-down:  constrained  Resolve attributes that are
build a contextual model,  Continue experimenting semantically overloaded,
then flesh out required with alternate structures non-atomic, or derived
details analyzing “down”  Refine conceptual model  Document attribute
 Experiment w. alternatives properties and validation
 Refine the contextual  Specify identifiers
model, if you had one.  Refine conceptual model
© 2010 Clariteq
contact asharp@clariteq.com

Of course, step 0) is to establish Project Scope and Objectives

We covered all of the previous stuff so you’ll be able to simplify some of the techniques for others.

Advanced Data Modeling extract 23 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Reminder – the four Ds of data modeling

Definition Dependency
 “What is one of these things?”  “What type of entity is this?”
 List common and unusual  “What other entity does it
instances depend on?”
 “Are there any known
anomalies?”  Essentially, is it a free-standing
thing, a type of things, or
 “What are the potential
repeating detail about some
differences of opinion?”
other thing?

Detail Demonstration
Keep it in its place! Sample instances
GEFN! HPDL! Schematics
Props
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 24 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Entity definition basics
Definitions must focus on what a single instance is:
• Not “how they’re used” or “how they’re created” or
“why we care” or “how the process works” or Key Point
“interesting problems and tidbits” etc. “What is one of
• Ask “What is one of these things?” these things?”

The most useful questions:


“Can anyone think of examples that might surprise someone else –
that is, anomalies or potential sources of confusion?”
e.g., to define “Customer:”
• “In our area, other divisions are treated as customers”
• “We record recipients of charitable donations as customers.”
“Could we list some examples?”
• Rita Smith, Acme Auto, Ministry of Finance, homeowners… (aha!)
“Does this deal with “kinds of things” or “specific things?”
• “kind” - Customer Category vs. “specific” – an individual Customer
• if it’s a specific thing, still ask if there are recognized types
(e.g., Personal, Corporate, Government; Lead, Prospect, Active)
© 2010 Clariteq
contact asharp@clariteq.com

The entity definition tells which things in the real world are included within our understanding of that
entity. For instance:
• The world has hundreds of millions of people who are “students”
• Which ones would we expect to find in a specific university’s Student database?
• Which ones would be excluded?
Two other useful questions:
• Are there life cycle issues to consider? For instance, Applicant to Candidate to Employee to Retiree
– does “Employee” include “Applicant” and “Retiree?”
• Does the same real-world thing appear as multiple entities? E.g., one person could be both a
“Driver,” a “Registered Vehicle Owner,” and a “Legal Vehicle Owner.” If this is of interest, you
might need to “generalize by” creating a “Person” entity.
A common error in entity definition - describing the current implementation instead of the “essence” of
what the entity is. E.g., “This entity is the ASF-72 created by Emily down in Personnel.
Another common error - using the entity name to define itself. E.g., "A Contract is a contract between
the corporation and …"
Finally, note that the last example on the slide indicates two separate “type” classifications –
Customer Legal Entity Type and Customer Status Type

Advanced Data Modeling extract 25 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Entity definition format and example
Customer Customer
We have a variety of Customers that A Customer is a person or organization
operate in multiple geographies, and these that is a past, present, or potential user of
must be tracked in order to consolidate our products or services.
purchasing statistics and enable our rating Current examples include Solectron
process to identify our best Customers. (contract manufacturer,) Cisco System
(OEM,) Arrow Electronics (distributor,) Best
Buy (retailer,) M&P PCs (assembler,) and
individual consumers.
Excludes the company itself when we use
our own products or services, but includes
cases where the Customer doesn’t have to
pay (e.g., a charity.)
Entity definition format:
1. A description of which real-world things will be included in scope.
This might be developed from a list of standard “thing types” – person,
organization, request, transfer, item, location, activity, etc.
Be sure to identify specific inclusions or exclusions.
2. Illustrate with examples:
• 5 – 10 sample instances
• diagrams
• current “props” like reports or forms
3. Interesting points – anomalies, synonyms, common points of confusion, etc.
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 26 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Guidelines for working with assertions
1. Focus on the appropriate case –
most assertions begin with the word “Each”
2. Exclusively use terms from the data model –
entity, relationship, and attribute names
• If there’s a concept that can’t be described with existing
terms, you’ll need to add to the data model
3. If the assertion describes a relationship, you must
state it in both directions
4. If the assertion describes a relationship, be clear on
whether cardinality is “one” or “one or more”

Each Instructor teaches one or more Sections


(Sounds good…)

Each Section is taught by one Instructor


(Really…?)
© 2010 Clariteq
contact asharp@clariteq.com

Entity definitions and uniqueness constraints are also assertions.

Advanced Data Modeling extract 27 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Two important time concepts

Logical Time Physical Time

 Effective date/time,  Recorded date/time,


Start date/time, Transaction date/time,
Begin date/time,etc. Update date/time,etc.
 Time that data reflects the  Time when a record was
intent of the business at the written to the database
time of update
 Representation
 Reality

Remember Remember
• Can be updated • Cannot be updated

Wrong – with developments like


Sarbanes-Oxley, we don’t change
stored data, we add new records.
© 2010 Clariteq
contact asharp@clariteq.com

A third type of time is “User Time” - any other date/time of interest to the business
(e.g., Reservation Arrival Date)

Advanced Data Modeling extract 28 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Time dependent data – key points

 Facts that change independently


should be recorded independently
 Never name the entity “History” –
it probably includes present and future values
 Distinguish between
• business Effective Date
• database Recorded Date
 It’s tempting to put “Effective Date” in the key,
but it might change
 Be sure to define what End / Expiry date means
 Capture the need (the “reality”) first in the model,
then factor in performance considerations
 You might need to consider time zones
• GMT / UMT
• Local offset
© 2010 Clariteq
contact asharp@clariteq.com

Plus –
• don’t change stored values, add new records
• check for “one at a time, many over time” vs. “many at a time, many over time”

Advanced Data Modeling extract 29 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Four key points about complex associations
1. You can’t tell whether a model is correct or not simply by inspecting it
– you must have business involvement

This gives rise to the other three points…

2. You must draw the model in a top-down fashion (or other systematic
approach) so you can actually see dependencies
3. You must state your assumptions or understanding in narrative form
as assertions, using terms (entity names, relationship names, and
attribute names) from the data model
4. You must illuminate the data model by using sample data, schematic
diagrams, scenarios, or some other understandable form

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 30 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract A quick exercise…
1. The company decides which items will be carried at which stockrooms.
2. The company qualifies suppliers to provide specific items.
(A supplier can be qualified to provide multiple items, and an item may
be provided by multiple suppliers)
3. The company enters into a contract with qualified suppliers for each
item they will provide to a specific stockroom.

Will this model satisfy the business constraints?


If not, identify specific problems and develop a better model

© 2010 Clariteq
contact asharp@clariteq.com

A 5NF violation occurs if independent relationships between pairs of entities have been lumped
together with other independent relationships.

Advanced Data Modeling extract 31 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
4th Normal Form

• 4NF - “Primary Key cannot contain 2 or more independent,


multivalued attributes of another entity”
• The classic example:
Employees may have Skills and/or Languages

This version is incorrect, because


Skill and Language are independent
This version is correct

© 2010 Clariteq
contact asharp@clariteq.com

Again the rule is


If only certain combinations of entities are valid, create an associative entity to record those
combinations
The associative should be as “small” as possible. That is, two entities each having a two part key is
preferable to one entity with a three-part key, if each “small” entity with a two-part key could exist
independently of the other.
If Language and Skill weren’t independent, then the original model is okay. (For example, if each
Skill could only be practiced in certain Languages)

4NF is pretty obvious. Things get trickier when we look at 5NF

Advanced Data Modeling extract 32 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
5th Normal Form

• How we model three or more related entities depends on


the rules

• Agents represent Agent


Agent ID
Manufacturer
Manufacturer ID
Region
Region ID
Manufacturers in
Regions - if any
combination is valid, Representation

the model to the right Agent ID


Manufacturer ID
is fine Region ID

• What if there are additional constraints?


– “business rules”
– only certain combinations are valid

© 2010 Clariteq
contact asharp@clariteq.com

Fifth Normal Form deals with associations between three (or more) entities when there are independent
relationships between two (or more) of those entities.

Advanced Data Modeling extract 33 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
5th Normal Form

• Assume the following constraints:


– Agents only represent certain Manufacturers
– Manufacturers only distribute in certain Regions
– Regions are only covered by certain Agents
• Now we have a “cyclic dependency” within the key of
Representation
– violates 5NF

“Cyclic dependency”:
Agents are related to Manufacturers,
Manufacturers are related to Regions,
and Regions are related to Agents

© 2010 Clariteq
contact asharp@clariteq.com

What are the problems with the form shown above?

“Independent multi-valued relationships” and “cyclic dependency” are the usual normalization
bafflegab that hides the real issue – a 5NF violation occurs if independent relationships between pairs
of entities have been lumped together with other independent relationships.

Advanced Data Modeling extract 34 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Two sides of the house

We’ve looked at Corporate … but other techniques are


techniques that are mission, strategy, appropriate for the
appropriate for this side goals, and objectives information delivery
of things… support
environment
support

Operational Executive Functions


Business Processes and Processes

support support
DSS, EIS, BI,
Operational
reporting, etc.
Applications
facilities

supports supports

Operational Atomic Data Mart,


ETML*
Data Data ODS, …
Warehouse
Entity-Relationship Model Star Schema or
* extract, transform, move, load Dimensional Model
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 35 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Oh-oh…

A detailed data model might be too complex to present to


business folks for query, OLAP, BI, etc.

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 36 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Dimensional models

 Used to model and implement


data structures for various
types of business intelligence
tools.
Dimension Dimension
 One or more dimensional
models per warehouse model
 We’ll use the terms dimensional Fact
model and star schema
interchangeably
Dimension Dimension
 Any combination of dimensions
can be used in a query
• the same dimension will appear
in many dimensional models
• should be managed as “shared
dimensions”
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 37 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Dimensional model concepts

“Facts” “Dimensions”

 the central thing you want to  how you want to organize or


count or measure summarize the facts
 has a count, usually “1”  often a Type or Kernel entity
(e.g., Region, Time Period,
 often details of a transaction or Product, Customer, …)
other core Associative entity
(e.g., Sale, Shipment, Crime,  can have attributes
Claim, …) (e.g., Product has Category,
Price, and Color)
 can have attributes, but when
they apply to a Fact they are
called measures
(e.g., Sale has Total Amount,
Time, Payment Method)

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 38 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Dimensional model – example

 The fact is usually an associative


entity from somewhere quite “low”
in the ERD
 The fact will usually include a Police Force –
“count” of something, even if the Calendar
Location
value is implicitly “1”
• E.g., “dollars” or “hours” or
Crime
“units”
 The dimensions are “clusters” of
the fact’s parents, grandparents, Court Statute
etc. entities
 Any combination of dimensions
can be used in a query
• the same dimension will appear
in many dimensional models
• should be managed as “shared
dimensions”
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 39 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
The classic methodology

Step Notes
What sorts of relationships among the data are of
Identify questions
1 interest? E.g., want to study sales by product
color and customer, or by region and employee
seniority.

What is the central thing (or things) of interest?

2 Identify facts Often a transaction or event entity with multiple


parents and classifications. E.g., a Sale

How will facts be organized? Usually an entity


Identify dimensions related to the fact entity (a foreign key.) E.g.,
3 Employee, Customer, … May be hierarchic, e.g.
Country, Region, “State”, …
What additional detail is needed? Facts
Add attributes
4 have“measures” and dimensions have “attributes”.
E.g., Sale units, total price, time of day, …

Identify calculations such as totals, average, or


5 Add calculations projection that should be pre-defined. E.g., average
sale price, total sales per month,

© 2010 Clariteq
contact asharp@clariteq.com

You may end up producing more than one star schema. Each will get collapsed into a single table
(named for the “fact”). Tables will then have to be joined (but these will be far simpler than what
would otherwise be necessary)
A few guidelines:
• Don’t try to get all your operational data perfect first, or you’ll never get anywhere
• Accept that after the data structure is in use, the questions will change. Embrace iteration.
• Manage the volume. Combining two “facts” (star schemas) into one table may cause exponential
volume increase. Focus initially on the critical measures and attributes.
• Start with a good, normalized data model that clearly shows dependency, as we’ll demonstrate in a
minute…

Advanced Data Modeling extract 40 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
But it’s easier with an ERD

Not a dimension
Publisher
Publisher ID Dimension
Name
Title Cardholder
Title ID Cardholder ID Dimension
Name Name
Author Number
available from
Member Since Date

Dimension is an instance of
Format Type Copy is taken by
Format Type Code
Name Title ID Loan
Copy SID
Loan ID
is classified by
Purchase Price Amt
Acquisition Date
Date Dimension
Cardholder ID (fk,nn)
Status Code
Dimension Format Type Code (fk)

takes is part of

Loan Item
Fact Loan ID
Title ID
Copy SID
Due Date
Return Date
Status Code

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 41 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract From E-R to dimensional

 Any parent (or grandparent or…)


entities that are encountered
following M:1 relationships from the
fact are possible dimensions
 Any entities that are 1:M or M:M Calendar Cardholder
from the fact cannot be dimensions
without “faking” the data
 Additional dimensions not in the Loan
original structure (e.g., Time
Period) can be added
Author Title
 Essentially, a basic dimensional
model (no snowflakes) collapses
an ER model to a two-level
structure with a 1:M relationship
between each dimension and the
fact

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 42 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Exercise: dimensional modeling

Jim’s sister-in-law June has just returned - The layout of stores (Sections, Aisles,
from a BI conference, and she has Jim Store Categories, etc.) varies widely
all wound up about building a query across the stores.
database so he can analyze sales
- The “Store Category” indicates if the
(purchases by customers.)
store is a mall location, streetfront,
Construct a dimensional model for Jim, “captive” (contained within another retail
using the following E-R model as a outlet,) etc. Web sales are not a factor.
starting point. At this point, don’t worry
Jim is especially interested in how the
about individual attributes – just which
same Title sells depending on where in
entities would collapse into which fact or
the Store it is displayed, because the
dimension. A few notes:
same Title might end up in different
- Jim’s has grown to a nationwide chain, Sections. He also wants to look at Sales
with stores in many regions. Most by Store, Region, Artist, Publisher,
regions cover one or more states, Supplier, Category, … well, just about
although some regions only cover part of everything! You’ll have to decide what’s
a state (e.g., Northern California and possible, and then be prepared to explain
Southern California). Each store is in a it to Jim!
single city, though, and each city is in
only one region.

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 43 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Dimensional modeling exercise

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 44 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Solution: dimensional model

© 2010 Clariteq
contact asharp@clariteq.com

As it turns out, having an E-R model is invaluable in producing a valid star schema, although many
data warehouse experts will argue the point…

Advanced Data Modeling extract 45 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Handling “vectors” of attributes

 fixed number of
repeating attributes
 may be an “array”
e.g., for each Quarter, also
record:
• Target Sales Amount
Divisional Sales • Sales Per Employee Amount
(in 1,000,000s) •…?
Year Q1 Q2 Q3 Q4

2005 1.45 1.37 1.40 1.67


2006 1.46 1.40 1.63 1.91 Each row is a vector
2007 2.11 2.32 … …

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 46 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Alternatives for modeling vectors

“Row-wise” table “Column-wise” table

• one row per vector; • multiple rows per vector;


attributes go in attributes go in a single column
separate columns

Advantages Advantages
• familiar layout • same handling as for other
multi-valued attributes
• from “row to screen”
is easier • easier SQL queries
(e.g., average sales)
• fewer tables and joins
• More efficient for sparse data
• more suitable in DW/DSS
environment • flexible:
– change vector length
– add additional attributes
(like Top Sales Rep for each Quarter)
© 2010 Clariteq
contact asharp@clariteq.com

Has anyone had experience with this situation?

The point – don’t be too quick to translate reporting layouts into operational data structures

Advanced Data Modeling extract 47 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Recursion

 When one entity occurrence Division


can be related to another
occurrence of the same
entity type
Department Organization
 Three variations – Unit
1:1, 1:M, M:M
generalizes
 Recursion and
generalization often go Section
together

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 48 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Recursion - recognizing the data structure

© 2010 Clariteq
contact asharp@clariteq.com

The name on the M:M (network) relationship could be more descriptive:


• contains / contained in
• precedes / follows
• substitute with / substitute for

Drawing out examples (the fourth “D” in data modeling) will always help

Advanced Data Modeling extract 49 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Supertypes and subtypes
all jobs
Job

Supertype Job Title describes Employee


Creation Date duties of
Job Type Code
performs
Subtypes
Management
Job
Salary Amount

Certification
Bargaining requires
Unit Job required
Hourly Wage Amt for
Confidential Flag
only B.U. jobs

 Breaks an entity down into two or more 'subtypes', or generalizes


two or more into a single 'supertype'
• common relationships and attributes go into supertype
• unique relationships and attributes go into subtype
 subtypes are mutually exclusive and mandatory –
there is exactly one subtype instance for each supertype
 a.k.a. generalization-specification, or gen-spec
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 50 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Generalization vs. subtyping

 “Generalization” is the usual bottom-


up O-O term;
“subtyping” is the usual
top-down E-R term
 Generalize whenever two or more
entities, each with their own distinct
attributes and relationships, also share
other attributes and relationships
 Automobile, Aircraft, and Vessel have
common attributes that could be
generalized into Vehicle…
 …or, Vehicle could be sub-typed into
Automobile, Aircraft, and Vessel, with
the same outcome
 Note that it’s common for a subtyped
entity to also be classified by a type or
reference entity
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 51 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Facilitation – models are built in “sessions.” Why?

Disadvantages:
• longer elapsed time
• incompleteness
• encourages parochialism
• no real communication or
1 - The plan: consensus
orderly one-on-one interviews

2 - The reality:
"the analyst as messenger"

Advantages:
• speed and quality
• commitment
• communication, team building
• business understanding 3 - The response:
facilitated sessions

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 52 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Should I always use facilitated sessions?

Conceptual Data Models Logical Data Models


 up to 8 or 10 content experts  multiple, smaller groups of
• cross-functional content experts, or individuals
• mid to senior level • specialists
 up to 3 or 4 analysts • managers or supervisors
• facilitator, analyst, … • “front line” contributors
 up to 3 or 4 technical experts -  small number of IT specialists (or
architect, DBA, developer, ... just one) –
 Focus is agreeing on concepts, analyst, DBA, developer, …
terminology, rules  Focus might be on Process or
 Sessions are essential! Application Requirements
 Sessions are less suitable!

Key point! - Conceptual and Logical data modeling


require substantially different skill sets.

© 2010 Clariteq
contact asharp@clariteq.com

Conceptual model to support “Fill Order” process will involve cross-functional reps

May separate into multiple logical modeling sessions for


• Customer Relationship piece
• Sales
• Manufacturing Planning and Manufacturing
• Logistics
• Accounts Receivable

Advanced Data Modeling extract 53 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Facilities requirements
The facilities really do influence session results...
• comfortable, roomy, and away from work area
• wide U-shaped layout
• lots of whiteboard space and “plain” wall space

Room for everyone to work on the wall


whiteboard
flipchart flipchart

Don’t forget
flipchart pens,
whiteboard pens,
facilitator’s “wall safe” masking tape,
supplies flipchart stands & paper,
rolls of plotter paper or
butcher paper, Post-its,
participant seating rubber bands,
note paper, …

refreshments, etc.
No empty seats – “energy holes”
© 2010 Clariteq
contact asharp@clariteq.com

As an alternative to the U shape, you might have “rounds” of 4 or 5 people each

Advanced Data Modeling extract 54 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Attitude – “I’m here to do a job, not work a miracle”

Everyone has a job to do - don’t try to be Atlas!


• Confirm scope and objectives
• Determine and “invite” participants
• Arrange other resources
• Resolve difficult decisions

Sponsor

Facilitator Participant
DO - • Participate!
• Help develop objectives and plan • Provide information
• Enforce rules & plan • Suggest ideas
• Maintain focus on topic • Make decisions
• Press for completion and quality
• Help everyone participate
• Ensure recording
DON’T -
• Develop content
• Push a point of view
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 55 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract The world’s shortest course on facilitation

What You Do What They'll Do

 Write something up  Tell you if it's wrong


 Watch facial expressions,  Appreciate the opportunity
and ask
 Find areas of agreement  Take care of the disagreement
 Use alternate forms of  Build a better product
information
 Take time to think,  Use the time too,
and use the group and generate the way forward
 Remember your role –  Do their job –
facilitate, not participate you stick to yours
 Acknowledge what is  Deal with it

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 56 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract a) - Getting started bottom-up

Don’t begin with a lecture on data modeling


“Before we begin our data modeling session, let’s go over
some key points about data modeling. First, an Entity is any
uniquely identifiable person, place, thing, event, concept, or
organization of interest to the enterprise about which facts may
be recorded. Any questions? I didn’t think so…”

“Before I begin my speech, let’s cover a few of


the basic rules of grammar. A noun is any... ”

Avoid starting with the theory and practice…

Allows use of data If you can get


modeling in non- away with it,
Data modeling typical situations don’t even call it
sessions go better “data modeling”

© 2010 Clariteq
contact asharp@clariteq.com

Why not?
• “Purple monkey water wrench” – a phrase I saw in an article making the point that our IT terms
(foreign key, referential integrity, cardinality, …) aren’t any clearer to the client
• May lead to boredom and mental shutdown
• May lead to resentment and non-participation
• It’s unnecessary! Some things are easier to just do. Coaching basketball - initially, by example.

Non-typical situations
• Goal Setting and Planning
• BPx
• Package Evaluation and Selection

Advanced Data Modeling extract 57 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Do begin with a brainstorm

Collect
(Brainstorm) doubt -
n in
Lots of Whe list!
ea
suggestions
m ak
Reduce
(eliminate,
Problem cluster…)
Selected set of
or question answers or points
Useful
Sequence result
(dependency,
CoRSE: priority, …) Organized set of
The Facilitator’s Friend points or topics

Expand

© 2010 Clariteq
contact asharp@clariteq.com

Not always, but it’s a good default


Gets everyone involved easily, and level-sets (“role induction”)
Level-sets
If your data model isn’t going to start with brainstorming, maybe do a “venting” brainstorm.

Advanced Data Modeling extract 58 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract “CoRSE”: the specifics

 Collect (Brainstorm)  Reduce


 State problem or question  Eliminate: redundant, out of
 Going clockwise (fast) everyone scope, …
makes one suggestion  Cluster
• “pass” if nothing to add  Select
• “pure” brainstorming is random,
not “in turn”
 Stop when everyone 'passing', or
 Sequence
agreement to stop, or time’s up  Goal: workable sequence
 Record without editorializing  By dependency, chronology,
 might ask for short phrase priority, …
 might paraphrase for  Not permanent – just to
confirmation organize the session
 Keep it moving, enforce rules
 No discussion  Expand
 quick clarification or positive  Collect more info: define,
comments okay alternatives, pro/con, …
 absolutely no negative  Apply CoRSE on each item
commentary
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 59 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Applying CoRSE to starting a model

Brainstorm… Collect

 For “anything related to data or information in any way,


1 shape, or form” (e.g., things of interest, information needs, facts,
queries, calculations, reports, etc.) Or, simply gather nouns.

For each item, ask “Is this a thing, Reduce


a fact about a thing, or other stuff?”
 Circle things
 Cluster facts around the appropriate thing
2  Other stuff will include reports, forms, systems, departments,
processes, etc. –
use these as clues for more things and facts about things
Choose the fundamental terms Sequence

3  Kernels, then their dependents


Entity definitions and major attributes Expand
 Focus on anomalies and “likely sources of confusion”
4  Don’t worry about normalization, generalization, keys, …
© 2010 Clariteq
contact asharp@clariteq.com

Accessibility – no jargon! Again – this is “role induction”

“Fact about a thing” – attributes or relationships. Don’t worry about keys!!! (or normalization or
atomic attributes or generalization or ANY of that stuff)

Advanced Data Modeling extract 60 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract b) - Getting started top-down

“Draw five boxes. Any five boxes.”

Stockroom Item Supplier

Inventory Availability
&
Agreements

Intake Assessment Diagnosis Treatment Service


Planning

Quotation Booking Confirmation Amendment Flight


& Ticketing

© 2010 Clariteq
contact asharp@clariteq.com

At this point, these could be subject areas, activities, states, … - it doesn’t matter!

Advanced Data Modeling extract 61 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Working with the “big picture”
Sources:
 Review “artifacts” such as
• input formats (screens, web pages, forms…)
• output formats (reports, queries…)
• training materials or periodicals on the topic
• other written documentation
• again, search for nouns and verbs

What to do with the five boxes:


 Have clients describe what they need to know about each “box,” or
what they do, or what the problems are… Just keep listening for
and noting:
• nouns – possible entities
• verbs – possible relationships and processes
• rules – constraints
• issues (problems) and opportunities © 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 62 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Presentations – it’s a story, so storyboard it
Details
Business Operational to informational
issues have Distributed, component-based
changed
Cross-functional
Therefore
systems have
changed

So methods
for building
have changed

Making these
 Building the storyboard changes will
be difficult
1. Draw 5 "bubbles"
2. Fill in the last (your "closer" - the purpose) But it is vital
to our
3. Fill in the first (your "hook") survival
4. Fill in the middle ones (the "body") –
add or subtract bubbles as needed
5. Allocate details to bubbles
6. Iterate until it flows and builds properly
Only include detail that matters!
© 2010 Clariteq
contact asharp@clariteq.com

Used to evaluate merit and sequence

Presentation should flow like a story


• does it make sense?
• does it build to the conclusion?

Advanced Data Modeling extract 63 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
How not to present a data model

Our models should aid


understanding by:

 Using visual cues


consistently
 Having a
starting point
and direction
 Abstracting
 Masking
unnecessary detail
 Highlighting
what matters
“Let’s start here with
Special Tax Rate Variation Comment Type…”
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 64 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Presenting data models
Start simple, and add details in layers…
• begin with two or three fundamental things
• work “across” the model, not a “deep dive” in one area
• draw the model on a whiteboard as you speak to it
• save detail like optionality until later, and primary/foreign
keys until much later

Speak exclusively in the language of the business

• don’t use terms like “entity”, “optionality”, etc.


• point to the relevant entity while addressing a concept

Back it up with sample data, queries, and


scenarios
Identify specific business issues or opportunities,
and show how the data model helps

We’ll now walk through a successful data model presentation,


followed by discussion of key points
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 65 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Presenting – some specifics
 Draw it on a whiteboard while you present it,
even if you have a laptop presentation.
“If it’s too complicated to draw,
it’s too complicated to present.”
 Draw it top down, adding a few entities at a time.
 Constantly illustrate the model with sample instances,
definitions, schematics, etc.
 Regularly highlight features and constraints of the model,
in business terms.
E.g.,
Currently we can allocate a Product to one Product
Category, but this model enables us to allocate a Product to
multiple Product Categories at a time, and to record
changes in categorization over time.
 Encourage participation –
the more questions and comments, the better!
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 66 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract The five techniques that really matter

Technique Why? How?


• Otherwise, you're • "Here's the point I want to make."
Organize their just "noise" • "This is why you care, and how I know."
1 minds to receive
the presentation
• "Why is this person
telling me these
things?"


(even if it's obvious)
"These are the caveats and limitations."
"This is how I'll make my point.“ (storyboard!)
• Provides context and • Show contextual data model first,
perspective build up detailed models later
2 Big picture first • Makes subsequent
detail understandable


Process context first, process flow later
Describe 5 problem areas first,
specifics of each area later

• Focuses, demands • Use memory triggers, not a script


that they watch • Build up content progressively on white board,
3 Do it live •

Involves them / you
It means 'attending •
flip chart, or screen
Add brainstorming, discussion, or questions
has value‘ • Have them physically “do stuff”

Present • Adds interest • Supplement PowerPoint slides with flip charts,


Different forms have white boards, Post-Its, handouts, etc.
4

information in different strengths • Use props – the thing itself, not a description
• Use visual, auditory, and kinesthetic
various forms approaches
• Point is more • Scenario / example first, then concept /
meaningful if abstraction
5 Show, then tell •
experienced firsthand
Saves time, simplifies


Problem first, solution second
Thing first, description / discussion second

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 67 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Data Modeling in context with other BA techniques

Framework Layer What it covers… The Technique


The mission, strategies (customers / markets, products /
Goals
Business services, differentiators), goals, objectives, and measures Project
(e.g., Key Performance Indicators) for the organisation.
Objectives (MSGO – Mission, Strategies, Goals, Objectives) Charter

The activities the business carries out in order to meet its


Process

Business objectives. Includes the actors involved, the sequence of Workflow


steps they carry out (workflow), and the result(s) produced
Process Provides context - a framework for developing Use Cases modelling
and Service Specifications.

A mechanism through which an actor in a business process


Presentation interacts with a system. Usually a GUI (graphical user
Use Cases
Application

Services interface) and reports, but could involve scanners, IVR


(telephone) systems, etc.

A “service” offered by a system – a specific function.


Business Includes the business rules and data updates it is Service
Services responsible for. Requires Event Analysis, State Transition Specification
Analysis, etc.

Files and databases that provide a system’s record-keeping


Data functions. Determines the things a system “knows” about, Data
Data

Management and the data that is maintained about those things.


modelling
Services Provides a platform - language and structure for developing
Use Cases and Services.

© 2010 Clariteq
contact asharp@clariteq.com

THIS IS NOT A SEQUENCE!!! There should always be an initial emphasis on defining objectives (the
“top” layer) and also a “scope level” statement of the business processes, application functions, and data
topics / subject areas that are in scope. Also, we always do some “guerrilla” data modelling during
which we at least clarify the primary terms and definitions, and ideally develop at least an initial
conceptual model. After that, you could choose to go through the layers in whatever order makes sense
given the situation.

The benefits:
• Divide and conquer
• Everything in its place Business Services
• Cross-validation

Other terms:
• Presentation Services = User Interface
• Business Services = Application Logic or Business Logic
• Data Management Services = Persistence Services

Advanced Data Modeling extract 68 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Linkages – top-down and bottom-up
Goals

Business
Objectives

Registrar’s Print
Process

Attach Reg
Student
Office Summary
Form and

Business Report
forward

Process Department
Check Reg
Form for Enroll
Advisor data Student
changes

When advisor enters five


characters of Last Name

Presentation Then System lists matching Students Use Case


actor – verb – noun:
Application

When advisor selects list item


Services Then System displays expanded

When advisor etc.


Student view
Advisor Enrolls Student

Enroll Student

Business Input Message:


Verify Student Status
Output Message:
Service
Check Student pre-reqs
Services Student Number
Course ID
Check Section availability
Create Enrollment
Result Code
verb – noun:
Section ID
Enroll Student
Course
Department
Data Instructor
Entity
Data

Student Number
ID
enrolls in
Management Number
Name offers
teaches Name
Rating Code
noun:
GPA Section
Services Dates
Times Student
Locations

© 2010 Clariteq
contact asharp@clariteq.com

Each layer interacts with its neighbor.


Not all methodologies address each perspective equally well.
• Information Engineering was weak to non-existent in addressing the business process (workflow)
and presentation (use cases) layers
• Most O-O and RAD/JAD techniques don’t address business process well, if at all

Noun - A thing of interest


• “Customer”
Verb – Noun
• An activity that must be performed (process, sub-process, service, …)
• “Register Customer”
Actor – Verb – Noun
• A Use Case or a step within a workflow model
• The intersection!
• “Sales Rep Registers Customer”

Advanced Data Modeling extract 69 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Progressive detail for all analysis techniques
Clariteq business analysis framework
Project Charter:
Goals
Business Starts at “Scope” level, may evolve
Objectives
Scope Concept Detail
Overall Process Map As-is (and later, to-be) As-is Workflow Models
Process

showing target and Workflow Models for to the appropriate detail,


Business related processes. the process’ main and to the Service level Workflow
Process Process “framed,” and
initial assessment and
variations (cases) to
the Handoff level.
for to-be. Optionally,
document procedures
modelling
goals stated. for manual to-be steps.
List of the main Use Initial Use Case Use Case dialogues at
Cases in the form: description (goal, the “clause” (“when-
Presentation Actor + Service + stakeholder interests, then) level of detail
Use Cases
Application

(optionally) Technology and use case abstract) including alternate


Services / Platform for each Use Case. sequences. Optionally,
Use Case Scenarios.

List of main Events and Initial Service Each service fully


corresponding description - result, documented, including
Business Services. main actions, cross- input/output messages, Service
referenced to validation, business
Services Conceptual Data rules, and data updates Specification
Model to the attribute level.

Contextual Data Model Conceptual Data Model Fully normalised


Data (optional) and a showing main entities, Logical Data Model
Data

glossary defining the relationships, with all attributes fully Data


Management main entities and other attributes, and defined and modelling
Services important terms. constraints documented.

Plan Understand Specify


© 2010 Clariteq
contact asharp@clariteq.com

Three levels of detail for ALL modelling

The reason that the “concept” level is important, and that we don’t dive right into the “detail” level is
that…
the level of precision, rigor, and detail that you need in order to build something
is far greater and different in nature than that which is necessary for the business person to know if
they’re going to like what you build!

Advanced Data Modeling extract 70 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Different roles for different perspectives
Note – this is just one possibility for roles.
Project Charter:
Goals
Business Starts at “Scope” level, may evolve
Objectives
Scope Concept Detail
Overall Process Map As-is (and later, to-be) As-is Workflow Models
Process

showing target and Workflow Models for to the appropriate detail,


Business Workflow
Process
related processes.
Process “framed,” and
the process’ main
variations (cases) to Specialist
and to the Service level
for to-be. Optionally,
initial assessment and the Handoff level. document procedures
modelling
goals stated. for manual to-be steps.
List of the main Use Initial Use Case Use Case dialogues at
Planners,
Cases in the form: description (goal, the “clause” (“when-
Presentation Actor + Service + stakeholder interests, then) level of detail
Enterprise Specialist Use Cases
Application

(optionally) Technology and use case abstract) including alternate


Services / Platform for each Use Case. sequences. Optionally,
Architects, Business Use Case Scenarios.

and
List of main Events and Analysts
Initial Service Each service fully
corresponding description - result, documented, including
Business Business
Services. main actions, cross- input/output messages, Service
referenced to Specialist
validation, business
Services Analysts Conceptual Data rules, and data updates Specification
Model to the attribute level.

Contextual Data Model Conceptual Data Model Fully normalised


Data (optional) and a showing main entities, Logical Data Model
Data

glossary defining the relationships, Specialist


with all attributes fully Data
Management main entities and other attributes, and defined and modelling
Services important terms. constraints documented.

Plan Understand Specify


© 2010 Clariteq
contact asharp@clariteq.com

One a smaller project, the same person might work on all perspectives at all levels of detail; the larger
the project, the more likely it is that different, specialized roles will be involved.

Advanced Data Modeling extract 71 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Other perspectives improve data modeling
Mission, Strategy, Goals, Objectives
 reporting requirements
 EIS, BI, OLAP, etc. needs. Is the data there?
Business Process Workflow
 similar to use of events or services
 inspect each step in the workflow, discuss data needs
 is the necessary data in the data model?
Presentation Services
 develop use cases, describe reports & queries
 is the necessary data in the data model?
Business Services
 describe rules for an event (service)
 is the necessary data in the data model?
Data Management Services
 get some real data, conduct data profiling
 does the data have a home, did profiling uncover “hidden” needs?
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 72 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Techniques and methodologies
The same techniques are used in different sequences,
with different emphasis, in different methodologies

© 2010 Clariteq
contact asharp@clariteq.com

Always start with


• scope and objectives (Project Charter)
• agreement on a fundamental vocabulary (a little Data Modeling)
Small projects are often best handled “inside-out” and are more suitable for “Agile” techniques
• start by identifying the main objects the system will deal with (Data Modeling)
• then identify the events and services that act on the main objects (Events, Service Specifications,
State Transitions)
• then identify how these Services will be invoked (Use Cases, then overall Process Workflow)
Large projects are best handled “outside in” and aren’t suitable for all Agile techniques
• start with an understanding of the overall workflow and the jobs or departments that are involved
(Process Workflow)

Advanced Data Modeling extract 73 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract State diagrams

Depicts the allowable states for an entity, the transitions between


them, and the rules governing those transitions

The concept
 Events happen
 Whether or not that event is legitimate depends on the current
entity state
 If the event is legitimate, one or more entities will be updated and
their state may change - a state transition
© 2010 Clariteq
contact asharp@clariteq.com

No other style of diagram depicts so many important aspects of a system without getting unreadable.
A State Diagram encompasses:
• an entity
• events
• entity states
• allowable state transitions (business rules!)

Advanced Data Modeling extract 74 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract The basic pattern

Time to
Tim e to Student finalize rosters Time to
Section is
open enrollment enrolls end term
scheduled

Section Scheduled Available Filled Closed Com pleted

Section is
Student canceled
drops/transfers
Cancelled

Starts with an Eventually, states


entity occurrence States are entered and
are entered where
in the null state. left in response to
no further update is
Leaves when the events. All states
possible
occurrence is "m atter", and are
created m utually exclusive

Key Point
• The diagram is linear or circular

© 2010 Clariteq
contact asharp@clariteq.com

All entity state diagrams begin with the entity in the null state, and the first event is always something
that causes the creation of the entity occurrence.
An entity can be in one and only one state at a time - states are mutually exclusive. The most common
error when people are learning this technique is to come up with “overlapping” states.
It’s common to return to the null state if the entity occurrence is deleted, although this example doesn’t
show it (the Registrar saves everything!).
All states “matter” in the sense that the only reason for a state to exist is to enforce a business rule. For
instance, it appears that Students can’t drop or transfer once the Class is “Closed”, and the Class can’t
be cancelled. If these rules weren’t in place, we wouldn’t need the state “Closed”.

Note that this example is different from the one on the previous page, even though they’re for the same
entity – the reason: different business rules.

Advanced Data Modeling extract 75 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Why bother?

- Business Perspective - - Systems Perspective -


 Understandable –  Get up-front agreement on
participate in important the rules that must be
systems decisions enforced at UI (use cases)
 “See” and assess rules for and Business Services
the first time (service specifications)
 Identify inconsistent or  Integrates events, services,
undefined rules and data modeling

Key Point
• Clients get started with almost no
explanation

… this may seem like extra work, BUT…


“pay me now, or pay me later”
© 2010 Clariteq
contact asharp@clariteq.com

The state diagramming technique, in practice, is quite intuitive for clients to pick up. We’ve been at
many sessions where the facilitator drew a simple state diagram on the whiteboard and clients
immediately started discussing and correcting it with no explanation whatsoever of the technique.
It never fails to amaze (and amuse) us how many different versions of “the rules” there are in the
average organization. Naturally, everyone thinks their set of rules is correct, and they are usually
surprised at the alternatives.

Advanced Data Modeling extract 76 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Four basic structures

1. null state
Probation term is extended
Employee is
hired
Employment Probation
Term Employee is put on Probation
Employee passes
probation
2. state
Employment Active
Term is
Purged Employee returns
Inactive from disability
Employee goes
on disability
Employment is
terminated Disability

3. state transition

4. event

© 2010 Clariteq
contact asharp@clariteq.com

This example is circular, which is less common now – it gets quite awkward.

Can you spot the error?

Advanced Data Modeling extract 77 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Components: 1 - The Null State

The entity in a state of non-  An occurrence which


existence (hasn’t been created yet) hasn’t been created

Indicates which entity’s life cycle is  For a single instance


depicted of the entity

entity in Create
null state (birth)
The simplest life cycle
Update
(pay taxes)
entity
Delete exists
(death)

© 2010 Clariteq
contact asharp@clariteq.com

In the UML, the state diagram begins with a solid (filled in) circle.

Advanced Data Modeling extract 78 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Components: 2 - States
A distinct stage in the life of an entity
 A status or condition  The only reason a state is created is to
 Events are only valid against enforce a business rule
particular states  States are mutually exclusive

Order taken shipped

 An order can’t be cancelled once it has been shipped, so we only need the
states “Taken” and “Shipped”

Order taken picked loaded shipped

 An order can be cancelled without penalty if picked, with penalty if loaded, and
not cancelable if shipped

State
 May be determined by  Usually summarized in a “Status” or
inspecting relationships or “State” attribute
attribute values
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 79 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Components: 3 - State Transitions

A change of an entity instance from one state to another

“from” event

Key Point
• Visual business rules “to”

Depicts dependencies of entity states

 Shows pre-conditions -  Shows post-conditions -


which state(s) an event which state(s) result from an
is valid against event

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 80 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
State transitions - special cases
“simple” “recursive”
Schedule Class
Class
Enroll Student
Enrolling
Cancel
Purge Class Class bifurcation
Cancelled

Filled
conjunction
Complete
Class
Completed

- Bifurcation - - Conjunction -
 From a given state an event can have  An event may be valid from multiple
different outcomes states with the same resultant state
B A

A C

C B

© 2010 Clariteq
contact asharp@clariteq.com

Bifurcation often occurs at “boundary condition” of repetitive operation.


e.g., Enrollment is completed until class is full.

Advanced Data Modeling extract 81 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Components: 4 - events / services

Events or services can be shown as


the cause of the state transition

Class is scheduled Schedule Class


Class Class

Enrolling Enrollment is Enrolling


Complete
Class is completed Cancel Enrollment
canceled Class
Cancelled Cancelled

Filled Filled
Class is Complete
Completed Class
Completed Completed

Key Point
• You can show events or services or
both

State analysis is an ideal “bottom-up” means of


discovering additional services
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 82 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Guidelines – a diagram for each entity?

Perform state analysis for Subtypes may all be covered by the


all Kernel and major Associative entities Supertype’s life cycle

Subtypes may each have


their own unique life cycle Policy Policy
Client Home Type

Group Auto

Individual Marine

Prior Claim
Address

Type and minor Characteristic entities


with a simple “Create-Update-Delete” life cycle
may not warrant a diagram

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 83 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Guidelines – an event can affect multiple entities
Class Student
Enrollment is completed
is constrained by the
state of a parent entity…
Enrollment

Schedule Class
Class Enrollment
Complete Enrollment

Enrolling Complete Enrollment


… and also causes
a state change in its Active

Filled parent’s life cycle

Key Point
• Start ST analysis at the “bottom” – with entities
that have no dependents

 An event affecting a characteristic or associative


entity is often constrained by a parent’s state
(and vice versa, less often)
 A event changing the state of an entity may also
cause a state change in parent or child entity © 2010 Clariteq
contact asharp@clariteq.com

Class and Enrollment each have their own life cycles, but they are related

Advanced Data Modeling extract 84 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Building a state diagram

- First Cut - Key Point


1  Get event list for entity • Mainstream first, exceptions later
 Brainstorm for valid states • “Bottom up” - dependents first, parents
later
 Select “mainstream” states.
 Start at null state, then select initial state from list
 Ask “What typically happens next?”, and select next state
 Continue until initial State Diagram is done

- Refine - Key Point


2  Ensure that states are mutually exclusive • Extremely iterative within and between
 Identify the event for each state transition state diagrams

 Ask “Can it cause transitions to or from other states?”


(e.g., conjunction or bifurcation)
 Check each event see if it is constrained by
or affects the state of parent or child entities
- Complete - 3
 Add remaining “non-mainstream”
 If sub-types are involved, check whether states or events
the state diagram works for all sub-types
 Check each event against each
state
Key Point
 Eliminate unused stated & events
• Lots of detailed cross checking as appropriate
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 85 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract A checklist for state analysis

1 Every state must matter


 Recognizable to business people
 Restricts operations in some unique way

2 All states must be mutually exclusive

3 Each event is “essential”


 e.g., “Enrollment is completed” (what)
not “Student enrolls via web (who and how)
Start with the “most dependent” entity (bottom
4 of the data model) to guard against
“overloading” life cycles

All states (including parent and child


5 entities) checked against each event

6 Mainstream first…. exceptions later!

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 86 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract Update service specs

1 Create new services for any newly-discovered events.

For each service, build a “state table” summarizing “from”


2 and “to” states for each entity impacted by the event.

Refine validation, calculations, and updates in


3 service documentation. Optionally, describe logic
with a UML Activity Diagram or other format.

Entity State State


Before After
Student Registered Registered
Enrollment (“from”) Active Ended
Class (“from”) Filled, Available Available
Enrollment (“to”) Null Active
Class (“to”) Available Available, Filled

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 87 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Exercise: Handling state transitions

1) Design a generalized data model to record valid


state transitions. If a particular response is required
(such as an error message) when an invalid event
arrives, be sure to handle that as well.

2) (Optional) It can provide useful analytic information


to maintain a history of state changes for the
instances of important entities. For example, in the
actual project that the stock exchange exercise
earlier in the course was based on, it was useful to
have a history of state changes for the “Listing” and
“Trade Order” entities. Develop a data model to
record a history of state changes for an entity like
“Listing”
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 88 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Solution - Valid state transitions

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 89 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
A few additional slides

I’ve added a few slides from our introductory Data


Modeling workshop covering:
- Attribute naming with classwords
- Some conventions for assigning meaningless
(surrogate) primary keys
- Checking for transitivity
These are some of the topics that often require
clarification during the Advanced Data Modeling
workshop.

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 90 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Apply attribute naming conventions
Naming format: entity name (implied) + optional qualifiers + classword
Class Word Abbrev. Description
Amount AMT Dollars and cents, or other currency (e.g., Penalty Assessed Amount)
Code CDE Decodes into a name and/or description via lookup
(e.g., Vehicle Type Code)
Constant CNS A fixed value, usually numeric (e.g., Pi Constant – 3.1415…)
Count CNT Like Quantity, but specifically for a quantity of items
(e.g., Requested Count or On Hand Count)
Description DSC Multi-line descriptive text (e.g., Incident Description)
Date DTE YYYY/MM/DD (e.g., Incident Date)
Identifier ID or IDN Attribute that uniquely identifies an entity occurrence, usually system-generated
(e.g., Customer ID)
Indicator or Flag IND or FLG Yes/No (True/False) attribute (e.g., Time Period Available Flag)
Name NME Single line of name text (e.g., First Name or Last Name)
Number NMB A unique identifier assigned by an organization
(e.g., Driver License Number)
Secondary ID SID Forms a unique identifier when combined with identifiers inherited from the
parent (e.g., Dependent SID)
Percent PCT Integer or number percentage (e.g., Penalty Percent)
Quantity QTY A count of anything – either items (like Count) or of a unit of measure like
gallons or feet. (e.g., Maximum Width Feet Quantity)
Variations are Volume (VOL), Length (LNG), or Area (ARE)
Rate RTE A ratio using defined numerator and denominator
(Percent is a Rate attribute with a numerator of 100) (e.g., ???)
Text TXT Multi-line alphanumeric data other than Name or Description
(e.g., Standard Disclaimer Text)
Time TME HHMMSSNN… to the needed fraction of a second (e.g., Incident Time)
Timestamp TMS Date and time in a single attribute (e.g., Record Creation Timestamp)
(e.g., Record Creation Timestamp)
© 2010 Clariteq
contact asharp@clariteq.com

There are a variety of naming formats in general use - mixed case with words separated by blanks (e.g.
“Effective Date.”) is the most readable
There are certain date-related attributes that will occur many times in all models, such as “Effective
Date”, “End Date”, “Create Date”, “Superseded Date”. Agree on standard names (e.g., choose
“Effective Date”, “Start Date”, or “Begin Date”) and then use them consistently.
Attribute definition should explain the meaning and purpose of the attribute - in other words, how to
interpret attribute values. Not:
• … a restatement of the attribute name. For instance, for “Person Social Security Number”, the
definition “The Social Security Number of a Person” tells us nothing new. A better definition
would be “ A number issued to wage earners by the Social Security Administration for the purpose
of crediting employees with contributions to future retirement pay as stipulated in the Federal
Insurance Contributions Act.”
• … a description of how the attribute is handled by current systems. For instance, “Budget Center
Code is an 11 character code captured in the GL system and assigned to a Department.”

Advanced Data Modeling extract 91 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Primary keys – essential concepts

What they are… What they’re not…

 One or more attributes with a unique  The only access or search path
value for each instance of an entity
 There might be many identifiers - one  The fundamental way the business
is chosen as the primary identifier, the distinguishes:
rest are alternate • one instance from an other
 A way to reference an instance of an • a new instance from existing
entity (e.g., Customer applying for credit)
(e.g., a row of a table)
 Used to establish relationships
between entities (or tables)

In short, how we relate entities is not necessarily how the client distinguishes
or accesses them
Customer: Part: Employee: Reservation:
Possible keys: Possible keys: Possible keys: Possible key:
• Customer Name + • Part Category + • SIN or SSN • Room Number +
Postal Code Manufacturer Prod # • Name + Address Start Date
• Sales Region + • Name + Birthdate
Customer Number • Portrait + Voice
• Account Number

© 2010 Clariteq
contact asharp@clariteq.com

Assigning primary and foreign keys is really part of physical database design, but the concepts are
important so we’ll cover them here.
As modelers, we should focus initially on determining how the client determines the uniqueness of
entities, and how they search for particular instances.
What’s wrong with the possible primary keys shown above?

Advanced Data Modeling extract 92 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Meaningless primary keys

Essential characteristics

stable (unchanging) available


 under your control  known, or can be
assigned, at instance
 contains no meaningful data, creation
because it will eventually change
(and no “special values” like
Customer Number 9999999)
 'key hierarchy' is unchanging
when an inherited key is used as
part of identifier
Almost invariably eliminates any choice except keys made up from
meaningless, system-generated ID or Secondary ID (SID) components

Customer: Part: Employee: Reservation:


• Customer ID • Part ID • Employee ID • Reservation ID
…is better than… …is better than… …is better than… …is better than…
• Customer Name + • Part Category + • SIN • Room Number +
Postal Code Manufacturer Prod # • Name + Address Start Date
• Sales Region + • Name + Birthdate
Customer Number • Portrait + Voice
• Account Number
© 2010 Clariteq
contact asharp@clariteq.com

Key problems:
• embedded meaning
– Customer 99999
– Customer ID with Head Office Region Code built in
• insufficient for expansion
– 1 digit code field

Advanced Data Modeling extract 93 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Keys - summary
The Primary Key Organization
Unit Job
is shown above classifies Job Code (PK) An alternate
the dashed line Org. Unit ID
Title method of
Description showing that the
contains
identifier of Job
is Job Code
is contained in

Building is the location of Position is classified by


Org. Unit ID
Building ID
is located at Position SID Employee
Name
Building ID (FK) is filled by
Address Employee ID
Job Code (FK)
is assigned to Name
Building ID is a foreign key Address
Alternate Key Birth Date
that implements the
Gov’t ID Number
relationship to Building

• A means of specifying a Employee ID is an inherited key


particular instance of an entity that forms part of the primary
key of Employee Dependent in Employee
combination with the SID Dependent
• Typically (Secondary ID). It also acts as a Employee ID
 Kernel - a system assigned ID foreign key.) Emp. Dep. SID
Name
 Characteristic - the key of the parent plus an SID Relationship Code
Birth Date
 Associative - the key of all parents, plus an SID if necessary
(if the same parent instances can be associated multiple times)
Important associatives are often given their own ID (e.g., Order ID)
 Reference or Type – a recognizable Code or a meaningless ID © 2010 Clariteq
contact asharp@clariteq.com

There can be many “candidate” or “alternate” keys, also referred to as “business identifiers” or “natural
keys”
• for instance, Employee may have a unique Government ID Number, Employee Number, and
System Logon ID
• one of these could be chosen as the Primary Key, if they meet the criteria; otherwise (normally)
assign a system-generated identifier
• the rest are called Alternate Keys or something similar, and must also be unique (put a unique index
on them)
Some methods use a “shorthand” technique for showing inherited keys in associative or characteristic
entities - the relationship via which parent keys are inherited is marked as an “identifying” relationship.
In one technique, an “I” is put across the relationship line, and in another, identifying relationships are
drawn with a solid line, while others (non-identifying”) are drawn with a dashed line. Normally, we
show the complete, inherited primary key.

Advanced Data Modeling extract 94 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Key propagation rules

© 2010 Clariteq
contact asharp@clariteq.com

An exception - dependent entities (associative or characteristic) are assigned a meaningless ID if they


can be “transferred” to another parent, or if they are very deep in the hierarchy.
Also, if an associative entity only has one parent (e.g., “Order”, where the connection to the other
parent is via another dependent associative entity) it may get its own meaningless ID. This is often true
of associatives that represent an important transaction and are therefore almost like Kernels, e.g. Order,
Sale, Contract, Shipment, etc.
Note - keys always propagate to the “many” end of the relationship. How would you decide where to
place the foreign key in a fully optional 1:1 relationship?

Whether you show the propagated foreign keys on your diagram, or instead flag relationships as
“identifying” is a matter of personal preference or organizational standards. In this workshop, we’ll
always show the propagated foreign keys.

Advanced Data Modeling extract 95 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
How far to go?

© 2010 Clariteq
contact asharp@clariteq.com

Each of the above alternatives employs the concept of “meaningless identifiers”, but differently
• the one on the left assigns an ID to kernel entities, while associative and characteristic entities
inherit the ID of their parent(s)
• the one on the right assigns all entities a unique ID

In teams, discuss the relative strengths and weaknesses of the two approaches. Which would you
choose?

Advanced Data Modeling extract 96 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Transitivity

 A “loop” (two or more paths between a pair of entities) might


indicate a problem -
• if the two paths record the same information,
one of the relationships is redundant
a.k.a. “transitivity” or “a transitive relationship”
• like redundant attributes,
redundant relationships introduce data integrity problems
 Are the two paths between
“Order” and “Customer” transitive?
We can’t tell just by looking…
 The presence of a “loop”
(a “cyclic relationship”) is only a
clue that there is a problem –
for proof, we must perform
Information Loss Analysis
(fancy name, simple method)
© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 97 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Checking for transitivity

 Check for transitivity using


Information Loss Analysis -
• one at a time, check each relationship in the loop
• Ask –
“Could this relationship be eliminated without losing necessary information?”
• If “Yes” –
The relationship is redundant, and can be removed from the data model
• If “No” –
The relationship is necessary, and remains in the data model

 If the two paths have clearly different meanings,


there is probably no redundancy,
and therefore no need to apply Information Loss Analysis

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 98 © Clariteq Systems Consulting Ltd.


Clariteq
ADM
extract
Transitivity - examples

© 2010 Clariteq
contact asharp@clariteq.com

Advanced Data Modeling extract 99 © Clariteq Systems Consulting Ltd.

Anda mungkin juga menyukai