ADM
extract
Alec Sharp
Senior Consultant
bute!
i
Clariteq Systems Consulting Ltd.
t distr
West Vancouver, BC, Canada o
don
Mobile – 604 418-3352 s e
asharp@clariteq.com plea
–
www.clariteq.com terial
r y ma
pr ieta Alec
Pro nks,
a
Th
© 2010 Clariteq
contact asharp@clariteq.com
Microblog: www.twitter.com/alecsharp
Data Modeling blog: © 2010 Clariteq
contact asharp@clariteq.com
Alec’swww.erwin.com/expert_blogs/authors/22/
bio:
Alec Sharp, a senior consultant with Clariteq Systems Consulting, has deep expertise in a
rare combination of fields – business process analysis and redesign, application requirements
specification, and data modeling. With almost 30 years of hands-on consulting experience, his
practical approaches and global reputation in model-driven methods have made him a sought-
after resource in locations as diverse as Ireland, Illinois, and India.
He is also a popular conference speaker, mixing content and insight with irreverence and
humour. Among his many top-rated presentations are “The Lost Art of Conceptual Modeling,”
“The Human Side of Data Modeling,” “Crossing the Chasm - From Process Model to IT
Requirements,” and “Getting Traction for Process – What the Experts Forget.”
Alec literally wrote the book on business process modeling – he is the principal author of
“Workflow Modeling: Tools for Process Improvement and Application Development, Second
Edition” The first edition was published in 2001, and the second edition was published in 2009.
It has consistently been the top-selling title on business process modeling, and is widely used
as a consulting guide and as an MBA textbook.
Alec’s popular workshops on Workflow Process Modeling, Data Modeling (introductory and
advanced,) and Requirements Modeling (with Use Cases and Services) are conducted at
many of the world’s best-known organizations. His classes are practical, energetic, and fun,
with the most common participant comments being “best course (or best instructor) I’ve ever
had.”
Requirements Modeling – Proven Techniques for Use Cases and Service Specifications 2 days
Use cases have offered great promise as a requirements definition technique, but many analysts get disappointing results. That’s because published
methods are often inconsistent, complex, or focused on internal design. This unique workshop clears up the confusion. It shows how to employ use
cases to discover external requirements – how users wish to interact with an application – and how to use service specifications to define internal
requirements – the validation, rules, and data manipulation performed behind the scenes. Better yet, it shows in concrete terms how the two
perspectives interact, and demonstrates synergies with data modeling and business process workflow modeling.
Now available! Business Analysis Overview – Model-Driven Techniques for Processes, Applications, and Data 2 days
Essential content from Clariteq’s Process, Requirements, and Data Modeling workshops.
© 2010 Clariteq
contact asharp@clariteq.com
Clariteq
ADM
extract Seven typical problems
© 2010 Clariteq
contact asharp@clariteq.com
1 - think about it – do architects bring hammers and saws to their first meeting with a client
2 - by putting them to sleep
3 - and know what has to be left in place, and what has to be converted or integrated
4 - and besides, you never really know the business as well as you think you do
5 - maybe you could just give them the DDL
6 - because for some of the folks, you aren’t using the right language
7 - besides, your “elegant” model is probably wrong if no one can validate it
And maybe…
8. Patience (is a virtue)
9. Humility (Don’t be afraid to ask! Spend more time saying “tell me more.”)
10. Empathy (Feel their pain! Put yourself in their shoes!)
Data modeling symbols will vary slightly among the different “dialects”, but the meaning is constant.
The symbols are much more standardized than they used to be.
Reference or Type
Independent
Classifies or categorizes other
entities and/or allows the
recording of allowable values
for a descriptive attribute
Subtype Drawn diagonally out from or
Contains facts that are beside the classified entity
specific to a particular
subset of instances of
Characteristic
the entity. Dependent on one parent
Records multi-valued facts
about a parent entity that
Associative have been “cast out” from
Dependent on two or more parents that entity
Records facts about a relationship Drawn below parent
(association) between two or more
parent entities – is often the Recursive relationship
resolution of a M:M relationship A relationship between
between the parents instances of the same entity.
Drawn between and below parents Can be 1:1, 1:M, or M:M
© 2010 Clariteq
contact asharp@clariteq.com
Note that across the industry, there is a lack of consistency in defining these types of models. In the
“Zachman Framework” these would be the planner’s, owner’s, and designer’s views.
Analogies:
- The contextual model is like the site plan with a definition of what will be built. The focus is scope or
“footprint.”
- The conceptual model is like a floor plan and sketches for a building. The focus is the essential terms,
definitions, and facts / rules.
- The logical data model is like the detailed blueprints for a building. The focus is on the individual
data items the enterprise needs, and the rules that govern them.
A basic message for conceptual modeling – “Resist the urge to normalize or generalize until it
matters!!!
The logical model is not necessarily the “as built” model – the physical database design. The database
designer or DBA will make changes in the interest of performance, recoverability, distribution, etc.
Everyone who currently supports an application should:
- draw the application’s logical data model following strict top-down drawing conventions.
- abstract the model “up” to a conceptual data model
- at least consider reviewing the conceptual model with analysts, developers, and subject matter experts
to ensure that it reflects the intentions of the business.
© 2010 Clariteq
contact asharp@clariteq.com
Main use: “Do we understand the scope and the main terms?”
© 2010 Clariteq
contact asharp@clariteq.com
The conceptual model is the “crossroads” at which both business and IT can communicate – both
parties have “shared accountability” to ensure that there is a common understanding of the basics.
As you add detail, your conceptual model will evolve into a logical data model, but don’t lose the
conceptual view!!! It is an absolutely vital tool for presentations, training, and so on.
After Logical Data Modeling, the next stage in the progression would be to turn your logical data
model into a Physical Database Design for your particular implementation environment (MS Access,
SQL Server, Oracle, DB2, etc.)
© 2010 Clariteq
contact asharp@clariteq.com
For multi-valued attributes, ask “On what basis does the attribute repeat?” The answer should be in the
form “It occurs once per …” This will provide a clue as to what entity the multi-valued attribute should
be moved to.
Two variations of the same example:
- If a Resource has multiple Chargeout Rates over time, then the Chargeout Rate doesn’t vary in
relation to some other entity. We could say that the Chargeout Rate attribute repeats “within” the
Resource entity, so we’ll simply move it down (“cast it out”) into a characteristic entity called
Resource Chargeout Rate. It will need the attributes Effective Date and End Date in addition to
Amount.
- If a Resource has multiple Chargeout Rates, one per Project that the Resource is contracted to, then
we could say that the Chargeout Rate attribute repeats “in relation to” the Project entity. In other
words, we know that Chargeout Rate is a fact about the relationship between Resource and Project, and
belongs in an associative between them. That associative may depict a contract or agreement, and
might have the word “Contract” in its name.
Another example:
- If the attribute Expected Duration is in the Project entity, and it is multi-valued, with one value per
project phase, then Expected Duration should be moved down into a Characteristic (of Project) entity
called Project Phase. The Task entity would likely be a characteristic of Project Phase
© 2010 Clariteq
contact asharp@clariteq.com
This is one of the rules for normalization - entities are in First Normal Form once all the repeating
attributes or groups of attributes have been sent (“cast out”) to their own entities.
© 2010 Clariteq
contact asharp@clariteq.com
“Many to many” relationships will almost always get a “promotion” to an entity, as in the example
above, because there are usually attributes about the relationship that must be recorded.
• Before migration, attribute values about a Department would be recorded redundantly with
every Course offered by that Department, so it is moved up to a parent entity.
• Before migration, values of the Delivery Method Description attribute would be carried
redundantly in many instances of Course, so it is moved out to a “type” (or “reference” or
“lookup” or “classification”) entity.
© 2010 Clariteq
contact asharp@clariteq.com
Eliminating redundancy puts entities into Second Normal Form if the redundant attributes move “up”
the parentage hierarchy, and into Third Normal Form if the attributes move “out” to a related entity
(often a “type” entity.)
The reason we’re covering this? You have to be able to make it simpler for the data “layperson”
An “orderly script” –
adding a new characteristic or associative entity to a logical model
1. Place the entity (and relationships) on the diagram
according to dependency
2. Ask “What is one of these things?” then
name and define the entity accordingly
3. Add relationship names, and add multiplicity
(or confirm, if it was already specified)
4. Add attributes
5. Perform further attribute migration, dealing with
multi-valued attributes first, and reference data last
(1NF, 2NF, 3NF in sequence)
… and only then worry about…
6. Relationship optionality
7. Primary keys or uniqueness constraints
8. Additional constraints (e.g., rules on date ranges)
Whenever you add a new entity
• check to see if attributes or relationships from nearby entities
should be moved to the new entity
• check that you haven’t introduced transitivity (clue: “loops”) © 2010 Clariteq
contact asharp@clariteq.com
Consistency is very important to engaging your clients in the data modeling process. Have a method,
or have scripts – do the same things the same way, and draw the same things the same way. If you do
this, participants will learn modeling “by osmosis” and will learn what to expect. (E.g., that a M:M
relationship will eventually get resolved.)
Note that in this example, we could ask the questions for both date ranges:
- Effective / End Date
- Recorded / Corrected Date
To clear up confusion around question 5, some organizations have standardized on “Last Valid Date”
instead of “End Date.”
Note – we don’t typically do this until after we’ve searched for, discovered, and satisfied outstanding
requirements using the techniques that we’ll look at shortly.
We covered all of the previous stuff so you’ll be able to simplify some of the techniques for others.
Definition Dependency
“What is one of these things?” “What type of entity is this?”
List common and unusual “What other entity does it
instances depend on?”
“Are there any known
anomalies?” Essentially, is it a free-standing
thing, a type of things, or
“What are the potential
repeating detail about some
differences of opinion?”
other thing?
Detail Demonstration
Keep it in its place! Sample instances
GEFN! HPDL! Schematics
Props
© 2010 Clariteq
contact asharp@clariteq.com
The entity definition tells which things in the real world are included within our understanding of that
entity. For instance:
• The world has hundreds of millions of people who are “students”
• Which ones would we expect to find in a specific university’s Student database?
• Which ones would be excluded?
Two other useful questions:
• Are there life cycle issues to consider? For instance, Applicant to Candidate to Employee to Retiree
– does “Employee” include “Applicant” and “Retiree?”
• Does the same real-world thing appear as multiple entities? E.g., one person could be both a
“Driver,” a “Registered Vehicle Owner,” and a “Legal Vehicle Owner.” If this is of interest, you
might need to “generalize by” creating a “Person” entity.
A common error in entity definition - describing the current implementation instead of the “essence” of
what the entity is. E.g., “This entity is the ASF-72 created by Emily down in Personnel.
Another common error - using the entity name to define itself. E.g., "A Contract is a contract between
the corporation and …"
Finally, note that the last example on the slide indicates two separate “type” classifications –
Customer Legal Entity Type and Customer Status Type
Remember Remember
• Can be updated • Cannot be updated
A third type of time is “User Time” - any other date/time of interest to the business
(e.g., Reservation Arrival Date)
Plus –
• don’t change stored values, add new records
• check for “one at a time, many over time” vs. “many at a time, many over time”
2. You must draw the model in a top-down fashion (or other systematic
approach) so you can actually see dependencies
3. You must state your assumptions or understanding in narrative form
as assertions, using terms (entity names, relationship names, and
attribute names) from the data model
4. You must illuminate the data model by using sample data, schematic
diagrams, scenarios, or some other understandable form
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
A 5NF violation occurs if independent relationships between pairs of entities have been lumped
together with other independent relationships.
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
Fifth Normal Form deals with associations between three (or more) entities when there are independent
relationships between two (or more) of those entities.
“Cyclic dependency”:
Agents are related to Manufacturers,
Manufacturers are related to Regions,
and Regions are related to Agents
© 2010 Clariteq
contact asharp@clariteq.com
“Independent multi-valued relationships” and “cyclic dependency” are the usual normalization
bafflegab that hides the real issue – a 5NF violation occurs if independent relationships between pairs
of entities have been lumped together with other independent relationships.
support support
DSS, EIS, BI,
Operational
reporting, etc.
Applications
facilities
supports supports
© 2010 Clariteq
contact asharp@clariteq.com
“Facts” “Dimensions”
© 2010 Clariteq
contact asharp@clariteq.com
Step Notes
What sorts of relationships among the data are of
Identify questions
1 interest? E.g., want to study sales by product
color and customer, or by region and employee
seniority.
© 2010 Clariteq
contact asharp@clariteq.com
You may end up producing more than one star schema. Each will get collapsed into a single table
(named for the “fact”). Tables will then have to be joined (but these will be far simpler than what
would otherwise be necessary)
A few guidelines:
• Don’t try to get all your operational data perfect first, or you’ll never get anywhere
• Accept that after the data structure is in use, the questions will change. Embrace iteration.
• Manage the volume. Combining two “facts” (star schemas) into one table may cause exponential
volume increase. Focus initially on the critical measures and attributes.
• Start with a good, normalized data model that clearly shows dependency, as we’ll demonstrate in a
minute…
Not a dimension
Publisher
Publisher ID Dimension
Name
Title Cardholder
Title ID Cardholder ID Dimension
Name Name
Author Number
available from
Member Since Date
Dimension is an instance of
Format Type Copy is taken by
Format Type Code
Name Title ID Loan
Copy SID
Loan ID
is classified by
Purchase Price Amt
Acquisition Date
Date Dimension
Cardholder ID (fk,nn)
Status Code
Dimension Format Type Code (fk)
takes is part of
Loan Item
Fact Loan ID
Title ID
Copy SID
Due Date
Return Date
Status Code
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
Jim’s sister-in-law June has just returned - The layout of stores (Sections, Aisles,
from a BI conference, and she has Jim Store Categories, etc.) varies widely
all wound up about building a query across the stores.
database so he can analyze sales
- The “Store Category” indicates if the
(purchases by customers.)
store is a mall location, streetfront,
Construct a dimensional model for Jim, “captive” (contained within another retail
using the following E-R model as a outlet,) etc. Web sales are not a factor.
starting point. At this point, don’t worry
Jim is especially interested in how the
about individual attributes – just which
same Title sells depending on where in
entities would collapse into which fact or
the Store it is displayed, because the
dimension. A few notes:
same Title might end up in different
- Jim’s has grown to a nationwide chain, Sections. He also wants to look at Sales
with stores in many regions. Most by Store, Region, Artist, Publisher,
regions cover one or more states, Supplier, Category, … well, just about
although some regions only cover part of everything! You’ll have to decide what’s
a state (e.g., Northern California and possible, and then be prepared to explain
Southern California). Each store is in a it to Jim!
single city, though, and each city is in
only one region.
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
As it turns out, having an E-R model is invaluable in producing a valid star schema, although many
data warehouse experts will argue the point…
fixed number of
repeating attributes
may be an “array”
e.g., for each Quarter, also
record:
• Target Sales Amount
Divisional Sales • Sales Per Employee Amount
(in 1,000,000s) •…?
Year Q1 Q2 Q3 Q4
© 2010 Clariteq
contact asharp@clariteq.com
Advantages Advantages
• familiar layout • same handling as for other
multi-valued attributes
• from “row to screen”
is easier • easier SQL queries
(e.g., average sales)
• fewer tables and joins
• More efficient for sparse data
• more suitable in DW/DSS
environment • flexible:
– change vector length
– add additional attributes
(like Top Sales Rep for each Quarter)
© 2010 Clariteq
contact asharp@clariteq.com
The point – don’t be too quick to translate reporting layouts into operational data structures
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
Drawing out examples (the fourth “D” in data modeling) will always help
Certification
Bargaining requires
Unit Job required
Hourly Wage Amt for
Confidential Flag
only B.U. jobs
Disadvantages:
• longer elapsed time
• incompleteness
• encourages parochialism
• no real communication or
1 - The plan: consensus
orderly one-on-one interviews
2 - The reality:
"the analyst as messenger"
Advantages:
• speed and quality
• commitment
• communication, team building
• business understanding 3 - The response:
facilitated sessions
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
Conceptual model to support “Fill Order” process will involve cross-functional reps
Don’t forget
flipchart pens,
whiteboard pens,
facilitator’s “wall safe” masking tape,
supplies flipchart stands & paper,
rolls of plotter paper or
butcher paper, Post-its,
participant seating rubber bands,
note paper, …
refreshments, etc.
No empty seats – “energy holes”
© 2010 Clariteq
contact asharp@clariteq.com
Sponsor
Facilitator Participant
DO - • Participate!
• Help develop objectives and plan • Provide information
• Enforce rules & plan • Suggest ideas
• Maintain focus on topic • Make decisions
• Press for completion and quality
• Help everyone participate
• Ensure recording
DON’T -
• Develop content
• Push a point of view
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
Why not?
• “Purple monkey water wrench” – a phrase I saw in an article making the point that our IT terms
(foreign key, referential integrity, cardinality, …) aren’t any clearer to the client
• May lead to boredom and mental shutdown
• May lead to resentment and non-participation
• It’s unnecessary! Some things are easier to just do. Coaching basketball - initially, by example.
Non-typical situations
• Goal Setting and Planning
• BPx
• Package Evaluation and Selection
Collect
(Brainstorm) doubt -
n in
Lots of Whe list!
ea
suggestions
m ak
Reduce
(eliminate,
Problem cluster…)
Selected set of
or question answers or points
Useful
Sequence result
(dependency,
CoRSE: priority, …) Organized set of
The Facilitator’s Friend points or topics
Expand
© 2010 Clariteq
contact asharp@clariteq.com
Brainstorm… Collect
“Fact about a thing” – attributes or relationships. Don’t worry about keys!!! (or normalization or
atomic attributes or generalization or ANY of that stuff)
Inventory Availability
&
Agreements
© 2010 Clariteq
contact asharp@clariteq.com
At this point, these could be subject areas, activities, states, … - it doesn’t matter!
So methods
for building
have changed
Making these
Building the storyboard changes will
be difficult
1. Draw 5 "bubbles"
2. Fill in the last (your "closer" - the purpose) But it is vital
to our
3. Fill in the first (your "hook") survival
4. Fill in the middle ones (the "body") –
add or subtract bubbles as needed
5. Allocate details to bubbles
6. Iterate until it flows and builds properly
Only include detail that matters!
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
THIS IS NOT A SEQUENCE!!! There should always be an initial emphasis on defining objectives (the
“top” layer) and also a “scope level” statement of the business processes, application functions, and data
topics / subject areas that are in scope. Also, we always do some “guerrilla” data modelling during
which we at least clarify the primary terms and definitions, and ideally develop at least an initial
conceptual model. After that, you could choose to go through the layers in whatever order makes sense
given the situation.
The benefits:
• Divide and conquer
• Everything in its place Business Services
• Cross-validation
Other terms:
• Presentation Services = User Interface
• Business Services = Application Logic or Business Logic
• Data Management Services = Persistence Services
Business
Objectives
Registrar’s Print
Process
Attach Reg
Student
Office Summary
Form and
Business Report
forward
Process Department
Check Reg
Form for Enroll
Advisor data Student
changes
Enroll Student
Student Number
ID
enrolls in
Management Number
Name offers
teaches Name
Rating Code
noun:
GPA Section
Services Dates
Times Student
Locations
© 2010 Clariteq
contact asharp@clariteq.com
The reason that the “concept” level is important, and that we don’t dive right into the “detail” level is
that…
the level of precision, rigor, and detail that you need in order to build something
is far greater and different in nature than that which is necessary for the business person to know if
they’re going to like what you build!
and
List of main Events and Analysts
Initial Service Each service fully
corresponding description - result, documented, including
Business Business
Services. main actions, cross- input/output messages, Service
referenced to Specialist
validation, business
Services Analysts Conceptual Data rules, and data updates Specification
Model to the attribute level.
One a smaller project, the same person might work on all perspectives at all levels of detail; the larger
the project, the more likely it is that different, specialized roles will be involved.
© 2010 Clariteq
contact asharp@clariteq.com
The concept
Events happen
Whether or not that event is legitimate depends on the current
entity state
If the event is legitimate, one or more entities will be updated and
their state may change - a state transition
© 2010 Clariteq
contact asharp@clariteq.com
No other style of diagram depicts so many important aspects of a system without getting unreadable.
A State Diagram encompasses:
• an entity
• events
• entity states
• allowable state transitions (business rules!)
Time to
Tim e to Student finalize rosters Time to
Section is
open enrollment enrolls end term
scheduled
Section is
Student canceled
drops/transfers
Cancelled
Key Point
• The diagram is linear or circular
© 2010 Clariteq
contact asharp@clariteq.com
All entity state diagrams begin with the entity in the null state, and the first event is always something
that causes the creation of the entity occurrence.
An entity can be in one and only one state at a time - states are mutually exclusive. The most common
error when people are learning this technique is to come up with “overlapping” states.
It’s common to return to the null state if the entity occurrence is deleted, although this example doesn’t
show it (the Registrar saves everything!).
All states “matter” in the sense that the only reason for a state to exist is to enforce a business rule. For
instance, it appears that Students can’t drop or transfer once the Class is “Closed”, and the Class can’t
be cancelled. If these rules weren’t in place, we wouldn’t need the state “Closed”.
Note that this example is different from the one on the previous page, even though they’re for the same
entity – the reason: different business rules.
Key Point
• Clients get started with almost no
explanation
The state diagramming technique, in practice, is quite intuitive for clients to pick up. We’ve been at
many sessions where the facilitator drew a simple state diagram on the whiteboard and clients
immediately started discussing and correcting it with no explanation whatsoever of the technique.
It never fails to amaze (and amuse) us how many different versions of “the rules” there are in the
average organization. Naturally, everyone thinks their set of rules is correct, and they are usually
surprised at the alternatives.
1. null state
Probation term is extended
Employee is
hired
Employment Probation
Term Employee is put on Probation
Employee passes
probation
2. state
Employment Active
Term is
Purged Employee returns
Inactive from disability
Employee goes
on disability
Employment is
terminated Disability
3. state transition
4. event
© 2010 Clariteq
contact asharp@clariteq.com
This example is circular, which is less common now – it gets quite awkward.
entity in Create
null state (birth)
The simplest life cycle
Update
(pay taxes)
entity
Delete exists
(death)
© 2010 Clariteq
contact asharp@clariteq.com
In the UML, the state diagram begins with a solid (filled in) circle.
An order can’t be cancelled once it has been shipped, so we only need the
states “Taken” and “Shipped”
An order can be cancelled without penalty if picked, with penalty if loaded, and
not cancelable if shipped
State
May be determined by Usually summarized in a “Status” or
inspecting relationships or “State” attribute
attribute values
© 2010 Clariteq
contact asharp@clariteq.com
“from” event
Key Point
• Visual business rules “to”
© 2010 Clariteq
contact asharp@clariteq.com
Filled
conjunction
Complete
Class
Completed
- Bifurcation - - Conjunction -
From a given state an event can have An event may be valid from multiple
different outcomes states with the same resultant state
B A
A C
C B
© 2010 Clariteq
contact asharp@clariteq.com
Filled Filled
Class is Complete
Completed Class
Completed Completed
Key Point
• You can show events or services or
both
Group Auto
Individual Marine
Prior Claim
Address
© 2010 Clariteq
contact asharp@clariteq.com
Schedule Class
Class Enrollment
Complete Enrollment
Key Point
• Start ST analysis at the “bottom” – with entities
that have no dependents
Class and Enrollment each have their own life cycles, but they are related
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com
There are a variety of naming formats in general use - mixed case with words separated by blanks (e.g.
“Effective Date.”) is the most readable
There are certain date-related attributes that will occur many times in all models, such as “Effective
Date”, “End Date”, “Create Date”, “Superseded Date”. Agree on standard names (e.g., choose
“Effective Date”, “Start Date”, or “Begin Date”) and then use them consistently.
Attribute definition should explain the meaning and purpose of the attribute - in other words, how to
interpret attribute values. Not:
• … a restatement of the attribute name. For instance, for “Person Social Security Number”, the
definition “The Social Security Number of a Person” tells us nothing new. A better definition
would be “ A number issued to wage earners by the Social Security Administration for the purpose
of crediting employees with contributions to future retirement pay as stipulated in the Federal
Insurance Contributions Act.”
• … a description of how the attribute is handled by current systems. For instance, “Budget Center
Code is an 11 character code captured in the GL system and assigned to a Department.”
One or more attributes with a unique The only access or search path
value for each instance of an entity
There might be many identifiers - one The fundamental way the business
is chosen as the primary identifier, the distinguishes:
rest are alternate • one instance from an other
A way to reference an instance of an • a new instance from existing
entity (e.g., Customer applying for credit)
(e.g., a row of a table)
Used to establish relationships
between entities (or tables)
In short, how we relate entities is not necessarily how the client distinguishes
or accesses them
Customer: Part: Employee: Reservation:
Possible keys: Possible keys: Possible keys: Possible key:
• Customer Name + • Part Category + • SIN or SSN • Room Number +
Postal Code Manufacturer Prod # • Name + Address Start Date
• Sales Region + • Name + Birthdate
Customer Number • Portrait + Voice
• Account Number
© 2010 Clariteq
contact asharp@clariteq.com
Assigning primary and foreign keys is really part of physical database design, but the concepts are
important so we’ll cover them here.
As modelers, we should focus initially on determining how the client determines the uniqueness of
entities, and how they search for particular instances.
What’s wrong with the possible primary keys shown above?
Essential characteristics
Key problems:
• embedded meaning
– Customer 99999
– Customer ID with Head Office Region Code built in
• insufficient for expansion
– 1 digit code field
There can be many “candidate” or “alternate” keys, also referred to as “business identifiers” or “natural
keys”
• for instance, Employee may have a unique Government ID Number, Employee Number, and
System Logon ID
• one of these could be chosen as the Primary Key, if they meet the criteria; otherwise (normally)
assign a system-generated identifier
• the rest are called Alternate Keys or something similar, and must also be unique (put a unique index
on them)
Some methods use a “shorthand” technique for showing inherited keys in associative or characteristic
entities - the relationship via which parent keys are inherited is marked as an “identifying” relationship.
In one technique, an “I” is put across the relationship line, and in another, identifying relationships are
drawn with a solid line, while others (non-identifying”) are drawn with a dashed line. Normally, we
show the complete, inherited primary key.
© 2010 Clariteq
contact asharp@clariteq.com
Whether you show the propagated foreign keys on your diagram, or instead flag relationships as
“identifying” is a matter of personal preference or organizational standards. In this workshop, we’ll
always show the propagated foreign keys.
© 2010 Clariteq
contact asharp@clariteq.com
Each of the above alternatives employs the concept of “meaningless identifiers”, but differently
• the one on the left assigns an ID to kernel entities, while associative and characteristic entities
inherit the ID of their parent(s)
• the one on the right assigns all entities a unique ID
In teams, discuss the relative strengths and weaknesses of the two approaches. Which would you
choose?
© 2010 Clariteq
contact asharp@clariteq.com
© 2010 Clariteq
contact asharp@clariteq.com