Accumulating snapshot fact tables always involve multiple date stamps. Our
example, which is typical, has seven foreign keys pointing to date dimensin.
This a good place to reiterate several important points:
-
The foreign keys in the fact table cannot be actual date stamps because
they have to handle the not applicable case. The foreign keys should
be simple integers serving as surrogate keys.
The surrogate keys assigned in the date dimension should be assigned
consecutively in order of date. This is the only dimension where the
surrogate keys have any relationship to the underlying semantics of the
dimension. We do this so that physical partitioning of a fact table can be
accomplished by using one of the date-based foreign keys. In our
example we recommend that the treatment date key be used as the
basis for physically partitioning the fact table.
Surrogate keys corresponding to special conditions such as not
applicable, Corrupted, or Hasnt happened yet should assigned to
the top end of numeric range so that these rows are physically
partitioned together in the hot partition with the most recent data. We do
this if these rows are ones that are expected to change.
We do not join the seven date-based foreign keys to a single instance of
the date dimension table. Such a join would demand that all seven dates
were the same date. Instead, we crate seven views on the single
underlying date dimension table, and we join the fact table separately to
these seven views, just as if they were seven independent. We refer to
these seven views as roles played by the date dimension table.
The seven view definitions using the date dimension table should
cosmetically relabel the column names of each view to be
distinguishable so that query tolos directly accessing the views will
present the column names through the user interface in a way that is
inderstandable to the end user.
If we choose not to apply the weighting factors in a given query, we can still
summarize billed amounts by diagnosis, but in this case we get what is
called an impact report. A question such as What is the total billed amount
across all possible treatments in any way involving the diagnosis of XYZ?
would be an example of an impact report.
In Figure 13.3, an SQL view could be defined combining the fact table and
the diagnosis group bridge table so that these two tables, when combined,
would appear to data access tools as a standard fact table with a normal
diagnosis foreign key. Two views could be defined, one using the weighting
factors and one not using the weighting factors.
Finally, if the many-to-many join in Figure 13.3 causes problems for your
modeling tool that insists on proper foreign-key-to-primary-key
relationships, the equivalent design of Figure 13.4 can be used. In this case
an extra table whose primary key is diagnosis group is inserted between the
fact table and the bridgetable. Now both the fact table and the bridge table
have conventional many-to-one joins in all directions. There is no new
information in this extra table.
In the real world, a bill-paying organization would decide how to administer
the diagnosis groups. If a unique diagnosis group were created for every
out-patient treatment, the number of rows could become astronomical and
unworkable. Probably the best approach is to have a standard portfolio of
diagnosis groups that are used repeatedly. This requires that each set of
diagnoses be looked up in the master diagnosis group table. If the existing
group is found, it is used. If it is not found, then a new diagnosis group is
created.
In a hospital stay situation, however, the diagnosis group probably should
be unique to the patient because it is going to evolve over time as a type 2
slowly changing dimension (SCD). In this case we would supplement the
bridge table with two date stamps to capture begin and end dates. While
the twin date stamps complicate the update administration of the diagnosis
group bridge table, they are very useful for querying and change tracking.
They also allow us to perform time-span queries, such as identifying all
patients who presented a given diagnosis at any time between two dates.
To summarize this discussion of multivalued dimensions, we can list the
issues surrounding a multivalued dimension design:
-
Figure 13.5 shows an extended set of facts that might be added to the basic
billing schema of Figure 13.2. These include the consumables cost, provider
cost, assistant cost, equipment cost, location cost, and net profit before general
and administrative (G&A) expenses, which is a calculated fact. If these
additional facts can be added to the billing schema, the power of the fact table
grows enormously. It now becomes a full-fledged profit-and-loss (P&L) view of
the health care business.
These costs are not part of the billing process and normally would not be
collected at the same time as the billing data. Each of these costs potentially
arises from a separate source system. In order to bring this data into the billing
fact table, the separately sourced data would have to be allocated down to the
billing line item. For activity-based costs such as the ones we have included in
the list, it may be worth the effort to do this allocation. All allocations are
controversial and to an extent arbitrary, but if agreement can be reached on
the set of allocations, the P&L database that results is incredibly powerful. Now
the health care organization can analyze profitability by all the dimensions!
the patient is discharged and is applied retroactively to all the rows that have
been entered as part of the hospital stay.