Anda di halaman 1dari 64

Understanding Birst

Issue #1 September 12, 2013

Understanding Birst Copyright 2013, Birst Inc. All rights reserved. This document contains confidential material proprietary to Birst Inc. Your access to and use of this confidential material is subject to the terms and conditions of a Birst Inc. nondisclosure agreement, which has been executed and with which you agree to comply. This document and information and ideas herein may not be disclosed, copied, reproduced or distributed to anyone outside Birst Inc. without prior written consent of Birst Inc. Birst Inc. 153 Kearny St., 3rd Floor San Francisco, CA 94108 www.birst.com

ii

Understanding Birst

Contents
List of Figures ............................................................................................................................................... iv Overview ....................................................................................................................................................... 1 Audience .................................................................................................................................................... 1 Scope ......................................................................................................................................................... 1 Introduction to Birst .................................................................................................................................. 1 Dimensional Analysis .................................................................................................................................... 1 Overview ................................................................................................................................................... 1 Dimensional Analysis in Birst..................................................................................................................... 3 Birst Logical Model ........................................................................................................................................ 3 Logical to Physical Mapping ...................................................................................................................... 4 Measure and Dimension Tables ............................................................................................................ 5 Joins ....................................................................................................................................................... 8 What Happens When You Run a Report? ................................................................................................... 10 How Birst Satisfies Logical Queries (Query Navigation).............................................................................. 11 Creating a Data Model ................................................................................................................................ 17 Application Building (BI Automation) ...................................................................................................... 17 Birst Spaces ............................................................................................................................................. 18 Discovery Tables ...................................................................................................................................... 19 Creating a Data Model for Discovery Spaces....................................................................................... 19 Application Building for Discovery Tables ........................................................................................... 21 Time Dimensions ................................................................................................................................. 24 Processing Data Using the Data Model ............................................................................................... 25 Warehouse Tables ................................................................................................................................... 27 Creating a Warehouse Data Model ..................................................................................................... 28 Dimensions/Hierarchies ...................................................................................................................... 30 Levels ................................................................................................................................................... 30 Surrogate Keys ..................................................................................................................................... 38 Grains ................................................................................................................................................... 38 Working with Grains: Determining Whether a Measure Table is at a Lower Level than Another ...... 42 Application Building for Data Warehouse Models .............................................................................. 54 Update Warehouse Model .................................................................................................................. 54 Automated Data Modeling (Automatic Mode) ....................................................................................... 56

iii

Understanding Birst Development and Deployment ................................................................................................................... 57 The Repository ........................................................................................................................................ 57 Birst Data and Data Model Processing .................................................................................................... 58

List of Figures
Figure 1 - Dimensional modeling. ................................................................................................................. 2 Figure 2 - Interaction between logical and physical data models. ............................................................... 5 Figure 3 - Examples of logical dimension column definitions. ...................................................................... 5 Figure 4 - "Customers Customers" Dimension Table definition. .................................................................. 6 Figure 5 - Example definitions of measures and their aggregation rules. .................................................... 7 Figure 6 - Example of a measure table definition on the Order Details table. ............................................. 8 Figure 7 - Example of a set of auto-generated joins. .................................................................................... 9 Figure 8 - Example snowflake join. .............................................................................................................. 9 Figure 9 - Conceptual Architecture. ............................................................................................................ 10 Figure 10 - Example Birst Warehouse Data Model. .................................................................................... 11 Figure 11 - Query navigation example. ....................................................................................................... 12 Figure 12 - Logical and physical data model structures for Discovery tables. ............................................ 19 Figure 13 - Setting a primary key. ............................................................................................................... 19 Figure 14 - Sample Discovery data model................................................................................................... 19 Figure 15 - Creating a new join. .................................................................................................................. 20 Figure 16 - Setting column attributes for a Discovery data source. ........................................................... 20 Figure 17 - Example of an auto-generated dimension table for a Discovery source.................................. 21 Figure 18 - Example of an auto-generated measure table for a Discovery source. ................................... 22 Figure 19 - Joins created in a Discovery data model................................................................................... 23 Figure 20 - Example measure across two tables......................................................................................... 24 Figure 21 - Example "combo-fact" table definition. ................................................................................... 24 Figure 22 - Example subject area illustrating time transformations. ......................................................... 25 Figure 23 - Preparing to process data ......................................................................................................... 26 Figure 24 - Processing group selection. ...................................................................................................... 26 Figure 25 - Processing results ..................................................................................................................... 27 Figure 26 - Processing details...................................................................................................................... 27 Figure 27 - Logical and physical data model structures for warehouse tables........................................... 28 Figure 28 - Source data model for building a warehouse data model. ...................................................... 29 Figure 29 - Example data warehouse model. ............................................................................................. 29 Figure 30 - Product related columns in a source data set. ......................................................................... 30 Figure 31 - Example set of hierarchies and levels. ...................................................................................... 31 Figure 32 - Example of setting a level key on the Hierarchies page. .......................................................... 32 Figure 33 - Example showing how columns are targeted to hierarchies and levels on Manage Sources page............................................................................................................................................................. 33 Figure 34 - Dimension table definition of Products dimension table. ........................................................ 34 Figure 35 - Hierarchy load detail for the Products source. ......................................................................... 34 Figure 36 - Queries used to load the Products dimension table ................................................................ 35 Figure 37 - View Processed Data page. ....................................................................................................... 35

iv

Understanding Birst Figure 38 - View processed data for the Products dimension table. .......................................................... 36 Figure 39 - Categories data source setup ................................................................................................... 37 Figure 40 - Dimension table definition for the Categories level in the Products dimension...................... 37 Figure 41 - View data for the Categories dimension table. ........................................................................ 38 Figure 42 - Specifying the grain for the Categories data source in the Mange Sources page. ................... 39 Figure 43 - Measure table definition for table at the Products.Categories grain....................................... 40 Figure 44 - Load SQL used to load the Categories, Day measure table. ..................................................... 41 Figure 45 - Data loaded into the Categories, Day measure table. .............................................................. 41 Figure 46 - Data in the Orders data source on the Manage Sources page. ................................................ 42 Figure 47 - Setup of Orders data source on the Manage Sources page. .................................................... 43 Figure 48 - Grain for the Orders data source. ............................................................................................. 44 Figure 49 - Measure table definition at the Orders, Employees grain. ...................................................... 45 Figure 50 - Order Details data source. ........................................................................................................ 46 Figure 51 - Column attributes of the Order Details data source. ............................................................... 46 Figure 52 - Grain for Order Details data source. ......................................................................................... 47 Figure 53 - Measure table definition for Order Details, Employees, Products, Day grain.......................... 48 Figure 54 - Processing detail for loading the measure table at the Order Details, Employees, Products, Day grain. .................................................................................................................................................... 49 Figure 55 - Simple report leveraging keys carried downhill. ...................................................................... 50 Figure 56 - Definition of Average Freight for a set of warehouse tables.................................................... 51 Figure 57 - Combo-fact measure table definition example. ....................................................................... 52 Figure 58 - Report in Birst Designer illustrating combo facts. .................................................................... 53 Figure 59 - Input data model for Update Warehouse Model. .................................................................... 55 Figure 60 - Example auto-created dimensional data model ....................................................................... 55

Understanding Birst

Overview
Audience
This document is intended for people involved in planning, designing, and implementing Birst solutions, including Birst partners. The audience for this document is database administrators, developers, analysts, and IT professionals who need to understand how Birst works and are familiar with data warehouse and business intelligence (BI) concepts.

Scope
This document presents the technical details of Birsts architecture in order to provide an understanding of how Birst works. This document describes Birsts logical data model and how the logical data model maps to physical database tables, as well as how Birst satisfies queries and how a data model is created in Birst. This document also describes the purpose of the Birst repository and the tasks Birst performs during data processing.

Introduction to Birst
Birst is the only end-to-end BI suite built for the cloud with fully integrated ETL, data warehouse automation, enterprise reporting, ad hoc query and analysis, and dashboards. Birst is delivered in an ondemand model but has the unique ability to report both on data that stays on premise as well as data that is loaded into the cloud. Birst is the industrys first and only end-to-end BI suite that provides the power of on-premise BI tools with the economics of SaaS. Birst is composed of three modules: Admin, Designer and Dashboards. The Admin module is used by Business Intelligence (BI) developers and administrators to create a space to hold data and reports, upload data, and assign relationships to create a logical data model. Once the data model is created and your space has been customized for your needs, you can perform data processing. Once data is processed, your data is ready for use in the Designer module. The Designer module is used by analysts and report writers to create customized reports and charts that can be scheduled for distribution throughout your organization. Finally, these reports can be published to dashboards in the Dashboards module for end users to consume.

Dimensional Analysis
Overview
Useful business analysis of data requires the ability to look at it at multiple levels. Businesses are after all hierarchical in nature. Analysis is also comparative. It requires the ability to look at various measures for different parts of a business to compare them with each other and over time. This process of hierarchical measurement and comparison naturally leads to a dimensional model. The dimensional metaphor provides a convenient and natural way of describing various hierarchical measurements of a business.

Understanding Birst In the dimensional model there are two basic types of entities that come into play: measures and dimensions. Measures are measurements of business data. measures can include things like Revenue, Sales, Assets, Number of Orders, etc. Measures, in turn, can be analyzed across dimensions (known as attributes in the Birst user interface). Dimensions (or attributes) are the natural groupings for measurements. They consist of things like Years, Months, Product Categories, Sales Regions, etc. For example, you might want to know the monthly sales by product. This is very naturally expressed in a dimensional model. Below is an example of how a dimensional model might be visualized. Consider two measurements of shipments: Quantity and Date Last Shipped. These measures can apply to the entire business, a particular region, a time period, or a transportation medium. We can therefore think of it as a cube split along three dimensions: time, route and source:

Figure 1 - Dimensional modeling.

Inherent in dimensions is the notion of hierarchy. Consider Time for example. There are many units of time that can be used to measure something (like sales). You could look at yearly sales, quarterly sales, monthly sales, weekly sales, daily sales or even hourly sales. It is convenient, therefore, to think of all of these units as levels in a hierarchy of Time. We could easily imagine a useful hierarchy with several levels:

Understanding Birst All Time Years Quarters Months Weeks Days

Obviously, if you wanted to measure sales on a daily basis there would be a lot more results than a more summarized analysis at the yearly level. A more useful approach would be to start at a higher level, say product sales by year, and then examine each year that appears more interesting, to see how it breaks down by quarter, then by month, etc. This peel-the-onion approach is quite natural and resembles the way business people normally think about problems. The type of software that allows us to analyze data in such a way is also known as Online Analytical Processing (OLAP).

Dimensional Analysis in Birst


Birst provides a formal way of analyzing hierarchical data. Unlike other approaches that restrict dimensional access to physical cubes of datalimiting the richness, scope and depth of information that can be analyzedBirst constructs a dynamic, logical cube of all data that it is mapped to at any given time. As long as a dimensional relationship between various data elements has been defined, it is possible to analyze it that way. Birst, therefore, provides a convenient view into all of an enterprises data avoiding the cumbersome approaches of prior generations of technology. Birst analyzes this dimensional structure at several different levels. At the lowest level, there is a rich query language that allows for arbitrary selections of data and calculations to be defined. Birsts Designer module allows for a more visual method of interacting with data. Birsts Dashboards module provide an even higher level of abstraction, where pre-designed analyses can be given to end-users, who can click on predefined prompts and links to navigate information.

Birst Logical Model


End users of Birst do not interact directly with objects as they are stored in the underlying database. Instead, Birst puts a layer of abstraction between the user and the data called the logical model. Users interact with elements of this logical model, and Birst translates those requests into queries that access physical data. A logical model is important for two reasons: 1. It provides an abstraction layer for business users. Users do not need to understand how different tables should be joined together to produce various resultsa logical model significantly simplifies the experience of end-users. Business calculations can be pre-built for easy ad-hoc analysisbestpractices and other calculations can be re-used and shared easily. Aggregation logic can also be hidden from end-usersusers do not have to be aware of the precise details as to how data is aggregated to be able to ask questions. 2. It provides a secure layer that allows for end-user development. If end-users had direct access to raw data, organizations would be forced to limit who could build a report or a dashboard (this is true

Understanding Birst for Discovery Tool products like Qlikview). By including deep role-based and individual security, endusers can be given access to ad-hoc tools to build reports without the worry that they will pull the wrong information. Security allows a single report to be created for large numbers of people that shows each persons data without having to build a report for each person.

Logical to Physical Mapping


When users of Birst create a report or a query, they do so using logical data elementselements that do not physically exist, but internally are mapped to physical objects. There are two types of logical objects that can be used to compose a query or a report: dimension columns and measure columns. Dimension columns are columns that are used to categorize or group data. In Figure 1 on page 2, examples of dimensional columns would be [Time.Half Year], [Time.Quarter], [Route.Hemisphere], and [Source.Category]. Dimension columns are specified by the logical dimension (for example, Time, Route and Source) and the individual attribute within that dimension (for example, Quarter, Hemisphere and Category). Similarly, logical measures are aggregations that calculate values based on the attributes in a query. In this same example [# Packages] could be a measure, which could be grouped by [Time.Quarter], [Source.Category] or both. When users select from the available dimension columns (referred to as attributes in the Subject Area) and measure columns in order to create a report, it is Birsts job to figure out how to bring that data together. Sometimes the data that satisfies a logical query exists in one physical table in the Birst database. Much more often, it exists in many tables and Birst must create physical queries that are sent to the underlying Birst database to perform the aggregations and calculations that are required. It does this by using special metadata that maintains mappings between logical columns and physical database objects. Logical dimension columns and logical measure columns are mapped using internal metadata objects called logical tables. Note: Birst hides the details of this mapping from data modelers and end-users as these mappings are generated automatically during the process of application building (except in the case of Live Access using Manage Local Sources). However, it is helpful to understand how this process works.

Understanding Birst

Logical Data Model

Product Orders Region

Customer

Logical Columns Logical Dimension Tables Logical Measure Tables Hierarchies

Time

Mapping and Security Metadata

Physical Database Tables

PRODUCT TABLE CUSTOMER TABLE

ORDER TABLE REGION TABLE

TIME TABLE

Physical Tables Physical Columns Joins

Figure 2 - Interaction between logical and physical data models.

Measure and Dimension Tables


Internally, there are two core object types that make up Birsts logical model: dimension table definitions and measure table definitionsboth of which map logical columns on to representations of physical tables. Let us consider first a Customers dimension in a logical model. Internally, each logical dimension column exists with a few attributes, like description, data type and width.

Figure 3 - Examples of logical dimension column definitions.

In order for Birst to satisfy a query that may use [Customers.CompanyName], for example, Birst must examine all the mappings of that logical column onto physical table structures. In this example, there is a mapping called Customers Customersan auto-generated mapping for the Customers level within the Customers dimension. Note: This table is created in a process called application building. See the Application Building (BI Automation) section on page 17 for more information. Naming conventions are specific to the method that created each logical table and are described later in this document.

Understanding Birst

Figure 4 - "Customers Customers" Dimension Table definition. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

It is at the table/column mapping level where Birst builds in complex logic for how objects should be accessed. Figure 4 shows how the dimension table mapping Customers Customers is defined: It is a table defined at the Customers level, so only queries at or above that level apply. It is mapped to a physical table in the database DW_DM_CUSTOMERS_CUSTOMERS. Logical columns are mapped onto physical columns (e.g., [Customers.CompanyName] is mapped to CompanyName$ on the table DW_DM_CUSTOMERS_CUSTOMERS.

Similarly consider a series of measures defined in this example:

Understanding Birst

Figure 5 - Example definitions of measures and their aggregation rules. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

In this case, each logical measure is defined with a data type and an aggregation rule. This way, depending on what attributes are selected in a query, Birst knows how it should aggregate that measure. As with dimensions, measures are also mapped onto physical structures using measure table mappings. Figure 6 below shows an example of a mapping to the automatically generated fact table Customers Day Employees Order Details Products Fact, which is mapped to the physical table DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS (the fact table at the grain that includes levels Customers.Customers, Time.Day, Employees.Employees, OrderDetails.OrderDetails, and Products.Products). Note: For information about grains, see the Grains section on page 38.

Understanding Birst

Figure 6 - Example of a measure table definition on the Order Details table. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

Looking at this figure, we can see the following: The mapping is to table DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS. The cardinality of the measure table mapping is defined. (This is important for query navigation, which is discussed in How Birst Satisfies Logical Queries (Query Navigation) on page 11). Individual measures are mapped to individual columns. For example, both [Sum: Unit Price] and [Avg: UnitPrice] are mapped to the same physical column in the table UnitPrice$. Since these two logical columns have different aggregation rules (SUM vs. AVG), each will return a different result for the end user.

Joins
Once Birst has a mapping for all measures and dimension columns, the next question is how does Birst combine these elements to service queries? Birsts logical model also specifies all legal joins between measure and dimension tables and between dimension tables. Using these join patterns, Birst can figure out what possible combinations of physical tables can service a logical query. For example, the Customers Day Employees Order Details Products Fact measure table shown above joins to the Products Products dimension table shown below:

Understanding Birst

Figure 7 - Example of a set of auto-generated joins. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

In this example you can see the auto-generated join between these two tables based on the surrogate key column that is present in both of these tables. You can also see that the join is an inner join. Additionally, Birst records joins between different dimension tables, allowing them to snowflake. For example, the following join is also generated:

Figure 8 - Example snowflake join.

This join shows the relationship between the Products Products dimension table and the Products Categories dimension table, and is what allows you to group order detail information on the fact table (like Quantity or Unit Price) by product categories. It does so by joining through the Product table to the Category table. Note: These examples show internal structures that Birst builds and maintains automatically. They are shown here for instructional purposes; users are not expected to directly edit or maintain these structures. (The exception to this is the lower-level administration view for Live Access, Manage Local Sources, that allows direct manipulation of these objects).

Understanding Birst

What Happens When You Run a Report?


Now, lets consider what happens when a user runs a report and how that report leverages Birsts logical layer. Figure 9 is a conceptual architecture of Birst.

Designer

Dashboards

Rendered Report Cache Logical Query Engine Memory Cache File Cache Physical Query Engine Data Warehouse Other Data Sources
Figure 9 - Conceptual Architecture.

Live Access

Aggregates Base Data Tables ETL

When a user loads a report in Designer, or views a report in Dashboards, the following occurs: Birst checks the report cache to see if that report with the same session and prompt parameters has already been rendered. If so, it returns that report from the report cache. If not, Birst creates a logical query for the report and submits that to the logical query engine. The logical query engine checks to see if that query can be satisfied by the query cache and if so uses it to create a result set. If the query misses cache, one or more physical queries is created by the query navigator which determines the least cost set of tables to satisfy the query, and the results from those queries are assembled into a result set (in the physical query engine). Birst renders the report by combining the result set with the formatting and layout from the report design.

10

Understanding Birst The most important part of this chain to understand is how Birst translates logical queries against its logical layer into physical queries against an underlying database.

How Birst Satisfies Logical Queries (Query Navigation)


Behind every report is a logical query. To understand how Birst turns logical queries into physical ones, we can use the example order-entry data model below.

Figure 10 - Example Birst Warehouse Data Model.

Lets consider the following simple query:


SELECT [Products.CategoryName] FROM [ALL]

To see how Birst processes this query, we can use the Navigate Query button on the Query Admin page under the Define Sources tab in the Admin module as shown below.

11

Understanding Birst

Figure 11 - Query navigation example.

Navigate Query shows for each column in a query all the possible logical mappings (logical measure tables and dimension tables) that could satisfy the query, and ultimately which one was chosen. At run time, Birst only joins logical measure tables to logical dimension tables. Birst pre-generates a wide variety of combinations of facts and dimensions to create compound logical tables. For example, if two dimension tables are joined, Birst will create a logical table for each, and a third logical table that represents the combination. Similarly, if one fact table is at a lower level than another, Birst will create a logical measure table for each fact as well as a logical measure table that joins both of those facts together (a combo fact). In the example above, you can see that the column [Products.CategoryName] exists in both the base logical dimension table (the one mapped directly to the physical Categories dimension table), as well as the Products-Categories snowflake dimension table (the one that joins both the Products table and the Categories table together to satisfy queries with attributes from both). An asterisk shows the one that was picked, in this case, the base Categories dimension table in the Products dimension as it has a lower cardinality than the Products-Categories combination table. Once Birst knows what physical tables it is going to use to satisfy the query, it can generate a physical query to the database:
SELECT DW_DM_PRODUCTS_CATEGORIES0_.CategoryName$ AS 'CategoryName' FROM S_ab41d31e_72a9_4145_b8d0_2379e181d3c1.DW_DM_PRODUCTS_CATEGORIES DW_DM_PRODUCTS_CATEGORIES0_ GROUP BY DW_DM_PRODUCTS_CATEGORIES0_.CategoryName$

Consider the following slightly more complex logical query that calculates the average freight and quantity sold for each customer region:

12

Understanding Birst
SELECT [Customers.Region],[Avg: Freight],[Sum: Quantity] FROM [ALL]

Birst begins by breaking up the query, with one query per measure:
SELECT [Customers.Region],[Avg: Freight] FROM [ALL] SELECT [Customers.Region],[Sum: Quantity] FROM [ALL]

Birst navigation works by finding the best query path for each measure, attempting to recombine measures that hit the same tables with the same joins, and then executing each query in parallel. Consider the navigation path for the first sub-query (Avg: Freight):
Measure Column: Avg: Freight Measure Tables (cardinality): * Customers Day Employees Orders Fact (679770.0) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDERS] Customers Day Employees Orders Fact_Customers Day Fact (747747.0000000001) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDERS,DW_SF_CUSTOMERS_DAY] Customers Day Employees Orders Fact_Day Employees Fact (747747.0000000001) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDERS,DW_SF_DAY_EMPLOYEES] Customers Day Employees Orders Fact_Customers Day Fact_Day Employees Fact (815724.0) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDERS,DW_SF_CUSTOMERS_DAY,DW_SF_DAY_EMPLOYEES] Customers Day Employees Order Details Products Fact_Customers Day Employees Orders Fact (6.590991405398401E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CUSTOMERS_DAY_EMPLOYEES _ORDERS] Customers Day Employees Order Details Products Fact_Categories Day Fact_Customers Day Employees Orders Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CATEGORIES_DAY,DW_SF_CU STOMERS_DAY_EMPLOYEES_ORDERS] Customers Day Employees Order Details Products Fact_Day Products Fact_Customers Day Employees Orders Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_DAY_PRODUCTS,DW_SF_CUST OMERS_DAY_EMPLOYEES_ORDERS] Customers Day Employees Order Details Products Fact_Customers Day Fact_Customers Day Employees Orders Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CUSTOMERS_DAY,DW_SF_CUS TOMERS_DAY_EMPLOYEES_ORDERS] Customers Day Employees Order Details Products Fact_Day Employees Fact_Customers Day Employees Orders Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_DAY_EMPLOYEES,DW_SF_CUS TOMERS_DAY_EMPLOYEES_ORDERS] Dimension Tables Selected: Customers Customers [DW_DM_CUSTOMERS_CUSTOMERS]

13

Understanding Birst
Dimension Column: Customers.Region Dimension Tables (level): * Customers Customers (Customers) [DW_DM_CUSTOMERS_CUSTOMERS] [XXXXXXXXX] Order Details Orders_Customers Customers_ (Orders) [DW_DM_ORDER_DETAILS_ORDERS,DW_DM_CUSTOMERS_CUSTOMERS] [XXXXXXXXX] Order Details Order Details_Order Details Orders_Customers Customers_ (Order Details) [DW_DM_ORDER_DETAILS_ORDER_DETAILS,DW_DM_ORDER_DETAILS_ORDERS,DW_DM_CUSTOMERS_CUSTO MERS] [XXXXXXXXX] Order Details Orders_Employees Employees_Customers Customers_ (Orders) [DW_DM_ORDER_DETAILS_ORDERS,DW_DM_EMPLOYEES_EMPLOYEES,DW_DM_CUSTOMERS_CUSTOMERS] [XXXXXXXXX]

Here we can see that the measure column Avg: Freight exists on 9 different logical measure tables. The first is the base Orders fact, which is named DW_SF followed by the names of the levels in the grain, in this case Customers, Day, Employees and Orders ([DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDERS]). Each other logical measure table is a logical table that combines two or more physical measure tables (referred to as Combo Facts). Each of these logical entities has a reference to Freight, but would also have columns from the other logical measure table as well, in case both were needed together to create a formula using columns from both, or to join through another fact table to a dimension that may be needed. Also, we can see that the column [Customers.Region] exists in 4 logical dimension tables: the base Customers dimension, the snowflake that joins the Orders dimension to the Customers dimension table, the snowflake that joins the Customers dimension table to the Orders dimension table to the Order Details dimension table, and the joint snowflake of both the Employees dimension table and the Customers dimension table to the Orders dimension table. To determine which measure table definition and which dimension table definition are selected, Birst prioritizes the measure table with the lowest cardinality. Cardinality is indicated next to each measure table definition in parenthesis. (Note that this is not the actual number of rows in the table.) For Live Access measure tables, you manually enter the cardinality of a fact table. For Discovery tables, the cardinality is determined based on row count of the source. For warehouse fact tables, cardinality is determined by multiplying the cardinality of each of the levels associated with the grain (which explains the very high numbers in the example above). Once a measure table is chosen, Birst must choose which dimension table to join to for each attribute. First, Birst prunes all dimension tables that do not join to the chosen fact. (Birst only chooses measure tables that have joins to dimension tables for every dimension column). Then Birst chooses the dimension table with the lowest cardinality (as defined by the cardinality of the dimension level to which it is assigned). We can see which dimension tables join to each fact table by noting the Xs in brackets next to each dimension table name. Between the brackets Birst shows either an X or a blank for each measure table, where an X indicates a join to the measure table in that position and a blank denotes a lack thereof.

14

Understanding Birst Given those rules, we can see that the two most obvious tables are chosen in this case as they are the simplest and lowest cardinality options. Once those tables are chosen, Birst can generate the following physical query for this measure:
SELECT DW_DM_CUSTOMERS_CUSTOMERS1_.Region$ AS 'Region',AVG(DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDERS0_.Freight$) AS 'Avg: Freight' FROM S_ab41d31e_72a9_4145_b8d0_2379e181d3c1.DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDERS DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDERS0_ INNER JOIN S_ab41d31e_72a9_4145_b8d0_2379e181d3c1.DW_DM_CUSTOMERS_CUSTOMERS DW_DM_CUSTOMERS_CUSTOMERS1_ ON DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDERS0_.Customers$Customers120094747$=DW_DM_CUSTOMERS_C USTOMERS1_.Customers120094747$ GROUP BY DW_DM_CUSTOMERS_CUSTOMERS1_.Region$

The navigation for the second subquery proceeds the same way, but as the measure does not exist on the main Orders fact, but rather the Order Details fact, a different measure table is chosen:
Measure Column: Sum: Quantity Measure Tables (cardinality): * Customers Day Employees Order Details Products Fact (5.991810368544E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS] Customers Day Employees Order Details Products Fact_Categories Day Fact (6.590991405398401E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CATEGORIES_DAY] Customers Day Employees Order Details Products Fact_Day Products Fact (6.590991405398401E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_DAY_PRODUCTS] Customers Day Employees Order Details Products Fact_Customers Day Fact (6.590991405398401E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CUSTOMERS_DAY] Customers Day Employees Order Details Products Fact_Day Employees Fact (6.590991405398401E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_DAY_EMPLOYEES] Customers Day Employees Order Details Products Fact_Customers Day Employees Orders Fact (6.590991405398401E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CUSTOMERS_DAY_EMPLOYEES _ORDERS] Customers Day Employees Order Details Products Fact_Categories Day Fact_Day Products Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CATEGORIES_DAY,DW_SF_DA Y_PRODUCTS] Customers Day Employees Order Details Products Fact_Categories Day Fact_Customers Day Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CATEGORIES_DAY,DW_SF_CU STOMERS_DAY]

15

Understanding Birst
Customers Day Employees Order Details Products Fact_Categories Day Fact_Day Employees Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CATEGORIES_DAY,DW_SF_DA Y_EMPLOYEES] Customers Day Employees Order Details Products Fact_Categories Day Fact_Customers Day Employees Orders Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CATEGORIES_DAY,DW_SF_CU STOMERS_DAY_EMPLOYEES_ORDERS] Customers Day Employees Order Details Products Fact_Day Products Fact_Customers Day Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_DAY_PRODUCTS,DW_SF_CUST OMERS_DAY] Customers Day Employees Order Details Products Fact_Day Products Fact_Day Employees Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_DAY_PRODUCTS,DW_SF_DAY_ EMPLOYEES] Customers Day Employees Order Details Products Fact_Day Products Fact_Customers Day Employees Orders Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_DAY_PRODUCTS,DW_SF_CUST OMERS_DAY_EMPLOYEES_ORDERS] Customers Day Employees Order Details Products Fact_Customers Day Fact_Day Employees Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CUSTOMERS_DAY,DW_SF_DAY _EMPLOYEES] Customers Day Employees Order Details Products Fact_Customers Day Fact_Customers Day Employees Orders Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_CUSTOMERS_DAY,DW_SF_CUS TOMERS_DAY_EMPLOYEES_ORDERS] Customers Day Employees Order Details Products Fact_Day Employees Fact_Customers Day Employees Orders Fact (7.1901724422528E15) [DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS,DW_SF_DAY_EMPLOYEES,DW_SF_CUS TOMERS_DAY_EMPLOYEES_ORDERS] Dimension Tables Selected: Customers Customers [DW_DM_CUSTOMERS_CUSTOMERS] Dimension Column: Customers.Region Dimension Tables (level): * Customers Customers (Customers) [DW_DM_CUSTOMERS_CUSTOMERS] [XXXXXXXXXXXXXXXX] Order Details Orders_Customers Customers_ (Orders) [DW_DM_ORDER_DETAILS_ORDERS,DW_DM_CUSTOMERS_CUSTOMERS] [XXXXXXXXXXXXXXXX] Order Details Order Details_Order Details Orders_Customers Customers_ (Order Details)

16

Understanding Birst

[DW_DM_ORDER_DETAILS_ORDER_DETAILS,DW_DM_ORDER_DETAILS_ORDERS,DW_DM_CUSTOMERS_CUSTO MERS] [XXXXXXXXXXXXXXXX] Order Details Orders_Employees Employees_Customers Customers_ (Orders) [DW_DM_ORDER_DETAILS_ORDERS,DW_DM_EMPLOYEES_EMPLOYEES,DW_DM_CUSTOMERS_CUSTOMERS] [XXXXXXXXXXXXXXXX]

Note that in this case, since Quantity is on the Order Details fact which is at a lower level, a few more fact table combinations are possible and hence more options for Birst. Since this query is fairly simple, Birst picks the simplest and lowest cardinality tables and generates the following physical query:
SELECT DW_DM_CUSTOMERS_CUSTOMERS1_.Region$ AS 'Region',SUM(CAST(DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_.Quantity$ AS BIGINT)) AS 'Sum: Quantity' FROM S_ab41d31e_72a9_4145_b8d0_2379e181d3c1.DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRO DUCTS DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_ INNER JOIN S_ab41d31e_72a9_4145_b8d0_2379e181d3c1.DW_DM_CUSTOMERS_CUSTOMERS DW_DM_CUSTOMERS_CUSTOMERS1_ ON DW_SF_CUSTOMERS_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_.Customers$Customers120094747$=D W_DM_CUSTOMERS_CUSTOMERS1_.Customers120094747$ GROUP BY DW_DM_CUSTOMERS_CUSTOMERS1_.Region$

Birst issues these two queries to the database and combines the result sets based on the dimension attributes in the query (in this case Region). Birst will issue a query for each set of physical tables that it needs to hit to satisfy a query. If multiple measures navigate to the same tables, the queries will be consolidated.

Creating a Data Model


When data is loaded into Birst, it is un-modeled, hence business logic needs to be applied to understand how it should be aggregated or displayed. Additionally, when new data is loaded, there needs to be some mechanism for determining what is new versus changed and how to incrementally accumulate this data. Historically, with legacy business intelligence tools, the process of building and managing the rules for displaying data was very much a manual oneevery column had to be hand mapped and every variant had to be hand mapped as well. Business intelligence does not address how to load and store data and the discipline of data warehousing, which does address this topic, is notorious for being cumbersome and slow. Birst supplies a design metaphor that provides a mechanism for automating a large portion of the manual configuration that has historically been necessary.

Application Building (BI Automation)


A concept fundamental to Birst is that of application building. Application building results in large amounts of BI and warehouse construction metadata generated based on high level declarative rules. The metadata for logical measure tables and dimension tables (as shown above) are actually automatically generated based on higher level structures. This automation allows for significant

17

Understanding Birst amounts of metadata to be built with a fraction of the effort that would be needed for a legacy BI tool like SAP Business Objects, Oracle BI, or Microstrategy. How does this work? You begin by decorating raw data sources with bits of dimensional information. Based on this information, Birst derives what the dimensional data model should be, creating many variants automatically to handle a diverse set of scenarios. Data sources can generate two different types of BI metadata. In the case of Discovery sources (which are sources that are not included in the data warehouse), they simply generate run-time dimensions and measure definitions that run against the existing tables. For warehouse tables, Birst actually creates physical tables in the database that match the logical structure of the dimensional modelbuilding in increased flexibility, capability and performance. Application building happens automatically every time a change occurs that would result in a different target data model. For example, if you upload a new source, add a new column to it, and target that column to a particular use (e.g., target it to a hierarchy or level or target it as a measure or Analyze by Date, Birst will rebuild all measure and dimension table definitions to ensure that they conform.

Birst Spaces
You work with spaces in Birst. A space is a container for your data and the reports and dashboards that you and others with access to your space build. You can create separate spaces for different sets of data sources and various analytical purposes. When creating a new space, you have the choice between three types of spaces (also referred to as modes): Automatic Advanced Discovery

In the case of an Automatic space, Birst will automatically analyze your uploaded data and derive a star schema dimensional model that is ready for analysis and reporting. You are not required to define hierarchies, source grains and source column properties. We recommend that you use this type of space when you want to get going quickly or are unfamiliar with the structure of the data and lack experience with BI and a basic understanding of dimensional modeling. An Advanced space in Birst provides you with full flexibility in defining a dimensional model that best describes your data and meets your specific analytics needs. You are able to set up custom hierarchies, specify the grain of your data sources, and determine which source columns serve as facts and how source columns relate to hierarchies and hierarchy levels. Using an Advanced space you are also able to write your own ETL scripts using Birst ETL Services to transform and manipulate data. Discovery sources (which are sources not included in the data warehouse) can also be added to Advanced spaces. The third space type is Discovery. This space type offers dashboard, discovery and visualization capabilities without the need for modeling hierarchies or specifying grains. It does not require the creation of a data warehouse.

18

Understanding Birst

Discovery Tables
Discovery tables are sources loaded directly with raw data by the user in Advanced spaces and Discovery spaces. In this case, the physical data in the database is in the same structure as the original data and the dimensional model is strictly a logical data modeling concept, as illustrated below.
Logical Data Model
Product Orders Region Time Customer

Logical Columns Logical Dimension Tables Logical Measure Tables Hierarchies

Mapping and Security Metadata

Physical Database Tables

PRODUCT TABLE CUSTOMER TABLE

ORDER TABLE REGION TABLE

TIME TABLE

Physical Tables Physical Columns Joins

Figure 12 - Logical and physical data model structures for Discovery tables.

Creating a Data Model for Discovery Spaces


Consider an example data model with 6 tables and a series of key/foreign key relationships. When these raw tables are initially uploaded, Birst scans them to discover their primary keys, and based on the presence of columns with the same name in other tables, establishes foreign key joins between many tables. For example, the primary key of the Orders table is OrderID (accessed by right-clicking on a source in the Data Flow page in the Birst Admin module and choosing Set Primary Key):

Figure 13 - Setting a primary key.

And as a result of setting these key/foreign key relationships, we get the following data flow diagram on the Data Flow page:

Figure 14 - Sample Discovery data model.

19

Understanding Birst Birst attempts to automate as much of the data modeling process as possible, and as such when you load data into Discovery sources in a Discovery space, Birst will attempt to create all the joins for you. You can also manually create joins by right-clicking a source in the Data Flow page and choosing Join to a Related Source as shown below.

Figure 15 - Creating a new join.

When you select this option, only sources that could be related by key/foreign key relationships are visible (the others are hidden) and you can drag a join line to another source. Standard joins are only possible where the foreign key column in one source has the same name as the primary key in another. If this is not the case with raw sources, you can simply rename the columns to make the names the same. After table-level relationships are determined, columns can be designated as attributes, measures or both by right-clicking a source in Data Flow and selecting Manage Sources. In the example below, for the main Orders table, most columns are selected as attributes (columns that can be used to report on and group by). A few columns are selected as measures (which will allow them to be summed, counted, averaged or used with another aggregation rule) such as Freight.

Figure 16 - Setting column attributes for a Discovery data source.

20

Understanding Birst

Application Building for Discovery Tables


As a result of these choices, Birst builds one logical dimension table and one logical fact table for each source. This process is entirely automatic. Consider the selections above. All the columns that are attributes are grouped into a logical dimension table. A new dimension is created using the name of the source. That dimension will have a single level (also the name of the source). A new dimension table definition is also created with the name of the source (and designated to apply at the newly created dimension and level) as shown below.

Figure 17 - Example of an auto-generated dimension table for a Discovery source. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

The table is mapped to the physical database table containing the Orders data and each logical column is mapped to the physical column name that is actually used to store the data in the database. (As noted previously, the figure above shows internal metadata structures for the purposes of illustration only.) Birst uses the following naming conventions when creating physical structures to hold data: Discovery sources begin with the prefix ST_. All spaces and other characters not suitable for database naming are converted to underscores. Column names end with a $.

As with the dimension table definition above, Birst builds a measure table definition as well (with the name of the source and the word Fact appended). Birst also maps each measure (with the appropriate aggregation rules) to the associated physical columns in the Birst database.

21

Understanding Birst

Figure 18 - Example of an auto-generated measure table for a Discovery source. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

Notice in the above example that the cardinality of the logical measure table is set to 830, the number of rows in the physical table. In addition to building a logical measure table and a logical dimension table for each source, Birst uses the key/foreign key relationships that have been defined to create a wide variety of join conditions as shown below.

22

Understanding Birst

Figure 19 - Joins created in a Discovery data model. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

In the figure above you can see the join relationships created between each measure table definition (those tables with Fact in their name) and each dimension table definition. There are also joins between dimension table definitions (for example, between Products and Categories). These dimensiondimension joins create snowflake metadata and allow you to join through a dimension table to a fact. Notice that there are also numerous joins between fact tables and time dimension tables. For any source that has columns designated as Analyze by Date, Birst will join using these columns to the time dimension allowing date analysis on these columns. All joins in Birst to measure table definitions are to dimension tables. So how does Birst deal with the situation where you want to create a calculation across measure tables? An example of this would be a calculation of Average Freight defined as a custom measure in Birst as the quantity sold of a product divided by the freight for an order as shown below.

23

Understanding Birst

Figure 20 - Example measure across two tables.

This is more complex because the column [Quantity] exists on the Order Details table and the column [Freight] exists on the Orders table. Birst accomplishes this by creating a special measure table definition called a combination fact as illustrated below.

Figure 21 - Example "combo-fact" table definition. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

Notice that this measure table definition is mapped to both the Orders physical table and the Order Details physical table (and has the cardinality of the highest table in the set). Birst uses the join defined to create this combination table. Any column which is used in a logical query from this table would cause both tables to be used in the underlying physical query using that join. Birst also has an internal concept called Inheritance where a measure table or dimension table can inherit joins and columns from other definitions. This allows combo facts to also have definitions for each column in their parent measure tables, and to leverage all joins defined for those tables as well.

Time Dimensions
Birst has considerable machinery for dealing with time. Birst comes pre-loaded with a series of physical tables that enable a variety of time shifting and aggregation. Birst provides 32 different time dimension tables out of the box enabling such things as quarter-ago, year-ago, trailing twelve months, etc. Each date column, when loaded into the Birst database, gets converted into an integer DateID. This allows for efficient joining to date dimension tables. Birst can join the same measure table to many different date dimension tables, enabling time-transformed versions of each metric. As a result, for this example, in addition to [Sum: Quantity], Birst also creates [YAGO Sum: Quantity], [TTM Sum: Quantity] and 30 other variants automatically as illustrated below.

24

Understanding Birst

Figure 22 - Example subject area illustrating time transformations.

All these variants are available in the Birst subject area (the set of columns available for reporting) and can be used together in the same report. Note that if you select multiple Analyze by Date columns on a source, the logical measure table will be created for each of these dates (each with its own join to the time dimension). This allows you to analyze a fact multiple ways against timesomething which is very difficult to do in other BI tools.

Processing Data Using the Data Model


Once a data model is created, you need to process the data to ensure that the raw data is loaded into the Birst database. This is accomplished by going to the Process Data tab in the Admin module and clicking the Process Now button on the Process New Data page as shown below.

25

Understanding Birst

Figure 23 - Preparing to process data

Not all sources need to be loaded into the Birst database at the same time. You can assign a processing group to a source and then choose which processing groups to load each time. In this simple example, no processing groups have been created so you would simply select All then click the Continue button.

Figure 24 - Processing group selection.

After processing, Birst will display a summary of database loads as shown below.

26

Understanding Birst

Figure 25 - Processing results

You can then click on the Details link to see how many rows were loaded into each table, whether there were any errors, and how long each table took to process (as shown below).

Figure 26 - Processing details.

Once data is processed into the Birst database, you can build reports and dashboards against its logical model.

Warehouse Tables
Warehouse tables are created in Advanced and Automatic spaces. While Discovery tables take the autocreated dimensional model and map it onto the same physical data structure that was uploaded into Birst, warehouse tables take a very different approach. They create physical tables that match the newly created logical model and load them with data from the original tables. Why would Birst go through all

27

Understanding Birst the trouble to change the structure of the data to match a dimensional model? The answer becomes clear when you consider what happens when a data model is loaded more than once or when data is loaded from more than one source or application. Discovery tools (as defined by Gartner and other analyst firms) were primarily designed to load a single set of data into memory and report on it. If new data presents itself, the dataset is reloaded. (There are some very cumbersome means to handle incremental updates, but they are limited and can require extensive scripting/coding.) A data warehouse, on the other hand, is designed fundamentally to represent data that has been accumulating over time. It is also designed to create common business dimensions that can be used across all types of data. This conformed dimensionality allows business users to pick elements from common dimensions (like customer, product, time, etc.) and analyze any data from any source where there may be a connection.

Logical Data Model

Product Orders Region

Customer

Logical Columns Logical Dimension Tables Logical Measure Tables Hierarchies

Time
Mapping and Security Metadata

Physical Database Tables

PRODUCT TABLE
CUSTOMER TABLE

ORDER TABLE REGION TABLE TIME TABLE

Product Orders Region

Customer

Physical Tables Physical Columns Joins

Time

Staged Data Sources

Warehouse Tables

Figure 27 - Logical and physical data model structures for warehouse tables.

Creating a Warehouse Data Model


Now, lets examine creating a warehouse data model using the same tables as above. In this case we will use Birsts built-in data model automation tools to get started quickly. We begin by doing the following steps: Upload the data Enable sources (if this is an Advanced space, they are disabled by default)

28

Understanding Birst Establish the key/foreign key relationships

Figure 28 - Source data model for building a warehouse data model.

Technically, setting the key/foreign key relationships isnt necessary when building a warehouse model. However, Birst can use these relationships to automate the creation of the metadata that is required for a warehouse model: hierarchies and grains. Before doing that, it is good to understand dimensions, hierarchies and grains. In order to fully illustrate these concepts, consider the dimensional warehouse model below which is derived from the data model in Figure 28.

Figure 29 - Example data warehouse model.

In this example, there are three dimensions: Product, Employees and Order Details. There are 6 dimension tables (in green) and 6 measure tables (in red). How and why Birst created these specific

29

Understanding Birst dimensions and tables is discussed in the Application Building for Data Warehouse Models section on page 54, but for now lets use this model to explore how a dimensional model works in the first place.

Dimensions/Hierarchies
One of the most fundamental concepts in dimensional modeling is that of a dimension or hierarchy. While technically these are slightly different concepts, Birst uses the two terms interchangeably as hierarchical information is included in the definition of a dimension. A hierarchy is a set of related columns that can be used to categorize data. For example, you might have a Product hierarchy that contains all attributes for a product. By convention, a hierarchy is named for the unique items at its lowest level. Consider the following set of columns in the source data set:

Figure 30 - Product related columns in a source data set.

These columns provide information about each product and are called attributes. For every product, there is one and only one value for each of these attributes. In addition to grouping these related columns into a hierarchy, you can also define how those attributes are grouped hierarchically into levels.

Levels
Each hierarchy must define at least one level, but there may be multiple levels (each with only one parent). Beyond that, the use of levels is up to the designer of the data model. Figure 31 below shows an example set of levels and hierarchies that apply to our example data model. The top of each of these trees indicates the hierarchy name and the items below indicate specific levels. For example, the Employees hierarchy has one level, Employees whereas the Products hierarchy has two levels, Categories and Products.

30

Understanding Birst

Figure 31 - Example set of hierarchies and levels.

Levels tell Birst the following: How to store dimension data. For each level, a physical table is created (in this example, 6 levels means 6 tables). The physical name of a dimension table is, by convention, DW_DM_[DIMENSION NAME]_[LEVEL NAME]; for example the dimension table at the Products level in the Products dimension is DW_DM_PRODUCTS_PRODUCTS. What items are unique in the data. Each level must have a unique key which becomes the primary key of its physical dimension table, the level key. How to join dimension tables with other tables. All joins are done using level keys, ensuring that many-to-many joins cannot occur. Initial drill behavior.

Levels tell Birst the logical places to store and aggregate data. In fact, Birst creates a physical table for each level in a hierarchy. As a result of this, it is often most convenient to use levels that match the data sources being used to load a data warehouse. That way, each row of source data loads directly into a single row in a physical dimension table. Internally, Birst guarantees that each level key is unique within the dimension table at that level. This prevents bad joins on non-specific keys. Why is this necessary? Consider a source table of 1 million records, where one of the columns in that table was defined as the level key and that column had identical values in every row. If that table was used to load both a fact table and a dimension table and Birst tried to join those two tables together based on that key and Birst did not make that column unique for the dimension table, there would be a dangerous join condition called a Cartesian join (or many-to-many join). There would be 1 million records in the dimension table and 1 million records in the fact table where every record in the dimension table joined to all records in the fact tablecreating 1 trillion combinationsand that would be very bad for the system to try to do. When loading dimension tables, Birst actually groups by level keys, ensuring that each level key value is unique. So in this case, the dimension would only be loaded with 1 recordthe unique value of the level key. As Birst uses the level key to join between different tables, it is generally best to pick integer columns over string/varchar columns, and shorter columns over longer ones. So in the case of the Products dimension, [ProductID] seems like a better choice than [ProductName]. Figure 32 shows an example of

31

Understanding Birst setting the level key for the Products level of the Products hierarchy on the Hierarchies page under Define Sources in the Admin module.

Figure 32 - Example of setting a level key on the Hierarchies page.

Every column in a dimension is assigned to a level and will load into the corresponding physical dimension table at that level. By default, each column that is part of a level key is assigned to the level of that key. All other columns, unless specifically indicated, are assigned to the lowest level in the hierarchy, otherwise referred to as the dimension key level (because the level key for that level is the primary key for the entire dimension). To summarize, these are the rules for assigning columns to levels (and hence physical tables): Level key columns are assigned to their respective level. Columns which are specifically targeted to a level are assigned as such. All other columns are assigned to the lowest level in the hierarchy, the dimension key level.

How do you assign a column to a level or hierarchy? This is done on the original data sources via the Manage Sources page in the Birst Admin module. The data warehouse model (dimension and measure tables) is automatically generated via the application building process and cannot be directly edited. Instead, columns are targeted to hierarchies and levels on the sources themselves. This ensures that the warehouse data model is completely in sync and consistent with source tables and there is a clear path for moving data from sources to the warehouse tables. Figure 33 shows the example of the Products source.

32

Understanding Birst

Figure 33 - Example showing how columns are targeted to hierarchies and levels on Manage Sources page.

In this example, you can see that all columns except [UnitPrice] are targeted to the Products hierarchy. [UnitPrice] is clearly a measure and not likely to be something that users would like to group by. Since [ProductID] is the level key of the Products level, you can see that it is targeted explicitly to that level. Similarly, [CategoryID] is the level key of the Categories level and it is also targeted directly. All the other columns are not explicitly targeted to a specific level. Given the assignment rules above, all columns except [CategoryID] and [UnitPrice] are then assigned to the Products dimension table. Figure 34 shows the dimension table mapping for the Products table. Here you can see how all of the logical columns map to physical columns in the Birst database. As expected, all the columns targeted to the Products level and all of the columns targeted only to the Products hierarchy/dimension but to no level are present. However, there are three other columns in this table: [CategoryID], [Products1249892458], and [Categories-1697024394]. These columns are the result of two things: Since Categories is a level above Products, Birst needs to be able to join between the Products table and the Categories table. It does so using the level key from the Categories table. Hence, [CategoryID] is present on this table. For each level key, Birst also creates and manages another column called a surrogate key column. Birst will use this column for joins whenever possible instead of columns for the original level key. For many applications, this results in a significant speedup in performance. (Surrogate keys are discussed on page 38.) In this example, [Products1249892458] and [Categories1697024394] are surrogate key columns.

33

Understanding Birst

Figure 34 - Dimension table definition of Products dimension table. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

What happens when this space is processed? First, lets use the processing history by clicking on the Details link for a load on the Process New Data page (we are assuming that data has already been processed in this example). Figure 35 shows the high level overview of what Birst does during a load. You can see that 77 product records were loaded into the Products level dimension table. You can also see which columns were loaded.

Figure 35 - Hierarchy load detail for the Products source.

What exactly happens when a dimension table is loaded from a data source? Since the data source has already been loaded into the Birst database and is now a table, Birst can issue a series of SQL statements to the Birst database to load the appropriate records in the table. Figure 36 shows the two queries Birst issued to the database to populate the dimension table. The first updates any records in the dimension where the natural key already exists (this is critical as Birst by definition must ensure that each natural key is unique in a dimension table). Birst then issues a second query to add records for new natural keys that occur during this data load.

34

Understanding Birst

Figure 36 - Queries used to load the Products dimension table

The SQL issued by Birst ensures that the latest information is stored in the dimension for each natural key. In other words, this dimension contains the unique information for each product that can be used to categorize all measures. If you update information about a product, Birst will be able to show reports and analysis using that updated data without changing data in any of the measure tables in the data warehouse. In this way, the Products table contains master data, thus satisfying Birsts goal of providing a single version of the truth. If we then move to the View Processed Data page (shown in Figure 37) we can select View Data for the Products dimension table to see the results of the load (shown in Figure 38).

Figure 37 - View Processed Data page.

35

Understanding Birst

Figure 38 - View processed data for the Products dimension table.

On the View Processed Data page, Birst shows the logical names of the columns in the table as headers. Figure 38 shows us a few things: You can see that the surrogate key for the products level [Products1249892458] has been populated. Birst does this during the load of the dimension table. The surrogate key for the Categories level, [Categories-1697024394], is also populated. Birst loads dimensions starting at the highest level first. This allows lower levels to look up the surrogate keys of higher levels during a load (which can be seen in the SQL above with the left outer join to the Categories table). A new column [Load ID] has been added. Birst tags every record with the load ID that was used to populate it. This allows Birst in some cases to back out data for a bad load, but also to be able to tie together records in measures and dimensions that were loaded at the same time.

In this data model, we have chosen to have one additional level in our Products hierarchy: Categories. This is not only convenient, but matches the levels of aggregation in the source data. In fact, as described in the Application Building for Data Warehouse Models section on page 54, Birst automatically creates a level for each source in the data. This can be seen in Figure 39 showing the setup of the Categories data source.

36

Understanding Birst

Figure 39 - Categories data source setup

Here, each column is targeted to the Products dimension. [CategoryID] is also specified as being at the Categories level. As [CategoryID] is the level key of the Categories level, this is required. Note that you do not need to specify this assignment on each source independently. When a column is named in a source, Birst uses that name as the logical column name. That means that any other column in another source targeted to the same dimension will be treated as the same column. In this way, we can have multiple physical data sources that load the same logical column. Birst will merge the instances of [CategoryID] that come from both the Products source and the Categories source. For the other columns in this table that do not specify a level directly, how does Birst know that they should also be assigned to the Categories level? This has to do with grains which are described on page 38. In the grain for the Categories source, the level of Categories in the Products dimension is chosen. That means that all columns in this source in the Products dimension are at the Categories level or higher. As a result, Birst generates the dimension table definition shown in Figure 40 below.

Figure 40 - Dimension table definition for the Categories level in the Products dimension. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

37

Understanding Birst Figure 41 shows the data actually loaded into the Categories dimension table (not the presence of the surrogate key for the Categories level [Categories-1697024394] and the Load ID column).

Figure 41 - View data for the Categories dimension table.

One final note about levels: they are also utilized by the Birst user interface to provide drilling behavior (you can drill from a level to a lower level). However, using levels for this purpose is not necesary as Birst provides drill maps that can be used to define the drill behavior when clicking on a column in a report. Instead, it is recommended to think of levels as providing data modeling information. Best practice is to control drilling behavior using drill maps rather than levels.

Surrogate Keys
If level keys can be used to join between different tables, why does Birst go through the trouble of creating surrogate keys for them? Birst could just use the original columns from the level key (also referred to as the natural key because these columns naturally occur in the data source). However, surrogate keys serve a couple of important purposes: Performance. Surrogate keys are integers which are the fastest data type for a database to index and join. Integers are much easier to use than other more complicated types. More importantly, if a level key contains two or more columns (a compound key), a surrogate key allows the database to join using just one column (and therefore just one index). Databases have to do much more work when joining by compound keys. In many cases this can have a dramatic effect on performance. Type II Slowly Changing Dimensions. For type II dimensions, the surrogate key records the join relationship at a specific moment in time. Every time there is a change to a dimension record and a record is retired, a new surrogate key is created. The natural key does not change. In this way you can join from the fact to the dimension using the surrogate key to see things as was (or as they actually happed) and Birst can also join using the natural key to see historical facts as is or against the current values in the dimension.

Grains
Grains are the concept where those new to Birst often have the most difficulty. The concept is quite simple at its core, but as it is one of the most unique concepts of Birst, it can be easily misunderstood.

38

Understanding Birst Fundamentally, a grain is simply a collections of levels. It is needed to describe the content of measure tables. By definition, dimension tables only contain content from one dimension, and therefore only need one level to describe them. A measure table, on the other hand, can be joined to many dimensions and Birst must understand what levels a measure table can join to. In the current example, lets consider the simplest grain: one level. Figure 39 on page 37 shows the [CategoryID] column as a measure column. That tells Birst that it must create a measure table to hold this value so that it can build the logical measures [# CategoryID] and [Count Distinct: CategoryID]. This measure table will be loaded from the Categories source and have one row per [CategoryID]. Also, this measure table will need to connect to the Categories dimension table in order to slice a query by columns such as [CategoryName]. There is no other dimension level that also applies to this source, and as a result its grain consists merely of the Categories level in the Products dimension.

Figure 42 - Specifying the grain for the Categories data source in the Mange Sources page.

Now, lets explore what happens physically. Birst builds a measure table that contains a record for each row in the Categories data source (for each data load). That measure table definition is shown in Figure 43 below.

39

Understanding Birst

Figure 43 - Measure table definition for table at the Products.Categories grain. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

From this definition you can see the following: The physical name of the table is DW_SF_CATEGORIES_DAY. By convention, a measure table is named DW_SF_[List of levels contained in the grain]. Birst implicitly adds [Time.Day] to the grain of every measure table. This allows you to not only slice data by the dimensions chosen, but also by the date when the data was loaded. Both the natural and surrogate keys for the dimension levels in the grain are contained on the fact table. Birst includes the keys for higher levels than the ones specified in the grain if possible. This is referred to as carrying keys downhill. When Birst loads this fact table, if it has a source that it can use to look up these higher level keys, it will. Why? This allows Birst to join this measure table directly to a higher level dimension table without having to join through the larger, lower level table that is in the grain. This simplifies queries and improves their performance, in many cases significantly. As a result, you can see the keys to the time dimension at the Week, Month and Quarter level as well as the one at the Day level. The measure table is set to the cardinality of the data source, in this case 8 (as there are 8 categories in the source). The only column in the source designated as a measure, [CategoryID], is mapped once for every allowable aggregation rule. In this case only COUNT and COUNT DISTINCT are allowed. (Generally you can also SUM and AVG integers, but in this case [CategoryID] is a level key column, and Birst assumes that such math doesnt make much sense for level keys).

What happens during an actual load? Figure 44 shows the query used by Birst to load the measure table with data from the Categories data source. Here you can see that Birst is loading the measure table with any applicable columns from the Categories data source as well as looking up the surrogate keys from the already-loaded dimension table. (Birst loads dimensions before it loads measures.)

40

Understanding Birst

Figure 44 - Load SQL used to load the Categories, Day measure table.

Figure 45 shows the data ultimately loaded into the measure table. You can see that in addition to the [CategoryID] column, which was the only column specifically designated as a measure, Birst loaded the table with the natural and surrogate keys for dimension tables that belong to the grain of this measure table (and higher levels if possible). Birst also populates the Load ID column.

Figure 45 - Data loaded into the Categories, Day measure table.

Given that Birst populates the Load ID column, and that Birst only adds data each load (unlike a dimension table that can update data), it is possible to back out the measure data for a given load. Birst allows you to Delete Last Load Only which will delete all measure data associated with that load. While dimension records that may have been updated cannot be returned to their prior state, deleting measure data allows you to re-run a load. All dimension data will be appropriately updated and measure data will contain the correct results. The most challenging aspect of grains is understanding that they are a logical construct and that a grain can include a level even if no columns in that dimension are in the measure table. By adding a level to a grain, you are essentially telling Birst the following: If it is possible to add a foreign key to the measure table that links it to a dimension table at a given level, do so. This measure table logically contains data at this level.

The first point is fairly obvious. With keys set up between tables, Birst can join them and this linkage allows you to slice and dice measures by columns present in dimension tables. But why is the second point important? It has to do with another data model automation feature of Birst: combination facts. (Note that the words fact and measure are used interchangeably in Birst).

41

Understanding Birst

Working with Grains: Determining Whether a Measure Table is at a Lower Level than Another
Each data warehouse data source in Birst loads a physical measure table (assuming at least one column is chosen as a measure on the data source). These measure tables include keys to dimension tables and therefore allow an end-user to slice and dice measure data by linked dimension data. But some useful calculations cannot be constructed using only a single measure table in conjunction with dimension tables. It may be a calculation that involves columns from two different levels of data. It may be that the keys to a particular dimension do not exist on a measure table and you need to join through another measure table to navigate that relationship. It may also be something more complex like a many-tomany relationship. Each of these cases involves the use of more than one measure table to create a calculation. As discussed in the Discovery Table section, Birst can create logical measure table definitions called combo-facts that join together two or more physical measure tables for this purpose. For warehouse tables, Birst uses grain information to figure out which combo-facts are possible. In order to understand this, lets consider a more sophisticated case involving the Orders table and the Order Details table from our current example. In this data set, there is one record for every order made by a customer. Also, each order is taken and placed by one employee. Figure 46 shows the data in the Orders data source where each order has several dates (Order Date, Shipped Date and Required Date), some attributes of that order (ShipVia method and Freight), and links to other tables (CustomerID and EmployeeID).

Figure 46 - Data in the Orders data source on the Manage Sources page.

Figure 47 shows the setup of this data source (in the Manage Sources page under the Define Sources tab in the Admin module). Here most columns are targeted to the Order Details dimension (that is, ones that would be good attributes to categorize by). Also, some columns are indicated as measures and all three dates are selected for analysis. Also note that the columns which are level keys in the Order

42

Understanding Birst Details dimension are also targeted to their respected levels. EmployeeID is the level key of the Employees dimension and as a result it is also targeted to a level.

Figure 47 - Setup of Orders data source on the Manage Sources page.

So what should the grain be for this table? You can answer that question by asking whether this source logically contains data at specific levels? Since each row in this data source corresponds to an order, [Order Details.Orders] should clearly be part of the grain (and thus we can join this measure table to the Orders dimension table). Since in this dimensional model Orders is part of the same dimension as Customers, and in fact is at a lower level, our linkage to the Customer dimension table is handled as part of the Order Details dimension. Also, since each order has an associated employee, [Employees.Employees] should also be part of the grain. Figure 48 shows the setup of the grain for the Orders data source. Notice that no level under the Products dimension is selected. This is because there is no relationship between an order and any table in the Products dimension (the relationship with Products is defined at a lower level).

43

Understanding Birst

Figure 48 - Grain for the Orders data source.

Based on this grain (the combination of the two levels [Employees.Employees] and [Order Details.Orders]and implicity [Time.Day]) Birst creates the measure table definition in Figure 49. The table is named DW_SF_DAY_EMPLOYEES_ORDERS, based on the grain naming convention. The cardinality of the table is estimated by Birst by multiplying the cardinality of the sources in the grain (this is is not a perfect solution as data sparsity will mean many fewer rows will actually be in the table, but this method gives intuitively correct results when Birst is choosing which measure tables to use in a query). The columns designated as measures, [OrderID], [EmployeeID], [ShipVia] and [Freight] are mapped as measure columns. Also, Birst includes both the natural and surrogate keys for the Employee, Order and Time dimensions. Notice that Birst also includes the natural and surrogate key [CustomerID]. This is an example of carrying keys downhill where Birst looks up the keys from the higher level Customer table. Of particular note are the last three columns that are mapped. Recall that for the Orders data source we selected Analyze by Date for the [OrderDate], [ShippedDate], and [RequiredDate] columns. When date columns are marked this way, Birst creates a column in the corresponding measure table that joins to the Time dimension at the Day level. It does so by converting each date into a Day ID integer that also is the primary key of the [Time.Day] dimension table. This allows us to slice and dice this fact by each of these three dates.

44

Understanding Birst

Figure 49 - Measure table definition at the Orders, Employees grain. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

The Orders table contains useful information about each order. However, it is the Order Details table that contains a record of each actual purchase. Each order might be for multiple products. Figure 50 shows multiple detail rows for each OrderIDone for each product that is ordered along with the corresponding price and quantity.

45

Understanding Birst

Figure 50 - Order Details data source.

Figure 51 shows the setup of each column in the Order Details source. In this case, since every column is numerical, they are all selected as measures. Columns that might be used for attributes to group data are also targeted to the Order Details dimension.

Figure 51 - Column attributes of the Order Details data source.

46

Understanding Birst Where is all of this leading? Figure 52 shows how the grain is defined for this source. As expected, [Order Details.Order Details] and [Products.Products] are selected as this table contains columns at those levels as well as keys to the dimension tables at those levels.

Figure 52 - Grain for Order Details data source.

But why is [Employees.Employees] chosen? The Order Details table does not contain any key to the Employees level, let alone any column that is part of the Employees dimension. Understanding this one concept is the key to mastering Birsts data warehouse automation capabilities. The central point is to realize that even though the Order Details table does not itself have a key to the Employees dimension table, every record in the Order Details table does have a matching Employees table record. In other words, if you know which Order Details record you are on, you should also know which employee you are dealing with. This is because each Order Details record is tied to an Order record, and that Order record is tied to one employee. So, logically, Order Details is also at the Employees level. You should, therefore, be able to slice and dice order detail information by employee last name, for example. As a result, the grain of this table should be [Employees.Employees], [Order Details.Order Details], [Products.Products] (and [Time.Day] which is implicit). This is more than an academic observation. It is central to how Birst knows what tables it can join both at runtime and when loading the warehouse. To see this, Figure 53 shows the measure table created for this grain. As expected, there are columns created for the natural and surrogate keys for the levels in the grain. There are also measure columns created for each desired aggregation.

47

Understanding Birst

Figure 53 - Measure table definition for Order Details, Employees, Products, Day grain. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

48

Understanding Birst But there are other columns on this measure table that did not originate on the Order Details source. These include the natural and surrogate keys to the Employees dimension table (e.g., [EmployeeID]) as well as the Day ID keys for the order associated with this order line item. These columns must come from data at the grain of the Orders table. In fact, as you can see in Figure 54, Birst looks up this information in the fact table already loaded by the Orders table. Recall that Birst loads dimensions in order from the highest level down to the lowest such that it can lookup higher level keys as needed during a load. The same is true for measure tables. Birst loads measure tables in order from higher level to lower level in order to look up information for more detailed measure tables from measure tables at higher levels of aggregation.

Figure 54 - Processing detail for loading the measure table at the Order Details, Employees, Products, Day grain.

You can see in this insert statement LEFT OUTER JOINS to dimension tables in order to look up surrogate keys. But there is also a LEFT OUTER JOIN to DW_SF_DAY_EMPLOYEES_ORDERS, the measure table loaded from the Orders data source. Intuitively, this may seem obvious. However, we need to examine how Birst determined that the Order Details measure table was indeed at a lower level than the Orders measure table. Within a single dimension, it is clear how to determine if a single level is above or below another level. But for a grain, which is a set of levels, how does Birst determine this? It makes sense that you should always be able to roll-up data from a lower level grain to a higher level grain. Accordingly, a grain (the parent grain) is considered to be at a higher level than another grain (the child grain) if it satisfies the following rules: Every dimension in the parent grain also exists in the child grain. Every matching level in the child grain should be at or below the level in the same dimension in the parent grain.

In the case of our Orders/Order Details tables, if we did not have [Employees.Employees] selected on the Order Details grain, the above rules would have been violated and Birst would have concluded that Order Detail data could not be sliced by employee. It would then not have been able to look up an Employee ID for each Order Detail record. As part of the application building process, Birst concluded

49

Understanding Birst that since there was a one-to-many relationship between Orders and Order Details, in order to make the above conditions true, it should set the grain of Order Details to include [Employees.Employees]. Put differently, if the above conditions exist, Birst can join two different fact tables together. It can join them together at load time (as illustrated above) to carry keys from higher level measure tables down to lower level ones. It can also join measure tables together at runtime with combo-facts as described below. One important note, given that Birst uses the combination of level keys in a grain as the combined key (the grain key) to join two measure tables: Birst requires that the grain key be unique. Why is this the case? Since the grain key is a combination of all of the columns that are present in the level keys of the levels that make up the grain, if it is not unique, using it to join two fact tables can create a Cartesian join condition which must be avoided. When loading data, Birst tests to see if the grain key is unique. If not, it will not create joins between measure tables and lookup values. Generally speaking this is the result of bad data modeling and easily fixed by figuring out which additional columns are needed to make it unique, then adding them to a new level, then adding that level to the grain. Lets look at an example of how carrying keys downhill from one measure table to another is useful. Consider the following very basic report of amount ordered each quarter.

SELECT [Time.Year/Quarter], [OrderDate: Sum: Quantity] FROM [ALL] Figure 55 - Simple report leveraging keys carried downhill.

Since the Order Date is not present on the Order Details source, you must somehow get that information from the Orders measure table. In the case of carrying keys downhill, this is done at load time and Birst can therefore directly join the Order Details measure table to the time dimension as shown below in the physical query actually generated:
SELECT DW_DM_TIME_DAY1_.Year_Quarter$, SUM(CAST(DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_.Quantity$ AS BIGINT)) FROM S_N3b72eb7c_65e2_4383_a197_6a78fa568ba2.DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_

50

Understanding Birst
INNER JOIN dbo.DW_DM_TIME_DAY DW_DM_TIME_DAY1_ ON DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_.Time$OrderDate_Day_ID$=DW_DM_TIME_DAY1_.D ay_ID$ GROUP BY DW_DM_TIME_DAY1_.Year_Quarter$,DW_DM_TIME_DAY1_.Quarter_ID$

Carrying keys downhill handles most common cases where information needs to be combined between measure tables. However, there are other cases where this must be done at run time. Because Birst knows that Order Details is at a lower grain than Orders, it also creates a combination measure table definition (combo-fact) that allows you to join these two measure tables in a query. Figure 56 shows the definition of the same combined measure calculation that we used in the Discovery Table section: [Average Freight]. Birst allows columns from both measure tables to be used in the formula since it understands that one is at a lower level than the other and can be combined at runtime.

Figure 56 - Definition of Average Freight for a set of warehouse tables.

Figure 57 shows the internal definition of the measure table for the combo fact. The example shows that this measure table definition inherits from both the measure table for the Orders source and the measure table for the Order Details source. The measure table is defined using a join between the two measure tables using the level keys of the grain.

51

Understanding Birst

Figure 57 - Combo-fact measure table definition example. (This view is not available from the Birst UI; it is provided for illustration purposes only.)

52

Understanding Birst So, to understand the average freight by region you could create the report shown below in Figure 58:

Figure 58 - Report in Birst Designer illustrating combo facts.

That report generates the following logical query:


SELECT [Order Details.Region] 'F3',[Average Freight] 'F1' FROM [ALL]

Which in turn, generates the following physical query:


SELECT DW_DM_ORDER_DETAILS_CUSTOMERS2_.Region$ AS 'F3', AVG(CAST(DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_.Quantity$/DW_SF_DAY_EMPLOYEES_OR DERS1_.Freight$ AS FLOAT)) AS 'F1' FROM S_N3b72eb7c_65e2_4383_a197_6a78fa568ba2.DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_ INNER JOIN S_N3b72eb7c_65e2_4383_a197_6a78fa568ba2.DW_SF_DAY_EMPLOYEES_ORDERS DW_SF_DAY_EMPLOYEES_ORDERS1_ ON DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_.Order_Details$Orders831130011$= DW_SF_DAY_EMPLOYEES_ORDERS1_.Order_Details$Orders831130011$ AND DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_.Time$Day_ID$=DW_SF_DAY_EMPLOYEES_ORDERS1_ .Time$Day_ID$ AND DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_.Employees$Employees200464383$= DW_SF_DAY_EMPLOYEES_ORDERS1_.Employees$Employees200464383$ INNER JOIN S_N3b72eb7c_65e2_4383_a197_6a78fa568ba2.DW_DM_ORDER_DETAILS_CUSTOMERS DW_DM_ORDER_DETAILS_CUSTOMERS2_ ON DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS0_.Order_Details$Customers120094747$= DW_DM_ORDER_DETAILS_CUSTOMERS2_.Customers120094747$ GROUP BY DW_DM_ORDER_DETAILS_CUSTOMERS2_.Region$

53

Understanding Birst The query joins both measure tables together and the calculation utilizes columns from both measure tables.

Application Building for Data Warehouse Models


Once hierarchies are defined, columns are targeted, and grains are set, Birst builds the application. As with Discovery tables, application building involves setting up logical measure and dimension tables to map onto physical structures as outlined below: Create a logical dimension table for each level (that has columns targeted to it). Generally each data source will have its own hierarchy level, but there are cases where multiple sources will feed the same level. Create snowflake dimension tables to pre-join multiple levels within a dimension. Create a logical measure table for each grain. Generally each data source will have its own unique grain, but there are cases where multiple sources will feed the same fact. Create logical joins between the logical measure and dimension tables. Create joins between measure tables and the time dimension. This includes creating different versions of each measure depending on how the table is joining to time. Create combination measure tables that create 2 and 3-way combinations of measure tables.

Note that application building happens automatically. Every time a change is made to the dimensional model, the measure and dimension table definitions are rebuilt to match. This information is provided here so you can understand how it works.

Update Warehouse Model


The previous section discussed how Birst generates a dimensional data warehouse when measure, level and grain information is added to these sources. After labeling each column as a measure, attribute or both, and labeling which levels are part of the grain of each source, Birst takes care of creating and loading a data warehouse. By creating a dimensional data model, Birst will automatically build the data warehouse. A designer can create their own data model, deciding which hierarchies they want to create and mapping them onto the various sources. Alternatively, Birst has another powerful tool for data model automation called Update Warehouse Model. Using this tool, Birst will actually derive a dimensional data model for your source data based solely on the initial key/foreign key relationships setup. Consider the same data model we looked at earlier:

54

Understanding Birst

Figure 59 - Input data model for Update Warehouse Model.

Once the key/foreign key relationships are defined for a set of data sources, you can click the Update button on the Data Flow page. Birst will leverage information in the data source data model and automatically create the dimensional data model shown in Figure 60 below.

Figure 60 - Example auto-created dimensional data model

So how does Birst determine which hierarchies to build and how to target them? Update Warehouse Model follows the following procedure: 1. Create a hierarchy for each data source: The hierarchy name is the same as the source. There is one level in the hierarchy also with the same name. The level key of this level is the same as the primary key of the data source. 2. Each potential attribute column (varchars and integers) is targeted to the associated hierarchy of its source. 3. Each potential metric (numeric values) is designated as a measure. 4. Each date is chosen to Analyze by Date.

55

Understanding Birst 5. Every source is set to a grain that includes only the level with its own name. 6. Hierarchies with dependent levels are folded together into multi-level hierarchies: A dependent level is one that has a one to many relationship with another level. For example, the Customers level is initially in its own Customers hierarchy. But since there is a one-to- many relationship (as indicated by the data model) between Customers and Orders, Customers can be merged into the Orders dimension as a level higher than Orders. This folding continues until it is not possible to combine any more hierarchies. 7. Utilize key/foreign key relationships to set implied grains. In the example of Orders above, after folding, the Orders table would have only one level in its grain, Orders. (Customers would simply be a higher level in the same dimension.) But Employees also has a one-to-many relationship to Orders. For levels such as Employees that have this one-to-many relationship but cannot be folded into a hierarchy because another was already chosen, they are added to the grain of the source. Hence, after this process, Orders has two levels in its grain: Orders and Employees. This process of creating levels, folding hierarchies together, and identifying implied grains using key/foreign key info creates a complete dimensional data model and maps it onto the source data. This algorithm is very effective and, except in very advanced cases, eliminates the need for designers to create and manage hierarchies and grains manually. As new sources are added, Birst will update this data model. However, if data has been processed, Birst will lock the existing hierarchies and grains in place and only create new ones for the new sources preserving any initial structures that were created. This allows you to create a model and add to it over time.

Automated Data Modeling (Automatic Mode)


The goal of Automatic mode is to completely automate the creation and management of the data model. In Automatic mode (in other words, when creating an Automatic space), Birst completely controls the following: Dimension/hierarchy definitions Column targeting Grain defintions

It does this by blending the automated key/foreign-key detection in Discovery mode with the capabilities of Update Warehouse Model. In fact, Automatic mode is the equivalent of: Loading sources with Discovery mode and having it attempt to discern primary keys for each table and foreign key relationships based on common naming. Converting the space and sources to Advanced mode. Executing Update Warehouse Model to create hierarchies, fold them, and identify grains.

Since Automatic mode ties together the entire process of data model discovery and dimensional data modeling, many users choose to load a dataset into Birst in an Automatic space and let it discover a data model. Then, they will convert the space to an Advanced space to make small adjustments if necessary or in order to understand the process that has occurred.

56

Understanding Birst

Development and Deployment


This section describes the purpose of the Birst repository and describes how a data model in Birst is deployed to the data warehouse.

The Repository
A central object for Birst is the repository. The repository contains all data model metadata associated with a space. It is composed of two broad types of metadata: Base metadata Generated metadata

Base metadata defines the following: Data sources o Source columns and their types and targeting o Grains o Scripts o Primary keys and foreign keys Hierarchies and Levels Custom measures and attributes Bucketed measures Session and Repository variables Aggregate definitions

Generated metadata is created by the application building process and applies to the following: Dimension table definitions, including the time dimension and snowflake dimension tables. Measure table definitions, including combination measure table definitions. Join definitions

The Birst model is one where a designer edits higher level base metadata and the runtime metadata that is used by the system to create queries or physical tables is generated from that base metadata during the application building process. This process occurs automatically at key points during the development process. The metadata that exists at runtime needs to match exactly the data definitions as they exist in the Birst database. If it did not, it might be possible for Birst to issue a query against a table or column which didnt actually exist. As a result, Birst maintains two repositories at any one time: a development repository and a production repository. A development repository is one that is currently being developed. It may be in an intermediate state, with sources partially defined or transformations not fully written. It may contain definitions of newly

57

Understanding Birst added sources. Once development is complete and you want to promote that development repository, Birst takes a snapshot of the development repository, runs a final application build process, and promotes that repository to production. In some cases, like changing the definition of a custom measure, the changes do not require any alterations to the physical data or data model in the Birst database. In that case, those changes are immediately promoted to production and visible to report writers and dashboard users. However, if changes have been made to the development repository that require changes to the physical database, they will not be promoted to production without processing data. Processing data is critical because it allows Birst to make any necessary schema migrations to keep the underlying database in synch with the metadata. During processing, the database is locked because it will be in an inconsistent state (some parts may be fully updated while others are not) and queries could result in erroneous results. If customers require continuous uptime, Birst accomplishs this by using a shadow space that can be swapped into and out of production after processing completes.

Birst Data and Data Model Processing


Processing data for Birst is an important process. It serves several functions: Updates the Birst database schema to reflect the latest data model (including adding columns, keys, etc.) Bulk loads raw data into the Birst database using the data model definitions supplied in data sources. For Discovery sources, these tables serve as the basis for queries. Loads dimensions and fact tables with the appropriate data from the original raw sources.

Birst has three distict phases that run during processing for each load group: GenerateSchema. This step compares the desired data model for measure and dimension tables with the one currently in the database. If the database is not current, Birst will alter the database schema to match. Typically this includes creating tables and adding new columns. LoadStaging. This step takes all the raw data that has been uploaded to Birst and modeled in data sources and bulk loads it into the Birst database. This process is generally very fast. The question might be asked, why doesnt Birst upload data directly into the database? The reason is that source data is often dirty and data type standards are not always adhered to. So Birst runs all data through a preprocessing phase to ensure numbers, dates and other fields are appropriately converted to items of the correct type. Then, only the columns of interest are actually loaded into the database. LoadWarehouse. If warehouse tables are defined in a space (in other words, if there are data sources other than Discovery sources which do not generate dimension and measure tables), Birst loads the dimension and measure tables from the sources. It does so in a very specific order to ensure hierarchical relationships are maintained: 1. Dimension tables are loaded first. Within a dimension, the highest levels are loaded first so that lower levels can be loaded with surrogate keys that point to higher levels in order to create snowflake relationships.

58

Understanding Birst Since dimension tables ensure that there is only one record for each natural key, re-running a load is OK as it will not result in duplicated records. When loading a dimension, Birst first updates any existing dimension records with the new data (in the case of Type II, it retires old records and creates new ones). It then inserts any new records that correspond to natural keys which were not previously loaded in the dimension. This ensures that every natural key is unique.

2. Measure tables are loaded next. Measure tables are loaded from highest grain to lowest grain so that measure tables can lookup keys and columns in higher level measure tables and thereby directly join do dimensions where the relationships may have been defined at a higher level. Measure tables also are loaded with the surrogate keys that point to dimension tables ensuring fast, and consistent joins across facts using common dimensions. In the case of degenerate dimension tables, they are loaded with their associated measure table. For each load, Birst starts by deleting any measure table records that might be in the measure table with the current ID. This means that if for whatever reason a load failed, it can be re-run without any damage to the measure table. In the case of a measure table loaded by multiple sources, Birst will load the measure table from the first source, it will then use the grain key to update any records that may also have additional columns coming from other sources. This allows one to merge data into a single fact when some columns are coming from one source, and others from another source.

Birst tracks the progress of data processing through a special table in each space. Every step in the processing process is recorded so that success and failure can be understood at the end of a load.

59

Anda mungkin juga menyukai