Anda di halaman 1dari 7

What is a Datawarehouse?

A single, complete and consistent store of data obtained from a variety of different sources made available to end users in what they can understand and use in a business context What is Datawarehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference A data warehouse is a subject-oriented integrated time-varying non-volatile collection of data that is used primarily in organizational decision making Who are called end users? What you mean by end user? or an !"#$ application end user is who is entering data or reading a particular report from the system% or a &ank teller he'she should enter the account number see the balance or deposit the check etc or a customer representative job he'she must see the cust information to be more effective (o end users are the ones who actually utilize the application we develop% What kind of information management wants to know)%% What is DSS? *ecision (upport (ystem, its not a product its an environment which is a blend of technologies +ainly used by business to take some strategic decisions based on the trends ,comparing current fiscal to previous- and project the numbers based on history and some parameters .ot to run the business, !"#$ systems takes care of the day to day activities of a business% /xample (A$ !rder +anagement takes care of the orders which the organization gets% 0n the *(( we collect all the data to do the analysis

/xamples of (trategic info &e number one in (outh east Asia region in banking software solutions in next 1 years 0ncrease the customer base by 234 every year

+aintain customer satisfaction at 534 as we have it today !nline #ransaction processing system /xamples of !"#$ systems are order management, $ayroll processing etc #ypically follows 6rd normal form, while designing the database All the *+" types are active *eal with specific data ,customer x, product z etcDSS .o change in the data ,.o updates and deletes8ueries based on time period set of products, set of customers etc +aintains the history 7sed mainly for analytics ,trend analysis, customer behavior etc-

OLTP

OLTP +ore *+" operations ,7pdate, *elete, 0nserts$oint 8ueries 9ery specific while issuing :ueries "ess history ,approximately ; months to 2 year7sed for day today activities ,must to run the business-

Datawarehouse Vs Datamarts *ata Warehouse /nterprise-wide !rganized /-< +odel (tructure for corporate view of data *ata +art *epartmental (tar (chema based , acts and dimensions 8uick turn around ,up and running as there are less stakeholdersBuilding Datawarehouse Top down approach #he corporate which actually values the analysis% #his leads the true single version of truth which is what every org wants% =entralized control over the enterprise and mainly business process% "ong time to implement as there are lot of stake holders >igh level commitment from the management =ase (tudy ? 6+

Bottom up approach o &uild departmental data marts first o 0ts pretty fast and easier to implement as the data mart is owned by single department o o o o .arrow view, may not comply with corporate goals% <edundant information in multiple data marts 0slands of information% !nce the number of *+ go beyond certain numbers then maintaining becomes very difficult

Different approaches in building DW Distributed Approach 9arious departments can start creating different data marts% /ach can start working independently and see the <!0 in a short span% 0n the long run integrating these data adds the complexity and =ost will be higher as there are more systems to maintain% entrali!ed Approach =entralized data warehouse contains the data in one place, easy to answer any business :uestion% 0n the long run this has the cost advantage over the non-centralized data warehouse% .ot very easy to implement as it needs more time and resources% <!0 won@t be seen until the implementation is completed% (o recommended approach is to implement the centralized data warehouse is, start with one subject area and keep adding one subject area at a time, this way organization will get to see the <!0 at various stages% Data "ranularit# Aranularity is the level of details we want to store in the data warehouse% or a retail store, $oint of (ale ,$!(- is the lowest granularity information available% or banking it@s the account level details based on every day transactions% As *(( is learning towards analyzing the data as a whole, not necessarily the data warehouse will have all the details up to daily transactions% *aily sales by date, product and customer Weekly sales by product and customer +onthly sales by product and customer 8uarterly sales by product and customer

Bearly sales by product and customer

Difference between star schema and snowfla$es (tar (chema makes :ueries run faster as the number of tables to join is less% 0n star schema all the hierarchies defined per dimension will be stored in single table% (o the data redundancy is high% 0n snow flake we can have one more table for the hierarchy% #hat@s the difference between the star schema and snow flake schema% Dimension Table o *efine business in terms already familiar to users o Wide rows with lots of descriptive text o (mall tables ,about a million rowso Coined to fact table by a foreign key o heavily indexed o typical dimensions time periods, geographic region ,markets, cities-, products, customers, salesperson, etc% %act Table =entral table o mostly raw numeric items o narrow rows, a few columns at most o large number of rows ,millions to a billiono Access via dimensions Star schema (tar schema is optimized for :ueries% Bou will have some level of redundant data available in star schema based data model Snowfla$e (now flake won@t have much of redundant data as most of the dimensions will have a look table% #his way the number of joins between the tables will become more% &oth have advantages and dis advantages, so analyze the end user@s re:uirements and space constraints to pick the best Denormali!ation .ormalization in a data warehouse may lead to lots of small tables =an lead to excessive 0'!@s since many tables have to be accessed *e-normalization is the answer especially since updates are rare

Data Mart
Org_id Name Cat_desc Manager_nm Org_id Emp_id Cust_id Policy_id Cal_date Payment_amt Claims_amt Cust_id Name Address Phone state

Emp_id Ename Join date sal

Cal_date Cal_week Cal_month Cal_quarter Cal_year

Policy_id Name Start_date End_date

DW Data Model
Department_dim" #!$PE"%%& Customer_dim"#!$PE"%%&
Cust_sur_id Cust_id Cust_nm city State Country Geo_cd Geo_nm Curr_rec_ind Dept_sur_id Dept_id Dept_name Mgr_nm Curr_rec_ind

Sales_fact

Employee_dim
Emp_id Name State Country Geo_cd Geo_nm

Cust_id Emp_id Dept_sur_id Prod_id Cal_ddate ty cost Price margin !ot_amt

Product_dim
Prod_id Prod_desc Product_sub_cat Product_family

Calendar_dim
Cal_date Cal_week Cal_month Cal_quarter Cal_year

*ifference between a *W and data marts *efine Aranularity of *ata Warehouse "ist diff between !"#$ and *W Which model you choose to create for your *W? Why? *ifferent components of *W, define the same What is (lowly changing dimension, explain all three with examples

2% *ifference between analytical :ueries and point :ueries% ,Aive examples for each one1% *efine attributes of *imension and act tables

Anda mungkin juga menyukai