MSC IT DW&DM Practicals DWM SQL Server

M.Sc.I.T.
Part 1 Data Warehousing and Mining
Practical No. 1
Data Transformations in a Data Warehouse
Title :
Create a warehouse in MS SQL Server 2000 and import various databases from
external sources such as excel, access, .txt file by using DTS tool.
Theory :
Data from various data sources and in various formats is stored in a data
warehouse by process of ETL (Extraction Transformation Loading).
Transformations on data, which are in various formats results in a common
format which is compatible with the database.
DBMS used: Microsoft SQL Server 2000.

Details:
3 data files are chosen for data transformation, a .txt file, a .xls file, a .mdb
file. Chosen files are transformed using the Data Transformation Services of
MS SQL Server and stored in one database. Any transformed file from that
database is chosen and stored in another database.
Steps in Data Transformation:

1) Run the Enterprise Manager in Microsoft SQL Server 2000.
2) Create two databases named dbase1 and dbase2.
3) Create a new package using Data Transformation Services option.
4) Select appropriate drivers for appropriate data files to be transformed.
M.Sc.I.T. Part 1 Data Warehousing and Mining
Screenshot:
5) Establish a connection to database dbase1 using OLEDB for SQL Server.

Screenshot:
5) Start the transformation of a data file by clicking on Transform Data Task in

Task menu. Select the source as one of the data files and target as database
dbase1. Perform this step for respective data files.
Screenshot:
6) Right click on one of the arrows of Transform Data Task and click on
Properties option. Select the Destination Tab option and click on OK in
Create Destination Table form.
Screenshot:
7) Having created the table with same name as the data file name, click on
Transformations tab in Transform Data Task Properties form.
Click on Delete All, then Select All, then New.
Next select Copy Column and Press OK.
Press OK again in Transformation Options.
Screenshot:
8) Click on OK in Transform Data Task Properties. Next right click on recently

modified Transform Data Task and click on Execute Step to complete the
transformation of the file.
Screenshot:
9) Repeat steps 5 to 8 for transformations of other data files.

10) Establish a connection to database dbase2 using OLEDB for SQL Server
11) Click on Transform Data Task option in Task Menu. Select source as database
dbase1 and target as dbase2.
Screenshot:
12) Perform steps 5 to 8 for loading the transformed data in the database
dbase2 from database dbase1. Click on Execute Step in right click menu of
Transform Data Task between the two databases.

Screenshot:
Practical No. 2
Querying the database
Title:
Create and schedule a DTS Package using Data Transformation services (DTS)
tool. Fire at least 5 queries on the database.
BUILD OLTP SYSTEM

Screenshot:
CREATE DATABASE: nw_mart

Create an empty nw_mart database in SQL Server.
OPEN SQL Query Analyzer

(Start->Program->MS SQL Server->Query Analyzer)
Click Ok
In SQL Query Analyzer window
CODE: (For Creating and Dropping Dimension tables)

if exists (select * from sysobjects where id = object_id(N'[dbo].[Customer_Dim]')
and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Customer_Dim]
GO
if exists (select * from sysobjects where id = object_id(N'[dbo].[Employee_Dim]')
drop table [dbo].[Employee_Dim]
GO
if exists (select * from sysobjects where id = object_id(N'[dbo].[Product_Dim]')
drop table [dbo].[Product_Dim]
GO
if exists (select * from sysobjects where id = object_id(N'[dbo].[Sales_Fact]') and
OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Sales_Fact]
8

GO
if exists (select * from sysobjects where id = object_id(N'[dbo].[Shipper_Dim]')
drop table [dbo].[Shipper_Dim]
GO
if exists (select * from sysobjects where id = object_id(N'[dbo].[Time_Dim]') and
OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[Time_Dim]
GO
CREATE TABLE [dbo].[Customer_Dim] (
[CustomerKey] [int] IDENTITY (1, 1) NOT NULL ,
[CustomerID] [nchar] (5) NOT NULL ,
[CompanyName] [nvarchar] (40) NOT NULL ,
[ContactName] [nvarchar] (30) NOT NULL ,
[ContactTitle] [nvarchar] (30) NOT NULL ,
[Address] [nvarchar] (60) NOT NULL ,
[City] [nvarchar] (15) NOT NULL ,
[Region] [nvarchar] (15) ,
[PostalCode] [nvarchar] (10) NULL ,
[Country] [nvarchar] (15) NOT NULL ,
[Phone] [nvarchar] (24) NOT NULL ,
[Fax] [nvarchar] (24) NULL
)ON [PRIMARY]
GO
CREATE TABLE [dbo].[Employee_Dim] (
[EmployeeKey] [int] IDENTITY (1, 1) NOT NULL ,
[EmployeeID] [int] NOT NULL ,
[EmployeeName] [nvarchar] (30) NOT NULL ,
[HireDate] [datetime] NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Product_Dim] (
[ProductKey] [int] IDENTITY (1, 1) NOT NULL ,
[ProductID] [int] NOT NULL ,
[ProductName] [nvarchar] (40) NOT NULL ,
9

[SupplierName] [nvarchar] (40) NOT NULL ,
[CategoryName] [nvarchar] (15) NOT NULL ,
[ListUnitPrice] [money] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Sales_Fact] (
[TimeKey] [int] NOT NULL ,
[CustomerKey] [int] NOT NULL ,
[ShipperKey] [int] NOT NULL ,
[ProductKey] [int] NOT NULL ,
[EmployeeKey] [int] NOT NULL ,
[RequiredDate] [datetime] NOT NULL ,
[LineItemFreight] [money] NOT NULL ,
[LineItemTotal] [money] NOT NULL ,
[LineItemQuantity] [smallint] NOT NULL ,
[LineItemDiscount] [money] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Shipper_Dim] (
[ShipperKey] [int] IDENTITY (1, 1) NOT NULL ,
[ShipperID] [int] NOT NULL ,
[ShipperName] [nvarchar] (40) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Time_Dim] (

[TimeKey] [int] IDENTITY (1, 1) NOT NULL ,
[TheDate] [datetime] NOT NULL ,
[DayOfWeek] [nvarchar] (20) NOT NULL ,
[Month] [int] NOT NULL ,
[Year] [int] NOT NULL ,
[Quarter] [int] NOT NULL ,
[DayOfYear] [int] NOT NULL ,
[Holiday] [nvarchar] (1) NOT NULL ,
[Weekend] [nvarchar] (1) NOT NULL ,
[YearMonth] [nvarchar] (10) NOT NULL ,
[WeekOfYear] [int] NOT NULL
10

) ON [PRIMARY]
GO
Explanation:
1) sysobjects
SQL Server sysobjects Table contains one row for each object created within a
database. In other words, it has a row for every constraint, default, log, rule,
stored procedure, and so on in the database. Therefore, this table can be used to
retrieve information about the database.
2) OBJECT_ID
Returns the database object identification number.
Syntax : OBJECT_ID ( 'object' )
Arguments : 'object'
Is the object to be used. object is either char or nchar. If object is char, it is
implicitly converted to nchar.
Return Types : int
3) UNICODE STRING
Unicode strings have a format similar to character strings but are preceded by an
N identifier (N stands for National Language in the SQL-92 standard). The N
prefix must be uppercase. For example, 'Michl' is a character constant while
N'Michl' is a Unicode constant. Unicode constants are interpreted as Unicode
data, and are not evaluated using a code page. Unicode constants do have a
collation, which primarily controls comparisons and case sensitivity. Unicode
constants are assigned the default collation of the current database, unless the
COLLATE clause is used to specify a collation. Unicode data is stored using two
bytes per character, as opposed to one byte per character for character data.
In Microsoft SQL Server, these data types support Unicode data:
nchar
nvarchar
ntext
11

Note The n prefix for these data types comes from the SQL-92 standard for
National (Unicode) data types.
Use of nchar, nvarchar, and ntext is the same as char, varchar, and text,
respectively
4) OBJECTPROPERTY
Returns information about objects in the current database.
Syntax : OBJECTPROPERTY ( id , property )
Arguments :
id : Is an expression containing the ID of the object in the current database. id is
int.
property: Is an expression containing the information to be returned for the object
specified by id. property can be one of these values.
Note Unless noted otherwise, the value NULL is returned when property is not a
valid property name.
Property name
Object type
Description and values returned
IsUserTable
Table
User-defined table.
1 = True
0 = False
4) IDENTITY (Property)
Creates an identity column in a table. This property is used with the CREATE
TABLE and ALTER TABLE Transact-SQL statements.
Note The IDENTITY property is not the same as the SQL-DMO Identity
property that exposes the row identity property of a column.
Syntax: IDENTITY [ ( seed , increment ) ]
Arguments
seed
Is the value that is used for the very first row loaded into the table.
increment
12

Is the incremental value that is added to the identity value of the previous
row that was loaded.
You must specify both the seed and increment or neither. If neither is
specified, the default is (1,1).
13

STEPS:
1. Save Query as Nwmartcreate.sql.
2. Execute the Nwmartcreate.sql script in SQL Server on the newly-created
nw_mart database. This script will create the table and index structure for the
sample database.
Create DTS package
Screenshot:
3.Save the sample DTS package, NorthwindDTS.dts to SQL Server.

4.Run the DTS package by selecting Execute from the Package menu.
This will retrieves the data from the Northwind Database and populate the tables
in newly created nw_mart Database.
14
To schedule the Package NorthwindDTS.dts

Right click NorthwindDTS.dts Package -> Schedule Package
Screenshot:
Queries:
1. Show the total units sold for each product where the required data is earlier
than today.
SELECT Product_Dim.ProductName, Product_Dim.CategoryName,
Product_Dim.SupplierName,SUM(Sales_Fact.LineItemQuantity) AS
[Total Units Sold], Sales_Fact.RequiredDate FROM Sales_Fact
INNER JOIN Product_Dim ON
Sales_Fact.ProductKey = Product_Dim.ProductKey
GROUP BY Product_Dim.ProductName,
Product_Dim.CategoryName,
Product_Dim.SupplierName,
Sales_Fact.RequiredDate
15

HAVING (Sales_Fact.RequiredDate < getdate())
Screenshot:
2. Show the Product name, Category name, Suppliers name,total units sold for
each product where total unit sold is greater than 100 .
SELECT Product_Dim.ProductName,
SUM(Sales_Fact.LineItemQuantity) AS [Total Units Sold],
Sales_Fact.RequiredDate
FROM Sales_Fact INNER JOIN
Product_Dim ON
GROUP BY Product_Dim.ProductName,
Sales_Fact.RequiredDate,
Sales_Fact.LineItemQuantity
HAVING (SUM(Sales_Fact.LineItemQuantity) >100)
16
Screenshot:
3. To View Total Unit Sold Of All the Products Under the category whose
average sale is greater than 50%
SELECT Product_Dim.ProductName,
SUM(Sales_Fact.LineItemQuantity) AS [Total Units Sold]
FROM Sales_Fact INNER JOIN
Product_Dim ON
GROUP BY Product_Dim.Productkey,
Product_Dim.ProductName,
Product_Dim.CategoryName
HAVING (AVG(Sales_Fact.LineItemQuantity) >0.5)
Screenshot:
17
4. To View Company Name and Total Quantity sold for all the Products
SELECT Customer_Dim.CompanyName, Sum(Sales_Fact.LineItemQuantity) AS
TotalQtySold
FROM Sales_Fact, Customer_Dim
WHERE Sales_Fact.CustomerKey=Customer_Dim.CustomerKey
GROUP BY Customer_Dim.CompanyName
ORDER BY Sum(Sales_Fact.LineItemQuantity) DESC
Screenshot:
5. To view total Price of products sold by Employee to Company
18

SELECT Employee_Dim.EmployeeName, Product_Dim.ProductName,
Customer_Dim.CompanyName,
Time_Dim.TheDate, Sales_Fact.LineItemTotal
FROM Time_Dim INNER JOIN (Product_Dim INNER JOIN
(Employee_Dim INNER JOIN (Customer_Dim INNER JOIN Sales_Fact
ON Customer_Dim.CustomerKey = Sales_Fact.CustomerKey)
ON Employee_Dim.EmployeeKey = Sales_Fact.EmployeeKey)
ON Product_Dim.ProductKey = Sales_Fact.ProductKey)
ON Time_Dim.TimeKey = Sales_Fact.TimeKey;
19

Screenshot:
20
Practical No. 3
Single Dimensional OLAP Cube
Title:
Create a database using Analysis Manager (snap-in of Microsoft Management
Console MMC) and create a single dimensional cube by using star schema.
Theory:
A cube stores complex business data in a multidimensional structure.
Data sources, fact table, dimensions and measures are selected for a cube.
The cube is then processed with the selected elements and used for analysis.
(Drill down and Drill up techniques). The cube uses a single dimension following
star schema.
DBMS used: Microsoft SQL Server 2000 along with Analysis Manager.
Steps:
Creating a new database
1) Right click on server which is seen when expanding Analysis Server. Click on
New Database. Supply appropriate Database name.
Creating a new data source
2) Expand the newly created database; right click on Data Sources and select
New Data Source. Select the Provider as Microsoft.Jet.OLEDB.4.0 Provider.
21
Screenshot:
3) Click on Next >>; select or enter the database name; click on Test Connection
to test the connection with database.
Screenshot:
22
Building cube using a wizard

4) Right click on Cubes. Select New Cube Wizard.
Screenshot:
5) Select a fact table from the data source. A fact table is nothing but a table
which contains the measurement facts about particular criteria.
Screenshot:
23
6) Click on Next >; now select the appropriate numeric columns that define
necessary measures.
Screenshot:
Building a single dimension table using star schema

7) Click Next >; select New Dimension to open the Dimension wizard.
Dimension contains relational data which needs to be analyzed.
Screenshot:
24
8) Click Next >; select Star Schema. Select the dimension table.
(Please select the dimension table which has a relationship with the fact table)
Screenshot:
9) Click Next >; select standard dimension.

Screenshot:
25
10) Click Next >.select the levels for dimension.

Screenshot:
11) Click Next >; again click Next > in specify numeric key columns.
Screenshot:
26
12) Cick Next >; in advanced options.

Screenshot:
13) Click Next >; specify appropriate Dimension Name.

Screenshot:
27
14) Click Finish; again click Next >. Select Yes in Fact Table Row Count
message box.
Screenshot:
15) Specify appropriate Cube Name and click on Finish. The schema is now
shown.
Screenshot:
28
Processing the cube

16) Right click on the cube name. Click on Process Cube. Click on Yes for
storage design wizard.
Screenshot:
17) Click Next > in storage design wizard. Select MOLAP as type of data
storage.
29

Screenshot:
18) Click Start followed by Next >in set aggregate options.

Screenshot:
19) Select Process now in storage design wizard to process the cube.
Screenshot:
30
Performing Drill down

20) Select the data tab. Right click on a category of dimension. Select Drill
down.
Screenshot:
31
Practical No. 4
Multidimensional OLAP Cube
Title:
Create a database by using Analysis Manager (snap-in of Microsoft Management
Console MMC) and create a multi dimensional OLAP cube by using snow flake
schema.
Theory:
A cube stores complex business data in a multidimensional structure.
Data sources, fact table, dimensions and measures are selected for a cube.
The cube is then processed with the selected elements and used for analysis.
(Drill down and Drill up techniques). The cube uses multi dimension following
snow flake schema.
DBMS used: Microsoft SQL Server 2000 along with Analysis Manager.
32
Steps:

Screenshot:
33

Screenshot:
Building cube using a wizard

4) Right click on Cubes. Select New Cube Wizard.
Screenshot:
34

5) Select a fact table from the data source. A fact table is nothing but a table
which contains the measurement facts about particular criteria.
Screenshot:
6) Click on Next >; now select the appropriate numeric columns that define
necessary measures.
Screenshot:
Building multiple dimensions table using star schema
35

7) Click Next > select New Dimension to open the Dimension wizard.
Dimension contains relational data which needs to be analyzed.
Screenshot:
8) Click Next >; select Snow flake schema. Select the dimension table.
(Please select the dimension table which has a relationship with the fact table)
Screenshot:
36

9) Select the dimension tables.
Screenshot:
10) Next, Drag drop columns to provide relationships between the dimension
tables, in other words create a join.
Screenshot:
37
11) Click Next >.select the levels for dimension.

Screenshot:
12) Click Next >; again click Next > in specify numeric key columns.
Screenshot:
38
13) Cick Next >; in advanced options.

Screenshot:
14) Click Next >; specify appropriate Dimension Name.

Screenshot:
39
15) Click Finish; again click Next >. Select Yes in Fact Table Row Count
message box.
Screenshot:
16) Specify appropriate Cube Name and click on Finish. The schema is now
shown.
Screenshot:
40
Processing the cube

17) Right click on the cube name. Click on Process Cube. Click on Yes for
storage design wizard.
Screenshot:
18) Click Next > in storage design wizard. Select MOLAP as type of data
storage.
Screenshot:
41
19) Click Start followed by Next >in set aggregate options.

Screenshot:
19) Select Process now in storage design wizard to process the cube.
Screenshot:
42
Performing Drill down
20) Select the data tab. Right click on a category of dimension. Select Drill
down.
Screenshot:
Practical No. 5
43
Mining model using Relational Data
Title:
Create a mining model using Relational Data using Microsoft Decision Tree.
Theory:
A mining model is a data structure that represents discovered knowledge based
on analysis of OLAP or relational data. Mining models can be used to make
predictions.
DBMS used: Microsoft SQL Server 2000 with Analysis Manager.
Steps:

Screenshot:
44
Screenshot:
Creating a Mining model

4) Right click on Mining Model seen when expanding the database just
created. Click on New Mining model to open Mining Model Wizard.
45

Screenshot:
5) Click Next to select the source type. Select Relational Data.

Screenshot:
46

6) Click Next to select the case tables. A case table contains the case key that
uniquely identifies the case that needs to be analyzed.
Screenshot:
7) Click Next to select Data Mining Technique. Select Microsoft Decision

Trees.
Screenshot:
47
8) Click Next to create and edit joins. Joins are automatically created if there is
cardinal relationship between the relational tables.
Screenshot:
9) Click Next to select the case key column.

Screenshot:
48
10) Click Next to select the input and predictable columns. An input column
contains the base information for analysis. A prediction column contains the
prediction the mining model makes with respect to the input columns.
Screenshot:
11) Click Next to performing finishing actions of Mining Model Wizard.

Supply appropriate model name and select Save and Process now.
Screenshot:
49
12) After the processing is completed, the schema will be shown.

Screenshot:
13) Select Content Tab to view the decision tree. Select a particular option of
Prediction Tree combo box to view its appropriate tree.
Screenshot:
50
Practical No. 7
Mining Model using OLAP data
Title:
Create a mining model using OLAP data.
Theory:
A mining model is a data structure that represents discovered knowledge based
on analysis of OLAP or relational data. In this practical the mining model uses
the OLAP cube built with snow flake and star schema.
DBMS used: Microsoft SQL Server 2000 along with MMC snap-in Analysis
Manager.
Steps:
51

Screenshot:
3) Click on Next >>; select or enter the database name in Data Link Properties;
click on Test Connection to test the connection with database.
52

Screenshot:
Building a multidimensional OLAP cube

(Note: for the following steps foodmart2000.mdb is chosen)
4) Right click on Cubes, click on New Cube Wizard to open the cube wizard.
Screenshot:
53

5) Click Next > to select the fact table. Select the fact table as sales_fact_1998.
Screenshot:
6) Click Next > to select the numeric columns that define the measures.
Select the measures as store cost, store sales, unit sales.
Screenshot:
Creating Time Dimension

54
7) Click Next > to create the dimensions for the cube. (Note: 4 dimensions will
now be created).
Click New Dimension to open the dimension wizard. Click Next > and select
star schema.
Screenshot:
8) Click Next > to select the dimension table. Select time_by_day as the
dimension table and click Next > to select the type of dimension.
55

Screenshot:
9) Select Time dimension and click Next >.

Screenshot:
10) In the create dimension levels, select the default options and click
56

Next >.
Screenshot:
11) Click Next > in advanced options. Supply name of dimension as

time dimension and click Finish.
Screenshot:
Building the customer dimension
57

12) Open the dimension wizard by click New Dimension. Click Next > and
select star schema.
Screenshot:
13) Select the dimension table as customer. Click Next >

Screenshot:
58

14) Select the dimension type as Standard Dimension. Click Next > to
proceed. Now select the following levels of dimension: country, state_province,
city, lname
Screenshot:
15) Click Next > in specify member key column. Again click Next > in
select advanced options. Supply name as customer dimension and click
Finish.
Building the Product Dimension

16) Click New Dimension to open the dimension wizard. Click Next > and
select snow flake schema.
Screenshot:
59
17) Select the following dimension tables: product and product_class and
click Next >.
Screenshot:
60
18) Click Next > in create and edit joins if joins are already present, else drag
drop appropriate columns to create or edit joins.
Screenshot:
19) Click Next > and select the following levels of dimension:
61

product_category, product_subcategory, brand_name.
Screenshot:
20) Click Next > in select member key column. Click Next > in select
advanced options. Supply dimension name as product dimension and click
Finish.
Building Store Dimension

21) Click New Dimension to open dimension wizard. Select star schema and
click Next >.
62

Screenshot:
22) Select the dimension table as store and click Next >.
Screenshot:
63
23) Select standard dimension in select dimension type. Click Next > and
select the following levels of dimension: store_country, store_state, store_city,
store_name.
Screenshot:
24) Click Next in select member key column. Click Next > in select
advanced options. Supply dimension name as store dimension and click
Finish.
25) Click Next > in cube wizard. Click yes in fact table row count message
box. Supply cube name as sales cube and click Finish.
64
Screenshot:
26) The cube schema is now shown

Screenshot:
65
Processing the cube

27) Right click on sales cube and click Process cube. Click yes to save,
again click yes for storage design wizard.
28) Click Next > in storage design wizard. Select type of storage as MOLAP
and click Next >.
Screenshot:
29) Click start followed by continue in start aggregate options.
66
Screenshot:
30) Click Next >. Select Process now and click Finish.
Performing Drill down of the cube
31) Click Data Tab in cube schema. Select appropriate dimension and perform
drill down by right clicking on + signs.
67
Screenshot:
68
Edit a Cube
You can make changes to your existing cube by using Cube Editor.
How to edit your cube in Cube Editor
You can use two methods to get to Cube Editor:
1.In the Analysis Manager tree pane, right-click an existing cube, and then click
Edit.
2.Create a new cube using Cube Editor directly. This method is not
recommended unless you are an advanced user.
In the schema pane of Cube Editor, the fact table (with yellow title bar) and the
joined dimension tables (blue title bars) are seen. In the Cube Editor tree pane,
you can preview the structure of your cube in a hierarchical tree. You can edit the
properties of the cube by clicking the Properties button at the bottom of the left
pane.
69
How to add a dimension to an existing cube

At this point, you decide you need a new dimension to provide data on product
promotions. You can easily build this dimension in Cube Editor.
NOTE: Dimensions built in Cube Editor are, by default, private dimensions; that
is, they can be used only with the cube you are working on and cannot be shared
with other cubes. They do not appear in the Shared Dimensions folder in the
Analysis Manager tree view. When creating such a dimension through the
Dimension Wizard, you can make it shared across cubes.
1. In Cube Editor, on the Insert menu, click Tables.
2. In the Select table dialog box, click the promotion table, click Add,
and then click Close.
3. To define the new dimension, double-click the promotion_name
column in the promotion table.
4. In the Map the Column dialog box, select Dimension, and then
click OK.
70
5.
6.
7.
8.
9.
Select the Promotion Name dimension in the tree view.

On the Edit menu, click Rename.
Type Promotion, and then press ENTER.
Save your changes.
Close Cube Editor. When prompted to design the storage, click No
Design Storage and Process the Cube

You can design storage options for the data and aggregations in your cube. Before
you can use or browse the data in your cubes, you must process them.
How to design storage by using the Storage Design Wizard
1. In the Analysis Manager tree pane, expand the Cubes folder, right-click
the pract6cube, and then click Design Storage.
2. In the Welcome step, click Next.
3. Select MOLAP as your data storage type, and then click Next.
4. Under Set Aggregation Options, click Performance gain reaches. In the
box, enter 40 to indicate the percentage. You are instructing Analysis
Services to give a performance boost of up to 40 percent, regardless of how
much disk space this requires. Administrators can use this tuning ability
to balance the need for query performance against the disk space required
to store aggregation data.
5. Click Start.
6. You can watch the Performance vs. Size graph in the right side of the
wizard while Analysis Services designs the aggregations. Here you can
see how increasing performance gain requires additional disk space
71

utilization. When the process of designing aggregations is complete, click
Next.
7. Under What do you want to do?, select Process now, and then click
Finish.
Note: Processing the aggregations may take some time.
8. In the window that appears, you can watch your cube while it is being
processed. When processing is complete, a message appears confirming
that the processing was completed successfully.
9. Click Close to return to the Analysis Manager tree pane.
72
Practical No. 7
Implementing the Decision Tree Algorithm
The decision tree approach is most useful in classification problems. With
this technique, a tree is constructed to model the classification process. Once the
tree is built, it is applied to each tuple in the database and results in a
classification for that tuple. There are two basic steps in the technique : building
the tree and applying the tree to the database.
Program to solve the Decision tree problem by specifying boundaries :

#include<iostream.h>
#include<conio.h>
#include<string.h>
#include<stdio.h>
#define n 10
class person
{
public :
73

char name[15];
double height;
char gender;
char output1[15];
char output2[15];
person()
{
};
void main()
{
clrscr();
person p[n];
cout<<"Enter the data of the form \nName,gender,height\n";
for(int i=0;i<n;i++)
{
cout<<"For person "<<i+1<<". :";
gets(p[i].name);
cin>>p[i].gender;
cin>>p[i].height;
//For classifying based on output1
if(p[i].height<=1.7)
strcpy(p[i].output1,"Short");
else if(p[i].height<2)
strcpy(p[i].output1,"Medium");
else if(p[i].height>=2)
strcpy(p[i].output1,"Tall");
//For classifying output2
if(p[i].gender=='m' || p[i].gender=='M')
{
if(p[i].height<1.7)
else if(p[i].height<2.1)
else if(p[i].height>=2.1)
}
else if(p[i].gender=='f' || p[i].gender=='F')
{
if(p[i].height<1.5)
else if(p[i].height<=1.8)
else if(p[i].height>1.8)
}
74

}
//Displaying the output
cout<<"Name\tGender\tHeight\tOutput1\t\tOutput2 \n";
for(int ii=0;ii<n;ii++)
{
cout<<p[ii].name<<" \t";
cout<<p[ii].gender<<"
\t ";
cout<<p[ii].height<<" \t ";
cout<<p[ii].output1<<"\t\t";
cout<<p[ii].output2<<endl;
}
getch();
}
/*OUTPUT :
Enter the data of the form
Name,gender,height
For person 1. :Kris
F
1.6
For person 2. :Jim
M
2
For person 3. :Maggie
F
1.9
For person 4. :Martha
F
1.88
For person 5. :Stepy
F
1.7
For person 6. :Bob
M
1.85
For person 7. :Kathy
F
1.6
75

For person 8. :Dave
M
1.7
For person 9. :Worth
M
2.2
For person 10. :Steve
M
2.1
Name
Gender Height
Kris
F
1.6
Jim
M
2
Maggie F
1.9
Martha F
1.88
Stepy
F
1.7
Bob
M
1.85
Kathy
F
1.6
Dave
M
1.7
Worth
M
2.2
Steve
M
2.1
*/
Output1
Short
Tall
Medium
Medium
Short
Medium
Short
Short
Tall
Tall
Output2
Medium
Medium
Tall
Tall
Medium
Medium
Medium
Medium
Tall
Tall
Program to solve the Decision tree problem by using probability distribution :

#include<stdio.h>
#include<conio.h>
void main()
{
int ppl=0;
int i,j,k,male=0,female=0,shrt=0,med=0,lng=0;
int shrtm=0,medm=0,lngm=0,shrtf=0,medf=0,lngf=0;
int gender[10];
float ht[10];
float probms=0,probmm=0,probml=0,probfs=0,probfm=0,probfl=0;
float probs=0,probm=0,probl=0;
clrscr();
for(i=0;i<10;i++)
{
gender[i]=0;
ht[i]=0;
}
printf("Enter total people");
scanf("%d",&ppl);
printf("\n Enter Gender: ");
printf("\n\nFor Male Enter 1, For Female enter 2\n");
for(i=0;i<ppl;i++)
{
76
scanf("%d",&gender[i]);
if(gender[i]==1)
{
male++;
}
else
{
female++;
}
printf("\n\nEnter height :");

for(i=0;i<ppl;i++)
{
scanf("%f",&ht[i]);
if(ht[i]<4)
{
shrt++;
if(gender[i]==1)
{
shrtm++;
}
else
{
shrtf++;
}
}
else
{
if((ht[i]>3.99) && (ht[i]<6))
{
med++;
if(gender[i]==1)
{
medm++;
}
else
{
medf++;
}
}
else
{
lng++;
if(gender[i]==1)
{
lngm++;
}
else
{
lngf++;
77
}
}
probms=(float)shrtm/shrt;
probmm=(float)medm/med;
probml=(float)lngm/lng;
probfs=(float)shrtf/shrt;
probfm=(float)medf/med;
probfl=(float)lngf/lng;
probs=(float)shrt/ppl;
probm=(float)med/ppl;
probl=(float)lng/ppl;
printf("\n");
printf("\nProbability
of
of
of
of
of
of
male short = %f\n",probms);

male medium = %f\n",probmm);
male long = %f\n",probml);
female short = %f\n",probfs);
female medium = %f\n",probfm);
female long = %f\n",probfl);
printf("\nProbability of short people = %f\n",probs);

printf("\nProbability of meduim people = %f\n",probm);
printf("\nProbability of tall people = %f\n",probl);
getch();
}
/* OUTPUT :
Enter total people 10
Enter Gender:
For Male Enter 1, For Female enter 2
2 1 2 2 2 1 2 1 1 1
Enter height :4.5
5.5
5.6
6.2
6
5.3
5.4
4.3
78

4.5
3.8
Probability of male short = 1.000000
Probability of male medium = 0.571429
Probability of male long = 0.000000
Probability of female short = 0.000000
Probability of female medium = 0.428571
Probability of female long = 1.000000
Probability of short people = 0.100000
Probability of meduim people = 0.700000
Probability of tall people = 0.200000
*/
Practical No. 8
Implementing the K Nearest Neighbors Algorithm
K Nearest Neighbors (KNN) is a common classification scheme based on
the use of distance measures. The KNN technique assumes that the entire
training set includes not only the data in the set but also the desired classification
for each item. Thus the training data becomes the model. When a classification is
to be made for a new item, its distance to each item in the training set must be
determined. Only the K closest entries in the training set are considered further.
The new item is them placed in the class that contains the most items from this
set of K closest items.
Program to implement the KNN Algorithm

//K NEAREST NEIGHBOURS (KNN)
//Aim: To implement distance based algorithm
//Program to find k nearest neighbours
#include<stdlib.h>
#include<stdio.h>
#include<conio.h>
#include<math.h>
79
# define MX 10
int mod_sub (int a, int b)
{
if (a<b)
{
return ((a-b)*(a-b));
}
else
{
return ((b-a)*(b-a));
}
}
int find_dist (int x1,int y1,int x2,int y2)
{
int dd;
dd=(int)(sqrt(mod_sub(y2,y1)+mod_sub(x2,x1)));
return dd;
}
int main()
{
int T[MX+1][2];
int tx,ty;
int k,x,y,i,j,temp;
int dist[MX+1][2];
printf("\nEnter training Data (x,y):-\n" );
for(i=0;i<MX;i++)
{
printf("Enter P%d (x,y) :",i);
scanf("%d %d", &T[i][0],&T[i][1]);
}
printf("\nEnter number of neighbours (k):");
scanf("%d",&k);
printf("\nEnter point t (x,y):");
scanf("%d %d",&tx,&ty);
for(i=0;i<MX;i++)
{
dist[i][0]=i;
dist[i][1]=find_dist(T[i][0],T[i][1],tx,ty);
printf("\nDistances of all points from 't' are:");
for(i=0;i<MX;i++)
{
printf("\nPoint %d Distance =%d",dist[i][0],dist[i][1]);
}
//getch();
80

for(i=0;i<MX-1;i++)
{
for(j=0;j<MX;j++)
{
if(dist[j][1]>dist[i][1])
{
//Exchange Point No
temp=dist[i][0];
dist[i][0]=dist[j][0];
dist[j][0]=temp;
//Exchange Point Distance
temp = dist[i][1];
dist[i][1]=dist[j][1];
dist[j][1]=temp;
}
}
} }
printf("\n\nThe nearest %d Distances are:-",k);
for(i=0;i<k;i++)
{
printf("\n Point %d Distance =%d",dist[i][0],dist[i][1]);
}
getch();
return 0;
}
/* OUTPUT :
Enter
Enter
Enter
Enter
Enter
Enter
Enter
Enter
Enter
Enter
Enter
training
P0 (x,y)
P1 (x,y)
P2 (x,y)
P3 (x,y)
P4 (x,y)
P5 (x,y)
P6 (x,y)
P7 (x,y)
P8 (x,y)
P9 (x,y)
Data (x,y)::20 50
:52 44
:64 55
:75 59
:45 76
:87 94
:65 99
:57 94
:20 90
:94 64
Enter number of neighbours (k):5

Distances of all
Point 0 Distance
Point 1 distance
Point 2 distance
Point 3 distance
Point 4 distance
Point 5 distance
points from 't' are:=50

=18
=20
=25
=26
=40
81

Point
point
point
Point
6
7
8
9
distance
distance
distance
distance
=22
=17
=50
=21
The Nearest 5 distances are:Point 7 Distance =17

Point 1 Distance =18
*/
Practical No. 9
Implementing the K-Means Clustering Algorithm
K-Means is an iterative clustering algorithm in which items are moved
among sets of clusters until the desired set is reached. A high degree of similarity
among elements in clusters is obtained, while a high degree of dissimilarity
among elements in different clusters is achieved simultaneously. The cluster
mean of Ki={ti1, ti2,....,tim} is defined as
1 m
mi t ij
m j 1
This algorithm requires that some definition of cluster mean exists, but it
does not have to be this particular one. The desired number of clusters k, is taken
as input.
Program to implement the K-Means clustering algorithm :

# include<iostream.h>
#include<math.h>
# include<conio.h>
void PrintClusters(int k1[9],int k2[9])
82

{
cout<<"K1=";
int i;
for(i=0;i<9;i++)
{
if(k1[i]!=0)
cout<<k1[i]<<"
";
}
cout<<endl<<"K2=";
for(i=0;i<9;i++)
{
if(k2[i]!=0)
cout<<k2[i]<<"
}
";
void main()
{
clrscr();
int num[9]={2,4,10,12,3,20,30,11,25};
int K1[9]={0,0,0,0,0,0,0,0,0};
int K2[9]={0,0,0,0,0,0,0,0,0};
int oldK1[9],oldK2[9];
int noK1=0,noK2=0,m;
double m1,m2,mean,sumK1=0,sumK2=0;
int i,same=0,sameCount;
cout<<"Considering number of clusters required 'k'=2 ";
cout<<endl<<"Set of numbers considered : ";
for(i=0;i<9;i++)
cout<<num[i]<<" ";
m1=num[0];
m2=num[1];
mean=(m1+m2)/2;
int c1=0,c2=0;
for(i=0;i<9;i++)
{
if(num[i]>=m1 && num[i]<=mean)
{
K1[c1]=num[i];
c1++;
}
else
{
K2[c2]=num[i];
83

c2++;
}
}
cout<<endl<<endl<<endl<<"m1= "<<m1<<"
PrintClusters(K1,K2);
//2nd onwards
m2= "<<m2<<endl;
//finding mean of K1 , mean of K2

for(i=0;i<9;i++)
{
sumK1=sumK1+K1[i];
sumK2=sumK2+K2[i];
}
//counting the number of elements in the clusters
for(i=0;i<9;i++)
{
if(K1[i]!=0)
noK1++;
if(K2[i]!=0)
noK2++;
}
m1=sumK1/noK1;
m2=sumK2/noK2;
//distributing into clusters
while(same!=1)
{
//store old values
for(m=0;m<9;m++)
{
oldK1[m]=K1[m];
oldK2[m]=K2[m];
}
cout<<endl<<endl<<"m1= "<<m1<<"
for(i=0;i<9;i++)
{
if(K1[i]!=0)
{
if(abs(m1-K1[i])>abs(m2-K1[i]))
{
//shift into K2
for(int j=0;j<9;j++)
{
if(K2[j]==0)
{
K2[j]=K1[i];
K1[i]=0;
}
m2= "<<m2<<endl;
84
if(K2[i]!=0)
{
if(abs(m1-K2[i])<abs(m2-K2[i]))
{
//shift into K1
for(int j=0;j<9;j++)
{
if(K1[j]==0)
{
K1[j]=K2[i];
K2[i]=0;
}
}
}
}
//Finding the mean again

//finding mean of K1 , mean of K2
sumK1=0;
sumK2=0;
noK1=0;
noK2=0;
for(i=0;i<9;i++)
{
sumK1=sumK1+K1[i];
sumK2=sumK2+K2[i];
}
//counting the number of elements in the clusters
for(i=0;i<9;i++)
{
if(K1[i]!=0)
noK1++;
if(K2[i]!=0)
noK2++;
}
m1=sumK1/noK1;
m2=sumK2/noK2;
//Check if the clusters are same
sameCount=0;
for(m=0;m<9;m++)
{
if(oldK1[m]==K1[m])
85

{
sameCount++;
}
}
if(sameCount==9)
same=1;
PrintClusters(K1,K2);
}
}
getch();
/* OUTPUT :
Enter the number of clusters required : 2
K1=2,3,
K2=4, 10, 12, 20, 30, 11, 25,
m1= 2.5 m2= 16
K1=2,3,4,
K2=10, 12, 20, 30, 11, 25,
m1= 3 m2= 18
K1=2,3,4,10,
K2=12, 20, 30, 11, 25,
m1= 4.75 m2= 19.6
K1=2,3,4,10,11,
K2=12, 20, 30, 25,
m1= 6 m2= 21.75
K1=2,3,4,10,11,12,
K2=20, 30, 25,
m1= 7 m2= 25
K1=2,3,4,10,11,12,
K2=20, 30, 25,
m1= 7 m2= 25
K1=2,3,4,10,11,12,
K2=20, 30, 25,
*/
86
Practical No. 10
Implementing the Agglomerative Algorithm (Single Link)
Agglomerative Algorithms, a type of clustering algorithm, start with each
individual item in its own cluster and iteratively merge clusters until all items
belong in one cluster. Different Agglomerative algorithms differ in how the
clusters are merged at each level. It assumes that a set of elements and distances
between them is given as input A (n x n vertex adjacency matrix). Here A[i,j] =
dis(ti,tj). The output of the algorithm is a dendrogram, DE, which is represented
as a set of ordered triples <d,k,K> where d is the threshold distance, k is the
number of clusters, and K is the set of clusters.
Single Link Technique :

This technique is based on finding the maximal connected components in
a graph. With the single link approach, two clusters are merged if there is at least
one edge that connects the two clusters, i.e. if the minimum distance between any
two points is less than or equal to the threshold distance being considered.
Program to implement the Agglomerative Algorithm (Single Link)

87
import java.util.*;
import java.io.*;
class Agglomerative
{
static void printAdjacency(char c[],int Ad[][],int n)
{
int i,j;
System.out.print("
");
for(i=0;i<n;i++)
{
System.out.print(c[i]+" ");
}
System.out.println();
for(i=0;i<n;i++)
{
System.out.print(c[i]+" ");
for(j=0;j<n;j++)
{
System.out.print(Ad[i][j]+" ");
}
System.out.println();
}
}
static boolean printClusters(int d,ArrayList clus[],int n)
{
int i; int count=1;
boolean stop=false;
for(i=0;i<n;i++)
{
if(!clus[i].isEmpty())
{
System.out.println("Cluster "+count+" has :
"+clus[i]);
count++;
if(clus[i].size()==n)
stop=true;
}
}
System.out.print("Dendrogram triple entry : <"+d+", "+
(count-1)+", {");
count=0;
for(i=0;i<n;i++)
{
if(!clus[i].isEmpty())
{
System.out.print(clus[i]+",");
}
88

}
System.out.print("}> \n\n");
if(stop)
return true;
else
return false;
public static void main(String args[])

{
try
{
int i,j,n,m,p,Kjsize=0;
boolean found=false,stop=false;
BufferedReader br=new BufferedReader(new
InputStreamReader(System.in));
System.out.println("Enter the number of vertices");
n=Integer.parseInt(br.readLine());
ArrayList[] K=new ArrayList[n];

for(i=0;i<n;i++)
{
K[i]=new ArrayList(16);
}
//array of ArrayLists
int[][] A=new int[n][n];

// int[][] A={{0,1,2,2,3},{1,0,2,4,3},{2,2,0,1,5},
{2,4,1,0,3},{3,3,5,3,0}};
char[] Items=new char[n];
System.out.println("Enter the names of the vertices : ");
for(i=0;i<n;i++)
{
Items[i]=br.readLine().charAt(0);
}
System.out.println("Enter the elements of the
Adjacency matrix : ");
for(i=0;i<n;i++)
{
System.out.println("Elements of row "+(i+1)+" : ");
for(j=0;j<n;j++)
{
A[i][j]=Integer.parseInt(br.readLine());
}
}
System.out.println("Adjacency Matrix : ");
printAdjacency(Items,A,n);
///ALgorithm
89

int d=0;
int k=n;
//initialize set of clusters
for(i=0;i<n;i++)
{
K[i].add(Items[i]+"");
}
stop=printClusters(0,K,n);
for(k=n;k>=1;k--)
{
d=d+1;
for(i=0;i<n;i++)
{
for(j=i;j<n;j++)
{
if(A[i][j]==d)
{
//merge();
m=0;
while(m<K[j].size())
{
for(p=0;p<K[i].size();p++)
{
if(K[j].get(m) == K[i].get(p))
{found=true;
break;
}
}
if(!found)
{ K[i].add(K[j].get(m));
K[j].remove(m);
m=0;
}
found=false;
}
}
}
}
stop=printClusters(d,K,n);
if(stop)
break;
}
}
catch(Exception e)
{
System.out.println("An Exception Occured "+e);
}
90
/* OUTPUT :
Enter the number of vertices
5
Enter the names of the vertices :
A
B
C
D
E
Enter the elements of the Adjacency matrix :
Elements of row 1 :
0
1
2
2
3
Elements of row 2 :
1
0
2
4
3
Elements of row 3 :
2
2
0
1
5
Elements of row 4 :
2
4
1
0
3
Elements of row 5 :
3
3
5
3
0
Adjacency Matrix :
A B C D E
A 0 1 2 2 3
B 1 0 2 4 3
C 2 2 0 1 5
D 2 4 1 0 3
E 3 3 5 3 0
Cluster 1 has : [A]
91

Cluster 2 has : [B]
Cluster 3 has : [C]
Cluster 4 has : [D]
Cluster 5 has : [E]
Dendrogram triple entry : <0, 5, {[A],[B],[C],[D],[E],}
Cluster 1 has : [A, B]
Cluster 2 has : [C, D]
Cluster 3 has : [E]
Dendrogram triple entry : <1, 3, {[A, B],[C, D],[E],}>
Cluster 1 has : [A, B, C, D]
Cluster 2 has : [E]
Dendrogram triple entry : <2, 2, {[A, B, C, D],[E],}>
Cluster 1 has : [A, B, C, D, E]
Dendrogram triple entry : <3, 1, {[A, B, C, D, E],}>
*/
92

MSC IT DW&DM Practicals DWM SQL Server

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

MSC IT DW&DM Practicals DWM SQL Server

Diunggah oleh

Hak Cipta:

Format Tersedia

M.Sc.I.T.

Part 1 Data Warehousing and Mining

DBMS used: Microsoft SQL Server 2000.

Steps in Data Transformation:

M.Sc.I.T. Part 1 Data Warehousing and Mining

5) Establish a connection to database dbase1 using OLEDB for SQL Server.

5) Start the transformation of a data file by clicking on Transform Data Task in

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

8) Click on OK in Transform Data Task Properties. Next right click on recently

M.Sc.I.T. Part 1 Data Warehousing and Mining

9) Repeat steps 5 to 8 for transformations of other data files.

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

BUILD OLTP SYSTEM

CREATE DATABASE: nw_mart

M.Sc.I.T. Part 1 Data Warehousing and Mining

OPEN SQL Query Analyzer

CODE: (For Creating and Dropping Dimension tables)

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

CREATE TABLE [dbo].[Time_Dim] (

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

Description and values returned

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

3.Save the sample DTS package, NorthwindDTS.dts to SQL Server.

M.Sc.I.T. Part 1 Data Warehousing and Mining

To schedule the Package NorthwindDTS.dts

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

5. To view total Price of products sold by Employee to Company

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

Building cube using a wizard

M.Sc.I.T. Part 1 Data Warehousing and Mining

Building a single dimension table using star schema

M.Sc.I.T. Part 1 Data Warehousing and Mining

9) Click Next >; select standard dimension.

M.Sc.I.T. Part 1 Data Warehousing and Mining

10) Click Next >.select the levels for dimension.

M.Sc.I.T. Part 1 Data Warehousing and Mining

12) Cick Next >; in advanced options.

13) Click Next >; specify appropriate Dimension Name.

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

Processing the cube

M.Sc.I.T. Part 1 Data Warehousing and Mining

18) Click Start followed by Next >in set aggregate options.

M.Sc.I.T. Part 1 Data Warehousing and Mining

Performing Drill down

M.Sc.I.T. Part 1 Data Warehousing and Mining

M.Sc.I.T. Part 1 Data Warehousing and Mining

Creating a new data source

M.Sc.I.T. Part 1 Data Warehousing and Mining

Building cube using a wizard

M.Sc.I.T. Part 1 Data Warehousing and Mining