Anda di halaman 1dari 8

Amit

1 what is .DIFF BETWEEN OLTP N OLAP(3 PTS)


A=
OLTP==> SIMPLE TANSACRION READ/WRITE QUERY SMALL DB
OLAP==> COMPLEX TRANS. READ QUERY HUGE DB

2. WHAT ARE 3 TIERS IN DATA WAREHOUSE ARCH.


A=
BOTTOM TIER-DATA MARTS,DATA WAREHOUSE,METADATA REPOSITORY
MIDDLE TIER-OLAP SERVER
TOP TIER- DATA MINING TOOLS,ANALYSIS TOOLS ETC

3.WHAT ARE ADVANTAGE. OF DATA WAREHOUSE


A=
MARKETING WEAPON
CUSTOMER SUPPORT
BUSINESS INTELLIGENCE
DECISION SUPPORT

4.WHAT IS APPLICATION. OF DATA WAREHOUSE


A=
INFORMATION PROCESSING
ANALYTICAL PROCESSING
DATA MINING

5.WHAT IS USE OF META DATA REPOSITORY


A=
STORES DATA ABOUT DATA MARTS AND DEFINITIONS
OF DATA WARE HOUSE

6.TYPES OF OLAP SERVERS


A=
ROLAP--RELATIONAL DBMS
MOLAP--MULTI-DIMENSIONAL VIEWS
HOLAP--COMBINES ROLAP N MOLAP

7.WTA IS DATA MART


A=
CONTAINS SUBSET OF CORPORATE WIDE DATA THAT IS OF
VALUE TO SPECIFIC GROUP OF USERS

8.WHAT ARE TYPES OF DATA WAREHOUSE DESIGN


A=
TOP DOWN--STARTS WITH OVERALL DESIGH N PLANNING
BOTTOM UP--STARTS WITH EXPERIMENTS N PROTOTYPES
COMBINED--COMBINES BOTH OF ABOVE
Amit

9.USE OF DATA PREPROCESSING


A=
IMPROVE QUALITY OF DATA
IMPROVE QUALITY OF MINING RESULT
IMPROVE EFFICIENCY N EASE OF MINING

10.DATA PRE PROCESSING TECH.


A=
DATA CLEANING
DATA INTEGRATION
DATA TRANSACTION
DATA REDUCTION

Q1. What is Data Warehouse?What are the features of data Warehouse?

Q2.what are the steps in the design and construction data warehouse?what are it's
components?

Q3.Explain the three-tier data Warehouse architecture

Q4.What are the applications of Data Warehouse?

Q5.What are the differences between data warehouse and data marts?

1. What is Online Analytical processing?(OLAP)

 OLAP is an interactive system that permits an anlyst to view different


summarizes of multidimensional data.OLAP tools support interactive analysis
of summery information.

2. Explain In Brief OLAP Implementation.

 OLAP is implemented on multidimensional models.In MOLAP servers,data


warehousesdirectly store multidimensional data in special data
structures(eg,arrays) and implement the OLAP operations over this special data
structure.

3. What Is Relational OLAP(ROLAP) System?

 Special schema design:star,snowflake.


Amit
Special indexes:bitmap,multi-table join
Special tuning: maximize query throughput.
Proven tech. tend to outperform specialized MDDB especially on large data sets.
4. What Is Hybrid OLAP(HOLAP)Systems?

 A system which stores some summaries in memory and store,the base data and
other summaries in relational database are called HOLAP

5. Give OLAP Component Of SQL

 Extended aggregation->

1999 standard define a rich set of aggregation function .The new aggression
functions on single attributes are standard deviation and variance.1999 also
supports new class of binary aggregate function,which compute stastical result on
pair of attributes,they include correlation,covariance and regression curves which
give a line approximating.

Q1. What is a data cube ?


----- Data cube is used to represent data along some measure of interest.
Although called a “data cube” ,it can be 2-dimensional, 3-dimensional ,
or higher dimensional.

Q2. What are the various operations on data cube.


----- Summerization or rollup
Drill down
Iceberg-Cubes

Q3. What is a cross tab ?


----- Cross tab is a table where values for one attribute (say A) form the row headers,
values for the another attribute (say B) form the column header, and each cell is identified
by (ai, bj) where ai is value for A & bj is value for B.

Q4. What are the different data preprocessing techniques ?


----- Data Integration
Data Cleaning
Data normalization
Data reduction.

Q5. What are the problems with data ?


----- Missing attributes and missing attribute values
Amit
Improper types

Q6. What is the need for data preprocessing ?

---- Data quality is a key issue with data mining


To increase the accuracy of the mining, we hav to perform data preprocessing
Other-wise Garbage in => Garbage out.

• What is semantic integration?


Ans:- a coolection of viewsto give a group of users a uniform presentation of
relevant data from multiple databases is called semantic integration.

• what is data integration?

Ans:- consolidate different source into one repository usually data


warehousing(schema reconsolidation).
a) using metadata.
b) correlation analysis.

• what is different stratergies of reduction?

Ans:-
1) data cube aggregation.
2) attribute subset selection.
3) dimensionally reduction.
4) numerosity reduction.
5) concept hierarchy generation.

• what is data cleaning?

Ans:-Real world data tend to be incomplete, noisy,inconsistent,to fill in missing


values,smooth out noise and correct inconsistencies in the data.

• which methods to used for data cleaning?

Ans:-
1) look for missing values.
2) Ignore the toples.
3) Fill missing values manually.
4) Use global constant to fill in missing values.
5) Use most probable valuetofill in missing values.
Amit
Q:What all sub-processes are genarally involved in Data Transformation?
Ans:Smoothing,Aggregations,Generalization,Normalization,Attribute
construction.

Q:Name the different strategies for data reduction?


Ans:Data cube aggreation,Attribute subset selection,Dimensionality
reduction,Numerosity reduction,Discretization and concept hierarchy generation.

Q:What is the use of data reduction?


Ans:To obtain a reduced representation of data set that is much smaller in
volume,yet closely maintains the integrity of the original data.

Q:What is the aim behind data transformation?


Ans:To transform or consolidate into forms appropiate for mining

Q:What is mean by smoothing?


Ans:Removing noise from the data is called smoothing.

Q:compare r-olap & m-olap.

Q:name any two operations on data cube that you have performed in your practical.

Q:what is hybrid Olap?give its benifits

Q:explain data cleaning

A:real-world data tend to be incomplete,noisy and inconsistent.Data cleaning routines


attempt to fill in missing values,smooth out noise and correct inconsistencies in the data.

1. Fill in missing values (attribute or class value):


* Ignore the tuple: usually done when class label is missing.
* Use the attribute mean (or majority nominal value) to fill in the missing value.
* Use the attribute mean (or majority nominal value) for all samples belonging to
the same class.
* Predict the missing value by using a learning algorithm: consider the attribute
with the missing value as a dependent (class) variable and run a learning algorithm
(usually Bayes or decision tree) to predict the missing value.
2. Identify outliers and smooth out noisy data:
* Binning
Amit
o Sort the attribute values and partition them into bins (see "Unsupervised
discretization" below);
o Then smooth by bin means, bin median, or bin boundaries.
* Clustering: group values in clusters and then detect and remove outliers
(automatic or manual)
* Regression: smooth by fitting the data into regression functions.
3. Correct inconsistent data: use domain knowledge or expert decision.

Q:explain data transformation

A:In data transformation,the data is transformed or consolidated into forms appropriate


for mining.Data trnsformaion involve the following:
1. Normalization:
* Scaling attribute values to fall within a specified range.
o Example: to transform V in [min, max] to V' in [0,1], apply V'=(V-Min)/
(Max-Min)
* Scaling by using mean and standard deviation (useful when min and max are
unknown or when there are outliers): V'=(V-Mean)/StDev
2. Aggregation: moving up in the concept hierarchy on numeric attributes.
3. Generalization: moving up in the concept hierarchy on nominal attributes.
4. Attribute construction: replacing or adding new attributes inferred by existing
attributes.

Q:Explain data reduction

A:data reduction techniques can be applied to obtain a reduced representation of the data
set that is much smallerin volume,yet closely maintains the intregrity of theoriginal
data.that is mining on the reduced data set shuld be more efficient yet produce the same
analyical result.

1. Reducing the number of attributes


* Data cube aggregation: applying roll-up, slice or dice operations.
* Removing irrelevant attributes: attribute selection (filtering and wrapper
methods), searching the attribute space (see Lecture 5: Attribute-oriented analysis).
* Principle component analysis (numeric attributes only): searching for a lower
dimensional space that can best represent the data..
2. Reducing the number of attribute values
* Binning (histograms): reducing the number of attributes by grouping them into
intervals (bins).
* Clustering: grouping values in clusters.
* Aggregation or generalization
3. Reducing the number of tuples
* Sampling
Amit

Q 1>what is the difference between OLTP query and OLAP query ?


Ans=>OLTP query: 1.used to modify data.
2. Require fully updated database.
OLAP query: 1.doesn’t modify data.
2. Doesn’t require fully updated database.
Q 2>what is OLAP?
Ans=>It is a online analytical processing.
Q 3>what is OLTP?
Ans=> It is a online transaction processing .OLTP requires that the data are completely
Up to date

Q 4> what are the operations OLAP tool supports?


Ans=> supports: 1 slice operation
2 Dice operation
3 Roll up operation
4. Drill down
5. Visualization operation

Q 5>what are the different kinds of OLAP tool used?


Ans=> ROLAP, MOLAP, HOLAP

Q1:What is noisy data?


Ans: noise is random error or variance in a measured variable.so,it is necessary to smooth
out the data to remove the noise.

Q2:What is data Integration?


Ans: Data mining often requires data integration which combines data from multiple
sources into coherent data store.
Q3:How to transform data?
Ans:data are transformed or consolidated into forms appropriate for mining.methods are:
Smoothing
Aggregation
Generalization
Normalization
Attribute construction
Q4:what are back end tools?
Ans:data extraction
Data cleaning
Data transformation
Load
Refresh
Q5:what are data cube measures?
Ans:data cube measure is numerical function that can be evaluated at each point in data
cube space.
Amit