Anda di halaman 1dari 4

TEST I Answer all of the following questions. 1. Define KDD. 2. What is pattern evaluation? 3.

. Differentiate Data Warehouse from Database. 4. What is Data Mining? 5. What is Binning method? 6. What is Data Mart? 7. What is meant by frequent pattern? 8. What is a strong association rule? 9. Differentiate Data characterization and discrimination. 10. Differentiate OLAP and OLTP. 11. What is data reduction? 12. Define concept hierarchy. 13. What is a fact table? 14. Mention the schemas used for designing a multi-dimensional data model. 15. What is a star-net query model? TEST II Answer all of the following questions. 1. What is meant by an outlier? 2. Define Classifier. 3. What is meant by Test Attribute? 4. What is back-propagation? 5. Define Cluster. 6. What is a spatial database? 7. What is a decision tree? 8. Differentiate classification and prediction.

9. What are supervised and unsupervised learning techniques? 10. Mention the metrics used for measuring dissimilarity between two interval-scaled attributes.

Home Assignment I 1. Define each of the following data mining functionalities: characterization, discrimination, association, classification, prediction, clustering, and evolution and deviation analysis. Give examples of each data mining functionality, using a real-life database that you are familiar with. 2. Suppose your task as a software engineer at Big-University is to design a data mining system to examine their university course database, which contains the following information: the name, address, and status (e.g., undergraduate or graduate) of each student, and their cumulative grade point average (GPA). Describe the architecture you would choose. What is the purpose of each component of this architecture? 3. Data warehouse design: (a) Enumerate three classes of schemas that are popularly used for modeling data warehouses. (b) Draw a schema diagram for a data warehouse which consists of three dimensions: time, doctor, and patient, and two measures: count, and charge, where charge is the fee that a doctor charges a patient for a visit. (c) Starting with the base cuboid (day; doctor; patient), what specific OLAP operations should be performed in order to list the total fee collected by each doctor in VGH (Vancouver General Hospital) in 1997? (d) To obtain the same list, write an SQL query assuming the data is stored in a relational database with the schema. fee(day; month; year; doctor; hospital; patient; count; charge) 4. In real-world data, tuples with missing values for some attributes are a common occurrence. Describe various methods for handling this problem. 5. Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order): 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. Using the data given above: (a) Plot an equi-width histogram of width 10. (b) Sketch examples of each of the following sample techniques: SRSWOR, SRSWR, cluster sampling, stratified sampling.

6. Consider suitable transactional data. Assume that the minimum support and minimum confidence thresholds are 20% and 60%, respectively. (a) Find the set of frequent item sets using the Apriori algorithm. Show the derivation of Ck and Lk for each iteration, k. (b) Generate strong association rules from the frequent item sets found above.

Home Assignment II 1. Following Table consists of training data from an employee database. The data have been generalized. For a given row entry, count represents the number of data tuples having the values for department, status, age, and salary given in that row.

Let salary be the class label attribute. Given a data sample with the values systems", junior", and 20-24" for the attributes department, status, and age, respectively, what would a naive Bayesian classification of the salary for the sample be? 2. Write an algorithm for k-nearest neighbor classification given k, and n, the number of attributes describing each sample. 3. By considering a relevant example explain how to use regression model for predicting the value of a numeric attribute. 4. Mention the steps of K-Means Clustering technique. Consider example data points and explain your steps. 5. With the aid of suitable examples, write short notes on the following databases. a) Temporal Database

b) Multi-media Database

Anda mungkin juga menyukai