Anda di halaman 1dari 2

Answer all questions (30 Marks)

1. Naive Bayesian classication is based on Bayes theorem of posterior probability. Why is naive Bayesian classication called naive? Briey outline the major ideas of naive Bayesian classication. [5 marks] 2. Suppose that the shock-trauma center of a major hospital in a large Midwestern city has contracted with your KDD startup company to develop a prototype system for emergency room triage (the sorting and prioritization of incoming patients). The data that you will work with consists of unsupervised learning instances with the following discrete variables, in alphabetical order: Blood-Loss amount of blood loss upon arrival (NB not a rate) Instrument type of object or weapon causing injury Internal-Bleeding level of internal bleeding of patient Location-of-Injury general body area of injury Priority symbolic attribute indicating how critical treatment is for this patient Risk-of-Shock level of risk of patient going into shock Trauma-Type general type of trauma (e.g., puncture, concussion, laceration) Treatment-Type symbolic attribute indicating the general type of treatment For this problem, suppose that values of observable variables are known. (a) Generate at least 20 synthetic patient cases using the above 8 variables. [5 marks] (b) Using the above cases as training instances, constructing a treestructured Bayesian belief network (BBN) from the data. Show your work! You need not learn conditional probability tables (CPTs), but assign a meaningful causal ow. [10 marks] 3. The following table consists of training data from an employee database. The data have been generalized. For example, 31 . . . 35 for age represents the age range of 31 to 35. For a given row entry, count represents the number of data tuples having the values for department, status, age, and salary given in that row.

department sales sales sales systems systems systems systems marketing marketing secretary secretary

status senior junior junior junior senior junior senior senior junior senior junior

age 31. . . 35 26. . . 30 31. . . 35 21. . . 25 31. . . 35 26. . . 30 41. . . 45 36. . . 40 31. . . 35 46. . . 50 26. . . 30

salary 746K. . . 50K 26K. . . 30K 31K. . . 35K 46K. . . 50K 66K. . . 70K 46K. . . 50K 66K. . . 70K 46K. . . 50K 41K. . . 45K 36K. . . 40K 26K. . . 30K

count 30 40 40 20 5 3 3 10 4 4 6

Let status be the class label attribute. (a) Construct a decision tree from the given data. [8 marks] (b) Given a data tuple having the values systems, 26. . . 30, and 46 50K for the attributes department, age, and salary, respectively, what would a naive Bayesian classication of the status for the tuple be? [2 marks] 4. The data tuples below are sorted by decreasing probability value, as returned by a classier. For each tuple, compute the values for the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). Compute the true positive rate (TPR) and false positive rate (FPR). Plot the ROC curve for the data. Tuple # 1 2 3 4 5 6 7 8 9 10 Class P N P P N P N N N P Probability 0.95 0.85 0.78 0.66 0.60 0.55 0.53 0.52 0.51 0.40 [10 marks]

Anda mungkin juga menyukai