This course prepares students to gather, describe, and analyze data, and use advanced statistical tools to make decisions on operations, risk management, finance, marketing, etc. Analysis is done targeting economic and financial decisions in complex systems that involve multiple partners. Topics include probability, statistics, hypothesis testing, regression, clustering, decision trees, and forecasting.

Course Outline

MATH6200 - Data Analysis (Data Analytics)

Texts & Learning Materials

There is no required textbook: All required class materials will be available on our Blackboard website.

However, some books are very useful if you want to learn more about data analytics and its applications. The

best way to learn is by doing (especially for R programming)

Optional Textbook 1 (highly recommend, easy to follow, with many examples and data sets):

Data Mining and Business Analytics with R, by Johannes Ledolter;

Publisher: Wiley (2013), ISBN-13: 978-1118447147;

Available in Johns Hopkins online library: https://catalyst.library.jhu.edu/catalog/bib_4637122

An Introduction to Statistical Learning with Application in R, by Gareth James, Daniela Witten, Trevor

Hastie, Robert Tibshirani;

Publisher: Springer (2013); ISBN-13: 978-1461471370;

Available in Johns Hopkins online library: https://catalyst.library.jhu.edu/catalog/bib_6591386

Elements of Statistical Learning: Data Mining, Inference, and Prediction, by Trevor Hastie, Robert

Tibshirani and Jerome Friedman, but it requires some mathematical sophistication and goes beyond the

material we will be covering. The book is free at https://web.stanford.edu/~hastie/Papers/ESLII.pdf

Software:

We require the R Statistical Software, which is powerful and free. R can be downloaded at the link

below: http://www.cran.r-project.org/

Rstudio is a free platform for both writing and running R, available at www.rstudio.org. Some students

find it friendlier than basic R (especially in windows OS).

The learning curve is very steep. Students can become proficient in a few weeks. Some manuals are

very helpful to learn R, e.g., http://cran.r-project.org/manuals.html

I provide limited software instruction, in-class demonstration, and code to accompany lectures and

assignments. We do not assume that you have used R in a previous class. However, this is not a

class on R. Like any language, R is only learned by doing. You should install R as soon as possible

and familiarize yourself with basic operations.

Additional resources: (a) Tutorials at data.princeton.edu/R are fantastic (and there are many others out

there). (b) YouTube intros to R, e.g. the series from Google Developers.

Course Description

This course prepares students to gather, describe, and analyze data, and use advanced statistical tools to

make decisions on operations, risk management, finance, marketing, etc. Analysis is done targeting economic

and financial decisions in complex systems that involve multiple partners. Topics include probability, statistics,

hypothesis testing, regression, clustering, decision trees, and forecasting.

Learning Objectives

By the end of this course, students will be able to:

1. Gather sufficient relevant data, conduct data analytics using scientific methods, and make appropriate

and powerful connections between quantitative analysis and real-world problems.

2. Demonstrate a sophisticated understanding of the concepts and methods; know the exact scopes and

possible limitations of each method; and show capability of using data analytics skills to provide

constructive guidance in decision making.

3. Use advanced techniques to conduct thorough and insightful analysis, and interpret the results

correctly with detailed and useful information.

4. Show substantial understanding of the real problems; conduct deep data analytics using correct

methods; and draw reasonable conclusions with sufficient explanation and elaboration.

5. Write an insightful and well-organized report for a real-world case study, including thoughtful and

convincing details.

6. Make better business decisions by using advanced techniques in data analytics.

Attendance

Attendance and class participation are part of each student’s course grade. Students are expected to attend all

scheduled class sessions. Failure to attend class will result in an inability to achieve the objectives of the

course. Excessive absence will result in loss of points for participation. Regular attendance and active

participation are required for students to successfully complete the course.

Class participation is an important part of learning. If you have a question, it’s likely that others do as well. I

encourage active participation, and course grades will take into account students who make particularly strong

contributions.

Many students learn better and faster when working in a group, so I encourage collaborative learning. You can

work together in a study group with 2–4 students to discuss class materials, homework assignments, and

projects on a weekly basis. However, each student must write your homework assignment individually using

your own language; your text should reflect your own understanding of the materials. The study groups can be

different from your project groups.

The instructors reserve the right to alter course content and/or adjust the pace to accommodate class

progress. Students are responsible for keeping up with all adjustments to the course calendar.

Recommended Reading

Week Date Weekly Objectives/Topics Assignments

(book by Ledolter)

1 [date] Introduction, Data Summarization and Text, Ch 1, 2

Visualization

2 [date] Linear and Nonlinear Regression Text, Ch 3, 4, 5, 6 HW 1 is due

Score

Criteria (0≤ score <6) (6≤ score <9) (9≤ score ≤10)

Deep Demonstrate inadequate Understand concepts and Demonstrate sophisticated

understanding of understanding of some methods relatively well, understanding for the concepts and

theory and its important concepts, methods analyze data using acceptable methods; know the exact scopes

applications, using or their applications, e.g., methods although not perfect; and possible limitations of each

qualitative methods choose wrong methods, be able to derive useful method; show capability of using

to answer business conduct analysis information for decision data analytics skills to make right

questions inappropriately, or interpret making. business decision.

results incorrectly.

Implementation Use wrong techniques to Choose acceptable methods to Use advanced techniques to

and interpretation analyze data, present analyze data, interpretations conduct thorough and insightful

of data analysis inappropriate interpretations or are sensible, derive useful analysis, interpret the results

techniques conclusions. results. correctly, draw right conclusions

based on data analysis.

Ability of solving Data is inadequate or Collect and document just Gather sufficient relevant data,

real-world unstructured. Use enough data, employ conduct data analytics using

problems using inappropriate methods to appropriate techniques to scientific methods, make

quantitative analyze data, fail to retrieve retrieve insightful information appropriate and powerful

methods useful information. from data, make reasonable connections between analysis and

Suggestions are not recommendations. real-world problems, provide

persuading. constructive guidance in decision

making.

Writing and Report is inadequately written Report is concise and clearly Report is well organized and

presenting, and poorly organized. Analysis written. Analyze problems insightfully written, includes

especially on is insufficient. Conclusions are following scientific strategies; thorough and thoughtful details.

organization and unconvincing. provide useful suggestions Conclusions are convincing.

communication with detailed explanation.

Total Score

Comments:

Score

Criteria (0≤ score <6) (6≤ score <9) (9≤ score ≤10)

Interpretation of Little or no attempt to Interpret most data correctly; Data are completely and

Data interpret data; or there are part of conclusions may be appropriately interpreted; there is no

(qualitative) significant errors; or some suspect; suggestions on future over- or under-interpretation; draw

data are over- or under- implementation are sound. convincing conclusions.

interpreted.

Statistical Analysis Statistical methods are Most statistical methods are Statistical methods are fully and

(quantitative) completely misapplied or correctly applied but more correctly applied; demonstrate

applied but with significant could have been done with the superior data analysis skills; deeply

errors or omissions. Choose data. Predictions are sensible mine the data and obtain useful

inappropriate methods and but may deviate from the true insights for decision making.

make wrong predictions. results in a large range.

Critical evaluation of Blindly accept defective Recognize defective results Show deep understanding for the

findings results; or recognize and figure out the causes; sources of errors; recognize

defective results but does not understand the main sources defective results and eliminates the

know how to fix them. of errors. causes.

Ability to draw Not draw conclusions; draw Draw correct conclusion; Demonstrate substantial

proper conclusions incorrect conclusions; suggestions may have understanding of the problem;

and make effective suggestions are not potential impact on the future conduct deep data analytics using

suggestions acceptable. business. correct methods; draw correct

conclusions with sufficient

explanation and elaboration.

Total Score

Comments:

