Anda di halaman 1dari 11

H Computer Science and Data Analysis Series

Computational
Statistics Handbook
with MATLAB
Second Edition

Wendy L. Martinez
The Office of Naval Research
Arlington, Virginia, U.S.A.

Angel R. Martinez
Naval Surface Warfare Center
Dahlgren, Virginia, U.S.A.

Chapman &. Hall/CRC


Taylor & Francis Group
Boca Raton

London

N e w York

Chapman & Hall/CRC is an imprint of the


Taylor & Francis Group, an informa business

Table ofContents
Preface to the Second Edition
Preface to the First Edition

xvii
xxi

Chapter 1
Introduction
1.1 What Is Computational Statistics?
1.2 An Overview of the Book
Philosophy
What Is Covered
A Word About Notation
1.3 MATLAB Code
Computational Statistics Toolbox
Internet Resources
1.4 Further Reading

1
2
2
3
5
6
7
8
9

Chapter 2
Probability Concepts
2.1 Introduction
2.2 Probability
Background
Probability
Axioms of Probability
2.3 Conditional Probability and Independence
Conditional Probability
Independence
Bayes' Theorem
2.4 Expectation
Mean and Variance
Skewness
Kurtosis
2.5 Common Distributions
Binomial
Poisson
Uniform
Normal

11
12
12
14
17
17
17
18
19
21
21
23
23
24
24
26
29
31

vii

viii

Computational Statistics Handbook with MATLAB, 2ND Edition

Exponential
Gamma
Chi-Square
Weibull
Beta
Student's t Distribution
Multivariate Normal
Multivariate t Distribution
2.6 MATLAB Code
2.7 Further Reading
Exercises

34
36
37
38
40
41
44
47
48
49
52

Chapter 3
Sampling Concepts
3.1 Introduction
3.2 Sampling Terminology and Concepts
Sample Mean and Sample Variance
Sample Moments
Covariance
3.3 Sampling Distributions
3.4 Parameter Estimation
Bias
MeanSquared Error
Relative Efficiency
Standard Error
Maximum Likelihood Estimation
Method of Moments
3.5 Empirical Distribution Function
Quantiles
3.6 MATLAB Code
3.7 Further Reading
Exercises

55
55
57
58
60
63
65
66
66
67
67
68
71
72
74
77
78
80

Chapter 4
Generating Random Variables
4.1 Introduction
4.2 General Techniques for Generating Random Variables
Uniform Random Numbers
Inverse Transform Method
Acceptance-Rejection Method
4.3 Generating Continuous Random Variables
Normal Distribution
Exponential Distribution
Gamma

83
83
83
86
89
93
93
94
95

Table ofContents
Chi-Square
Beta
Multivariate Normal
Multivariate Student's t Distribution
Generating Variates on a Sphere
4.4 Generating Discrete Random Variables
Binomial
Poisson
Discrete Uniform
4.5 MATLAB Code
4.6 Further Reading
Exercises

ix
98
99
101
103
104
107
107
108
111
112
113
115

Chapter 5
Exploratory Data Analysis
5.1 Introduction
5.2 Exploring Univariate Data
Histograms
Stem-and-Leaf
Quantile-Based Plots - Continuous Distributions
Quantile Plots - Discrete Distributions
Box Plots
5.3 Exploring Bivariate and Trivariate Data
Scatterplots
Surface Plots
Contour Plots
Bivariate Histogram
3-D Scatterplot
5.4 Exploring Multi-Dimensional Data
Scatterplot Matrix
Slices and Isosurfaces
Glyphs
Andrews Curves
Parallel Coordinates
5.5 MATLAB Code
5.6 Further Reading
Exercises

117
119
119
122
124
132
138
145
145
146
148
149
155
158
158
160
166
168
172
179
181
183

Chapter 6
Finding Structure
6.1 Introduction
6.2 Projecting Data
6.3 Principal Component Analysis
6.4 Projection Pursuit EDA

187
188
190
195

Computational Statistics Handbook with MATLAB, 2ND Edition

Projection Pursuit Index


Finding the Structure
Structure Removal
6.5 Independent Component Analysis
6.6 Grand Tour
6.7 Nonlinear Dimensionality Reduction
Multidimensional Scaling
Isometric Feature Mapping - ISOMAP
6.8 MATLAB Code
6.9 Further Reading
Exercises

197
198
199
204
211
216
216
220
224
227
230

Chapter 7
Monte Carlo M e t h o d s for Inferential Statistics
7.1 Introduction
7.2 Classical Inferential Statistics
Hypothesis Testing
Confidence Intervals
7.3 Monte Carlo Methods for Inferential Statistics
Basic Monte Carlo Procedure
Monte Carlo Hypothesis Testing
Monte Carlo Assessment of Hypothesis Testing
7.4 Bootstrap Methods
General Bootstrap Methodology
Bootstrap Estimate of Standard Error
Bootstrap Estimate of Bias
Bootstrap Confidence Intervals
7.5 MATLAB Code
7.6 Further Reading
Exercises

233
234
234
243
246
246
247
252
256
256
258
260
262
268
269
271

Chapter 8
Data Partitioning
8.1 Introduction
8.2 Cross-Validation
8.3Jackknife
8.4 Better Bootstrap Confidence Intervals
8.5 Jackknife-After-Bootstrap
8.6 MATLAB Code
8.7 Further Reading
Exercises

273
274
281
289
293
295
296
298

Table of Contents

xi

Chapter 9
Probability Density Estimation
9.1 Introduction
9.2 Histograms
1-D Histograms
Multivariate Histograms
Frequency Polygons
Averaged Shifted Histograms
9.3 Kernel Density Estimation
Univariate Kernel Estimators
Multivariate Kernel Estimators
9.4 Finite Mixtures
Univariate Finite Mixtures
Visualizing Finite Mixtures
Multivariate Finite Mixtures
EM Algorithm for Estimating the Parameters
Adaptive Mixtures
9.5 Generating Random Variables
9.6 MATLAB Code
9.7 Further Reading
Exercises

301
303
303
309
311
316
322
322
327
329
331
333
335
338
343
348
356
357
359

Chapter 10
Supervised Learning
10.1 Introduction
10.2 Bayes Decision Theory
Estimating Class-Conditional Probabilities: Parametric Method
Estimating Class-Conditional Probabilities: Nonparametric
Bayes Decision Rule
Likelihood Ratio Approach
10.3 Evaluating the Classifier
Independent Test Sample
Cross-Validation
Receiver Operating Characteristic (ROC) Curve
10.4 Classification Trees
Growing the Tree
Pruning the Tree
Choosing the Best Tree
Other Tree Methods
10.5 Combining Classifiers
Bagging
Boosting
Arcing Classifiers
Random Forests
10.6 MATLAB Code

363
365
367
369
370
377
380
380
382
385
390
394
399
403
412
414
415
417
420
422
423

xii

Computational Statistics Handbook with MATLAB9, 2ND Edition

10.7 Further Reading


Exercises

424
428

Chapter 11
Unsupervised Learning
11.1 Introduction
11.2Measuresof Distance
11.3 Hierarchical Clustering
11.4 K-Means Clustering
11.5 Model-Based Clustering
Finite Mixture Models and the EM Algorithm
Model-Based Agglomerative Clustering
Bayesian Information Criterion
Model-Based Clustering Procedure
11.6 Assessing Cluster Results
Mojena - Upper Tail Rule
Silhouette Statistic
Other Methods for Evaluating Clusters
11.7 MATLAB Code
11.8 Further Reading
Exercises

431
432
434
442
445
446
450
453
453
458
458
459
462
465
466
469

C h a p t e r 12
Parametric M o d e l s
12.1 Introduction
12.2 Spline Regression Models
12.3 Logistic Regression
Creating the Model
Interpreting the Model Parameters
12.4 Generalized Linear Models
Exponential Family Form
Generalized Linear Model
Model Checking
12.5 MATLAB Code
12.6 Further Reading
Exercises

471
477
482
482
487
488
489
494
498
508
509
511

C h a p t e r 13
Nonparametric M o d e l s
13.1 Introduction
13.2 Some Smoothing Methods
Bin Smoothing
RunningMean

513
514
515
517

Table ofContents

xiii

Running Line
Local Polynomial Regression - Loess
Robust Loess
13.3 Kernel Methods
Nadaraya-Watson Estimator
Local Linear Kernel Estimator
13.4 Smoothing Splines
Natural Cubic Splines
Reinsch Method for Finding Smoothing Splines
Values for a Cubic Smoothing Spline
Weighted Smoothing Spline
13.5 Nonparametric Regression - Other Details
Choosing the Smoothing Parameter
Estimation of the Residual Variance
Variability of Smooths
13.6 Regression Trees
Growing a Regression Tree
Pruning a Regression Tree
Selecting a Tree
13.7 Additive Models
13.8 MATLAB Code
13.9 Further Reading
Exercises

518
519
525
528
531
532
534
536
537
540
540
542
542
547
548
551
553
557
557
563
567
570
573

Chapter 14
Markov Chain Monte Carlo Methods
14.1 Introduction
14.2 Background
Bayesian Inference
Monte Carlo Integration
Markov Chains
Analyzing the Output
14.3 Metropolis-Hastings Algorithms
Metropolis-Hastings Sampler
Metropolis Sampler
Independence Sampler
Autoregressive Generating Density
14.4 The Gibbs Sampler
14.5 Convergence Monitoring
Gelman and Rubin Method
Raftery and Lewis Method
14.6 MATLAB Code
14.7 Further Reading
Exercises

575
576
576
577
579
580
580
581
584
587
589
592
602
604
607
609
610
612

xiv

Computational Statistics Handbook with MATLAB, 2ND Edition

Chapter 15
Spatial Statistics
15.1 Introduction
What Is Spatial Statistics?
Types of Spatial Data
Spatial Point Patterns
Complete Spatial Randomness
15.2 Visualizing Spatial Point Processes
15.3 Exploring First-order and Second-order Properties
Estimating the Intensity
Estimating the Spatial Dependence
15.4 Modeling Spatial Point Processes
Nearest Neighbor Distances
IC-Function
15.5 Simulating Spatial Point Processes
Homogeneous Poisson Process
Binomial Process
Poisson Cluster Process
Inhibition Process
Strauss Process
15.6 MATLAB Code
15.7 Further Reading
Exercises

617
617
618
619
621
623
627
627
630
638
638
643
646
647
650
651
654
656
658
659
661

Appendix A
Introduction to MATLAB
A.l What Is MATLAB?
A.2 Getting Help in MATLAB
A.3 File and Workspace Management
A.4 Punctuation in MATLAB
A.5 Arithmetic Operators
A.6 Data Constructs in MATLAB
Basic Data Constructs
Building Arrays
CellArrays
A.7 Script Files and Functions
A.8 Control Flow
For Loop
WhileLoop
If-Else Statements
Switch Statement
A.9 Simple Plotting
A.10 Contact Information

663
664
664
666
666
668
668
668
669
670
672
672
672
673
673
673
676

Table ofContents

xv

Appendix B
Projection Pursuit Indexes
B.l Indexes
Friedman-Tukey Index
Entropy Index
Moment Index
L 2 Distances
B.2 MATLAB Source Code

677
677
678
678
679
680

Appendix C
MATLAB Statistics Toolbox
File I/O
Dataset Arrays
GroupedData
Descriptive Statistics
Statistical Visualization
Probability Density Functions
Cumulative Distribution Functions
Inverse Cumulative Distribution Functions
Distribution Statistics Functions
Distribution Fitting Functions
Negative Log-Likelihood Functions
Random Number Generators
Hypothesis Tests
Analysis of Variance
Regression Analysis
Multivariate Methods
Cluster Analysis
Classification
Markov Models
Design of Experiments
Statistical Process Control
Graphical User Interfaces

687
687
687
688
688
689
690
691
691
692
692
693
694
694
694
695
696
696
696
697
697
697

Appendix D
Computational Statistics Toolbox
Probability Distributions
Statistics
Random Number Generation
Exploratory Data Analysis
Bootstrap and Jackknife
Probability Density Estimation
Supervised Learning
Unsupervised Learning

699
699
700
700
701
701
701
701

xvi

Computational Statistics Handbook with MATLAB, 2ND Edition

Parametric and Nonparametric Models


Markov Chain Monte Carlo
Spatial Statistics

702
702
702

Appendix E
Exploratory Data Analysis Toolboxes
E.l Introduction
E.2 Exploratory Data Analysis Toolbox
E.3 EDA GUI Toolbox

703
704
705

Appendix F
Data Sets
Introduction

719

Appendix G
Notation
Overview
ObservedData
Greek Letters
Functions and Distributions
Matrix Notation
Statistics

727
727
728
728
729
729

References
Author Index
Subject Index

731
751
757

Anda mungkin juga menyukai