Statistics - Glossary - Coursera

1/24/2014 Wiki - Glossary | Coursera
https://class.coursera.org/stats1-002/wiki/Glossary 1/6
Help
Glossary
Bimodal distribution: a non-normal distribution consisting of two modes; two
scores (or two ranges of scores) that share the greatest frequency; characterized by
two peaks in a histogram.

Central limit theorem: A fundamental component of inferential statistics, according
to which; (a) the mean of a sampling distribution will approximate the mean of the
population; (b) the standard deviation of a sampling distribution will vary as a function
of population variance and sample size; and (c) the shape of the sampling distribution
will be normal given sufficient sample size (N > 30).

Classical test theory (also known as true score theory): A theory of measurement
that assumes that, for each subject, an observed score (raw score) consists of a true
score and error, which may be due to bias and/or chance.

Confidence interval: In contrast to a point estimate, a confidence interval consists of
a range of values, taking into account sampling error and the degree of confidence
desired.

Confound variable: an unknown, unmeasured, or extraneous variable that is
correlated with both the dependent variable and the independent variable in
experimental research, thus compromising causal arguments.
Correlation (general): refers to any of a broad class of statistical relationships
involving dependence between two variables.
Correlation (Pearson-product moment, r): a measure of the relationship between two
continuous variables.
Correlation (Phi coefficient): a measure of the relationship between two categorical
variables.
Correlation (Point-bi-serial): a measure of the relationship between one continuous
variable and one categorical variable.
Correlation (Spearman rank order): a measure of the relationship between two
ranked (ordinal) variables.
Construct: an idealized object of investigation that is not directly observable.
Covariance: a measure quantifying the degree to which two variables vary together.
Also, unstandardized correlation.
Dependent variable: a variable that represents an aspect of the world that the
experimenter predicts will be affected by the independent variable.
Descriptive statistics: procedures used to summarize, organize, and simplify data.
Double blind experiment: an experiment in which neither the experimenter nor the
subject knows whether the treatment is experimental or control.
Homoscedasticity: an assumption underlying linear correlation and regression
analysis, according to which residuals are orthogonal to the predictor variable (X).
Independent variable: a variable manipulated by the experimenter.
Inferential statistics: procedures that allow for generalizations about population
parameters based on sample statistics.
Intercept: In a regression analysis, the predicted score on the outcome variable (Y)
when all predictors (X) equal zero. Also known as the regression constant.
Interval variable: a type of variable that is used to not only categorize cases but also
to distinguish cases as greater than or less than. Furthermore, all distinctions are
equivalent across all possible values.
Leptokurtic distribution: a non-normal distribution with a high kurtosis value;
characterized by one high peak in a histogram.
Linear Regression: A statistical procedure used to estimate the relationship between
an outcome variable (Y) and one or more predictor variables (X). Simple regression
refers to analyses with just one predictor variable whereas multiple regression refers to
analyses with more than one predictor variable.
Mean (M = X/N): A measure of central tendency used to describe the center point
of a distribution and/or sample (also known as the average).
Mean Squares (MS = SS/N): A measure of variability, more commonly referred to
as variance.
Median: A measure of central tendency equivalent to the 50th percentile rank of a
distribution/sample. Often preferred when distributions are skewed because it is more
resistant to extreme scores than the mean.
Mode: A measure of central tendency; the most frequent score in a
distribution/sample.
Negatively skewed distribution: a non-normal distribution consisting of a few or
many extreme scores on the negative end of a scale (typically the left side of an x-
axis).
Nominal variable: the most basic variable type, used to assign cases to categories.
Normal distribution: also known as a Gaussian distribution, or bell curve, due to
greater frequency around the mean and symmetry.
Null Hypothesis Significance Testing (NHST): A form of hypothesis testing that
controls the probability of incorrectly deciding that a default position (null hypothesis)
is incorrect based on how likely it would be for a set of observations (data) to occur
if the null hypothesis were true.
Ordinary Least Squares (OLS): In linear regression analysis, a method for
estimating unknown regression coefficients (or parameters), in which the sum of the
squared residuals is minimized.
Ordinal variable: a type of variable that is used to not only categorize cases but also
to distinguish cases as greater than or less than.
Parameter: a numerical measure that describes a characteristic of a population.
Population: the entire collection of cases to which one attempts to generalize.
Percentile rank: the percentage of scores that fall at or below a given score in a
distribution.
Platykurtic distribution: see uniform distribution.

Point estimate: Any sample statistic that is based on a single sample and represents
just one point in a sampling distribution.

Positively skewed distribution: a non-normal distribution consisting of many or few
extreme scores on the positive end of the x-axis
Quasi-independent variable: a variable that resembles an independent variable but is
not manipulated by the experimenter.
Ratio variable: a type of of variable with all the qualities of an interval variable but
also has a true zero point.
Reliability estimate: a statistic that estimates the consistency of a measurement
(methods include test/retest, parallel tests, and inter-item).
Residual: In a regression analysis, it is the prediction error, or the difference between
an individuals score on the outcome variable (Y) and their score predicted by the
regression model.
Sample: a subset of the population.

Sampling distribution: A hypothetical distribution of sample statistics, such as a
distribution of sample means. Used to make probability judgments about samples.
Also known as a probability histogram.

Slope: In a regression analysis, it is the predicted change in Y associated with a one-
unit increase in X. Also known as the regression coefficient.
Standard deviation (SD = SQRT(MS)): The square root of variance; or an estimate
of the average deviation in a sample.

Standard error: The estimated amount of sampling error. Also, the standard
deviation of a sampling distribution.

Statistic: a numerical measure that describes a characteristic of a sample.
Sum of cross products (SP = [(X MX)*(Y MY)]): used to calculate the
correlation and covariance between two variables, X and Y.
Sum of squares (SS = (X M)2): The sum of squared deviation scores.
Type I error: In NHST, incorrectly rejecting the null hypothesis when it should have
been retained.
Type II error: In NHST, incorrectly retaining the null hypothesis when it should have
been rejected.
Uniform distribution: A non-normal distribution in which frequency is nearly
equivalent across all possible values. Also known at platykurtic.
Validity (construct): the notion that observations or measurement tools actually
represent or measure the construct being investigated.
Validity (content): the notion that the items or devices used to obtain a score on a
measure are representative of the underlying construct.
Validity (convergent): the notion that scores on a measure should correlate with
scores on other measures used to define the same, or similar, constructs.
Validity (divergent): the notion that scores on a measure should not correlate, or
weakly correlate, with scores on measures used to define unrelated constructs.
Validity (nomological): the notion that scores on a measure are consistent with more
general theories, including theories from other disciplines of science.
Created Wed 11 Sep 2013 9:31 AM PDT
Last Modi fi ed Fri 18 Oct 2013 10:13 AM PDT
Variance (MS or SD2): A measure of variability, also known as Mean-Squares (MS)
and equal to standard deviation (SD) squared.
Z-scale: A universal metric in statistics used to standardize different scales, such that
for any metric, M=0 and SD=1.
Z-score (Z = (X-M) / SD): A score on a Z-scale.

Statistics - Glossary - Coursera

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Statistics - Glossary - Coursera

Diunggah oleh

Hak Cipta:

Format Tersedia

1/24/2014 Wiki - Glossary | Coursera

Anda mungkin juga menyukai