Anda di halaman 1dari 4

Descriptive Statistics Summary (Session 1-5)

Statistics is a science that helps us make better decisions in business and economics as well as in
other fields.
Statistics teaches us how to summarize, analyze, and draw meaningful inferences from data that
then lead to improve decisions.
Types of Data - Two Types
Qualitative - Categorical or Nominal and Quantitative - Measurable or Countable.
• Nominal Scale - groups or classes
✓ Gender, color, professional classification, etc.
• Ordinal Scale - order matters
✓ Ranks (top ten videos, products, etc.)
• Interval Scale - difference or distance matters
✓ Temperatures (0F, 0C)
• Ratio Scale - Ratio matters – “True Zero Point”
✓ Salaries, weight, volume, area, length, etc.
Population
Collection of all the items or individuals about which you want to draw a conclusion.
Sample
A portion of a population selected for analysis.
Parameter
A numerical measure that describes a characteristic of a population.
Statistic
A numerical measure that describes a characteristic of a sample.
Measures of Location

Population Mean: µ =  xi / N
Sample Mean: 𝑥̅ =  xi / n

∑ 𝑤𝑖 𝑥𝑖
Weighted Mean: 𝑥̅ = ∑ 𝑤𝑖
Geometric Mean: ̅̅̅
𝑥𝑔 = 𝑛√𝑥1 𝑥2 … 𝑥𝑛
Median: Middle value, middlemost or most central item

• Arrange n observations in increasing order.


• If n is odd, (n+1)/2th observation is the median.
• If n is even, median = average of (n/2)th and (n/2+1)th observation.

Mode: Most frequent: the value that is repeated most often.


Percentiles: To compute the pth percentile, determine the data point in position (n + 1)P/100.

⚫ Quartiles are the percentage points that break down the ordered data set into quarters.
th
⚫ The first quartile is the 25 percentile. It is the point below which lie 1/4 of the data.
th
⚫ The second quartile is the 50 percentile. It is the point below which lie 1/2 of the data. This is
also called the median.
th
⚫ The third quartile is the 75 percentile. It is the point below which lie 3/4 of the data.

Measures of dispersion:
Range: The difference between the highest and the lowest observed values.
Special fractiles: Deciles, percentiles and quartiles.
Interquartile range: Q3 - Q1

Variance: Measures the variability in the data from the mean.


Standard Deviation: It is the positive square root of variance.

Population Variance: 2 = [(xi-)2] /N


Sample Variance: s2 = [(xi- 𝑥̅ )2] /n-1
The sample variance s2 is the “UNBAISED ESTIMATE” of the population variance.

• Coefficient of variation relates the SD and the mean by expressing the SD as a


percentage of mean.
• Coefficient of variation = ( / )(100) %

Skewness and Kurtosis

Skewness

Measure of the degree of asymmetry of a frequency distribution

• Skewed to left or negatively skewed


• Symmetric or unskewed
• Skewed to right or positively skewed

Kurtosis

Measure of flatness or peakedness of a frequency distribution


• Platykurtic (relatively flat)
• Mesokurtic (normal)
• Leptokurtic (relatively peaked)

Methods of Displaying Data

Bar Graphs

• Heights of rectangles represent group frequencies

Histograms

• Histogram consists of a series of rectangles whose widths are defined by the limits of the
classes, and whose heights are determined by the frequency in each interval.

Frequency Polygons

• Height of line represents frequency

Ogives

• Height of line represents cumulative frequency

Pie Charts

• Categories represented as percentages of total

Exploratory Data Analysis – EDA

Techniques to determine relationships and trends, identify outliers and influential observations, and
quickly describe or summarize data sets.

Stem-and-Leaf Displays

• Quick way of listing all observations


• Conveys some of the same information as a histogram

Box Plots

• Median
• Lower and upper quartiles
• Maximum and minimum

Scatter Plots:

• Scatter Plots are used to identify and report any underlying relationships among pairs of
data sets.
• The plot consists of a scatter of points, each point representing an observation.

Relations between the Mean and Standard Deviation


Chebyshev’s Theorem

• Applies to any distribution, regardless of shape. Places lower limits on the percentages of
observations within a given number of standard deviations from the mean.
1
• At least (1- ) of the elements of any distribution lie within k standard deviations of the mean
𝑘2

Empirical Rule

• Applies only to roughly mound-shaped and symmetric distributions. Specifies


approximate percentages of observations within a given number of standard deviations
from the mean.
• Roughly 68% lie within one standard deviation from mean.
• Roughly 95% lie within two standard deviation from mean.
• Roughly all lie within three standard deviation from mean.

Anda mungkin juga menyukai