Anda di halaman 1dari 18

Statistics

• MACHINE LEARNING – exciting!

• STATISTICS - boring
What Is Statistics?

Statistics is the study of the collection,


analysis, interpretation,
presentation, and organization of data.
EXTRACTING INFORMATION
FROM DATA
• Data Analysis

Predictions Model
Population and Sample
• Population :- Population is the collection of
all individuals or items under
consideration in a statistical study.

• Sample :- Sample is that part of the


population from which information is
collected.
Descriptive and Inferential Statistics

• Descriptive:- It consist of methods for


organizing and summarizing information.

• Inferential Statistics :- It consist of


methods for drawing and measuring the
reliability of conclusions about
population based on information obtained
from a sample of the population.
Data
Types

• Numeric Data:-
a) Discrete
b)Continou

• Categorical Data:-
a)Nominal
b)Ordinal
The Central Tendencies
• Mean :- It is nothing but the average

• Median:-Arrange data in increasing order and find the mid-


point (𝑛+1) /2 if n is even

• Mode:- The most frequently occurring Data point.


Measures of variation
• Range:-Max-Min
• Quartiles
a)Lower Quartiles
b)Middle Quartiles
c)Upper Quartiles
d)Inter Quartiles Range
• Variance:- The average of the squared differences from the
mean

• Standard deviations:- It is the measure of dispersion of a


set of data from its mean
Probability
• It is a branch of mathematics that deals with calculating the
likelihood of a given event's occurrence.
Methods of probabilities
• Classical Method --
Probability can be determined prior to conducting any
experiment.
• Empirical Method – Probability can be determined post
conducting a thought experiment
No of times event occurred/ total no of times experiment
conducted
• Subjective Method -- Based on feelings, insights,
knowledge
Probability
• Types of probability
• Joint Probability :- P(A AND B)
• Union Probability :-P(A OR B)
• Marginal probability :-P(A)
• Conditional probability :- 𝑃( 𝐴 / 𝐵)=𝑃 (𝐴) ∗ 𝑃(𝐵|𝐴)/ 𝑃(B)
Bayes Theorem
• It allows you to find reverse probabilities, and
to allow revision of probabilities

=
Confusion matrix
A confusion matrix is a table that is often used to describe the
performance of a classification model on a set of test data for
which the true values are known.
The confusion matrix itself is relatively simple to understand,
but the related terminology can be confusing.
DISTRIBUTIONS

• Binomial Distribution
• Geometric Distribution
• Poisson Distribution
• Exponential Distribution
Binomial Distribution
• The number of successes in n Bernoulli trials is called
a binomial random variable.
• Estimating Number of successes in n attempts

𝐸𝑋=𝑛𝑝𝑉𝑎𝑟𝑋=𝑛𝑝𝑞
Geometric Distribution
• When independent Bernoulli trials are repeated, each
with probability p of success, the number of trials X it
takes to get the first success has a geometric
distribution.
Poisson Distribution
• In a Bernoulli process, time is discrete, and at each time
unit there is a certain probability p that success occurs,
the same probability at any given time, and the events
at one time instant are independent of the events at
other time instants.
Exponential Distribution
• Events occur uniformly at random over time at a rate
of λ events per unit time, then the random variable X
giving the time to the first event has an exponential
distribution.
Normal Distribution
• A normal distribution is a bell-shaped
frequency distribution curve. Most of the data
values in a normal distribution tend to cluster
around the mean.
EMPIRICAL RULE
68–95–99.7

Anda mungkin juga menyukai