Anda di halaman 1dari 42

Quantitative Methods : Some Terms

Dr Kishor Bhanushali
Faculty – Economics & Quantitative Methods
IBS – Ahmedabad
E-Mail : kishorkisu@rediffmail.com
Discrete Data

• A set of data is said to be discrete if the values /


observations belonging to it are distinct and
separate, i.e. they can be counted (1,2,3,....).
Examples might include the number of kittens in a
litter; the number of patients in a doctors surgery;
the number of flaws in one metre of cloth; gender
(male, female); blood group (O, A, B, AB).
Categorical Data

• Set of data is said to be categorical if the values or


observations belonging to it can be sorted
according to category. Each value is chosen from a
set of non-overlapping categories. For example,
shoes in a cupboard can be sorted according to
colour: the characteristic 'colour' can have non-
overlapping categories 'black', 'brown', 'red' and
'other'. People have the characteristic of 'gender'
with categories 'male' and 'female'.
Nominal Data

• A set of data is said to be nominal if the values /


observations belonging to it can be assigned a code
in the form of a number where the numbers are
simply labels. You can count but not order or
measure nominal data. For example, in a data set
males could be coded as 0, females as 1; marital
status of an individual could be coded as Y if
married, N if single.
Ordinal Data

A set of data is said to be ordinal if the values / observations


belonging to it can be ranked (put in order) or have a rating scale
attached. You can count and order, but not measure, ordinal data.
The categories for an ordinal set of data have a natural order, for
example, suppose a group of people were asked to taste varieties
of biscuit and classify each biscuit on a rating scale of 1 to 5,
representing strongly dislike, dislike, neutral, like, strongly like.
A rating of 5 indicates more enjoyment than a rating of 4, for
example, so such data are ordinal.
Interval Scale

• An interval scale is a scale of measurement where the distance


between any two adjacents units of measurement (or 'intervals')
is the same but the zero point is arbitrary. Scores on an interval
scale can be added and subtracted but can not be meaningfully
multiplied or divided. For example, the time interval between the
starts of years 1981 and 1982 is the same as that between 1983
and 1984, namely 365 days. The zero point, year 1 AD, is
arbitrary; time did not begin then. Other examples of interval
scales include the heights of tides, and the measurement of
longitude.
Continuous Data

• A set of data is said to be continuous if the values /


observations belonging to it may take on any value
within a finite or infinite interval. You can count,
order and measure continuous data. For example
height, weight, temperature, the amount of sugar in
an orange, the time required to run a mile.
Frequency Table

A frequency table is a way of summarising a set of data. It


is a record of how often each value (or set of values) of the
variable in question occurs. It may be enhanced by the
addition of percentages that fall into each category.
A frequency table is used to summarise categorical,
nominal, and ordinal data. It may also be used to
summarise continuous data once the data set has been
divided up into sensible groups.
Mean

• The mean is the average of the scores in the


population. Numerically, it equals the sum of the
scores divided by the number of scores. It is of
interest that the mean is the one value which, if
substituted for every score in a population, would
yield the same sum as the original scores, and
hence it would yield the same mean.
Median

• The median of a population is the point that divides


the distribution of scores in half. Numerically, half
of the scores in a population will have values that
are equal to or larger than the median and half will
have values that are equal to or smaller than the
median.
Mode

• It is the score in the population that occurs most


frequently. Please notice that the mode is not the
frequency of the most numerous score. It is the
value of that score itself
Range

• The range is the distance between the highest and


lowest score. Numerically, the range equals the
highest score minus the lowest score.
Average Deviation

The average deviation is one of several indices of


variability that statisticians use to characterize the
dispersion among the measures in a given population.
To calculate the average deviation of a set of scores it is
first necessary to compute their mean and then specify the
distance between each score and that mean without regard
to whether the score is above or below the mean. The
average deviation is defined as the mean of these absolute
values.
Variance

• The variance is one of several indices of variability


that statisticians use to characterize the dispersion
among the measures in a given population. To
calculate the variance of a given population, it is
necessary to first calculate the mean of the scores,
then measure the amount that each score deviates
from the mean and then square that deviation (by
multiplying it by itself). Numerically, the variance
equals the average of the several squared deviations
from the mean.
Standard Deviation

The standard deviation is one of several indices of


variability that statisticians use to characterize the
dispersion among the measures in a given population.
To calculate the standard deviation of a population it is
first necessary to calculate that population's variance.
Numerically, the standard deviation is the square root of
the variance. Unlike the variance, which is a somewhat
abstract measure of variability, the standard deviation can
be readily conceptualized as a distance along the scale of
measurement.
Random Sample

• In statistical terms a random sample is a set of


items that have been drawn from a population in
such a way that each time an item was selected,
every item in the population had an equal
opportunity to appear in the sample.
Parameter & Statistics

• A parameter is a measurement on a population that


characterizes one of its features
• A statistic is a measure on the items in a random
sample. Since the only reason to ever draw a
random sample is to infer something about the
population from which it came, it should be clear
that when we calculate a given statistic we only do
so in order to estimate a corresponding parameter
of the population from which the sample was drawn
Sampling Distribution

• The Sampling Distribution of a statistic is the set of


values that we would obtain if we drew an infinite
number of random samples from a given population
and calculated the statistic on each sample. In doing
so, all samples must be of the same size (n).
Central Limit Theorem

The Central Limit Theorem consists of three statements:


[1] The mean of the sampling distribution of means is equal to the mean of
the population from which the samples were drawn.
[2] The variance of the sampling distribution of means is equal to the
variance of the population from which the samples were drawn divided by
the size of the samples.
[3] If the original population is distributed normally (i.e. it is bell shaped),
the sampling distribution of means will also be normal. If the original
population is not normally distributed, the sampling distribution of means
will increasingly approximate a normal distribution as sample size increases.
(i.e. when increasingly large samples are drawn)
Standard Error

The Standard Error, or Standard Error of the Mean, is an


estimate of the standard deviation of the sampling
distribution of means, based on the data from one or more
random samples.
Numerically, it is equal to the square root of the quantity
obtained when s squared is divided by the size of the
sample
Outcome

An outcome is the result of an experiment or other


situation involving uncertainty.
The set of all possible outcomes of a probability
experiment is called a sample space.
Sample Space

• The sample space is an exhaustive list of all the


possible outcomes of an experiment. Each possible
result of such a study is represented by one and
only one point in the sample space
Event

An event is any collection of outcomes of an experiment.


Formally, any subset of the sample space is an event.
Any event which consists of a single outcome in the
sample space is called an elementary or simple event.
Events which consist of more than one outcome are called
compound events.
Probability

• A probability provides a quantitative description of


the likely occurrence of a particular event.
Probability is conventionally expressed on a scale
from 0 to 1; a rare event has a probability close to
0, a very common event has a probability close to 1
• The probability of an event has been defined as its
long-run relative frequency
Subjective Probability

A subjective probability describes an individual's personal


judgement about how likely a particular event is to occur.
It is not based on any precise computation but is often a
reasonable assessment by a knowledgeable person.
Like all probabilities, a subjective probability is
conventionally expressed on a scale from 0 to 1; a rare
event has a subjective probability close to 0, a very
common event has a subjective probability close to 1.
A person's subjective probability of an event describes
his/her degree of belief in the event.
Independent Event

• Two events are independent if the occurrence of


one of the events gives us no information about
whether or not the other event will occur; that is,
the events have no influence on each other
Mutually Exclusive Events

Two events are mutually exclusive (or disjoint) if it is


impossible for them to occur together.
Formally, two events A and B are mutually exclusive if
and only if
If two events are mutually exclusive, they cannot
be independent and vice versa.
Additional Rule

The addition rule is a result used to determine the


probability that event A or event B occurs or both
occur.
The result is often written as follows, using set
notation:
Multiplication Rule

The multiplication rule is a result used to 
determine the probability that two events, A 
and B, both occur.
The multiplication rule follow from the 
definition of conditional probability.
The result is often written as follows, using 
set notation:
Conditional Probability

• In many situations, once more information becomes available, we are


able to revise our estimates for the probability of further outcomes or
events happening
The usual notation for "event A occurs given that event B has occurred" is
"A | B" (A given B). The symbol | is a vertical line and does not imply
division. P(A | B) denotes the probability that event A will occur given that
event B has occurred already.
A rule that can be used to determine a conditional probability from
unconditional probabilities is
Bayes’ Theorem

Bayes' Theorem is a result that allows new information to be used to update the
conditional probability of an event.
Using the multiplication rule, gives Bayes'

P(B | A).P(A)
P(A | B)
P(B | A).P(A) + P(B | A' ).P(A' )

Where: P(A) = probability that event A occurs


P(B) = probability that event B occurs
P(A') = probability that event A does not occur
P(A | B) = probability that event A occurs given that event B has occurred already
P(B | A) = probability that event B occurs given that event A has occurred already
P(B | A') = probability that event B occurs given that event A has not occurred already
Probability Distribution

• The probability distribution of a discrete random


variable is a list of probabilities associated with
each of its possible values. It is also sometimes
called the probability function or the probability
mass function.
Probability Density Function

• The probability density function of a continuous


random variable is a function which can be
integrated to obtain the probability that the random
variable takes a value in a given interval
Discrete Random Variable

• A discrete random variable is one which may take on


only a countable number of distinct values such as 0, 1,
2, 3, 4, ... Discrete random variables are usually (but not
necessarily) counts. If a random variable can take only a
finite number of distinct values, then it must be discrete.
Examples of discrete random variables include the
number of children in a family, the Friday night
attendance at a cinema, the number of patients in a
doctor's surgery, the number of defective light bulbs in a
box of ten
Continuous Random Variable

• A continuous random variable is one which takes


an infinite number of possible values. Continuous
random variables are usually measurements.
Examples include height, weight, the amount of
sugar in an orange, the time required to run a mile.
Normal Distribution

• Normal distributions model (some) continuous random


variables. Strictly, a Normal random variable should be capable
of assuming any value on the real line

• This probability density function (p.d.f.) is a symmetrical, bell-shaped


curve, centred at its expected value µ. The variance is

• Many distributions arising in practice can be approximated by a Normal


distribution. Other random variables may be transformed to normality
Poisson Distribution

• Poisson distributions model (some) discrete random variables.


Typically, a Poisson random variable is a count of the number of
events that occur in a certain time interval or spatial area. For
example, the number of cars passing a fixed point in a 5 minute
interval, or the number of calls received by a switchboard during
a given period of time.
Binomial Distribution

Binomial distributions model (some) discrete random


variables.
Typically, a binomial random variable is the number of
successes in a series of trials, for example, the number of
'heads' occurring when a coin is tossed 50 times.
Confidence Interval

A confidence interval gives an estimated range of values which is likely


to include an unknown population parameter, the estimated range being
calculated from a given set of sample data.
If independent samples are taken repeatedly from the same population,
and a confidence interval calculated for each sample, then a certain
percentage (confidence level) of the intervals will include the unknown
population parameter.
Confidence intervals are usually calculated so that this percentage is
95%, but we can produce 90%, 99%, 99.9% (or whatever) confidence
intervals for the unknown parameter.
Confidence Limit

Confidence limits are the lower and upper boundaries /


values of a confidence interval, that is, the values which
define the range of a confidence interval.
The upper and lower bounds of a 95% confidence interval
are the 95% confidence limits. These limits may be taken
for other confidence levels, for example, 90%, 99%,
99.9%.
Confidence Level

The confidence level is the probability value associated with a


confidence interval.
It is often expressed as a percentage. For example, say , then the
confidence level is equal to (1-0.05) = 0.95, i.e. a 95% confidence level.
Example
Suppose an opinion poll predicted that, if the election were held today, the
Conservative party would win 60% of the vote. The pollster might attach a
95% confidence level to the interval 60% plus or minus 3%. That is, he
thinks it very likely that the Conservative party would get between 57% and
63% of the total vote.

Anda mungkin juga menyukai