Statistics
Descriptive Statistics
Inferential Statistics
Few Basic Concepts
Population and Sample
Elementary Unit and Frame
Survey Versus Experiments
Census Taking or Sampling
Statistical Use in Business and Planning
Data
Why Do We Need Data
Different Data Focus
Nature
Source
Units or Types of Measurements
Levels of measurements
Variables
Numeric
Continuous
Discrete
Categorical
Rank
Most people associate the term statistics with numbers or, perhaps, tables and graphs that display
them. This mental image is reinforced daily as people encounter an abundance of numerical
information in newspapers, magazines, and on television screens regarding the prices of bonds and
stocks, performances of businesses and sports teams, rates of unemployment and inflation, poverty
and disease, accidents, crimes and the weather the list goes on. As a matter of fact, this widely held
view quite accurately depicts the original concern of the discipline as well as the continuing concern
of one of its modern branches.
1.1
Statistics
Statistical Applications 2
Statistics deals with data. Data is classified facts about a particular class of objects. The facts can be
quantitative or qualitative depending on the situation. Specifically statistics deals with data relating to
its collection, organization, presentation, summarization, description, and analysis. Statistics is also
concerned with facilitating wise decision making in the face of uncertainty. The science of Statistics
can, therefore, be viewed as
the application of the scientific method in the analysis of numerical data for the purpose of
making rational decisions in the face of uncertainty and that, therefore, develops and utilizes
techniques for the careful collection, effective presentation, and proper analysis of numerical
information.
Statistics is broadly divided into two distinct groups: Descriptive and Inferential.
Descriptive statistics, as the name reflects, deals with description or summarization of various
features of a data set of a given group using tables, diagrams and numerical measures. It mainly
focuses on the distribution, central tendency and spread of the group. Presenting the data in a
descriptive form is usually the first stage in any statistical analysis, as it helps us to spot any patterns
in the data set. The descriptive statistics is sometimes called primary analysis or deductive statistics.
Inferential statistics goes with further analysis of data normally of a smaller group (called sample) in
order to draw conclusions about the characteristics of a larger group (called population or universe).
Probability is the integral part of the inferential statistics. It is also called secondary analysis or
inductive statistics.
1.2
Understanding some basic terms is very important in familiarization of Statistics. These terms
include:
Set of all possible observations about a specified characteristic of interest is called a population or universe. The
word population can also refer to a collection of entities. If the heights of all the 65 students of a class are of
interest, then the population consists of all these students.
The persons or objects possessing the characteristics that interest the statistician are referred to as
elementary units. A complete listing of all elementary units relevant to a statistical investigation is
called a frame. Thus, someone who wanted to learn about the racial composition of a firms labor
force would quickly identify the individual employees of that firm as the elementary units, but
someone concerned about the amount of credit extended by that firm might view the individual credit
accounts issued by it as the elementary units to be investigated.
New data can be generated either by conducting a survey or by performing an experiment. A survey or
an observational study is the collection of data from elementary units without exercising any
Two types of surveys exist: complete and partial ones. Census or a complete survey involves
observations about one or more characteristics of interest for every elementary unit that exists. When
the number of elementary unit is very large, complete success in observing all of them is likely to
elude the census takers.
In a partial or sample survey, observations about one or more characteristics of interest are being
made for only a subset of all existing elementary units. There are good reasons why sample surveys
are often undertaken in place of censuses.
Why Sampling?
The cost of collecting and processing data is obviously lower; the fewer are the elementary
units that have to be contacted.
Sampling saves time.
Study of the whole population is sometimes physically impossible, as when the number of
elementary units is infinitely large or when some of them are totally inaccessible.
Sampling error is less than that of non-sampling error.
A census is senseless whenever it produces information that comes too late.
Sampling can provide more accurate data than a census.
In case of destruction of samples from a lot, sampling is obvious, e.g., testing of tires,
respondents view about the taste of a drink, etc.
It needs skilled and trained interviewers.
Sometimes it is monotonous, bored and biased in collecting information from a huge
population.
Statistical Applications 4
As already noted sample has the overall advantage over census in studying a population. When
sample is used for population inference the question of validity of the results comes in. Statistics can
help in answering these questions in terms of level of confidence of the results.
1.3 Data
As statistics deals with data, to study the science of statistics needs a thorough understanding of the
term data. Data is the facts, attributes, observations or characteristics of an object (e.g., income,
occupation, food habits, etc.). Any single observation about a specified characteristic of interest is
called a datum - the basic unit of the statisticians raw material. Any collection of observations about
one or more characteristics of interest, for one or more elementary units, is called a data set. Data can
be thought of as the information needed to help us make a more informed decision in a particular
situation.
1.3.1 Why do we need data?
Statistical Applications 6
Ranked data are also categorical in nature. But they have order or rank (hierarchy) property inherent
among them.
Example 1.8
Metric data needs certain units of measurements and has continuity, i.e., these data have values that
are continuous over certain range, and are expressed with the help of standard units of measurements.
Example 1.9
Expenditures and income of a community can be examples of metric data. In many cases the
metric data can be converted to rank data, e.g.,
Income expressed as low (up to Tk. 5,000), medium (between Tk. 5,000 & Tk. 10,000) and high
(above Tk. 10,000),
or,
Height (cm) ranked in classes of 150-159, 160-169, 170-179, etc.,
or,
Weight (lb) ranked in classes of 105-115, 116-125,126-135, etc.
Levels of Measurements
Measurement is the process of assigning a value to the data and rules defining the assignment of an
appropriate value determine the level of measurement (sometimes termed as scales of measurement).
The levels of measurement are distinguished on the basis of ordering or distance properties inherent
in the measurement rules. Knowledge of the rules and the levels of measurement are important
because each statistical technique is appropriate for data measured at certain levels only. Traditionally,
four levels of measurements are identified: Nominal, Ordinal, Interval, and Ratio.
Nominal
Nominal level is the lowest level of measurements. Each value is a distinct category, which serves
merely as a label or name for the category. No ordering or distance properties among categories are
made. The real number properties (i.e., addition, subtraction, multiplication, and division) do not
apply to nominal level of measurements. Categorical data fall under this level of measurement. The
basic property of the nominal level of measurement is that the properties of objects in one category
are equal to each other, but not to anything else in their identical aspect. This logical property of
equivalence are (a) reflexivity (i.e., every object in one of the categories is equal to itself), (b)
symmetry (i.e., if a=b, then b=a), (c) transitivity (i.e., a=b and b=c, then a=c). These three logical
properties are operative among objects within the same category, but not necessarily between
categories.
Example 1.10
Names of continents: Asia, Africa, Australia, Europe, North America, South America,
Classification of a population by religions: Muslim, Hindu, Buddhist, Christian, etc.
Labeling rooms on 1st, 2nd or 3rd floors by numbers in the 100s, 200s, or 300s, respectively.
Statistical Applications 8
quantitative level, whereas nominal and ordinal levels fall under qualitative levels. The formal
properties characterizing each level of measurement are summarized in Table 1.1.
Level
Nominal
Ordinal
Interval
Ratio
1.4 Variables
In simple terms variables used to mean something that varies. The variable itself does not vary rather
its values vary. In other words, a variable is a symbol (e.g., X, Y, H, x, P, etc.), which can assume any
of a prescribed set of values, called the domain of the variable. If the variable can assume only one
value it is called a constant.
Any one elementary unit may possess one or more characteristics that interest the statistician. An
investigator may, indeed, be interested only in the age of each employee, but it would be just as
possible to observe, in addition, each employees sex or salary. The characteristics of elementary units
are themselves called variables, presumably because observations about these characteristics are
likely to vary from one elementary unit to next.
Example 1.14
Age represented by X is a variable. It can assume any value. But age of a particular person at
a particular point in time is constant.
Number of customers entering a departmental store each day (X) is an example of a variable.
Three types of variables can be distinguished in statistical applications: Numerical, Categorical, and
Rank.
Numerical
A numerical variable is a variable whose possible values are numbers. A numerical variable, which
can theoretically assume any value between two given values, is called a continuous variable;
otherwise it is called a discrete variable. Observations about a discrete quantitative variable can
assume values only at specific points on a scale of values, with gaps between them. Data that can be
described by a discrete or by a continuous variable are called discrete data or continuous data
respectively. As a thumb rule, enumeration or counting gives rise to discrete quantitative data, which
differ from each other by clearly defined steps; while measurements give rise to continuous
quantitative data without any gap in between.
Example 1.15
The household size N, which can assume any of the values 1, 2, 3, 4, . , but cannot be 2.4 or
3.75, is a discrete variable.
The height H of an individual, which can be 169 cm, or 169.6 cm, or 169.567 cm, depending
on the accuracy of measurement, is a continuous variable.
Lengths of 1000 bolts produced in a factory are an example of continuous data.
Number passengers in a bus can only take integers; hence, it is a discrete variable.
The number of books in a library shelve is an example of discrete data.
Categorical
A categorical variable is a variable whose values are expressed in words as categories rather than
numbers. A categorical variable with only two values is called a dichotomous variable.
Example 1.17
A categorical variable with more than two values is called polytomous variable.
Example 1.18
Religion wise: Muslim, Hindu, Christian, and Buddhist community in Dhaka city.
Satisfaction. Values: Highly satisfied, Satisfied, Indifferent, Dissatisfied, Highly dissatisfied
Rank
Rank variables are those values that can be ranked. Rank variables can also come from two sources:
naturally ranked variables and numerical variable converted to rank variables.
Example 1.20