Anda di halaman 1dari 9

Chapter One

Statistics: Mathematical and Nonmathematical Aspects


Contents

Statistics
Descriptive Statistics
Inferential Statistics
Few Basic Concepts
Population and Sample
Elementary Unit and Frame
Survey Versus Experiments
Census Taking or Sampling
Statistical Use in Business and Planning
Data
Why Do We Need Data
Different Data Focus
Nature
Source
Units or Types of Measurements
Levels of measurements
Variables
Numeric
Continuous
Discrete
Categorical
Rank

Most people associate the term statistics with numbers or, perhaps, tables and graphs that display
them. This mental image is reinforced daily as people encounter an abundance of numerical
information in newspapers, magazines, and on television screens regarding the prices of bonds and
stocks, performances of businesses and sports teams, rates of unemployment and inflation, poverty
and disease, accidents, crimes and the weather the list goes on. As a matter of fact, this widely held
view quite accurately depicts the original concern of the discipline as well as the continuing concern
of one of its modern branches.
1.1

Statistics

Statistical Applications 2
Statistics deals with data. Data is classified facts about a particular class of objects. The facts can be
quantitative or qualitative depending on the situation. Specifically statistics deals with data relating to
its collection, organization, presentation, summarization, description, and analysis. Statistics is also
concerned with facilitating wise decision making in the face of uncertainty. The science of Statistics
can, therefore, be viewed as
the application of the scientific method in the analysis of numerical data for the purpose of
making rational decisions in the face of uncertainty and that, therefore, develops and utilizes
techniques for the careful collection, effective presentation, and proper analysis of numerical
information.
Statistics is broadly divided into two distinct groups: Descriptive and Inferential.
Descriptive statistics, as the name reflects, deals with description or summarization of various
features of a data set of a given group using tables, diagrams and numerical measures. It mainly
focuses on the distribution, central tendency and spread of the group. Presenting the data in a
descriptive form is usually the first stage in any statistical analysis, as it helps us to spot any patterns
in the data set. The descriptive statistics is sometimes called primary analysis or deductive statistics.
Inferential statistics goes with further analysis of data normally of a smaller group (called sample) in
order to draw conclusions about the characteristics of a larger group (called population or universe).
Probability is the integral part of the inferential statistics. It is also called secondary analysis or
inductive statistics.
1.2

Few basic Concepts in Statistics

Understanding some basic terms is very important in familiarization of Statistics. These terms
include:

Population and Sample

Set of all possible observations about a specified characteristic of interest is called a population or universe. The
word population can also refer to a collection of entities. If the heights of all the 65 students of a class are of
interest, then the population consists of all these students.

A sample is a representative part of a population. If a sample of 45 households is taken from a


community for study, it should truly reflect the whole community. Even with a very closely selected
representative sample there are chances of error between the population and the sample, known as
sampling error.

Elementary Unit and Frame

The persons or objects possessing the characteristics that interest the statistician are referred to as
elementary units. A complete listing of all elementary units relevant to a statistical investigation is
called a frame. Thus, someone who wanted to learn about the racial composition of a firms labor
force would quickly identify the individual employees of that firm as the elementary units, but
someone concerned about the amount of credit extended by that firm might view the individual credit
accounts issued by it as the elementary units to be investigated.

Survey Versus Experiments

New data can be generated either by conducting a survey or by performing an experiment. A survey or
an observational study is the collection of data from elementary units without exercising any

Statistics: Mathematical and Nonmathematical Aspects 3


particular control over factors that may make these units different from one another and that may,
therefore, affect the characteristics of interest being observed.
Example 1.1
A characteristic such as the annual salary of different workers is simply being observed and
recorded without regard to factors, such as education, work experience, or length of service, that
make workers different from one another and that may be responsible for differences in their
salaries.
An experiment involves the collection of data from elementary units, while exercising control over
some or all factors that may make these units different from one another and that may, therefore,
affect the characteristics of interest being observed.
Example 1.2
A firm may divide its 40 new employees into two similar groups of equal size (with the help of
some random device) and then administer an obligatory special training program to one of the
groups only. If the 20 employees who went through the program exhibited superior productivity
later on, the training program might justifiably be credited because other factors that could account
for this result (such as group differences in age, motivation, or prior work experience) were
eliminated by the random division of the original group of 40 newcomers.
Experimental data tend to be stronger than the survey data. Unfortunately, most data in business and
planning, and in many other fields, are generated not by experiments but by surveys because it is
often impossible, or extremely costly, to exercise proper experimental controls.

Census taking or Sampling

Two types of surveys exist: complete and partial ones. Census or a complete survey involves
observations about one or more characteristics of interest for every elementary unit that exists. When
the number of elementary unit is very large, complete success in observing all of them is likely to
elude the census takers.
In a partial or sample survey, observations about one or more characteristics of interest are being
made for only a subset of all existing elementary units. There are good reasons why sample surveys
are often undertaken in place of censuses.
Why Sampling?
The cost of collecting and processing data is obviously lower; the fewer are the elementary
units that have to be contacted.
Sampling saves time.
Study of the whole population is sometimes physically impossible, as when the number of
elementary units is infinitely large or when some of them are totally inaccessible.
Sampling error is less than that of non-sampling error.
A census is senseless whenever it produces information that comes too late.
Sampling can provide more accurate data than a census.
In case of destruction of samples from a lot, sampling is obvious, e.g., testing of tires,
respondents view about the taste of a drink, etc.
It needs skilled and trained interviewers.
Sometimes it is monotonous, bored and biased in collecting information from a huge
population.

Statistical Applications 4

1.2 Statistical Use in Business and Planning


The more complex our world becomes both our needs for information and the quantities of
information available continue to expand rapidly. Managers or researchers in every field must plan
carefully, so that the quantity and quality of information they obtain are adequate to meet their needs.
1.2.1 Statistics for Decision-making
In the business world, the concepts, techniques, and results of statistics are indispensable components
of decision making. Statistics presents the decision-maker with relevant facts and, in many cases,
provides an estimate of the probability and/or the monetary consequences of making a wrong
decision. Before bringing a new product to market, a manufacturer want to arrive at some assessment
of the likely level of demand, and a market research survey may be undertaken.
1.2.2 Statistics for Forecasting
One of the major objectives of managers and researchers is to assemble information of sufficient
quantity and quality to forecast. In the use of both a management information system and the
scientific methods, the persons trained in statistics can make important contribution. Essentially,
forecasts of future values are obtained through the discovery of regularities in past behavior.
1.2.3

Statistics for Validity Measurement

As already noted sample has the overall advantage over census in studying a population. When
sample is used for population inference the question of validity of the results comes in. Statistics can
help in answering these questions in terms of level of confidence of the results.
1.3 Data
As statistics deals with data, to study the science of statistics needs a thorough understanding of the
term data. Data is the facts, attributes, observations or characteristics of an object (e.g., income,
occupation, food habits, etc.). Any single observation about a specified characteristic of interest is
called a datum - the basic unit of the statisticians raw material. Any collection of observations about
one or more characteristics of interest, for one or more elementary units, is called a data set. Data can
be thought of as the information needed to help us make a more informed decision in a particular
situation.
1.3.1 Why do we need data?

To provide the necessary input to a research study.


To measure performance in an ongoing service or production process.
To assist in formulating alternative courses of action in a decision making process.
To satisfy our curiosity.

1.3.2 Different Focus of Data


Data can be looked at from different angles.
Nature
As per nature, data can be simply divided into two groups:
(i) Qualitative Data

Statistics: Mathematical and Nonmathematical Aspects 5


Example 1.3
Sex, taste, Letter grades
(ii) Quantitative Data
Example 1.4
132 cm, 56 kilogram,
Source
Depending on the source, data can be mainly classified as primary or secondary.
Primary data are not readily available and as such they are collected directly from the field or
experiments. These are first hand data. Primary data can be collected through face-to-face
conversation, observation, field survey (using questionnaires), etc.
Example 1.5
1) Airline ticketing offices and travel agents have up-to-the minute information regarding space
availability on flights and hotels.
2) ATMs enable banking transactions to occur spontaneously with information immediately
recorded on account balances.
Secondary data are obtained from available sources, e.g., reports, records and documents of different
organizations/researches. They are the data compilers.
Example 1.6
Statistical Yearbooks, Journals, National trade data bank, World development report, etc.
Some authors use another term tertiary data. When World Development Report cites data using
another source it becomes tertiary.
Units or Types of Measurement
On the basis of units of measurement or types of measurement, data may be classified as Categorical,
Ranked, and Metric.
Categorical data are those in which individual objects are simply placed in the proper category or
group, and the number in each category is counted. No units are required to identify these, e.g., sex,
religion, students category on the basis of program, employment, etc.
Example 1.7
In a graduate level business statistics class the students are categorized into male and female. The
corresponding numbers are counted and are categorized as follows:
Male = 16
Female = 11
Total = 27

Statistical Applications 6
Ranked data are also categorical in nature. But they have order or rank (hierarchy) property inherent
among them.
Example 1.8

Student grades (e.g., A, B, C, D) have distinct rank order among them.


A community can be ranked on the basis of their income (e.g., low, middle, and high)
Students' education level (e.g., primary, secondary, higher secondary).
Households can be ranked on the basis of their age (e.g., children, adolescent, young and old).

Metric data needs certain units of measurements and has continuity, i.e., these data have values that
are continuous over certain range, and are expressed with the help of standard units of measurements.
Example 1.9
Expenditures and income of a community can be examples of metric data. In many cases the
metric data can be converted to rank data, e.g.,
Income expressed as low (up to Tk. 5,000), medium (between Tk. 5,000 & Tk. 10,000) and high
(above Tk. 10,000),
or,
Height (cm) ranked in classes of 150-159, 160-169, 170-179, etc.,
or,
Weight (lb) ranked in classes of 105-115, 116-125,126-135, etc.
Levels of Measurements
Measurement is the process of assigning a value to the data and rules defining the assignment of an
appropriate value determine the level of measurement (sometimes termed as scales of measurement).
The levels of measurement are distinguished on the basis of ordering or distance properties inherent
in the measurement rules. Knowledge of the rules and the levels of measurement are important
because each statistical technique is appropriate for data measured at certain levels only. Traditionally,
four levels of measurements are identified: Nominal, Ordinal, Interval, and Ratio.
Nominal
Nominal level is the lowest level of measurements. Each value is a distinct category, which serves
merely as a label or name for the category. No ordering or distance properties among categories are
made. The real number properties (i.e., addition, subtraction, multiplication, and division) do not
apply to nominal level of measurements. Categorical data fall under this level of measurement. The
basic property of the nominal level of measurement is that the properties of objects in one category
are equal to each other, but not to anything else in their identical aspect. This logical property of
equivalence are (a) reflexivity (i.e., every object in one of the categories is equal to itself), (b)
symmetry (i.e., if a=b, then b=a), (c) transitivity (i.e., a=b and b=c, then a=c). These three logical
properties are operative among objects within the same category, but not necessarily between
categories.
Example 1.10
Names of continents: Asia, Africa, Australia, Europe, North America, South America,
Classification of a population by religions: Muslim, Hindu, Buddhist, Christian, etc.
Labeling rooms on 1st, 2nd or 3rd floors by numbers in the 100s, 200s, or 300s, respectively.

Statistics: Mathematical and Nonmathematical Aspects 7


Political party affiliation: Democrat, Republican, Independent, Others.
Ordinal
In ordinal level of measurements it is possible to rank order all the categories according to certain
criterion exhibiting some kind of relation. Typical relations are higher, greater, more desired,
more difficult, and so on. Although ordering property is present in the ordinal level of
measurements, the distance property is absent. Hence, the real number properties cannot be applied
when dealing with ordinal level of measurements. Ranked data fall under this level of measurement.
Example 1.11
Letter grades of students: A, B, C, D.
Student class designation: Freshmen, Sophomore, Junior, Senior.
Product Satisfaction: very satisfied, satisfied, neutral, dissatisfied, very dissatisfied.
Faculty Rank: Professor, Associate Professor, Assistant Professor, Instructor.
Interval
As can be seen in the above examples, in the absence of the distance property it is difficult to relate
the gap between A and B, or B and C, and the like. In interval level of measurement this distance
property is added in addition to ordering property. Here the distances between the categories are
defined in terms of fixed and equal units. It is important to note here that in interval level of
measurement we study the difference between things and not their proportionate magnitude. In other
words the inherently determined zero point is not available in interval scale. Metric data fall under this
level of measurement.
Example 1.12
Temperature. Difference between 40C and 41C is the same as the difference between 80C and
81C. But 40C does not mean half of 80C
Ratio
The ratio level of measurement has all the properties of an interval level measurement in addition to a
well defined zero point. The zero point is inherently defined by the measurement scheme.
Consequently, the distance comparisons as well as ratio comparisons can be made. Any real number
applications are applied to ratio level of measurements. Metric data fall under this level of
measurement.
Example 1.13
Weight of students, Income of a household, Height (in inches), Age (in years or days), Salary (in
Tk).
It is important to note that a true interval level of measurement is difficult to be found. If the distances
between categories can be measured, a zero point can also be established. Another point to note is that
statistics developed for one level of measurement can always be used with higher-level variables, but
not with variables measures at a lower level. There are other typologies of measurements that we
come across quite frequently: Quantitative and Qualitative. The interval and ratio levels fall under

Statistical Applications 8
quantitative level, whereas nominal and ordinal levels fall under qualitative levels. The formal
properties characterizing each level of measurement are summarized in Table 1.1.
Level
Nominal
Ordinal
Interval
Ratio

Table 1.1: Levels of Measurements and Their Characteristic Properties


Equivalence
Greater Than
Fixed Interval
Natural Zero
Yes
No
No
No
Yes
Yes
No
No
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes

1.4 Variables
In simple terms variables used to mean something that varies. The variable itself does not vary rather
its values vary. In other words, a variable is a symbol (e.g., X, Y, H, x, P, etc.), which can assume any
of a prescribed set of values, called the domain of the variable. If the variable can assume only one
value it is called a constant.
Any one elementary unit may possess one or more characteristics that interest the statistician. An
investigator may, indeed, be interested only in the age of each employee, but it would be just as
possible to observe, in addition, each employees sex or salary. The characteristics of elementary units
are themselves called variables, presumably because observations about these characteristics are
likely to vary from one elementary unit to next.
Example 1.14

Age represented by X is a variable. It can assume any value. But age of a particular person at
a particular point in time is constant.

Number of customers entering a departmental store each day (X) is an example of a variable.
Three types of variables can be distinguished in statistical applications: Numerical, Categorical, and
Rank.
Numerical
A numerical variable is a variable whose possible values are numbers. A numerical variable, which
can theoretically assume any value between two given values, is called a continuous variable;
otherwise it is called a discrete variable. Observations about a discrete quantitative variable can
assume values only at specific points on a scale of values, with gaps between them. Data that can be
described by a discrete or by a continuous variable are called discrete data or continuous data
respectively. As a thumb rule, enumeration or counting gives rise to discrete quantitative data, which
differ from each other by clearly defined steps; while measurements give rise to continuous
quantitative data without any gap in between.
Example 1.15

The household size N, which can assume any of the values 1, 2, 3, 4, . , but cannot be 2.4 or
3.75, is a discrete variable.
The height H of an individual, which can be 169 cm, or 169.6 cm, or 169.567 cm, depending
on the accuracy of measurement, is a continuous variable.
Lengths of 1000 bolts produced in a factory are an example of continuous data.
Number passengers in a bus can only take integers; hence, it is a discrete variable.
The number of books in a library shelve is an example of discrete data.

Statistics: Mathematical and Nonmathematical Aspects 9


In business and planning, very often the concept of variable is linked to non-numerical (Qualitative)
entities.
Example 1.16

Flavor of ice cream, Sex, color of a pencil, etc.

Categorical
A categorical variable is a variable whose values are expressed in words as categories rather than
numbers. A categorical variable with only two values is called a dichotomous variable.
Example 1.17

Sex: Male and Female


Reach an agreement: Yes and No.

A categorical variable with more than two values is called polytomous variable.
Example 1.18

Religion wise: Muslim, Hindu, Christian, and Buddhist community in Dhaka city.
Satisfaction. Values: Highly satisfied, Satisfied, Indifferent, Dissatisfied, Highly dissatisfied

A categorical variable can be naturally categorized or converted to a categorical variable.


Example 1.19

Naturally categorized: Satisfaction, Response (Yes, No, No answer)


Converted to a categorical variable: Examination scores (0 to 100) converted to letter grades
(A, B, C, D).

Rank
Rank variables are those values that can be ranked. Rank variables can also come from two sources:
naturally ranked variables and numerical variable converted to rank variables.
Example 1.20

Naturally ranked: Class ranking (first, second, third, forth, )


Converted to a rank variable: test scores of 15 subjects reduced to ranks from 1 to 15.

Anda mungkin juga menyukai