Anda di halaman 1dari 22

Chapter I

Introduction
History of Statistics

• Statistics was derived from the New Latin statisticum collegium ("council of state") or the Italian word
“Statista”, meaning of these words is “Political State or Governmnet”
• In the past, the statistics was used by rulers.
• The application of statistics was very limited but rulers and kings needed information about lands,
agriculture, commerce, population of their states to assess their military potential, their wealth, taxation
and other aspects of government.
• At the beginning of the 20th century, William S Gosset developed the methods for decision making based
on small set of data.
• During the 20th century several statisticians are active in developing new methods, theories and
application of statistics.
• These days the availability of electronics computers is certainly a major factor in the modern development
of statistics.

Meaning of Statistics
• The word statistics has different meanings (sense) which are discussed below:

(1) Plural Sense:


• In plural sense, the word statistics refer to numerical facts and figures collected in a systematic manner
with a definite purpose in any field of study. In this sense, statistics are also aggregates of facts which are
expressed in numerical form. For example, Statistics on industrial production, statistics or population
growth of a country in different years etc.
(2) Singular Sense:
• In singular sense, it refers to the science comprising methods which are used in collection, organization
presentation, analysis, and interpretation of numerical data. These methods are used to draw conclusion
about the population parameter.

Importance of Statistics in Different Fields


• Statistics plays a vital role in every fields of human activity. It has important role in determining the
existing position of per capita income, unemployment, population growth rate, housing, schooling medical
facilities etc…in a country.
• Nowadays, statistics holds a central position in almost every field like Industry, Commerce, Trade, Physics,
Chemistry, Economics, Mathematics, Biology, Botany, Psychology, Astronomy etc…, so application of
statistics is very wide. Now we discuss some important fields in which statistics is commonly applied.
• 1. Statistics in business and management
• Statistics helps businessman to plan production according to the taste of the costumers, the quality of the
products can also be checked more efficiently by using statistical methods. So all the activities of the
businessman are based on statistical information. He can make correct decision about the location of
business, marketing of the products, financial resources etc
2. Statistics in Economics:
• Statistics plays an important role in economics. Economics largely depends upon statistics. National
income accounts are multipurpose indicators for the economists and administrators. Statistical methods
are used for preparation of these accounts. In economics, research statistical methods are used for
collecting and data analysis and testing hypothesis. The relationship between supply and demand, the
imports and exports, the inflation rate, the per capita income are the problems which require good
knowledge of statistics
3. Statistics in Mathematics:
• Statistical plays a central role in almost all natural and social sciences. The methods of natural sciences are
most reliable but conclusions draw from them are only probable, because they are based on incomplete
evidence.
• Statistical computations help in describing these measurements more precisely. Statistics is branch of
applied mathematics. The large number of statistical methods like probability averages, dispersions,
estimation etc… is used in mathematics and different techniques of pure mathematics like integration,
differentiation and algebra are used in statistics.
4. Statistics in State Management (Administration):
• Statistics is essential for a country. Different policies of the government are based on statistics. Statistical
data are now widely used in taking all administrative decisions. Suppose if the government wants to revise
the pay scales of employees in view of an increase in the living cost, statistical methods will be used to
determine the rise in the cost of living. Preparation of federal and provincial government budgets mainly
depends upon statistics because it helps in estimating the expected expenditures and revenue from
different sources. So statistics are the eyes of administration of the state.
5. Statistics in Astronomy:
• Astronomy is one of the oldest branches of statistical study; it deals with the measurement of distance,
sizes, masses and densities of heavenly bodies by means of observations. During these measurements
errors are unavoidable so most probable measurements are founded by using statistical methods.
Example: This distance of moon from the earth is measured. Since old days the astronomers have been
statistical methods like method of least squares for finding the movements of stars. .
6. Statistics and industry:
• In industry statistics is widely used inequality control. In production engineering to find out whether the
product is confirming to the specifications or not. Statistical tools, such as inspection plan, control chart
etc.

7. Statistics, psychology and education: In education and physiology statistics has found wide application
such as, determining or to determine the reliability and validity to a test, factor analysis etc.
8 . Statistics and war: In war the theory of decision function can be a great assistance to the military and
personal to plan “maximum destruction with minimum effort.”
• Statistical tools are very useful in the fields of defense and war because it helps to compare the military
strength of different countries in terms of man power, tanks, war-aero planes, missiles etc. Moreover, it
helps in planning future military strategy of the country. It helps to estimate the loss due to war. It helps to
arrange the war finance.
9. Statistics and State:
• Statistics are the eyes of state as they help in administration. In the ancient times, the ruling kings and
chiefs have to rely heavily on statistics to frame suitable military and fiscal policies. Similarly, modern
states make tremendous use of statistical tools on various problems.
Similarly, state conducts the population census to estimate the figures of national income and the prosperity
of the country. In this way, state is the most single unit which not only collects the largest amount of statistics
but also needs statistics on a very extensive scale.
10. Statistics in Research:
• Statistical techniques are of immense use in any research enquiry. In the field of industry and commerce,
researches are made to find out the causes of variations of different products.
• Similarly, various market research is made with the help of statistical techniques. Even in literary field,
various researches are made in which various types of statistical data are used.
Everyday Reasons Why Statistics Are Important

• Statistics are sets of mathematical equations that are used to analyze what is happening in the world
around us. When used correctly, statistics tells us any trends in what happened in the past and can be
useful in predicting what may happen in the future.
1. Weather Forecasts
• Considering a weather forecast sometime during the day, Have you ever heard the forecaster talk about
weather models? These computer models are built using statistics that compare prior weather conditions
with current weather to predict future weather.
2. Emergency Preparedness
• What happens if the forecast indicates that a hurricane is imminent or that tornadoes are likely to occur?
Emergency management agencies move into high gear to be ready to rescue people. Emergency teams
rely on statistics to tell them when danger may occur.
.3. Predicting Disease
• Lots of times on the news reports, statistics about a disease are reported. If the reporter simply reports
the number of people who either have the disease or who have died from it, it's an interesting fact but it
might not mean much to your life. But when statistics become involved, you have a better idea of how
that disease may affect you.
• For example, studies have shown that 85 to 95 percent of lung cancers are smoking related. The statistic
should tell you that almost all lung cancers are related to smoking and that if you want to have a good
chance of avoiding lung cancer, you shouldn't smoke
4. Medical Studies
• Scientists must show a statistically valid rate of effectiveness before any drug can be prescribed. Statistics
are behind every medical study you hear about.
5. Genetics
• Many people are afflicted with diseases that come from their genetic make-up and these diseases can
potentially be passed on to their children. Statistics are critical in determining the chances of a new baby
being affected by the disease.
• 6. Political Campaigns
• Whenever there's an election, the news organizations consult their models when they try to predict who
the winner is. Candidates consult voter polls to determine where and how they campaign. Statistics play a
part in who your elected government officials will be
7. Insurance
• You know that in order to drive your car you are required by law to have car insurance. If you have a
mortgage on your house, you must have it insured as well. The rate that an insurance company charges
you is based upon statistics from all drivers or homeowners in your area.
8. Consumer Goods
• Example, a worldwide leading retailer of imported , keeps track of everything they sell and use statistics to
calculate what to ship to each store and when.
9. Quality Testing
• Companies make thousands of products every day and each company must make sure that a good quality
item is sold. But a company can't test each and every item that they ship to you, the consumer. So the
company uses statistics to test just a few, called a sample, of what they make. If the sample passes quality
tests, then the company assumes that all the items made in the group, called a batch, are good.
10. Stock Market
• Another topic that you hear a lot about in the news is the stock market. Stock analysts also use statistical
computer models to forecast what is happening in the economy
Difference Between Descriptive and Inferential Statistics

• Descriptive Statistics is that branch of statistics which is concerned with describing the
population under study
• What it does? Organize, analyze and present data in a meaningful way.
• Form of final Result : Charts Graphs and Tables
• Usage: To describe a situation
• Function: It explains the data, which is already known, to summarize sample.

• Inferential Statistics is a type of statistics, that focuses on drawing conclusions about the
population, on the basis of sample analysis and observation.
• What it does? Compares, test and predicts data
• Form of final Result: Probability
• Usage: To explain the chances of occurrence of an event
• Function: It attempts to reach the conclusion to learn about the population, that extends beyond the data
available
Difference between a population and a sample?
• When we think of the term “population,” we usually think of people in our town, region, state or country
and their respective characteristics such as gender, age, marital status, ethnic membership, religion and so
forth. In statistics the term “population” takes on a slightly different meaning. The “population” in
statistics includes all members of a defined group that we are studying or collecting information on for
data driven decisions.
• In simple terms, population means the aggregate of all elements under study having one or more
common characteristic, for example, all people living in India constitutes the population. The population is
not confined to people only, but it may also include animals, events, objects, buildings, etc. It can be of any
size, and the number of elements or members in a population is known as population size. Denoted by N
while the sample size is denoted by n.
• Population is defined as the whole set of data, individuals, events or objects etc on which the researcher is
performing research.
The whole area of study is included in a population. While, the sample is relatively smaller. It is a subset of
the population. Since it is difficult to handle and analyze each and every member in the population, a
smaller and representative portion from the population is picked up. This is called sample.
Definition of Sample
• A part of the population is called a sample. The sample is a proportion of the population, a slice of it, a
part of it and all its characteristics. A sample is a scientifically drawn group that actually possesses the
same characteristics as the population – if it is a sample drawn randomly.
• By the term sample, we mean a part of population chosen at random for participation in the study. The
sample selected should represent the population in all its characteristics, and it should be free from bias,
so as to produce miniature cross-section, as the sample observations are used to make generalizations
about the population.
• In other words, the respondents selected out of population constitutes a ‘sample’, and the process of
selecting respondents is known as ‘sampling.’ The units under study are called sampling units, and the
number of units in a sample is called sample size.
• While conducting statistical testing, samples are mainly used when the sample size is too large to include
all the members of the population under study
• Sample and population are related to each other, i.e. sample is drawn from the population, so without
population sample may not exist. Further, the primary objective of the sample is to make statistical
inferences about the population, and that too would be as accurate as possible. The greater the size of the
sample, the higher is the level of accuracy of generalization
Difference Between Parameter and Statistic/Estimate
• A fixed characteristic of population based on all the elements of the population is termed as the
parameter. Here population refers to an aggregate of all units under consideration, which share common
characteristics.
A descriptive measure (such as mean, mode, or median) of a population is known as a parameter. It
numerically expresses the value for an attribute by summarizing the available data. As indicated earlier, it
is impossible to consider the values for attribute over the whole population. Therefore, the sample is used
to calculate the measures and then infer them into the population.
• A statistic is defined as a numerical value, which is obtained from a sample of data. It is a descriptive
statistical measure and function of sample observation. A sample is described as a fraction of the
population, which represents the entire population in all its characteristics. The common use of statistic is
to estimate a particular population parameter.
• In population parameter, µ (Greek letter mu) represents mean, P denotes population proportion,
standard deviation is labeled as σ (Greek letter sigma), variance is represented by σ2, population size is
indicated by N, Standard error of mean is represented by σx̄, standard error of proportion is labeled as σp,
standardized variate (z) is represented by (X-µ)/σ, Coefficient of variation is denoted by σ/µ.

• In sample statistic, x̄ (x-bar) represents mean, p̂ (p-hat) denotes sample proportion, standard deviation is
labeled as s, variance is represented by s2, n denotes sample size, Standard error of mean is represented
by sx̄, standard error of proportion is labeled as sp, standardized variate (z) is represented by (x-x̄)/s,
Coefficient of variation is denoted by s/(x̄)
What is a variable?

• A variable is any characteristics, number, or quantity that can be measured or counted. A variable may
also be called a data item. Age, sex, business income and expenses, country of birth, capital expenditure,
class grades, eye colour and vehicle type are examples of variables. It is called a variable because the value
may vary between data units in a population, and may change in value over time.
For example; 'income' is a variable that can vary between data units in a population (i.e. the people or
businesses being studied may not have the same incomes) and can also vary over time for each data unit
(i.e. income can go up or down).

What are the types of variables


• There are different ways variables can be described according to the ways they can be studied, measured,
and presented.
Numeric variables have values that describe a measurable quantity as a number, like 'how many' or
'how much'. Therefore numeric variables are quantitative variables.
Numeric variables may be further described as either continuous or discrete:
• A continuous variable is a numeric variable. Observations can take any value between a certain set of
real numbers. The value given to an observation for a continuous variable can include values as small as
the instrument of measurement allows. Examples of continuous variables include height, time, age, and
temperature.
• A discrete variable is a numeric variable. Observations can take a value based on a count from a set of
distinct whole values. A discrete variable cannot take the value of a fraction between one value and the
next closest value. Examples of discrete variables include the number of registered cars, number of
business locations, and number of children in a family, all of of which measured as whole units (i.e. 1, 2, 3
cars).


• The data collected for a numeric variable are quantitative data.

Categorical variables have values that describe a 'quality' or 'characteristic' of a data unit, like 'what
type' or 'which category'. Categorical variables fall into mutually exclusive (in one category or in another)
and exhaustive (include all possible options) categories. Therefore, categorical variables are qualitative
variables and tend to be represented by a non-numeric value. Categorical variables may be further
described as ordinal or nominal:
An ordinal variable is a categorical variable. Observations can take a value that can be logically ordered
or ranked. The categories associated with ordinal variables can be ranked higher or lower than another,
but do not necessarily establish a numeric difference between each category. Examples of ordinal
categorical variables include academic grades (i.e. A, B, C), clothing size (i.e. small, medium, large, extra
large) and attitudes (i.e. strongly agree, agree, disagree, strongly disagree).
• A nominal variable is a categorical variable. Observations can take a value that is not able to be
organised in a logical sequence. Examples of nominal categorical variables include sex, business type, eye
colour, religion and brand.
• The data collected for a categorical variable are qualitative data
Types of variables flowchart
• Data can be understood as the quantitative information about a specific characteristic. The characteristic
can be qualitative or quantitative, but for the purpose of statistical analysis, the qualitative characteristic is
transformed into quantitative one, by providing numerical data of that characteristic. So, the quantitative
characteristic is known as a variable. And this refers to the discrete and continuous variable.
• These are also known as the result of numerical measurement or the
• Example can be a small collection of sea shells gathered on the beach . All the shells in the collection are
similar: small disk-shaped shells with a hole in the center. But the shells also differ from one another in
overall size and weight, in color, in smoothness, in the size of the hole, etc. Any data set is something like
the shell collection. It consists of cases: the objects in the collection. Each case has one or more attributes
or qualities, called variables. This word “variable” emphasizes that it is differences or variation that is
often of primary interest. Usually, there are many possible variables. The researcher chooses those that
are of interest, often drawing on detailed knowledge of the system that is under study. The researcher
measures or observes the value of each variable for each case.
Univariate vs. Bivariate Data
• Statistical data are often classified according to the number of variables being studied.
• Univariate data. When one conducts a study that looks at only one variable, then she is working with
univariate data. Example, conducting a survey to estimate the average weight of high school students,
the researcher will be working with only one variable (weight), called univariate data
• Bivariate data. When one conducts a study that examines the relationship between two variables, she is working
with bivariate data. Example, conducting a study to see if there is a relationship between the height and weight
of high school students. The researcher will be working with two variables (height and weight), considered as
bivariate data
Qualitative and Quantitative Data
• Data collected about a numeric variable will always be quantitative and data collected about a categorical variable
will always be qualitative. Therefore, you can identify the type of data, prior to collection, based on whether the
variable is numeric or categorical.
Why are quantitative and qualitative data important?
• Quantitative and qualitative data provide different outcomes, and are often used together to get a full picture of
a population. For example, if data are collected on annual income (quantitative), occupation data (qualitative)
could also be gathered to get more detail on the average annual income for each type of occupation.
How can you use quantitative and qualitative data?
• It is important to identify whether the data are quantitative or qualitative as this affects the statistics that can be
produced
Examples:
• The age of your car. (Quantitative.)
• The number of musical instruments at home. (Quantitative.)
• The softness of a cat. (Qualitative.)
• The color of the sky. (Qualitative.)
• The number of coins in your pocket. (Quantitative)
The Four Levels/Scales of Measurement

Why is Level of Measurement Important?


• It is important for the researcher to understand the different levels of measurement, as these levels of
measurement, together with how the research question is phrased, dictate what statistical analysis is appropriate.
• First, knowing the level of measurement helps you decide how to interpret the data from that variable.
• Second, knowing the level of measurement helps you decide what statistical analysis is appropriate on the values
that were assigned. If a measure is nominal, then you know that you would never average the data values or do a
t-test on the data.
• There are typically four levels of measurement that are defined:
• 1. Nominal
• 2. Ordinal
• 3. Interval
• 4. Ratio
• In nominal measurement the numerical values just "name" the attribute uniquely. No ordering of the cases is
implied. The nominal level of measurement is the lowest of the four ways to characterize data. Nominal means "in
name only" and that should help to remember what this level is all about. Nominal data deals with names,
categories, or labels.
• Data at the nominal level is qualitative. Colors of eyes, yes or no responses to a survey, and favorite breakfast
cereal all deal with the nominal level of measurement. Even some things with numbers associated with them,
such as a number on the back of a football jersey, are nominal since it is used to "name" an individual player on
the field.
• Data at this level can't be ordered in a meaningful way, and it makes no sense to calculate things such as means
and standard deviations
• In ordinal measurement the attributes can be rank-ordered. Here, distances between attributes do not
have any meaning. For example, on a survey you might code Educational Attainment as 0=less than high
school; 1=some high school.; 2=high school degree; 3=some college; 4=college degree; 5=post college. In
this measure, higher numbers mean more education. The interval between values is not interpretable in
an ordinal measure. Data at this level can be ordered, but no differences between the data can be taken
that are meaningful.
In interval measurement the distance between attributes does have meaning. For example, when we
measure temperature (in Fahrenheit), the distance from 30-40 is same as distance from 70-80. The
interval between values is interpretable. Because of this, it makes sense to compute an average of an
interval variable, But note that in interval measurement ratios don't make any sense - 80 degrees is not
twice as hot as 40 degrees (although the attribute value is twice as large). The interval level of
measurement deals with data that can be ordered, and in which differences between the data does make
sense.
• In ratio measurement there is always an absolute zero that is meaningful. This means that you can
construct a meaningful fraction (or ratio) with a ratio variable. Weight is a ratio variable. In applied social
research most "count" variables are ratio, for example, the number of clients in past six months. Why?
Because you can have zero clients and because it is meaningful to say that "...you had twice as many
clients in the past six months as you did in the previous six months."
• It's important to recognize that there is a hierarchy implied in the level of measurement idea. At lower
levels of measurement, assumptions tend to be less restrictive and data analyses tend to be less
sensitive. At each level up the hierarchy, the current level includes all of the qualities of the one below
it and adds something new. In general, it is desirable to have a higher level of measurement (e.g.,
interval or ratio) rather than a lower one (nominal or ordinal .

Anda mungkin juga menyukai