Chapter 1
Basic ideas in Statistics
What Is Statistics?
Statistics is a science that deals with data collection, data organization, data summarization, data analysis,
data presentation and making inference on data.
Why Statistics?
In almost any area of work business, medicine, engineering, education, social sciences, and physical
sciences etc. you must be able to read, interpret, and apply the results of a statistical analysis of research
data.
Terminology
A statistical problem involves studying some characteristics associated with a group of
objects commonly called experimental units or subjects
A characteristic being studied that is associated with each experimental unit is called a
variable.
The collection of values measured on experimental units (called observations) associated with
the variable in the study is called a data set.
Population: Collection of all the objects or items that are of interest in a statistical study. The
individual objects in the population are the experimental units or subjects
Sample: A finite portion (subset) of the population that is used to study characteristics of
concern in the population
Parameter: Characteristic or measure obtained from a population.
Statistic (not to be confused with Statistics): Characteristic or measure obtained from a
sample.
Census: The collection of data from every element in the population.
Sampling: The collection of data from the elements of a sample.
Data
The basic foundation is Statistics is data
A collection of values that is used as a basis for describing situations and making conclusions.
The first step of the statistical process is data collection
Data Sources
2
Primary
Data
Data
Secondar
y Data
Primary Data
The primary data are the first hand information collected, compiled and published by yourself for
some special purpose. They are most original data in character and have not undergone any sort
of statistical treatment.
Advantages
Applicable and usable if done right
Accurate and reliable can answer your direct research questions
Up-to-date as you have collected the data
Disadvantages
Expensive
Not immediately available takes time to define problem, sampling frame, method and
analysis.
Not as readily accessible
Secondary Data
The secondary data are the second hand information which are already collected by someone (or
organization) for some purpose and are available for the present study. The secondary data are not
pure in character and have undergone some treatment at least once.
Advantages
Inexpensive
Easily accessible
Immediately available
Will also alert the researcher to any potential difficulties.
Disadvantages
Frequently outdated e.g. census data
Potentially unreliable not always sure where information has come from
May not be applicable may not totally answer your research questions
Lack of availability i.e. no data available or very difficult to obtain
Personal Interviews
3
People selected to face the survey or test must be met personally and information acquired
through the interview method (meeting between two people). the person who is interviewing is
named as interviewer(should have a good training, ability to develop mutual trust and confidence
etc.) and the person who is being interviewed Is named as informant.
Eg: In a supermarket, ask people about their purchasing behavior.
Advantages
(i) High response rate
(ii) Great flexibility (ability to adapt/explain questions)
Disadvantages:
(i) Relatively expensive
(ii) Possibility of interviewer and interviewee bias
(iii) Personal nature of questions (e.g., age or income) inaccurate or false data may be given to
the interviewer.
Mailed questionnaires
A QUESTIONNAIRE (also referred to as self-administered questionnaire) is a data collection
tool in which questions are presented in written form and that are to be answered by the
respondents in written form. A questionnaire can be administered in different ways
o Sending questionnaires by post with clear instructions on how to answer the questions
together with a self-addressed stamped envelope and asking the respondents to send
back;
o Gathering all or part of the respondents in one place at one time, giving oral or written
instructions, and letting the respondents fill out the questionnaires;
o Hand-delivering questionnaires to respondents and collecting them later.
Eg: Collect data for a Study on Consumers Experience and Attitudes Towards Online Shopping
in Sri Lanka by sending postal questionnaires.
Advantages
(i) Not very costly, save time
(ii) Can obtain responses to question that one dislike to answer directly.
(iii) No interviewer bias
(iv) Consistent questions (for all respondents)
Disadvantages:
(i) Low response rates (relative to other survey types)
(ii) No one being available to clarify any questions that are confusing.
(iii) Doubtful
Telephone Surveys
This is conducted over the telephone, by asking the respondent the question over the phone.
Advantages
(i) Quicker but will cost you some money (telephone bill)
(ii) Can cover large geographical areas
(iii) High response rates
Disadvantages:
(i) Not confidential
(ii) Only few have telephone
Internet/Email (Web) Surveys
4
These are surveys in which the respondents enter their answers directly using a computer.
Advantages
(i) Relative speed and flexibility. An email questionnaire can gather several thousand
responses within a day or two.
(ii) Uses graphics and visual aids
Disadvantages:
(i) Technical skills required
(ii) Cannot cover all class of people
Types of Variables
Variables
Numerical Categorical
(Quantitati (Qualitativ
ve) e)
5
Categorical
Values or observations that can be sorted into groups or categories.
Examples: Sex, Eye colour and Favourite colour.
Numerical
Values or observations that can be measured. And these numbers can be placed in ascending or
descending order. Examples: Height, Arm Span and Weight.
The numerical data that we will use in this course falls into 1 of 2 categories : discrete and
continuous.
A type of data is discrete if there are only a finite number of values possible or if there is
a gap on the number line between each 2 possible values
Continuous data makes up the rest of numerical data. This is a type of data that is usually
associated with some sort of physical measurement.
Levels of Measurement and Measurement Scales