Data Collection

Getting Data Ready

Data Analysis

Week 10

Learning Outcomes

• Data Collection methods

• Data Analysis – Feel of Data

• Data Analysis – Goodness of Data

• Data Analysis – Hypothesis testing

What type of Data? – Primary and Secondary

PRIMARY DATA:

• Data originated by a researcher

for the specific purpose of

addressing the research

problem.

• Included among primary

sources are survey, interviews

TYPES

SECONDARY DATA:

• Data collected from a source PRIMARY SECONDARY

DATA DATA

that has already been published

in any form

• Included among secondary data

are reports available in the

libraries/ internet, magazines,

newspapers etc.

Primary Research Methods &

Techniques

Primary

Research

In-house, self-

administered Interviews

Observation

Telephone,

fax, e-mail,

Web, Google Observation

Form Simulation

Case studies

Collection of Secondary Data

Refers to information that has been already gathered

by someone (individual or agencies) and readily

available to the researcher.

characteristics:

• Reliability of data

• Suitability of data

• Adequacy of data

Secondary data

Data someone else has collected

Sources.

1. Internal company Records

2. Company reports The research method

consists of how the

3. Internal computer databases

researcher:

4. Reports and publications of • collects,

government agencies • analyzes,

5. Other publications. • interprets the data

(Creswell, 2009).

6. Computerized databases.

EBSCO, PROQUEST, Emerald

A Classification of Published Secondary

Sources

Published Secondary

Data

Sources Sources

Data Data Government

Publications

Using secondary data for research (2)

Types of secondary data

Figure 8.1 Types of secondary data

Sources in Malaysia

https://www.statistics.gov.my/

Share prices

Department of Statistics

Primary Data

Collection

Sources and Methods of Data

collection

1. Describe how the questionnaire was administered and discuss

problems encountered

• Internet- or intranet-mediated,

• postal,

• delivery and collection,

• telephone

• interview

• What method was used

• How many questionnaires/respondents

• How was the response rate and follow up

If pilot testing was done, state the reliability and validity testing

results and what amendments done to questionnaire

Data Collection: Qualitative vs Quantitative

Approaches

Qualitative

• Focus Group

• Interview

• Case Study

• Participant observation

• Secondary data analysis

Quantitative

• Surveys

• Experiments

• Structured observation

• Secondary data analysis

Observation

• Intensive, usually long term, examination of a social

group, an organization, etc.

group members

– Observes their behavior and learns meaning systems

(which are tied to language)

developed in Classical Anthropology

Interviews

• Unstructured – e.g. ethnographic interviewing – researcher

allows interview to proceed at respondent’s pace and subjects

to vary by interviewee (to an extent)

(predominantly open-ended). Further probing

Everyone asked exactly the same questions in exactly the

same way, given exactly the same choices

Type of interview to use

• Individual

• Focus group

• Telephone

Focus group

A small group of people are • A type of group

brought together to discuss interview.

• Focuses on group

specific topics of interest to the interaction on a

researcher. topic selected by

the researcher.

The group process tends to elicit • Ideally 4- 12

participants.

more information than • The interaction is

directed by a

individual interviews because moderator who asks

there is cross-conversation and questions and

keeps the

discussion. discussion on the

Different views can be explored. topic.

15

Surveys

• A survey involves data

collection from a large number

of respondents using a

predesigned questionnaire.

• Four basic survey methods:

– Telephone survey

– Self Administered – mail/e-mail

– Person-administered surveys

(distribute and collect)

16

In data analysis we have three objectives:

1. getting a feel for the data,

2. testing the goodness of data, and

3. testing the hypotheses developed for the research

Getting Data Ready

Preparing the Data for

Analysis

Data Processing

coding/editing/transformation

Stages of Data Analysis

• Getting data

ready for

analysis(SPSS)

• Editing

• Handling Blank

Reponses

• Coding

• Categorizing

• Creating data file

• Programming

Preparing the Data for Analysis

Data Editing

• Editing consists of scrutinizing the filled-up survey

instrument to identify and minimize errors,

incompleteness, misclassification and gaps in the

information obtained from the target respondents.

Omissions: Respondents often fail to answer a single question

or a section of the questionnaire, either deliberately or

inadvertently.

Ambiguity: A response might not be legible or it might be unclear

which of the two boxes is checked in a multiple response system.

Inconsistencies: Sometimes two responses can be logically

inconsistent.

Ineligible respondent: An inappropriate respondent may be

included in the sample.

Coding

• All the information collected from the respondent must be

converted into numerical values before we enter the data

in the spreadsheet.

– Coding for closed-ended questions

– Coding for open-ended questions

Data Entry

Excel

SPSS

• To manage the quantitative data, it has to be entered into

a statistical software. Most of the statistical packages

such as SPSS, SAS and STATISTICA are able to handle

large dataset.

There are two ways to open the data spreadsheet in SPSS:

Open SPSS, click on TYPE IN DATA and then click OK.

Open an existing SPSS data file under FILE click on NEW.

Replacing the Missing Value

The four options available are:

• Option 1: Replace values with numbers that are known

from prior knowledge or from an educated guess. Easily

done but can lead to researcher bias if you are not

careful.

• Option 2: Replace missing values with variable mean.

The simplest option but it does lower variability and in turn

can bias the result.

• Option 3: Replace missing values with a group mean (i.e.

the mean for prejudice grouped by ethnicity). The missing

value is replaced with the mean of the group that the

subject belongs to. A little more complicated, but there is

not as much of a reduction in the variability.

• Option 4: Using regression to predict the missing values.

Data Transformation

Summated and Average Summated Score

• A process of changing using SPSS

the original data to a 1.Select Transform Compute Variable

new format. 2.Type the variable name in the box labelled

Target Variable

• Summated scores 3.Click on parentheses to place in Numeric

Expression box

• Measurement scales 4. Highlight fi rst variable of interest and click

that consist of both on arrow box

5.Click on “” sign

negatively and positively 6.Repeat (4) and (5) unless all variables of

worded statements interest are placed in Numeric

7.Expression box

(reverse code the 8.Click OK to get the summated score (skip

questions). step 7 if interested in average summated score)

9.Click on the cursor to place it after

• Birth date needs to be parentheses and then click slash “/” sign

transformed into real 10.Click on required number (the required

number is total number of variables in Numeric

age of the respondents. Expression box)

11.Click OK to get the average summated

Module Code and Module Title score

Title of Slides

Descriptive & Inferential Statistics

• Organize

• Summarize • Generalize from

• Simplify samples to population

• Presentation of data • Hypothesis testing

• Relationships among

variables

Using data gathered on a group to Using sample data to reach

describe or reach conclusions conclusions about the population

about that same group only from which the sample was taken

SPSS: Data View and Variable view window

This window shows the actual data Contains information about the data set

values and the name of the variables. that is stored with the database

Data Analysis – Uma Sekaran

Data Analysis

In data analysis we have three objectives:

1. getting a feel for the data,

2. testing the goodness of data, and

3. testing the hypotheses developed for the

research.

We can classify analysis into three types:

1. Univariate analysis, involving a single variable at one time

2. Bivariate analysis, involving two variables at a time

3. Multivariate analysis, involving three or more variables

simultaneously

1. Feel of the data

We can acquire a feel for the data by

checking the central tendency and the

Frequency

dispersion.

Distributions

Examination of the measure of central tendency, and

Mean, SD

how clustered or dispersed the variables are, gives a

Range

good idea of how well the questions were framed for

tapping the concept

• Frequency distributions – For categorical

variables

• The mean, standard deviation, range, and

variance on the other dependent and

independent variables

The mean, the range, the standard deviation, and the

variance in the data will give the researcher a good

idea of how the respondents have reacted to the items

in the questionnaire and how good the items and

measures are. Source: Sekaran: 2003

2. Testing Goodness of data

Validity & Reliability

.

Reliability is:

the consistency of your measurement

instrument

Validity asks

if an instrument measures what it is supposed

to measure

Testing Goodness of data

Validity & Reliability

Reliability is: Reliability

the consistency of “…the degree to which a

your measurement test or measure produces the

same scores when applied in

instrument the same circumstances…”

(Nelson 1997)

Validity asks

if an instrument Validity

measures what it is “Degree to which a test or

supposed to instrument measures what it

measure purports to measure”

(Thomas & Nelson 1996)

Testing Goodness of data -

Reliability

• Reliability

produces the same scores when applied in the

same circumstances…”

(Nelson 1997)

Cronbach‟s alpha is a reliability coefficient that indicates how well the

items in a set are positively correlated to one another.

Goodness of data - Validity

“Degree to which a test or instrument

measures what it purports to measure”

(Thomas & Nelson 1996)

instrument in measuring what it is designed to

measure”

(Vincent 1999)

VALIDITY IN QUANTITATIVE

RESEARCH

• Concurrent

• Construct

• Content

• Criterion-related

• Convergent & discriminant

• Face

• Predictive

• External

Normality

• Many statistical methods

require that the numeric

variables we are working

with have an approximate

normal distribution.

• For example, t-tests, F-tests,

and regression analyses all

require in some sense that the

Standardized

numeric variables are normal distribution

approximately normally with empirical rule

distributed. percentages.

In a symmetrical distribution, median, mode and mean all fall at the same point

3. Hypothesis Testing

What is Hypothesis

A Hypothesis is the statement or an assumption about

relationships between variables

range/missing responses, etc., are cleaned up, and the

goodness of the measures is established), the

researcher is ready to test the hypotheses already

developed for the study.

Statistics: What’s What?

• Descriptive • Comparative

objectives/ research objectives/

questions: hypotheses

Descriptive

Statistics

3 Types

Frequency Distributions

# of items that fall Describe data in just one

in a particular category number

2. Graphical Representations

Descriptive Statistics

• Number • Variability

• Frequency Count • Variance and

• Percentage standard deviation

• Measures of Central • Graphs

Tendency • Normal Curve

– Mean

– Median

– Mode

Mode – the most frequently occurring observation

Median – the middle value in the data (50 50 )

Mean – arithmetic average

Describe what's going on in our data.

Basic Statistics

Measures of Central Tendency

• Measures of central tendency are measures of the

location of the middle or the centre of a distribution.

• Mean

– The arithmetic mean (AM), commonly known as mean or

average, is the most often used measure of central

tendency. The mean is computed by adding all the data

values together and dividing by n, where n represents the

total number of data values. Symbolically,

Median

Example

Ordered Array includes:

3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22

• There are 17 terms in the ordered array.

• Position of median = (n+1)/2 = (17+1)/2 = 9

• The median is the 9th term, 15.

• The median of a set of scores represents the middle value (50th

percentile) when the scores are arranged as an array in

ascending or descending order.

• To calculate the median, first rank the scores in ascending or

descending order and follow these two guidelines:

– For an odd number of scores, the median is the middle

score.

– For an even number of scores, the median is the mean

(arithmetic average) of the two middle scores.

Mode

• The mode represents the most frequently occurring

score. When two scores occur with the same greatest

frequency, each one equals the mode and the data set

is considered bimodal. When more than two scores

occur with the greatest frequency, the data set is said

to be multimodal. 35 41 44 45

37 41 44 46

• The mode is 44.

37 43 44 46

• There are more 44s

39 43 44 46

than any other value.

40 43 44 46

40 43 45 48

Measures of Dispersion

• Measures of dispersion or variability give us information

about the spread of the scores in our distribution.

Standard Deviation

dominating role for the study of variation in

the data and it is the most widely used

measure of dispersion.

Variance

It is defined as the average of the squared

difference between each of the observations in a set

of data and

the mean. The variance is denoted by and is

calculated as:

Univariate data analysis: descriptive statistics

Standard

N Minimum Maximum Mean

Deviation

Convenie

nce

30 3.67 7.00 5.0111 .93663

Measures of Shape

• One is typically interested to know how well the distribution

can be approximated by the normal distribution.

Skewness

• When the distribution of item in a series is perfectly

symmetrical, a curve is technically described as normal

curve (Figure 11.1) and the relating distribution as a

normal distribution.

Normal Curve Shape

to the median.

The tails of the distribution are the parts to the left and to the

right, away from the mean

Positively skewed distribution OR

Right skewed

longer tail to the right of the distribution, it is known as

positively skewed distribution. In this case,

mean > median > mode.

the median. Also notice that the tail of the distribution on the right

hand (positive) side is longer than on the left hand side.

Negatively skewed distribution OR Left Skewed

Negatively skewed distribution: If the frequency curve has

longer tail to the left of the distribution, it is known as

negatively skewed distribution . In this case, mean < median

< mode.

Kurtosis

normal curve is known as leptokurtic. On the

other hand, if the curve is more flat-topped than

the normal curve, it is called platykurtic. A

normal curve itself is called mesokurtic, which is

neither too peaked nor too flat-topped

General forms of Kurtosis

Graphical Method of Presenting Data

Descriptive Statistics

Types of descriptive statistics:

• Organize Data

– Tables

– Graphs

• Summarize Data

– Central Tendency

– Variation

Descriptive Statistics

Types of descriptive statistics:

• Organize Data

– Tables

• Frequency Distributions

• Relative Frequency Distributions

– Graphs

• Bar Chart

• Histogram

• Stem and Leaf Plot

Graphical Method of Presenting Data—histogram

To show the distribution of values

Histogram

Distribution

may be

leptokurtic

(peaked)

Positively

skewed?

Graphical Method of Presenting Data—bar

chart

Graphical Method of Presenting Data—

multiple bar chart

11.9.4 Graphical Method of

Presenting Data—pie chart

Description: Research Response Rate

2. What was the response rate

3. Any follow up done

4. Finally how many received

5. After editing, how many good questionnaires

used for analysis

Demographics

Present Demographics Describe the

sample:

1. The age, gender, or relevant related

information on the population

2. Summarize the demographics of the

sample, and present in a table format

after the narration (as per sample)

(Simon, 2006).

3. Otherwise, the table is included as an

Appendix and referred to in the

narrative of chapter four

To show highest and lowest

values

Example: Frequency

Distribution

Inferential

statistics

We use inferential statistics to make inferences

from our data to more general conditions;

Eg: t-test, Analysis of Variance (ANOVA)

what's going on in our data.

Eg: Mean, Median, Normality

Research Methods for Degree Study

Inferential Statistics

Sample

Sample

Sample

Population

Sample Parametric

T-tests

ANOVA

Inferential statistics are used Correlation

to draw conclusions about a Multiple regression

population by examining the ANCOVA

sample Non-parametric

Module Code and Module Title

Research Methods for Degree Study

Descriptive Statistics vs. Inferential Statistics

Allows us to draw

Allow us to say whether

conclusions

difference is significant

Through use of graphs

This difference

Is significant

Research Methods for Degree Study

Levels of significance and Confidence

• The level of significance is the predetermined

level at which a null hypothesis is not

supported. The most common level is p < .05

– P =probability

– < = less than (> = more than)

Significance level of 0.05 indicates a 5% risk of

concluding that a difference exists when there is

no actual difference.

predetermined level, α. α is called the significance level, and

is the probability of rejecting the null hypothesis given that it

is true (a type I error). It is usually set at or below 5%.

Research Methods for Degree Study

P-levels/Significance Levels

Inferential tests use probability to ascertain the likelihood that a pattern

of results could have arisen by chance.

level we assume these results to be significant

C

P ≤0.10

H

A

P ≤0.05

N

C

P ≤0.01

E P ≤0.001

Research Methods for Degree Study

Tutorial

1. Working in groups, read the attached

sample of research (or any other past

research) and explain the

methodology used.

• Procedures/Data Collection

• Data Analysis

Research Methods for Degree Study

Research Methods for Degree Study

Validity

1. Face Validity

– extent to which the measurement method appears

“on its face” to measure the construct of interest

– Infers that a test is valid by definition

e.g. People might have negative reactions to an intelligence test that

did not appear to them to be measuring their intelligence

2. Content Validity

– Infers that the test measures all aspects contributing to the

variable of interest

– Extent to which the measurement method covers the entire

range of relevant behaviors, thoughts, and feelings that

define the construct being measured.

e.g. a test to assess one’s attitude toward taxes should include

items about thoughts, feelings, and behaviors.

Research Methods for Degree Study

Validity

3. Concurrent Validity

– Infers that the test produces similar results to a previously

validated test When the criterion is something that is

happening or being

– assessed at the same time as the construct of interest, it is

called concurrent validity.

that did not appear to them to be measuring their intelligence

4. Criterion Validity

– It is the extent to which people’s scores are correlated with

other variables or criteria that reflect the same construct.

• Example: An IQ test should correlate positively with school

performance.

Validity

5.Predictive Validity

• Infers that the test provides a valid reflection of future

performance using a similar test

e.g. A new measure of self-esteem should correlate positively with

an old established measure.

6. Discriminant Validity

• It is the extent to which people’s scores are not correlated with

other variables that reflect distinct constructs.

Example: Imagine, that a researcher with a new measure of self-

esteem claims that self-esteem is independent of mood; a person

with high self-esteem can be in either a good mood or a bad

mood (and a person with low self-esteem can too).

Then this researcher should be able to show that his self-esteem

measure is not correlated (or only weakly correlated) with a valid

measure of mood.

Research Methods for Degree Study

