Anda di halaman 1dari 73

Research Methodology

Data Collection
Getting Data Ready
Data Analysis

Week 10
Learning Outcomes
• Data Collection methods
• Data Analysis – Feel of Data
• Data Analysis – Goodness of Data
• Data Analysis – Hypothesis testing

Module Code and Module Title Title of Slides Dr Jugindar Singh


What type of Data? – Primary and Secondary
PRIMARY DATA:
• Data originated by a researcher
for the specific purpose of
addressing the research
problem.
• Included among primary
sources are survey, interviews
TYPES

SECONDARY DATA:
• Data collected from a source PRIMARY SECONDARY
DATA DATA
that has already been published
in any form
• Included among secondary data
are reports available in the
libraries/ internet, magazines,
newspapers etc.
Module Code and Module Title Title of Slides Dr Jugindar Singh
Primary Research Methods &
Techniques
Primary
Research

Quantitative Data Qualitative Data

Surveys Experiments Focus groups


Mail
 In-house, self-
administered Interviews
Observation
 Telephone,
fax, e-mail,
 Web, Google Observation
Form Simulation
Case studies
Module Code and Module Title Title of Slides Dr Jugindar Singh
Collection of Secondary Data
Refers to information that has been already gathered
by someone (individual or agencies) and readily
available to the researcher.

Secondary data should possess the following


characteristics:
• Reliability of data
• Suitability of data
• Adequacy of data

Module Code and Module Title Title of Slides


Secondary data
Data someone else has collected
Sources.
1. Internal company Records
2. Company reports The research method
consists of how the
3. Internal computer databases
researcher:
4. Reports and publications of • collects,
government agencies • analyzes,
5. Other publications. • interprets the data
(Creswell, 2009).
6. Computerized databases.
EBSCO, PROQUEST, Emerald

Secondary data can be both quantitative and qualitative in form.


Module Code and Module Title Title of Slides Dr Jugindar Singh
A Classification of Published Secondary
Sources
Published Secondary
Data

General Business Government


Sources Sources

Guides Directories Indexes Statistical Census Other


Data Data Government
Publications

Module Code and Module Title Title of Slides Dr Jugindar Singh


Using secondary data for research (2)
Types of secondary data

Source: Saunders et al. (2006)


Figure 8.1 Types of secondary data
Module Code and Module Title Title of Slides Dr Jugindar Singh
Sources in Malaysia
https://www.statistics.gov.my/
Share prices
Department of Statistics

Module Code and Module Title Title of Slides Dr Jugindar Singh


Primary Data
Collection

Module Code and Module Title Title of Slides Dr Jugindar Singh


Sources and Methods of Data
collection
1. Describe how the questionnaire was administered and discuss
problems encountered
• Internet- or intranet-mediated,
• postal,
• delivery and collection,
• telephone
• interview

2. State the steps in data collection ie:


• What method was used
• How many questionnaires/respondents
• How was the response rate and follow up

If pilot testing was done, state the reliability and validity testing
results and what amendments done to questionnaire
Module Code and Module Title Title of Slides Dr Jugindar Singh
Data Collection: Qualitative vs Quantitative
Approaches
Qualitative
• Focus Group
• Interview
• Case Study
• Participant observation
• Secondary data analysis
Quantitative
• Surveys
• Experiments
• Structured observation
• Secondary data analysis
Module Code and Module Title Title of Slides Dr Jugindar Singh
Observation
• Intensive, usually long term, examination of a social
group, an organization, etc.

• Researcher becomes a participant in the lives of


group members
– Observes their behavior and learns meaning systems
(which are tied to language)

• Most closely associated with Ethnography, as


developed in Classical Anthropology

See what is happening


Module Code and Module Title Title of Slides Dr Jugindar Singh
Interviews
• Unstructured – e.g. ethnographic interviewing – researcher
allows interview to proceed at respondent’s pace and subjects
to vary by interviewee (to an extent)

• Semi-structured – Asks same general set of questions


(predominantly open-ended). Further probing

• Structured – Precisely worded questions.


Everyone asked exactly the same questions in exactly the
same way, given exactly the same choices
Type of interview to use
• Individual
• Focus group
• Telephone
• e-mail
Module Code and Module Title Title of Slides Dr Jugindar Singh
Focus group
 A small group of people are • A type of group
brought together to discuss interview.
• Focuses on group
specific topics of interest to the interaction on a
researcher. topic selected by
the researcher.
 The group process tends to elicit • Ideally 4- 12
participants.
more information than • The interaction is
directed by a
individual interviews because moderator who asks
there is cross-conversation and questions and
keeps the
discussion. discussion on the
 Different views can be explored. topic.

15
Module Code and Module Title Title of Slides Dr Jugindar Singh
Surveys
• A survey involves data
collection from a large number
of respondents using a
predesigned questionnaire.
• Four basic survey methods:
– Telephone survey
– Self Administered – mail/e-mail
– Person-administered surveys
(distribute and collect)

– Website eg: Google Form


16
Module Code and Module Title Title of Slides Dr Jugindar Singh
In data analysis we have three objectives:
1. getting a feel for the data,
2. testing the goodness of data, and
3. testing the hypotheses developed for the research

Module Code and Module Title Title of Slides Dr Jugindar Singh


Getting Data Ready
Preparing the Data for
Analysis
Data Processing
coding/editing/transformation

Module Code and Module Title Title of Slides Dr Jugindar Singh


Stages of Data Analysis

• Getting data
ready for
analysis(SPSS)
• Editing
• Handling Blank
Reponses
• Coding
• Categorizing
• Creating data file
• Programming

Module Code and Module Title Title of Slides Dr Jugindar Singh


Preparing the Data for Analysis
Data Editing
• Editing consists of scrutinizing the filled-up survey
instrument to identify and minimize errors,
incompleteness, misclassification and gaps in the
information obtained from the target respondents.
Omissions: Respondents often fail to answer a single question
or a section of the questionnaire, either deliberately or
inadvertently.
Ambiguity: A response might not be legible or it might be unclear
which of the two boxes is checked in a multiple response system.
Inconsistencies: Sometimes two responses can be logically
inconsistent.
Ineligible respondent: An inappropriate respondent may be
included in the sample.
Module Code and Module Title Title of Slides
Coding
• All the information collected from the respondent must be
converted into numerical values before we enter the data
in the spreadsheet.
– Coding for closed-ended questions
– Coding for open-ended questions

Module Code and Module Title Title of Slides


Data Entry
Excel
SPSS
• To manage the quantitative data, it has to be entered into
a statistical software. Most of the statistical packages
such as SPSS, SAS and STATISTICA are able to handle
large dataset.

Data entry in SPSS


There are two ways to open the data spreadsheet in SPSS:
Open SPSS, click on TYPE IN DATA and then click OK.
Open an existing SPSS data file under FILE click on NEW.

Module Code and Module Title Title of Slides


Replacing the Missing Value
The four options available are:
• Option 1: Replace values with numbers that are known
from prior knowledge or from an educated guess. Easily
done but can lead to researcher bias if you are not
careful.
• Option 2: Replace missing values with variable mean.
The simplest option but it does lower variability and in turn
can bias the result.
• Option 3: Replace missing values with a group mean (i.e.
the mean for prejudice grouped by ethnicity). The missing
value is replaced with the mean of the group that the
subject belongs to. A little more complicated, but there is
not as much of a reduction in the variability.
• Option 4: Using regression to predict the missing values.

Module Code and Module Title Title of Slides


Data Transformation
Summated and Average Summated Score
• A process of changing using SPSS
the original data to a 1.Select Transform Compute Variable
new format. 2.Type the variable name in the box labelled
Target Variable
• Summated scores 3.Click on parentheses to place in Numeric
Expression box
• Measurement scales 4. Highlight fi rst variable of interest and click
that consist of both on arrow box
5.Click on “” sign
negatively and positively 6.Repeat (4) and (5) unless all variables of
worded statements interest are placed in Numeric
7.Expression box
(reverse code the 8.Click OK to get the summated score (skip
questions). step 7 if interested in average summated score)
9.Click on the cursor to place it after
• Birth date needs to be parentheses and then click slash “/” sign
transformed into real 10.Click on required number (the required
number is total number of variables in Numeric
age of the respondents. Expression box)
11.Click OK to get the average summated
Module Code and Module Title score
Title of Slides
Descriptive & Inferential Statistics

Descriptive Statistics Inferential Statistics


• Organize
• Summarize • Generalize from
• Simplify samples to population
• Presentation of data • Hypothesis testing
• Relationships among
variables

Describing data Make predictions


Using data gathered on a group to Using sample data to reach
describe or reach conclusions conclusions about the population
about that same group only from which the sample was taken
Module Code and Module Title Title of Slides Dr Jugindar Singh
SPSS: Data View and Variable view window

This window shows the actual data Contains information about the data set
values and the name of the variables. that is stored with the database
Dr Jugindar Singh 26
Data Analysis – Uma Sekaran

Module Code and Module Title Title of Slides Dr Jugindar Singh


Data Analysis
In data analysis we have three objectives:
1. getting a feel for the data,
2. testing the goodness of data, and
3. testing the hypotheses developed for the
research.
We can classify analysis into three types:
1. Univariate analysis, involving a single variable at one time
2. Bivariate analysis, involving two variables at a time
3. Multivariate analysis, involving three or more variables
simultaneously

Module Code and Module Title Title of Slides Dr Jugindar Singh


1. Feel of the data
We can acquire a feel for the data by
checking the central tendency and the
Frequency
dispersion.
Distributions
Examination of the measure of central tendency, and
Mean, SD
how clustered or dispersed the variables are, gives a
Range
good idea of how well the questions were framed for
tapping the concept

The statistics give feel for the data:


• Frequency distributions – For categorical
variables
• The mean, standard deviation, range, and
variance on the other dependent and
independent variables
The mean, the range, the standard deviation, and the
variance in the data will give the researcher a good
idea of how the respondents have reacted to the items
in the questionnaire and how good the items and
measures are. Source: Sekaran: 2003
Module Code and Module Title Title of Slides Dr Jugindar Singh
2. Testing Goodness of data
Validity & Reliability
.

Reliability is:
the consistency of your measurement
instrument

Validity asks
if an instrument measures what it is supposed
to measure

Module Code and Module Title Title of Slides Dr Jugindar Singh


Testing Goodness of data
Validity & Reliability
Reliability is: Reliability
the consistency of “…the degree to which a
your measurement test or measure produces the
same scores when applied in
instrument the same circumstances…”
(Nelson 1997)
Validity asks
if an instrument Validity
measures what it is “Degree to which a test or
supposed to instrument measures what it
measure purports to measure”
(Thomas & Nelson 1996)
Module Code and Module Title Title of Slides Dr Jugindar Singh
Testing Goodness of data -
Reliability
• Reliability

“…the degree to which a test or measure


produces the same scores when applied in the
same circumstances…”

(Nelson 1997)
Cronbach‟s alpha is a reliability coefficient that indicates how well the
items in a set are positively correlated to one another.

Module Code and Module Title Title of Slides Dr Jugindar Singh


Goodness of data - Validity
“Degree to which a test or instrument
measures what it purports to measure”
(Thomas & Nelson 1996)

“The soundness or appropriateness of a test or


instrument in measuring what it is designed to
measure”
(Vincent 1999)

Module Code and Module Title Title of Slides Dr Jugindar Singh


VALIDITY IN QUANTITATIVE
RESEARCH
• Concurrent
• Construct
• Content
• Criterion-related
• Convergent & discriminant
• Face
• Predictive
• External
Module Code and Module Title Title of Slides Dr Jugindar Singh
Normality
• Many statistical methods
require that the numeric
variables we are working
with have an approximate
normal distribution.
• For example, t-tests, F-tests,
and regression analyses all
require in some sense that the
Standardized
numeric variables are normal distribution
approximately normally with empirical rule
distributed. percentages.
In a symmetrical distribution, median, mode and mean all fall at the same point

Module Code and Module Title Title of Slides Dr Jugindar Singh


3. Hypothesis Testing

What is Hypothesis
A Hypothesis is the statement or an assumption about
relationships between variables

Once the data are ready for analysis, (i.e., out-of-


range/missing responses, etc., are cleaned up, and the
goodness of the measures is established), the
researcher is ready to test the hypotheses already
developed for the study.

Module Code and Module Title Title of Slides Dr Jugindar Singh


Statistics: What’s What?

• Descriptive • Comparative
objectives/ research objectives/
questions: hypotheses

– Descriptive statistics – Inferential Statistics

Module Code and Module Title Title of Slides Dr Jugindar Singh


Descriptive
Statistics
3 Types

1. Organize data 3. Summary Stats


Frequency Distributions
# of items that fall Describe data in just one
in a particular category number

2. Graphical Representations

Graphs & Tables

Module Code and Module Title Title of Slides Dr Jugindar Singh


Descriptive Statistics
• Number • Variability
• Frequency Count • Variance and
• Percentage standard deviation
• Measures of Central • Graphs
Tendency • Normal Curve
– Mean
– Median
– Mode
Mode – the most frequently occurring observation
Median – the middle value in the data (50 50 )
Mean – arithmetic average
Describe what's going on in our data.
Module Code and Module Title Title of Slides Dr Jugindar Singh
Basic Statistics
Measures of Central Tendency
• Measures of central tendency are measures of the
location of the middle or the centre of a distribution.
• Mean
– The arithmetic mean (AM), commonly known as mean or
average, is the most often used measure of central
tendency. The mean is computed by adding all the data
values together and dividing by n, where n represents the
total number of data values. Symbolically,

Module Code and Module Title Title of Slides


Median
Example
Ordered Array includes:
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
• There are 17 terms in the ordered array.
• Position of median = (n+1)/2 = (17+1)/2 = 9
• The median is the 9th term, 15.
• The median of a set of scores represents the middle value (50th
percentile) when the scores are arranged as an array in
ascending or descending order.
• To calculate the median, first rank the scores in ascending or
descending order and follow these two guidelines:
– For an odd number of scores, the median is the middle
score.
– For an even number of scores, the median is the mean
(arithmetic average) of the two middle scores.
Module Code and Module Title Title of Slides
Mode
• The mode represents the most frequently occurring
score. When two scores occur with the same greatest
frequency, each one equals the mode and the data set
is considered bimodal. When more than two scores
occur with the greatest frequency, the data set is said
to be multimodal. 35 41 44 45

37 41 44 46
• The mode is 44.
37 43 44 46
• There are more 44s
39 43 44 46
than any other value.
40 43 44 46

40 43 45 48
Module Code and Module Title Title of Slides
Measures of Dispersion
• Measures of dispersion or variability give us information
about the spread of the scores in our distribution.

Standard Deviation

• The standard deviation (SD) plays a


dominating role for the study of variation in
the data and it is the most widely used
measure of dispersion.

Module Code and Module Title Title of Slides


Variance

• Variance is another absolute measure of dispersion.


It is defined as the average of the squared
difference between each of the observations in a set
of data and
the mean. The variance is denoted by and is
calculated as:

Module Code and Module Title Title of Slides


Univariate data analysis: descriptive statistics
Standard
N Minimum Maximum Mean
Deviation

Difficulty 30 1.75 5.88 3.9208 1.21834

Trust 30 1.00 6.00 4.3778 1.21527

Convenie
nce
30 3.67 7.00 5.0111 .93663

Security 30 1.60 7.00 4.7733 1.21000

Influence 30 1.00 6.33 3.7667 1.35937

Module Code and Module Title Title of Slides


Measures of Shape
• One is typically interested to know how well the distribution
can be approximated by the normal distribution.

Skewness
• When the distribution of item in a series is perfectly
symmetrical, a curve is technically described as normal
curve (Figure 11.1) and the relating distribution as a
normal distribution.

Module Code and Module Title Title of Slides


Normal Curve Shape

For symmetric distributions, the mean is approximately equal


to the median.
The tails of the distribution are the parts to the left and to the
right, away from the mean
Module Code and Module Title Title of Slides
Positively skewed distribution OR
Right skewed

Positively skewed distribution: If the frequency curve has


longer tail to the right of the distribution, it is known as
positively skewed distribution. In this case,
mean > median > mode.

For a right skewed distribution, the mean is typically greater than


the median. Also notice that the tail of the distribution on the right
hand (positive) side is longer than on the left hand side.
Module Code and Module Title Title of Slides
Negatively skewed distribution OR Left Skewed
Negatively skewed distribution: If the frequency curve has
longer tail to the left of the distribution, it is known as
negatively skewed distribution . In this case, mean < median
< mode.

Module Code and Module Title Title of Slides


Kurtosis

• A curve having relatively higher peak than the


normal curve is known as leptokurtic. On the
other hand, if the curve is more flat-topped than
the normal curve, it is called platykurtic. A
normal curve itself is called mesokurtic, which is
neither too peaked nor too flat-topped

Module Code and Module Title Title of Slides


General forms of Kurtosis

Module Code and Module Title Title of Slides


Graphical Method of Presenting Data

Module Code and Module Title Title of Slides Dr Jugindar Singh


Descriptive Statistics
Types of descriptive statistics:
• Organize Data
– Tables
– Graphs

• Summarize Data
– Central Tendency
– Variation

Module Code and Module Title Title of Slides Dr Jugindar Singh


Descriptive Statistics
Types of descriptive statistics:
• Organize Data
– Tables
• Frequency Distributions
• Relative Frequency Distributions
– Graphs
• Bar Chart
• Histogram
• Stem and Leaf Plot

Module Code and Module Title Title of Slides Dr Jugindar Singh


Graphical Method of Presenting Data—histogram

Module Code and Module Title Title of Slides


To show the distribution of values
Histogram

Distribution
may be
leptokurtic
(peaked)

Positively
skewed?
Saunders et al. (2009)

Module Code and Module Title Title of Slides Dr Jugindar Singh


Graphical Method of Presenting Data—bar
chart

Module Code and Module Title Title of Slides


Graphical Method of Presenting Data—
multiple bar chart

Module Code and Module Title Title of Slides


11.9.4 Graphical Method of
Presenting Data—pie chart

Module Code and Module Title Title of Slides


Description: Research Response Rate

1. How many questionnaires sent


2. What was the response rate
3. Any follow up done
4. Finally how many received
5. After editing, how many good questionnaires
used for analysis

Module Code and Module Title Title of Slides Dr Jugindar Singh


Demographics
Present Demographics Describe the
sample:
1. The age, gender, or relevant related
information on the population
2. Summarize the demographics of the
sample, and present in a table format
after the narration (as per sample)
(Simon, 2006).
3. Otherwise, the table is included as an
Appendix and referred to in the
narrative of chapter four

Module Code and Module Title Title of Slides Dr Jugindar Singh


To show highest and lowest
values

Module Code and Module Title Title of Slides Dr Jugindar Singh


Example: Frequency
Distribution

Table 2. Frequency distribution bar chart.

Module Code and Module Title Title of Slides Dr Jugindar Singh


Inferential
statistics
We use inferential statistics to make inferences
from our data to more general conditions;
Eg: t-test, Analysis of Variance (ANOVA)

We use descriptive statistics simply to describe


what's going on in our data.
Eg: Mean, Median, Normality
Module Code and Module Title Title of Slides Dr Jugindar Singh
Research Methods for Degree Study
Inferential Statistics
Sample
Sample

Sample
Population

Types of inferential statistics


Sample Parametric
T-tests
ANOVA
 Inferential statistics are used Correlation
to draw conclusions about a Multiple regression
population by examining the ANCOVA
sample Non-parametric
Module Code and Module Title
Research Methods for Degree Study
Title of Slides Chi-Square Dr Jugindar Singh
Descriptive Statistics vs. Inferential Statistics

Allows us to draw
Allow us to say whether
conclusions
difference is significant
Through use of graphs

This difference
Is significant

Module Code and Module Title Title of Slides Dr Jugindar Singh


Research Methods for Degree Study
Levels of significance and Confidence
• The level of significance is the predetermined
level at which a null hypothesis is not
supported. The most common level is p < .05
– P =probability
– < = less than (> = more than)
Significance level of 0.05 indicates a 5% risk of
concluding that a difference exists when there is
no actual difference.

The null hypothesis is rejected if the p-value is less than a


predetermined level, α. α is called the significance level, and
is the probability of rejecting the null hypothesis given that it
is true (a type I error). It is usually set at or below 5%.
Module Code and Module Title Title of Slides Dr Jugindar Singh
Research Methods for Degree Study
P-levels/Significance Levels
Inferential tests use probability to ascertain the likelihood that a pattern
of results could have arisen by chance.

If the probability of the results occurring by chance is below a certain


level we assume these results to be significant

C
P ≤0.10
H
A
P ≤0.05
N
C
P ≤0.01
E P ≤0.001

We can also write these as 10%, 5%, 1%, 0.1%


Module Code and Module Title Title of Slides Dr Jugindar Singh
Research Methods for Degree Study
Tutorial
1. Working in groups, read the attached
sample of research (or any other past
research) and explain the
methodology used.

• Procedures/Data Collection
• Data Analysis

Module Code and Module Title Title of Slides Dr Jugindar Singh


Research Methods for Degree Study
Module Code and Module Title Title of Slides Dr Jugindar Singh
Research Methods for Degree Study
Validity
1. Face Validity
– extent to which the measurement method appears
“on its face” to measure the construct of interest
– Infers that a test is valid by definition
e.g. People might have negative reactions to an intelligence test that
did not appear to them to be measuring their intelligence
2. Content Validity
– Infers that the test measures all aspects contributing to the
variable of interest
– Extent to which the measurement method covers the entire
range of relevant behaviors, thoughts, and feelings that
define the construct being measured.
e.g. a test to assess one’s attitude toward taxes should include
items about thoughts, feelings, and behaviors.
Module Code and Module Title Title of Slides Dr Jugindar Singh
Research Methods for Degree Study
Validity
3. Concurrent Validity
– Infers that the test produces similar results to a previously
validated test When the criterion is something that is
happening or being
– assessed at the same time as the construct of interest, it is
called concurrent validity.

e.g. People might have negative reactions to an intelligence test


that did not appear to them to be measuring their intelligence
4. Criterion Validity
– It is the extent to which people’s scores are correlated with
other variables or criteria that reflect the same construct.
• Example: An IQ test should correlate positively with school

performance.

Module Code and Module Title Title of Slides Dr Jugindar Singh


Research Methods for Degree Study
Validity
5.Predictive Validity
• Infers that the test provides a valid reflection of future
performance using a similar test
e.g. A new measure of self-esteem should correlate positively with
an old established measure.

6. Discriminant Validity
• It is the extent to which people’s scores are not correlated with
other variables that reflect distinct constructs.
Example: Imagine, that a researcher with a new measure of self-
esteem claims that self-esteem is independent of mood; a person
with high self-esteem can be in either a good mood or a bad
mood (and a person with low self-esteem can too).
Then this researcher should be able to show that his self-esteem
measure is not correlated (or only weakly correlated) with a valid
measure of mood.
Module Code and Module Title Title of Slides Dr Jugindar Singh
Research Methods for Degree Study