Anda di halaman 1dari 221

COMM-316 RESEARCH METHODS

Department of Communication
University of Louisville
Fall Semester 2009, Section 75
Professor: Greg Leichty
Email: greg.leichty@louisville.edu
Office Phone-852-8175

Copyright 2009
Table of Contents

COMM-316 RESEARCH METHODS

Department of Communication
University of Louisville
Fall Semester 2090, Section 75
Professor: Greg Leichty
Email: greg.leichty@louisville.edu ............................................................................................................1
Office Phone-852-8175

..........................................................................................................................................................................1
Copyright 2009.................................................................................................................................................1
Course Introduction..........................................................................................................................................6

..........................................................................................................................................................................6
Unit 1: Introduction to Communication Research
..........................................................................................................................................................................7
Ways of Knowing or Arguing.....................................................................................................................8
Types of Research.....................................................................................................................................10
What is Distinctive about Communication Research?..............................................................................11
Evaluating the Quality of Research Resources.........................................................................................12
Reading Communication Research Reports
...................................................................................................................................................................14
Publishing Communication Research.......................................................................................................16
Ethics and Communication Research.......................................................................................................17
Communication Research Paradigms.......................................................................................................18
Exercise: Which Research Paradigm?............................................................................................................21
Unit 2: The Interpretive Paradigm-Participant Observation...........................................................................22
Participant Observation.............................................................................................................................24
Recording Field Notes
...................................................................................................................................................................26
Unit 3: Interpretive Paradigm-The Depth Interview
........................................................................................................................................................................27
Planning for the Depth Interview
...................................................................................................................................................................30
Ethical Issues in Depth Interviewing........................................................................................................32
Focus Group Interviews
...................................................................................................................................................................33
Exercise: Self-Analysis-Generating Interview Topics
...................................................................................................................................................................35
Problematic Questions and Question Patterns in the Depth Interview.....................................................37
Developing Depth Interview Answers......................................................................................................39
Exercise: Question Development..............................................................................................................40
Exercise: Evaluating a Depth Interview Protocol
...................................................................................................................................................................41
How Many Interviews Are Needed?
...................................................................................................................................................................42
Unit 4-Evaluating Communication: Rhetorical Criticism and Critical Research...........................................42
Evaluating Communication.......................................................................................................................44
Rhetorical Criticism..................................................................................................................................45
Critical Theory..........................................................................................................................................47
Unit 5-Interpretive Paradigm: Analyzing the Depth Interview .....................................................................48
Analyzing the Depth Interview
...................................................................................................................................................................50
Unit 6: Objectivist Paradigm-Research Questions and Hypotheses...............................................................53
Research Questions and Hypotheses.........................................................................................................55
Exercise: Identifying Variables in Research Statements..........................................................................59
Types of Variables in Research Design....................................................................................................60
Exercise: Research Questions and Null Hypotheses.................................................................................62
Exercise: Identifying Variable Types.......................................................................................................63

........................................................................................................................................................................64
The Spiral of a Research Study
...................................................................................................................................................................67
Research across Time: Stages and Methods.............................................................................................69
Exercise: Testing a Hypothesis.................................................................................................................71
Unit 7-Objectivist Paradigm-Basics of Measurement..............................................................................72
Developing Constructs
...................................................................................................................................................................73
Exercise: Construct Dimensions
...................................................................................................................................................................74
Levels of Measurement.............................................................................................................................75
Selecting a Measurement Level for an Operational Definition................................................................77
Exercise: Levels of Measurement.............................................................................................................78
Exercise: Levels of Measurement #2

One can measure the same construct in different ways. See several examples below. Identify the level
of measurement for each question. Then select the operational definition which one you think that best
measures each variable. Explain your reasoning.....................................................................................79
Desirable Measurement Attributes............................................................................................................80
Assessing Measurement Reliability..........................................................................................................81
Exercise: State or Trait?
...................................................................................................................................................................82
Exercise: Measurement Reliability
...................................................................................................................................................................83
Exercise: Content Analysis
...................................................................................................................................................................84
Assessing Measurement Validity
...................................................................................................................................................................88
Exercise: Measurement Validity
...................................................................................................................................................................90
Relating Measurement Reliability & Measurement Validity...................................................................91
Improving Measurement Reliability and Validity
...................................................................................................................................................................92
Unit 8: Descriptive Statistics..........................................................................................................................94
Describing a Distribution..........................................................................................................................96
Exercise: Constructing a Frequency Distribution...................................................................................100
Exercise: Constructing a Frequency Distribution Example 2.................................................................101
Standardized Scores................................................................................................................................102
Comparing Distributions Using Standardized Scores.............................................................................103
The Normal Curve..................................................................................................................................104
Is Grading on the Curve Really a Good Idea?........................................................................................107

3
Exercise: Determining Proportions on the Normal Curve......................................................................108
Exercise: How Exceptional are Janelle's Performances?........................................................................109
Describing Variable Relationships..........................................................................................................110
Integrating Correlation and Regression..................................................................................................112
Multivariate Tools for Describing Variable Relationships.....................................................................114
Exercise: Choosing Measures of Association.........................................................................................117
Exercise: Correlation and Regression Problems
.................................................................................................................................................................118
Exercise: Additional Correlation and Regression Problems
.................................................................................................................................................................120
Perils in Describing Variable Relationships...........................................................................................121
Exercise: Charlie's Career Decision........................................................................................................122
Unit 9: Objectivist Paradigm-Inferential Statistics.......................................................................................122
Statistical Procedures for Testing Hypotheses........................................................................................124
Exercise: Hypothesis Testing-Linear Correlation...................................................................................127
Reference Sheet: Some Common Difference Statistics
.................................................................................................................................................................132
Difference Testing Examples..................................................................................................................134
Multiple Regression: A Case Study........................................................................................................137
Exercise: Weighing Type I and Type II Error
.................................................................................................................................................................142
Stories and Statistics
.................................................................................................................................................................143
Statistical Malpractice: A Top 10 List
.................................................................................................................................................................145
Unit 10: Objectivist Paradigm-Internal Validity and External Validity
......................................................................................................................................................................150
Correlation and Causation.......................................................................................................................152
Exercise: Identifying Causal Models
.................................................................................................................................................................155
Identifying Artifacts that Compromise Internal Validity
.................................................................................................................................................................156
Assessing External Validity....................................................................................................................163
External Validity Case Study..................................................................................................................167
Relating Internal Validity and External Validity
.................................................................................................................................................................168
Unit 11: Objectivist Paradigm-Experiments
......................................................................................................................................................................169
Research Control in Experimental Design
.................................................................................................................................................................170
Exercise: Identifying Design Structure
.................................................................................................................................................................175
Estimating Population Parameters: Sources of Error..............................................................................178
Exercise: Drawing Confidence Intervals
.................................................................................................................................................................180
Issues in Sampling
.................................................................................................................................................................181
Question Formats
.................................................................................................................................................................184
Survey Design Issues
.................................................................................................................................................................187
Examples of Problematic Questions.......................................................................................................189
Exercise: Critiquing a Telephone Survey
.................................................................................................................................................................190
Exercise: Dealing With Social Desirability Bias....................................................................................192
Survey Example: Parent Communication Survey
.................................................................................................................................................................193
Concept Glossary
......................................................................................................................................................................199
Subject Index................................................................................................................................................217

5
Course Introduction
Welcome to Research Methods. This text is organized into units. Each unit contains learning objectives,
readings, and class exercises. The first page of each unit lays out the learning objectives for the unit.
Course examinations are based on these objectives. These pages are your study guides for examinations.
A glossary and a concept index appear at the end of the text. You should check the glossary for definitions
of important course concepts. Toward that end, bold italic concepts also appear in the glossary. An
excerpt that appears in bold type indicates emphasis or and important principle. Words that appear in
italics indicate secondary concepts-concepts that while important do not appear in the glossary.

The text is intended to be a learning tool--please underline and write in the margins. You can't pass it on,
because the content changes some each semester. However, this allows you to take notes in the book and
really use it as a study aid. Best wishes for positive learning experience this summer session
Unit 1: Introduction to Communication Research
This unit introduces you to the conventions and standards of communication research. It compares
empirical research with other ways of answering questions that we have about the world (i.e., ways of
knowing or arguing). This unit also presents a brief overview of the history of communication research and
describes the basic types of communication research practiced today.

Unit Objectives

Upon completing this unit, you should be prepared:

1-1: To compare and contrast empirical research with other ways of knowing.
1-2: To describe concrete examples of the following types of research: academic research, proprietary
research, basic research and applied research.
1-3: To explain the commonalities between communication research and other scholarly disciplines.
1-4: To explain the differences between communication research and other scholarly disciplines.
1-5: To discuss the ethical implications of communication competence for communication professionals.
1-6: To specify what kind of information can be found in each section of a research report.
1-7: To explain the role of peer review in research publication.
1-8: To apply relevant standards to judge the quality of research sources, including Internet sources.
1-9: To compare and contrast the three communication research traditions in terms of their goals and
methods.
1-10: To explain the different role that theory plays in each research tradition.
1-11: To identify which research tradition a research study represents after reading a research abstract.
1-12: To describe the communication related research databases that are available in the University of
Louisville library.
1-13: To locate specific information from relevant communication research databases, including the Social
Sciences Citation Index.

7
Ways of Knowing or Arguing
In everyday life, we try to persuade people and change their minds by making arguments. People
sometimes ask, “How do you know that?” The question of how we know what we know is an area of
philosophical study called epistemology. Epistemology is a deep subject, so here I am writing only about
some common methods that we use to "certify" how we know something to be true. These methods are
referred to as ways of knowing. Empirical research is one way of knowing.

Research can be defined as a systematic inquiry as to the facts or truth about a matter. Something is
empirical when it is based on observation. Empirical research then is an inquiry about some aspect of the
natural world by means of our senses. Empirical research is limited to matters that we can observe or
detect. Empirical research does not address questions of meaning, philosophical speculation or religious
faith. Speculation about the "real" nature of things behind or beyond appearances is called metaphysics.

People often appeal to tradition to settle an argument. An appeal to tradition says, “We have always done
it this way and it has worked well.” The saying “If it isn’t broke, don’t fix it,” exemplifies this theme. In
many cases argument by tradition is quite rational. A group's culture represents the hard learned lessons of
trial and error experience. That which worked was retained; that which didn't work was discarded. Most
new ideas fail when they are actually tested. If we turn our back on tradition and always embrace the new
and novel, we will make many costly mistakes. Tradition can serve as a useful brake on unsound
experimentation. However, too much deference to tradition stifles innovation and creativity. If we take
tradition too seriously, cultural stagnation is likely to follow.

Anthropologist Jared Diamond gives a good example of a setting in which an inherent conservatism and
skepticism about new ideas is tied to a harsh and sensitive physical environment.1 The country of Iceland is
sometimes known for its resistance to trying new technologies. Diamond writes that this conservative
outlook is understandable because Iceland's physical environment is very fragile. "Icelanders have become
conditioned by their long history of experience to conclude, that whatever change they tried to make, it was
more likely to make things worse rather than better." (p. 202). When there is a very small margin for error,
relying on conventional methods often makes a lot of sense.

People sometimes appeal to intuition to certify an argument. Argument by intuition refers to an axiom that
the communicator believes her listeners accept as a self-evident truth. When someone says, "Everyone
knows X", the person is appealing to intuition. Mathematicians and philosophers perfected the art of the
syllogism. They take a premise and then use deductive logic to develop the implications of the beginning a
premise or axiom (e.g., geometry class). In the United States Declaration of Independence Thomas
Jefferson appealed to intuition when he wrote, “We hold these truths to be self-evident that all men are
created equal.” If the listener shares this belief with you, you may well persuade her to your point of view.
Appeals to intuition are powerful persuasive tools when the audience members share the communicator’s
intuitions. However, there are relatively few universally shared self-evident beliefs. Perhaps someone has
accused you of not listening when you simply disagreed with the person. The other person believes that if
you had listened, you would surely have noticed the self-evident truth and its “logical implications.” If you
find yourself in this situation, it is best to disengage as soon as possible because if you don't acknowledge
the "truth", your will be treated as an "irrational" person.

A third way to settle an argument is to appeal to an authority (i.e., a respected person or text). An appeal
to authority asks the listener to believe something because a credible authority recommends it. In some
cases, the cited authority may be a text such as the Talmud, the Bible, or the Koran. Complete religious
systems and theologies are built upon arguments by authority. By appealing to authority, a communicator
relies upon the competence, and character of the source. Appeals to authority are very common and
necessary in a complex society. A person can’t know everything, so we learn to trust people and complex
technical systems. I know relatively little about investing my money for retirement, so I rely on mutual
fund managers to manage my investments. I rely on my physician to diagnose my illnesses. In fact, we
can’t sort through many issues without an appeal to authority.

Appeals to authority have several useful features. First, they reduce the amount of complex information
search and processing that one must do. Second, in many areas of life, we lack the knowledge to make
informed judgments without the assistance of authorities (e.g., I rely on my brother who is a computer
engineer when I have questions about computer hardware). Difficulties arise, however, if the authority one
invokes is not a genuine authority (i.e. He doesn't know what he is doing or he has bad intentions).
Appeals to authority are also ineffective when the listener does not hold the authority in high esteem. If
someone does not believe that the Bible is sacred, quoting from the Bible will have little persuasive impact.
In some cases, different authorities contradict each other. When authorities disagree, (e.g. Is global
warming a reality or a fiction?), we face a quandary about how to resolve an issue.

An appeal to personal experience or private knowledge is a very popular form of argument. If a person
has had a unique experience, you have probably heard: “If you knew what I knew, you would _______”.
This is a special case of an appeal to authority: an appeal to personal authority based upon one's
experience(s). Since you have not had the same experience, “you should defer to a person like me who
really knows, because I have been there.” Whether an appeal to personal experience or private
knowledge is convincing, depends on the how competent and trustworthy the listener judges communicator
to be. As with appeals to authority, people routinely disagree with each other about the adequacy of private
knowledge. When personal experiences conflict, there are no effective mechanisms to resolve the impasse.

The primary weakness of these ways of knowing is that they have no clear mechanism for testing opposing
claims. In contrast an appeal to empirical inquiry” provides a mechanism for handling disputes about
questions of fact. Research investigates questions where competing ideas exist. Empirical research goes
beyond existing premises, to test new ideas and principles. Systematic empirical research is an open-ended
or inquiry. If two conflicting hypotheses or ideas exist, we can subject them both to empirical tests to see
which framework works best. We observe and test under carefully controlled conditions to falsify our ideas
about reality and thereby create new knowledge.

Empirical research is so much an accepted part of our way of thinking that we are puzzled that people of
previous generations were confused by things we consider to be self-evident truths. Before Galileo, people
thought that heavier objects fell to earth faster than light objects. Oddly, no one had tested this belief
experimentally. Galileo supposedly disproved this with an experiment at the Leaning Tower of Pisa in
1590. He dropped two cannon balls, one large and one small, and measured the time of their descent.
When he found that they reached the ground at the same point, he postulated the Law of Falling Bodies.
This law states that bodies regardless of their mass accelerate at the same rate when other things like air
resistance are held constant. Today we wonder why no one ever actually tested the mistaken hypothesis in
the 2000 years between Aristotle and Galileo. However, experimental comparison is recent cultural
invention. What is “common sense” today is very different from what was “common sense” only a few
generations ago. Common sense changes and evolves as cultural knowledge changes.

Empirical inquiry is a valuable tool for answering questions of fact. Empirical research makes the
process of human discovery conscious and intentional rather than accidental learning by trial and
error. It speeds up the process of inquiry and makes it is less hazardous than trial and error learning. The
secret of life is to learn from observation and experimentation. The alternative is to only learn from your
own bruises.

1) Jared Diamond (2005). Collapse: How societies choose to fail or succeed. New York: Viking Press.

9
Types of Research
Research falls into several different categories. One common distinction is between basic research and
applied research. Basic research builds theory about a set of objects or phenomena. It is usually set up to
discover relationships, test hypotheses, and to apply existing theories to new phenomena. Basic research
extends what we know about empirical phenomena. The researcher may not have a practical end or
application in mind for doing the research. Researchers know practical applications often develop out basic
research, but these applications are not the reason for doing the research. Curiosity drives the researcher’s
immediate interest rather than anticipated commercial applications.

Applied research usually focuses on a particular question in a local context. The main goal of applied
research is to answer a practical question such as “Which method of instruction works best with this
particular audience?” The primary objective is not to develop theory. If theoretical development occurs, it
is a secondary outcome of the research process. Evaluation research is one important type of applied
research. Evaluation research investigates how well a particular program or intervention meets the goals
that it was designed to meet. For three years I served as a program evaluator for a federally funded
program that was designed to provide job training and counseling for people with disabilities. The goal of
the program was to help trainees obtain full-time jobs with benefits. My job was to collect data and
evaluate how well the program met the goals that were laid out in the original grant proposal.

In practice, the distinction between basic and applied research is fuzzy. People who promote the value of
basic research sometimes say, “There is nothing as practical as a good theory.” People with a more
applied bent sometimes reply: “There is nothing as theoretical as a good application.” Applied
researchers argue that if a theory isn’t useful in application, it isn’t a good theory. Both statements have
merit. Applied research can advance our knowledge of general theories, by raising new research problems
or by testing the generalizeability of theory (i.e., how widely a theory applies). In many cases, a study
contributes both to theory development and to practical problem solving.

Another second distinction is between academic research and proprietary research. Researchers at
academic institutions conduct research as a part of their official duties. They share research results with
the community of scholars studying a question. If research findings remain private, they do not contribute
to the knowledge of the field. A scholar's contribution to her field can be assessed according to how often
her published works are cited by others. Contributing to one’s field of study is an important part of a
professor’s job, hence the term, publish or perish. Proprietary research on the other hand, is the property of
the person or institution sponsoring the research. The decision to disclose or not disclose the research
findings is the prerogative of the research sponsor. Marketing research is a good example of proprietary
research. No firm wants to share its research findings with its competitors.

In practice, much proprietary research is applied research. However, some applied research is not
proprietary. In most cases, proprietary research is not basic research (See recent Human Genome research
as an exception to this statement). Proprietary research is usually conducted to solve a problem or to make
a profit. Basic research on the other hand, has a longer time span or outlook. Basic research is sometimes
characterized as the search for knowledge for its own sake. It satisfies human curiosity first without
immediate concern for potential applications, especially commercial applications. One never knows ahead
of time what the applications of basic research will be; sometimes there are none. In contrast, proprietary
research usually focuses on moving to application within a short period of time.
What is Distinctive about Communication Research?
Many different academic disciplines study human behavior and human culture. Communication research is
complex because it analyzes behavior at multiple levels of analysis and utilizes multiple research
traditions. Communication research has many similarities to political science research. Social science
disciplines like psychology focus on one level of analysis. Psychology takes a micro-focus by looking at the
individual, whereas sociology primarily takes a macro-focus and examines social institutions. These
disciplines are sometimes called horizontal disciplines. In contrast, disciplines like communication and
political science examine questions that cross-levels of analysis. Communication scholars look at
communication at the individual, dyadic, small group, organizational, institutional, and societal levels.
Communication and political science are sometimes called vertical disciplines

Compared to other academic disciplines, communication researchers use more research methods.
Psychology uses the experiment as its primary method. A research methods course in psychology usually
focuses on experimental design. A research methods course in sociology usually focuses on how to design
surveys. Communication research uses a wider variety of methods, with no single method gaining
predominance. Communication researchers use both experiments and surveys. Communication researchers
also do content analyses and various types of qualitative research.

Communication research is sometimes classified as a social science, sometimes as a humanities, and


sometimes as a profession. The first study of communication really began in Greece with the study of
rhetoric. Rhetoric is a humanities-based discipline that examines public discourse in its historical context.
Rhetoricians do research that is similar to what English professors do when they analyze a text (e.g.,
analyze the Chaucer's Canterbury Tales). Humanities based disciplines put great emphasis on the individual
actor, history and context of action.

The study of communication evolved toward a social science orientation in the 20th century. The social
science tradition developed in speech communication beginning with World War I. An understanding of
communication as a social science also developed out of the study of mass communication. Before World
War II, scholars from many disciplines studied mass communication phenomena. With the arrival of World
War II and the explosion of mass communications thereafter (e.g., television), academic departments
specifically devoted to the study of mass communication emerged in American universities. A social
science tradition also developed in journalism departments, which had a predominantly craft orientation
(i.e., taught the conventions of professional journalists) prior to World War II.

Today, communication research includes both humanities oriented scholarship and social science oriented
scholarship. This class primarily focuses on the social science orientation to communication scholarship
and the three main research traditions within communication social science research. However, the
differences between the “scientific approach” and the "humanities approach" are less sharp than they once
were. In particular, rhetorical scholarship has contributed to the emergence of interpretive social science
and critical social science scholarship. In this course, we will examine research studies from the following
three social science traditions, 1) objectivist oriented/social science research, 2) interpretive social science
research, and 3) critical studies.

Communication is more eclectic in its theories and methods than many other academic disciplines. On the
negative side, the field sometimes seems to be very chaotic (i.e., very hard to describe).On the positive side,
the communication research community has an openness new theories and methods. In fact, one frequent
complaint about communication research it borrows all of its theories and methods from other fields. On
the other hand, no matter what your philosophical and methodological preferences, you can easily find a
home in communication research. It may not be simple or orderly, but it is very inclusive.

11
Evaluating the Quality of Research Resources
The Internet is a great place to find information, but for purposes of well grounded research, it supplements
the library; the Internet does not replace the library. We must sometimes dig much deeper than a Google
search. Research libraries are the essential tool for doing systematic research. Much primary research
material is still not accessible on the Internet. If you only use Internet sources, your research will be quick,
easy and second rate. If you are doing a good literature review, there is no substitute for quality time spent
with library resources.

Here are five evaluation criteria that are traditionally used to judge the quality of resources. They can be
used to judge the quality of both traditional library sources and web sources.1

-Authority: the extent to which the material is created by a person or organization that is recognized as
having authoritative knowledge in a subject area.
-Accuracy: the extent to which the information is reliable and free from errors.
-Objectivity: The extent to which the material gives information and evaluations that is not distorted by
personal feelings or other biases.
-Currency: the extent to which the material can be identified as up to date.
-Coverage: the breadth of the topic(s) covered in a source and the depth with which the topic(s) are
covered.

It is particularly important to scrutinize the authority, accuracy and objectivity of many Internet sources.
The gold standard for empirical research is something called the "peer review process". Research that is
"peer reviewed" is carefully examined by other researchers or experts in the particular field before it is
published. If the research is judged to fall short of the quality standards in a particular area, it will not be
published. Most research in research libraries is published in peer-reviewed journals. You can find peer
reviewed journals in the Ekstrom library in the stacks on the third and fourth floors. There you can find
thousands of volumes of peer reviewed journals, some going back for more than fifty years.

When a research article appears in a peer-reviewed journal, the reader knows that it has been through a
review process. The writer has sent the material to an editor of a research journal and the editor usually gets
three or more experts in the topic area to review the manuscript for accuracy, objectivity, currency and
coverage. In essence, the authority of the source is established the rigor of its peer review processes. In
many cases, the reviewers will recommend that the study not be published because of methodological or
theoretical problems. In other cases, they may recommend that the writer improve the article and resubmit
it for further consideration (i.e., revise and resubmit). Peer review of research results is an important,
though not infallible check on the quality of research.

One potential advantage of Internet published research occurs under the standard of currency. It is possible
to publish research in a much more timely way than it is possible under the rather laborious peer review
process. The peer review process can take up to a year and a half if several revisions of an article are
considered. However, currency is a high price to pay if it occurs at the expense of the four other standards
of information quality,

One might if it is really necessary to put research manuscripts through a quality review process? It would
be possible to archive the materials of a discipline on an Internet site and let members of the discipline read
the materials and evaluate it for themselves. In an ideal world, where we readers had lots of time, this
might work, but it would require that each researcher shuffle through an extraordinary amount of material
to find the pieces that are optimal quality. Even this overlooks the fact that the peer review process
typically significantly improves the quality of the research that is finally published.

In the end, the key question on information quality is not where you find the information, but the what
review process the published research information has been through. Increasingly there are editions of
electronic research journals that go through a peer review process of blind review. In addition, many high
quality print journals also now provide their journal contents through research databases such as Ebscohost
that are available through research university libraries. At this time, the highest quality communication
research publications tend to be available via online databases. However, one must be typically be
associated with an academic or research institution to access materials in these databases.

In 2006, the library is still the heart of the University as it always has been. A research library collects and
catalogs the accumulated knowledge of the disciplines that make up the University. Textbooks and other
secondary materials derive from primary research. If you want to go back to roots of what is known about
something, you end up at the library-whether it involves shelves of books and periodicals or online
databases. For better or worse, most primary or original research information is still available only in
research libraries or electronic databases. This may not be an eternal truth, but it accurately describes how
the world works in 2006.

1. Janet Tate & Marsha Ann Tate (1999). Web Wisdom: How to evaluate and create quality on the Web.
.Mahwah, NJ: Lawrence Erlbaum.

13
Reading Communication Research Reports
When you read a communication research article, you may be a bit overwhelmed. Here are some tips to
make your reading of a communication research article more efficient. Most communication research
articles have six parts: research abstract, literature review, methods, results, discussion, and references. This
structure closely follows the structure that you will find in Masters Theses and Doctoral dissertations. In
some cases, the headings of the sections may differ, but most research reports follow this structure.

We must first acknowledge that not all articles in research journals report the results of original research.
Some articles are literature reviews. They summarize what is known about a particular topic at a given
point in time. Other articles may address philosophical issues in communication research. Some articles
may also critique research that has been done on a research topic. In addition, many communication
journals also include book reviews. However, the majority of articles in communication journals involve
reporting the results of original research. The structure of these articles tends to follow the following
format fairly closely.

The research abstract summarizes the article in 100 to 200 words. You should read the abstract to
determine if the article is relevant to your subject. The research abstract is entered into on-line databases
such as Ebscohost. The research abstract enables you to understand the focus of a study in a glance. You
can usually determine if the article is relevant to your topic by reading the abstract.

The literature review immediately follows the abstract. In this section, the researcher sets up his or her
research problem. She describes what has been discovered about the topic in previous research. No matter
the discipline, a researcher tries to show her study is on the cutting edge (i.e., it addresses an important
research question or research hypothesis). This means that the question or hypothesis that has not been
researched before, or that the study replicates and extends previous studies in an important way. The
literature review shows how the researcher's study builds on previous research. There is no reason to
investigate already answered questions. A good research article makes a new contribution to what we
know. The researcher should write the literature review before she conducts the study. The research
questions and hypotheses of the study often appear toward the end of this section.

The methods section describes the procedures and methods that the researcher followed to investigate the
research question. This section describes the sample employed in the study. It should also carefully
describe 1) how the information was collected and how study variables were measured, 2) how
measurement reliability and validity were assessed, and 3) how the data were analyzed. As you read a
methods section, you may be surprised by how precisely some minute details of these procedures are
described. These small details are provided so that the reader can assess the quality of the study. An astute
reader can identify potential problems or limitations of the study by closely examining the methods section.
For instance, if a researcher fails to include information about measurement reliability, it casts doubts on
the measurement of one of more variables. The methods section also provides details that other researchers
will need if they want to replicate the study. Without an exact description of the research sample, methods
and procedures, researchers would have a great deal of difficulty in extending their knowledge of a subject
or in comparing their results.

The fourth section of a research report is the results section. This section reports what the researcher found
concerning the research questions and research hypotheses. This section usually reports the results of the
study in a straightforward way. It frequently includes tables and charts that visually summarize important
study results. The discussion section usually follows the results. The discussion summarizes and interprets
study results. It tells how the results fit with, extend, or change what we knew about a topic. It also
discusses limitations of the study design. Toward the end of this section, the researcher often identifies
new research questions/hypotheses arising from the study. A good research study often raises more
questions than it answers, because new questions and hypotheses and emerge whenever we discover
something new.

The final section of the article is usually the bibliography. The bibliography, also sometimes called the
reference list, contains the works the author has cited in his or her research report. The reference list can be
valuable to the reader because it identifies other works that the reader can read to get a full understanding
of the research topic. Journals differ in how the citations are structured. Many journals list references
alphabetically. If footnotes are used, the references are given in the order that they are used in the article.
In some cases, the footnotes appear at the bottom of page, in other cases they appear in a separate section at
the end of the research report.

So how can you skim an article quickly and efficiently? With a little experience, you can get the gist of a
research article in five minutes. You begin by reading the research abstract. In many cases, you can decide
that they study is not directly relevant to your interests. If the abstract interests you in the study, then you
should quickly read the actual research questions or hypotheses investigated in the study. Then you can
usually look in the Tables or charts that summarize the study results to get a good idea about the findings of
the study. If you want to get a judge the quality of the study, you should then check the methods section for
a description of the sample, data collection and data analysis. If it appears that the article deserves further
consideration, then you should read it in detail.

15
Publishing Communication Research
Where does one find communication research? In the past, the answer to this question was "in peer
reviewed journals in research libraries". A peer reviewed journal is a scholarly publication that is dedicated
to publishing theory and research articles in a given field. A journal has an editor who receives
manuscripts from scholars in the relevant field of study. The editor is responsible for choosing the material
a journal publishes. Since the editor is seldom an expert in all facets of the field, an editorial board usually
assists her. An editorial board is a group of experts in various parts of the field covered by the journal. The
editor might review a submitted article to determine if it is relevant to the focus of the journal. If she
decides it is, she ordinarily sends it to recognized experts in the field to get their recommendations. The
reviewers judge merit of the submitted work on criteria such as its methodological rigor, its timeliness, and
its overall contribution to the literature.

Peer review means that other scholars and experts in the field carefully review manuscripts. The peer
review process is an important quality control mechanism in the research community. Editors are dedicated
to publishing the best research. The quality of a journal is reflected in the reputation of the editorial board
that reviews manuscripts for the editor. The editorial board is usually printed somewhere in the journal.
High quality journals publish only a small percentage of the submitted manuscripts. In the communication
field, high quality journals ordinarily publish less than a quarter of the submitted manuscripts.

When the editor receives the recommendations of the reviewers she makes a decision on what to do with
the manuscript. In many cases, the editor will decide to reject the manuscript. In other cases, the editor may
decide that while the study does not have enough merit to be published as it is, it has enough potential to
invite the author to revise and resubmit. The author(s) are advised on changes that they need to make before
the manuscript is resubmitted. This may entail collecting some new data, reanalyzing the existing data in a
new way, and changing the written report. It is up to the author to decide whether she wants to make the
required changes and resubmit the article for another round of reviews by the editorial board. In a few
cases, the editor will accept the manuscript and publish it with only minor modifications. The process of
editorial review and manuscript revision can take some time to complete. It may run up to 2 years between
when a research study is submitted and when it appears in a research journal. The process of peer review
can be tedious, but it maintains and enhances the quality of scholarship in a field of study.

Many scholarly organizations publish research journals. In the field of communication, several scholarly
research organizations publish peer reviewed research journals. The National Communication Association
(NCA) is an organization of scholars in the fields of rhetoric and speech communication. The Association
of Educators in Journalism and Mass Communication (AEJMC) has journals that publish in the areas of
journalism and mass communication. The International Communication Association (ICA) is an
international association that publishes journals includes both speech and mass communication scholars.
Many smaller organizations, often regional or specialized topic areas, also publish research journals. You
can find a list of the most important peer reviewed journals in communication in the Communication
Abstracts database that indexes more than 42 journals from many fields in communication. Many of these
journals are now available in an online format as well as in the traditional print format.
Ethics and Communication Research
If you plan to become a professional communicator, you have a special obligation to communicate and
interpret information ethically and intelligently. In a democratic society, the quality of public discourse
depends upon the truthfulness and dependability of public communications. If you lack credibility as a
professional communicator, you have nothing. If you degrade or demean the integrity of societal
communications, you undermine the very basis of your profession. There will always be people who lie,
misinform and mislead. However, after you take this course, it will be harder for you to mislead others
unintentionally. If you decide to misinform or mislead, I prefer that you to know what you are doing and to
feel the weight on your conscience for doing so.

Many, perhaps most, of the glaring examples of ethical malpractice among communication professionals
(journalists, public relations practitioners, advertisers etc.), arise from professional ignorance, not ethical
malfeasance. If you mislead or misinform members of the public due to carelessness or incompetence, the
effect is same as if you had done so intentionally. If you communicate incompetently, you help build
cynicism and mistrust among the public toward professional communicators and the institutions that you
represent. These outcomes accumulate independent of your intentions. If, I must choose between an
unethical communicator and an incompetent one, I will choose the unethical one. The unethical
communicator knows what he is doing, and may be bothered enough by his conscience to do the right
thing, or to repent. The incompetent communicator is unaware of what he is doing, and is therefore unable
to change or make amends.

If you claim to be a professional communicator, but do not follow the best ethical practices of your
profession, then your practice of communication is unethical. If you practice ignorantly, you are an
unwitting charlatan: you claim to be someone that you are not. If you claim to be a professional, you have
an ethical obligation to practice as knowledgeably and ethically as you can. If you settle for less, you
practice committed ignorance (I know I don’t follow best practices, but I don’t care). However, ignorance
does not release you from your ethical obligations. We all make mistakes, but this does not excuse us from
doing the best that we can with the available knowledge and resources.

An ethical communicator is also a knowledgeable communicator. As professional communicators, you


should be aware of some of the common logical fallacies that plague public argument in our society. You
should be familiar with the mistakes communicators often make when they inform and persuade. You need
to know how communicators sometimes use factually true statements to misinform and mislead. You also
need to know what information you need to provide so that people consuming your messages can make
informed and responsible choices. You also need to understand the limitations of our knowledge (statistics,
projections etc.). Most of all, you need to know how to employ this knowledge as you create and evaluate
messages. You will find it morally difficult to mislead others, and it will be very difficult for others to
mislead you. Several courses in the communication curriculum such as argument in everyday life and
research methods help you develop the critical faculties that are needed for competent and ethical
communication.

17
Communication Research Paradigms
This course introduces three different approaches to communication research. Each research tradition has
its goals, assumptions about the nature of the world (i.e., ontology), knowledge (i.e., epistemology) and
human nature (i.e., philosophical anthropology), and typical research methods. Each research tradition is
briefly summarized and described below.

The Objectivist Paradigm

The objectivist paradigm, sometimes referred to as the scientific approach to communication, was the
dominant research paradigm in communication research in the 20th century. This paradigm also sometimes
goes by the names of empiricism and/or logical positivism. The objectivist tradition assumes that the goal
of research is to develop theories aimed at prediction and control. Objectivist researchers are interested in
finding definitive answers to questions of cause and effect. Objectivist researchers investigate the
commonalities that large numbers of people or events share. Objectivist research focuses on the general as
opposed to the specific. From this perspective, the study of persuasion should lead to the discovery of the
laws of persuasion: timeless truths about persuasion that apply across cultures and history. The objectivist
paradigm assumes that the world has a relatively simple structure: a few underlying variables account for
the seeming complexity of the world. All other things being equal, the best explanation of an event will be
the simplest or most parsimonious. The objectivist paradigm assumes that fixed internal and external
forces determine human behavior. The human organism responds to internal and external stimuli. Human
beings only think that they have free will.

The objectivist paradigm further assumes that observers can discover the underlying structure of reality by
careful observation and systematic testing. Objectivist researchers give a great deal of attention to devising
measuring technologies that attain objective measurement. Objective measures filter out the inaccuracies
of human subjectivity. Objective measures enable different observers to get comparable results.
Objectivists assume that disagreement between observers is measurement error. Objectivist researchers
believe proper attention to method will lead to the truth about the underlying nature of reality. Experiments,
surveys and content analyses are representative research methods of the objectivist paradigm. These are
often described as quantitative research methods.

The Interpretive Paradigm


The interpretive paradigm usually defines itself in opposition to the objectivist paradigm. Rhetorical
studies and other forms of humanities based research fall underneath this general heading. The goal of
interpretive research is to develop the observer's understanding of the intentions and subjective viewpoints
of communicators. In contrast to objectivist researchers, interpretive researchers give high priority to
context. Interpretive researchers tend to focus on the unique features of people and context. The
interpretive researcher looks for contextual truths rather than universal truths. The interpretive researcher
assumes that the universe is relatively complex. Interpretive researchers do not deny the existence of cause
and effect; they simply believe that many cause-effect relations are unique and nonrepeatable. As a result,
interpretive researchers develop explanations that describe the subtleties and dynamics of context. For the
interpretive researcher, the ideal research report is one that is rich in detail and captures the subtle nuances
of the social and historical context.

The interpretive paradigm also contrasts with the objectivist paradigm in how it regards human beings.
Interpretive researchers assume that human behavior springs from creative interpretation of situational
events and stimuli. Interpretive researchers give a great deal of attention to meaning. Human beings do not
merely respond to events, they also use language and other symbolic resources to make sense of and give
meaning to events in their environment. Interpretive researchers believe that humans have freewill, at least
within the possibilities that they are able to conceive of. Interpretive researchers analyze symbol systems
that people use to make sense of life. These symbol systems help to constitute the culture of a group of
people. Interpretive researchers study how language, art, rituals, myths and stories serve to create meaning.
In some cases, the internal relations of symbol systems such as language are analyzed.

Interpretive researchers begin with the proposition that it is understandable that two people will interpret
the same situation differently. Where objectivist researchers regard disagreement among observers to be
measurement error, interpretive researchers consider such differences to be worthy of investigation.
Interpretive researchers embrace human subjectivity and attempt to understand it. Depth-interviews,
participant -observation and textual analysis are common research methods employed by interpretive
researchers. These research methods are often referred to as qualitative research methods.

The Critical Paradigm


The critical paradigm is the third general communication research paradigm. It is the most recent research
paradigm to develop, and though it has gained considerable strength, it is not as entrenched the other two
paradigms. Critical researchers overtly disclose their political goals and aspirations. Whereas the other two
paradigms seek to understand, describe or predict events, critical research aims to restructure human
consciousness and human society in the direction of providing human liberation. Critical researchers study
the power relations that exist in a society. They trace the reciprocal relations between cultural systems and
the power relations inherent in social, political and economic institutions.

Critical researchers pay particular attention to the cultural devices that reproduce social institutions and the
power relationships embedded in them. They study the topics of ideology and false consciousness.
Ideology represents the ideas that powerful groups develop about the nature of reality and morality.
Critical theorists believe that ideology legitimates the positions and interests of the power elite in society.
False consciousness is the mirror image of ideology. It represents the distorted pictures that the non-elite
groups in society come to accept about the nature of the world, social reality. The critical theorist analyzes
the prevailing forms of ideology and false consciousness and shows how these false or distorted visions of
reality serve to keep people pacify people. When Karl Marx said that religion was the opiate of the people,
he meant that Christian religion diverted people's attention from the suffering of their daily lives.
According to Marx, this made it less likely that they would act to change their lives in the present.

Critical researchers believe a critique should be useful. It should ultimately promote a better awareness of
the situation. Through critical reflection, people will discard the forms of false consciousness. Critical
reflection will help people sort out those aspects of social reality that are fixed by natural law from those
that are fixed by social ideology. By recognizing the possibilities of human action, human beings can
pursue collective changes that are in the best interest of their social class or group. Moreover, they tend to
believe that the resulting "enlightened consciousness" will lead people to embrace a program of radical
social change that reduces social inequalities and equalizes social power. Given its overt political
orientation, critical theory is more controversial than the other research paradigms.

The critical paradigm blends the concerns of the other two paradigms in unique ways. Critical researchers
share with interpretive researchers the recognition that several defensible interpretations of social reality
can exist. However, critical researchers also believe that some interpretations are much better than others
are. Specifically, the enlightened consciousness promoted by social critique is superior to the distortions
and partial truths of the dominant societal ideologies. In terms of method, critical researchers use both
qualitative and quantitative research methods. Compared to the other paradigms, the critical paradigm gives
a higher priority to historical research.

The Role of Theory in Each Research Paradigm


Each paradigm has a qualitatively different idea about what communication theory should be. In the
objectivist paradigm, theory serves the interests of prediction and control. A theory spells out how

19
variables in the real world relate to each other. A “good theory” predicts variable relationships better than
any other theory. A predictive theory enables strategic interventions to control the course of events. For
instance, a predictive theory of persuasion should enable message designers to design messages that are
effective in informing or persuading target audiences. We know that messages that discuss the pros and
cons of a proposal are more persuasive for audiences that hold different opinions than the communicator.
On the other hand, messages that only present the pros for a proposal are more effective when audience
members have not already formed a strong opinion on a proposition. Knowledge about audience attitudes
on a topic enables one to choose the message form will have the greatest persuasive impact. The objectivist
version of theory is the most common one in communication theory.

Because interpretive researchers believe that context is very important, they believe that predictive theories
are often of limited value. For instance, historians have traditionally held that the proper role of historical
research is to describe and interpret what happened, rather than to predict where history is going next.
Cause and effect operate, but they must take into account the processes of human interpretation. More
importantly, many cause-effect sequences are one-time events, because of the complexities of situations
and human interpretation. For interpretive theorists, a “good theory” provides analytical categories that
can be broadly applied to describe and analyze communication situations and interpretations.

Rhetorical theorists use Burke’s dramatistic pentad to describe and analyze persuasive events. The pentad
consists of five components that rhetorical theorists employ to describe and interpret what took place (i.e.,
act, scene, agent, agency, and purpose). Likewise, Penelope Brown and Steven Levinson, two
communication ethnographers, developed an interpretive theory of politeness. The theory was derived
from an analysis of politeness rituals from three different cultures. Although each culture has different
politeness rituals, Brown and Levinson argued that politeness theory has both descriptive and analytical
power: its concepts can accurately describe any politeness ritual in any society. For an interpretive
researcher, a “good theory” is one that enables the researcher to describe and analyze the particulars of
communication in many different communication situations.

Critical researchers believe that a “good theory” enables activists to analyze social
and economic structures that perpetuate social inequality. A good theory raises the
consciousness of those who are oppressed and enables them to see their situation
“more objectively”. This is done by showing how communication structures and
rituals create interpretive frameworks that keep people from accurately
understanding their situation (i.e., create forms of false consciousness). A critical
theorist often does a dialectical analysis to expose the contradictions within the
existing social-economic and cultural ideologies. Critical researchers fault objectivist
theory for containing many elements of false consciousness. According to this
critique, objectivist theories often fail to differentiate between the way things are,
and the way things could be. In this respect, the critical paradigm shares concerns
and methods with interpretive paradigm that also stresses the importance of human
interpretation and choice. The very strong deterministic world view of objectivist
researchers obscures the fact that the structures of society can indeed be changed to
be more just and equitable. Critical researchers often do detailed historical analyses
to show how dominant groups create ideologies to make current social arrangements
appear to be “natural” and “inevitable”.

Recently, some communication researchers have criticized research on male and


female differences in communication. They maintain that communication
researchers have focused too much attention on differences between the sexes and
have given too little attention to commonalities in communication preferences
among men and women1. When examined from another angle, the differences
between men and women in their communication preferences and behaviors are
rather small. Much past research has functioned to create the illusion of very large
discrepancies between the communication styles of men and women. This
reinforces sex-role stereotypes and thereby supports social and economic
inequalities between men and women in our society. Men and women are definitely
not from different planets, popular press books to the contrary.2 Critical theorists do
not deny that some sex differences in communication exist, but they insist that the
differences are much less important than the commonalities.

Critical theorists also fault with the theoretical stance of interpretive researchers.
Karl Marx complained that historical research was of little use because it was only
backward looking. From the perspective of Karl Marx and other critical theorists, a
theory is “good” to the extent that it “works” or helps change the world in the
direction of greater justice and social equality. In this respect, critical research
shares the forward-looking intent of the objectivist paradigm. Good theory enables
one to design effective interventions that actually change the world. As reiterated
earlier, the distinguishing feature of critical theory is that it is overtly political in its
orientation: it seeks to change the world in a particular direction.

Conclusion
This brief description of research paradigms in communication glosses over the considerable complexity
and diversity of research approaches within each research tradition3. You will see some of this diversity in
the research articles that we read and analyze this semester.

Footnotes
. See Daniel Canary & Tara Emmers-Sommer (1997). Sex and Gender Differences in Personal
Relationships. New York: Guilford Press.
2. See the following for a popular press best seller that magnifies sex differences in the way described
here. John Gray (1992). Men are from Mars and Women are from Venus. New York: Harper Collins.
3. See the following text to find out how much many types of theory exist within each research paradigm.
Steven Littlejohn (2005) Theories of Human Communication (9th Edition). Belmont, CA: Wadsworth.
Exercise: Which Research Paradigm?
Which research tradition do you think each of the following research abstracts represents? Each was
abstract was taken from a recently published Communication publication. Explain your answer below each
abstract.

1) This ethnographic inquiry reports on an African American Pentecostal Church in the southern U.S. This
site is important to communication scholars because the church is an essential institution in Black
communities and is involved in the communication of important values, behaviors and attitudes. This study
finds that this church plays a vital role in the lives of its members and acts as an important site of
community involvement, education and spiritual worship. Furthermore, the membership culture provides a
powerful corrective to the racist communication of the dominant white society by providing a spiritual
community that stands in opposition to racism.

2) Recent articles on the quality of health information on the Internet reveal 2 critical criteria: completeness
and credibility. This article investigates the effect of Web use motivation on the relationship between

21
completeness and consumer perceptions of credibility. Based on a 2 x 3 experiment conducted with 246
respondents, the article demonstrates that the extent of completeness of health information on the Internet
impacts consumer assessment of source and website credibility. In contrast to extant research on the
orthogonality of content and source characteristics, this research demonstrates their interaction.

3) This article examines the role of whiteness as a structuring absence to ethnographic audience research.
After ignoring whiteness altogether, media ethnographers have tended to essentialize whiteness within
narratives of structural dominance or individual vulnerability. Using poststructuralist theories of language,
whiteness, and hegemony, the author argues that these narratives for whiteness can be traced to experiences
in the field that are shaped by historical and institutional forces outside the field. Researchers both
perform whiteness in the field, by claiming its privilege and hiding its visibility, and codify whiteness for
others to identify outside the field. To illustrate, the author examines "narcissistic whiteness" and
"defensive whiteness" as two articulations that are visible in her field notes, interpreted through unifying
narratives and rearticulated through an alternative reading of the notes.

Unit 2: The Interpretive Paradigm-Participant Observation

The interpretive paradigm primarily utilizes qualitative research methods that investigate culture and
meaning. Qualitative research methods in communication research share a concern for capturing the
complexities of communication in natural contexts. Some researchers prefer the term “naturalistic
research” to the term of “qualitative research.” Where objectivist research seeks the most parsimonious
explanations for empirical phenomena, the naturalistic researcher prefers to investigate the complexity of
naturally occurring events. The naturalistic researcher believes that "objective measurement" can lead to
serious distortions in our representation and understanding of communication. Naturalistic researchers
share the conviction that communication is highly contextualized (i.e., it depends upon the specific
situation). They maintain that something important is lost when researchers try to reduce everything to the
lowest common denominator of basic elements and relationships. Instead of investigating how people to
adapt to researcher-constructed tasks, naturalistic researchers observe communication in naturally occurring
situations to identity the communication regularities and cultural understandings that they demonstrate. In
participant observation the observer participates in naturally occurring interaction and records her
observations. An observer becomes a “competent” observer, when she can meaningfully participate in
communication in the scene of observation.

Unit Objectives
Upon completing this unit, you should be able:
2-1: To describe the kinds of research questions addressed in interpretive research studies.
2-2: To explain what participant observation is.
2-3: To discuss the advantages and disadvantages of participant observation as a research method.
2-4: To discuss the how the researcher’s role affects the research process.
2-5: To discuss how researcher characteristics (e.g., gender and age) affect what a
participant observer sees and hears.
2-6: To explain the relationship between participant observation and ethnography.
2-7: To match specific research problems with the most appropriate participant-observer role.
2-8: To explain the functions of the different types of field-note entries.
2-9: To explain how a participant observer should organize field notes to document the
researcher’s learning process.
2-10: To compare and contrast deductive and inductive methods of analyzing field notes.
2-11: To discuss the practical problems that participant observers typically face.
2-12: To discuss ethical issues that participant observer researchers typically face.

23
Participant Observation
Participant observation is a primary research tool of naturalistic researchers. Participant observation is a
modest idea. The researcher immerses himself in the event. The researcher participates, observes and
reflects on what he has seen, heard and experienced. Through this process, the researcher becomes an
intimately familiar with all of the subtleties and nuances of the communication situation. Participant
observation is the preferred mode for learning about and describing the cultural practices of a group. The
researcher is usually an outsider to the situation or group that he is investigating. As such, he encounters
much that is strange and experiences considerable uncertainty as he tries to understand what is happening.
As he goes from being a cultural outsider to a cultural insider, the researcher learns about the culture
through an ongoing process of trial and error. The pragmatic test of whether the researcher has been
successful in learning the culture is his ability to communicate appropriately and effectively in that
situation. The participant-observer then documents what he learns as the process of his acculturation
unfolds. In other words, the researcher is the research instrument in participant observation.

Each of you has been through the process of cultural accommodation many times in your life. In fact, each
new class that you take in your university career involves important cultural adaptations on your part. In
each class you learn a new vocabulary and learn how to use it appropriately. The difference between
everyday learning and the learning of s participant observer is that the participant observer observes and
records his findings in a much more systematic way than we typically do in everyday life. In summary, a
participant observer systematically observes and describes what he sees, hears and concludes as he
encounters a culture. Students sometimes confuse participant-observation with ethnography. Ethnography
involves intimate study and description of a culture or setting over a long period (typically a year or more).
Participant-observation is a primary research tool for ethnographers, but they also use other research tools
such as depth interviewing, textual analysis and surveys. Therefore, it is important to differentiate between
participant observation as a research method, and ethnography as a long-term project of cultural
description.

So far, I have implied that participant observation is a standardized research method. In fact, there is an
observer-----participant continuum reflecting various options in terms of how one can structure the
researcher’s role. The roles are ordered in terms of how much attention the researcher is able to devote to
observation and description in the research situation as portrayed in the figure below.

Complete Observer Participant Complete


Observer as as Participa
Participa Observer nt
nt
Observation dominates-------------------------------------------------
Participation Dominates
On observation end of the continuum is the research role of complete observer. In this role the researcher
observes other people in communication: but he has little or no interaction with people in the scene. His
behavior has little impact upon how the situation progresses. This role allows the researcher to focus all of
his energies on observation. This can be an advantage when one wants to develop a very precise description
of the situation and events. A disadvantage of this approach is that it leaves the researcher looking in from
the outside. He may interpret certain communications incorrectly or arrive at distorted interpretations of
what is happening. This is sometimes described as “experience distant description”. Other than observing
communication in a natural context, this approach is similar to objectivist research. It is not widely used
observer role in naturalistic studies.

A second observer role is the observer as participant. This role combines both participation and
observation and thus requires that a researcher develop a double consciousness. The researcher has a role
to play in the situation, but he must also keep his research role in mind as the situation unfolds. In this role,
observation is of primary importance, but the researcher must participate in the scene to some extent. The
researcher usually has a peripheral role in the situation. This approach offers the advantages of careful
description and self-learning gained from the researcher’s participation in the scene. In addition, the
researcher’s behavior does not substantially influence what happens in the situation. However, the degree
of participation may not provide the researcher with insight into things that are happening in backstage
areas that he does not have access to.

The third observer role is participant as observer. This role requires a lot of time and concentration so
observation is a subordinate part of the role. The researcher typically plays a central role in the
observational situation. This role opens up many opportunities for discovery and learning. The researcher
will be more likely gain access to backstage settings and view to a wide variety of communication events
and communication practices. However, this is also likely to prevent the researcher from developing very
careful descriptions of the scene. He has more things to write about, but less opportunity to do so. This
difficulty can be diminished if the researcher carefully records his field notes very soon (i.e., within 24
hours) after the observation. Another concern with this role is that the researcher may significantly affect
the communication that takes place in the setting. This role is also raises some touchy ethical issues, as the
researcher decides what to disclose and how to disclose it.

On the last slot on the participant side of the continuum, the researcher plays the role of complete
participant. Here the role is so consuming that the person is only able to reflect upon his experiences and
describe them after the fact (e.g., John Walker: My Life as a Taliban Warrior). This role has the serious
disadvantage that it relies upon the researcher’s memory for a description or account of the situation. As a
result, this kind of role is probably best reserved for describing experiences that are very memorable and
not subject to distortion over time (i.e., epiphanies). If the person’s experiences and memories can be
compared with other people who were present at the event, this role can be productive. The retrospective
role is also useful in situations when the significance of the event is not ascertained until after it takes place.

By now you are probably asking which of these observational roles is best. There is not one correct answer.
Which observational role is best depends on the goals of the researcher and the importance of the events.
The complete observer role is probably best when one wants a detailed description of a particular
communication episode. The two roles on the inside of the continuum are best adapted for studies in which
one wants to understand the intricacies of a culture. Finally, the complete observer role is the only way that
some events can be described and assessed. In the end, doing participant observation is a bit like writing a
novel: the study is only as good as the observation, interpretation and writing skills of the researcher(s) who
did the participant observation. Participant observation is a very “personal” method of research. It requires
that the researcher have a great deal of personal awareness and self-reflection. It is also a research
technique that some people have difficulty mastering. It is more difficult to teach people how to observe
and be good listeners than it is to teach people how to design an experiment. For this reason, some
observers say that participant observation research is as much an art as it is a science.

25
Recording Field Notes
Just as a journalist takes notes on what she sees or hears so that she can later write up the story, the
participant observer also records notes about what he sees, hears, and infers so that he can write up a
research report later. These observational notes are called “field notes” (i.e., notes collected while
observing “in the field”). Over the research project the researcher builds an archive of these field notes.
These field notes become the “data” that the researcher examines in the data analysis phase of the project.
This reading outlines some essentials for writing accurate and useful field notes.

Field notes are the record of the participant observer's experience in a naturalistic context. In her field
notes, the researcher records what she heard, descriptions of what she has seen or witnessed, and her
interpretations of what she has seen and heard. The recorded interpretations focus on the meanings that she
attributes to the actions at the time. The first truism of fieldwork is that it ordinarily involves immersion in
a culture. Good field notes distinguish the researcher's time in the field from any other participant in a
different place or culture. Naturalistic researchers ordinarily take many notes. A year in the field might
yield with more than 1500 -2000 pages of field notes. These notes are systematically analyzed. With such
a volume of data, it is important to collect and organize one’s field notes in an orderly way.

There are several simple but important steps to follow in writing field notes. First, the entries should be
legibly recorded, so they are intelligible to the researcher later on. It is also important to date the notes
because it preserves information about the observer’s learning process. This is important because many
important discoveries that the observer makes early in the process, become so obvious and automatic in the
observer's mind that she will have forgotten the importance that the early discoveries. It is also important
for the researcher to describe the scene and to note her role in the scene. Information about the nature of the
setting and one's physical position in the scene provides important information about how the researcher's
observation may have been influenced or constrained by these situational factors. Many fieldworkers draw
pictures or take photographs to help documents these factors.

It is important that the observer describe her observational role as it fits on the observation and participation
continuum. There are some trade offs between participation and observation. When the observer is deeply
involved in the action, she ordinarily has less time and attention to devote to observing and recording her
observations. The demands of participation may overwhelm her cognitive and emotional resources. On the
other hand, deep participation often enables the person to get "backstage" and gain access to ordinarily
inaccessible events. It is also possible for the researcher to vary her levels of participation and observation
at different points during the research process. However, it is very useful to have information on the depth
of the researcher's observation/participation at the time of the observation.

It is also important to differentiate the types of data in one’s field notes. The notes should distinguish
between things 1) things the observer directly saw or heard, 2) eye-witness descriptions of events, 3)
second hand news and 4) explanations or accounts that people give for why certain things happened. This
information helps the researcher judge the credibility of information. The observer should also record
details about who, what, where and when and how the event unfolded. The observer should make explicit
notes about research procedures, sources, and inferences that the observer made. Recording small details
documents the learning process and helps the researcher establish the credibility of her research findings.
Research writers often recommend that participant observer record his own thoughts and feelings in the
field notes. The observer should identify his expectations upon entering the scene as well as his thoughts,
feelings and emotions in the situation. The notes should also identify any surprises that she experienced, as
well as record her interpretation of what he thinks is going on. These recommendations require that entries
make a clear distinction between description, self-observation and interpretation. Following is a
recommended structure for field notes that accomplishes these objectives.

Structuring Field Notes


Good field notes allow you to reconstruct your learning curve. This requires a flexible but systematic
approach. One good system distinguishes between different types of account entries.1 According to this
scheme field notes should differentiate between 1) condensed account entries, 2) expanded account entries,
3) personal journal entries, and 4) provisional interpretation entries. The condensed account entry consists
of the shorthand that the observer makes in the situation or as soon as possible after leaving the situation.
The condensed account records the gist of important details and records what happened. It serves as the
basis for the expanded account entry. The expanded account elaborates on and fills in the gaps of the
condensed account. The condensed account focuses on description. It attempts to make the notes as explicit
as possible. The condensed account should be written within 24 hours of the observed events. The
personal journal entry gives an honest account of the observer's reactions and feelings concerning what
she sees and hears. These entries help document the learning process and help conceptualize the accounts
recorded in the field notes. The provisional interpretation entry gives the observer's interpretation of the
meaning of events at a given point in time and explains the basis of the interpretations. Later the observer
analyzes her field notes by carefully reading and coding them. However, the quality of the subsequent data
analysis and interpretation is no better than the quality of the field notes on which they are based. The
system provides “thick description” and enables the researcher to chart how his thought processes changed
over time.
Footnote:
1. Jerome Kirk & Marc L. Miller (1986). Reliability and Validity in Qualitative Research. Beverly Hills:
Sage Publications

Unit 3: Interpretive Paradigm-The Depth Interview


This unit introduces you to the depth interview, one of the most utilized methods of interpretive research.
The depth interview utilizes self-reports, just as surveys and structured interviews do. Both involve having
people answer questions. However, the depth interview serves different purposes than a survey.
Comparing and contrasting the survey with the depth interview will give you a clear understanding of how
the objectivist paradigms and interpretive paradigms differ.

Interpretive research focuses on the meaning of facts and experiences. Facts do not interpret themselves;
they acquire a complex web of subjective and intersubjective meanings (i.e., meanings we share with
others). In many cases, we consider facts to have obvious and unavoidable interpretations. However, this
has as much to do with our own interpretive and communication practices as it does with “objective”
reality. Even when people largely agree about the facts, they often have very different interpretations of
what the facts mean. The primary aim of interpretive research is to gain an understanding of the meaning
that things and events have for people. Interpretive research seeks to gain an empathic understanding with
another individual or group of people. Interpretive researchers think that differences in perspectives
between individuals and groups need to be understood. In contrast, objectivist researchers regard
differences between observers or groups as measurement error: something to be eliminated whenever
possible.

Unit Objectives:

Upon completing this unit, you should be able:

3-1: To compare and contrast depth-interviewing, and focus group interviewing.


3-2: To identify when focus-group interviews should be used instead of depth interviews.
3-3: To identify the advantages and disadvantages of focus group interviewing.
3-4: To explain how depth interviewing differs from survey research.
3-5: To explain when depth interviewing is an appropriate research strategy.
3-6: To explain when depth-interviewing is not an appropriate research strategy.
3-7: To explain the different functions of open-ended and closed-ended questions in the depth interview.
3-8: To explain the different functions of main questions, probe questions and follow-up questions in the

27
depth interview.
3-9: To identify frequently used types of probes and follow-up questions in depth interviews.
3-10: To devise main questions that elicit answers which have depth, detail, nuance and vividness.
3-11: To explain the ethical issues associated with depth interviewing.
3-12: To describe recommended stages that a depth interview should go through.
3-13: To evaluate the appropriateness of a depth interview protocol.
3-14: To devise an interview protocol appropriately orders main questions.
3-15: To conduct a depth interview and ask appropriate follow-up questions.
Depth Interviewing
Depth interviewing is designed to find out how others frame their experience. It enables you to gain
insight about the person’s perspective on experiences and events in which you did not participate. To do
this, you need to get the interviewee talking about his or her experiences. In particular, you give the
interviewee latitude to describe her experiences in her own words. Depth interviewing is designed to
develop thick description: description that is rooted in the interviewee’s language. Depth interviewing
involves asking open-ended questions, listening carefully, and adapting to the interviewee. There are
several specialized kinds of depth interviewing used in linguistics, anthropology and clinical psychology.
This segment describes a generic approach to interviewing.

Students sometimes equate the depth interview with other kinds of interviews such as journalist interviews
or talk-show interview (i.e., what Opra does). However the depth interview is different from these
types of interviews. The journalistic interview is usually oriented toward factual questions such as who,
what, where, and when. In contrast, depth interviews typically address questions such as how and why.
Journalistic interviews tend to use a high proportion of closed-ended questions. Moreover, the journalist
usually determines the direction and pace of the interview. In contrast, the depth interview typically has
more open-end questions and the direction and pace of the interview is mutually negotiated between the
interviewer and the interviewee.

The depth interview is also not an entertainment interview. The interviews that Barbara Walters and
Oprah Winfrey do are primarily for the entertainment of the TV viewing audience. The needs of television
broadcasting determine the structure and pace of such interviews. The purpose of such interviews is to
generate entertaining or provocative statements that will sustain the ratings of the show and thereby sell
advertising. In contrast, the depth interview involves a personal face-to-face encounter between the
interviewer and interviewee. A depth interview is for gaining an understanding of the interviewee's
perspective--there is no external audience to be concerned about.

So, what exactly is the depth interview? In reality, it is a guided conversation. It has some of the structure
of an interview, but it also has much of the spontaneity and moment-by-moment creativity of conversation.
The interviewer has a general destination in mind, but she is flexible about the means for reaching that end.
The interviewer has a general purpose and guides the conversation, but the interviewer and interviewee
negotiate topic and turn-taking on a moment-by-moment basis. There is no predefined script for the
interview. In the depth interview the interviewer assumes that the interviewee has a perspective or point of
view that is worth understanding. The interviewee must be given an opportunity to speak in her own
words. The interviewer must give the interviewee latitude to talk freely on the topics at hand. The feature
of mutual control most strongly differentiates the depth interview from journalistic and
entertainment interviews.

This is not to say that depth interviewing is synonymous with conversation. Conversations usually occur
between people who know each other fairly well, whereas many depth interviews are done with strangers
or acquaintances. Depth interviewing is more purposeful than ordinary conversation; the interviewer
attempts to get the interviewee to think and talk in more detail about their experiences than they ordinarily
do in casual conversation. On the listening side, the interviewer listens more carefully than do ordinary
conversationalists. The interviewer listens for nuances and connotative meanings that provide insight into
the interviewee’s personal construction of her environment. Depth interviewing can be characterized as
learning to hear the meaning of what is said. The disclosure in depth-interviews is also more one-sided than
ordinary conversations. In the depth interview, the interviewer asks most of the questions and does not self-
disclose a great deal.

Depth interviewing depends upon developing a personal connection between the interviewer and
interviewee; obtaining the trust of the interviewee is of utmost importance. While the “relationship” is often
confined to the interview situation, it is important to treat the interviewee as a true research partner. The
interviewee should be treated with the respect that is due a co-investigator. Either person can take the lead
in developing a new angle or direction in the interview. In contrast to objectivist research, in depth
interviewing, the interviewer does not try to remain emotionally uninvolved. In fact, the interviewer’s
expressions of understanding and empathy are valuable research tools. As a depth-interviewer, one must be
prepared to listen to and try to understand perspectives that she may not personally agree with. In fact, one
may need to be prepared to empathize with multiple and conflicting perspectives from the same person
during an interview. In the end, the interviewer seeks to get the person to reflectively talk about all of their
thoughts and feelings related to the target experiences. Being an attentive and empathetic listener is an
important part of being a good depth interviewer.

The techniques of depth interviewing are relatively transparent. The interviewer asks an open-ended
question to open up a topic area and get the person to talk. The interviewer uses more specific probes and
follow-up questions to get clarifications, examples, to get an elaboration on the context, and to get more
details and to get the interviewee to reflect more deeply than she ordinarily would in a conversation. In
practice, this is hard work. A good depth-interviewer learns when to probe, when to ask follow-up
questions and, perhaps most importantly, when to be quiet and encourage the interviewee to continue
speaking. A great deal of the skill of depth interviewing is adapting to the unexpected. In contrast to
interviewing in structured surveys, depth interviewing requires an individual to develop her own
interviewing style: one in which she can relate to the interviewee in a genuine and sincere way. Compared
to objectivist-survey interviewing, depth interviewing entails personal involvement and greater personal
risk.

29
Planning for the Depth Interview
Planning for the depth interview is different from kind of planning done for structured surveys and
interviews. By its nature, depth interviewing planning is ongoing and flexible. The depth interview is an
inductive and opportunistic research methodology. It is oriented toward the discovery of concepts, themes
and other symbolic elements. Because the researcher cannot predict what symbolic elements will be
discovered, he must use follow-up and probe questions when he discovers them. As the depth interview
focuses more on uncovering meanings than collecting facts, planning is provisional. A plan is developed,
but the researcher is willing to depart from the plan if an unanticipated opportunity presents itself.

Building a Conversational Partnership

The quality of the depth interview is no better than the trust and rapport that the interviewer is able to
develop with the interviewee. In the initial stages of the depth interview, the interviewer must often define
her role and that of the interviewee. The interviewer may find that the interviewee casts the interviewer into
a role such as a therapist, historian, social worker or friend. Indeed the meaning of the term "interview"
itself sometimes has unique meanings. For instance, Professor Al Futrell tells how among prisoners an
"interview" is synonymous with an interrogation: something that is done by detectives, probation officers
and wardens. He was only able to break through prisoners' distrust and suspicion in the interview when
they decided that he was actually doing a "survey" not an interview.

One interviewer role that is particularly useful to cultivate is that of "student." The "student" role is a
particularly useful role to develop because it allows one to develop a friendly and responsive persona. It is
also a role that most interviewees can easily understand. One tricky issue in negotiating interview roles is
determining how much deference to show to the interviewee. If one does not show enough deference, the
interviewee may be reluctant to open up. However, if one is too deferential one may fail to ask needed
follow-up question and probes or the interviewer may fail to keep the interview on target. In addition, the
interviewer should inform the interviewee about the purposes of the research and get permission to record
the interview. It is important to show the interviewee that the interviewer is interested in his story. If this is
done, one finds that most people like to talk about themselves.

Developing Questions for the Depth Interview

Main questions are developed in advance of the interview. Main questions open up topic areas for
discussion. As a group, the main questions should cover the overall topic. They should be open-ended
enough to encourage expression, but be narrow enough maintain the desired focus. Some closed-ended
questions are in main questions to inquire about facts that move the interview in one direction or another
(e.g., do you currently have school aged children?). Main questions should appear in an order consistent
with the suggested stages of interviewing. However, the interview protocol is a provisional plan, not a rigid
script. An interviewer is free to add a follow-up on a question that opens up a topic area and to reorder the
sequence of main questions as the interviewee opens new topic areas on his own. The list of main
questions can be thought of as a check-off list rather than a rigid script.

Main questions get the interview started; however, much of the actual content of the depth interview is
developed by probes and follow-up questions. Probes are verbal and nonverbal cues that the interviewer
uses to indicate his interest and attention, to obtain clarification and to get the person to provide detail or
depth. Typical probes include cues that encourage the interviewee to continue talking such as head
nodding and back channel comments (e.g., yeah, uh huh) and short questions that press for elaboration
(e.g., What happened next?). Other types of probes seek clarification or confirmation or ask for evidence
for a statement (e.g., Who did you hear that from?). Probes help define the interviewing relationship and
help the interviewee understand the kind of answers the interviewer is seeking (e.g., detail depth, vividness
and nuance). Probes are especially prominent in the initial stages of the interview.
Follow-up questions explore symbolic elements that the interviewer notices, elaborate context, and
develop the implications of something the interviewee has said. An astute interviewer follows-up on some
of the following cues: contradictions, hesitations, new words, and statements about how people should
behave or expressions of praise or blame. Follow-up questions may seek elaboration of the attributes that
the person associates with a word or concept, determine how typical an event is, or try to identify whether
different labels have equivalent meanings. Some follow-ups can be anticipated, but most follow-up
questions arise spontaneously and respond to something the interviewee says. Asking good follow-up
questions requires presence of mind and interviewer skill. It is often wise to schedule a follow-up
interview to give the interviewer a chance to ask follow-up questions that he develops after listening to the
tape of the interview.

Each kind of question does important work in the depth interview. To steal a golf metaphor from Rubin and
Rubin (1995), main questions are like long drives in that they open topics of conversation; Probes are like
shots from sand traps and other obstacles in that they help maintain the flow of the interview. Follow-up
questions are like puts that develop the desired depth, vividness and nuance in the interviewee's answers.

Suggested Stages for Depth Interview

Following are some suggested stages for the depth interview1. Stage 1 involves introductions and small
talk to get conversation started and to establish needed rapport. Stages 2 and 3 are not separate stages but
are important functions that the interviewer needs to accomplish early in the interview. For instance, the
interviewer liberally uses probes to get the interviewee accustomed to answering the questions posed in the
needed depth and detail. The fourth stage is usually the longest phase of the interview. The interviewer
should avoid touchy subjects or difficult questions that require a lot of thought early in the interview.
Delaying these topics until later in the interview allows the interviewer and interviewee to develop a level
of trust and rapport that puts the interviewee at ease. Once a person trusts you, she is much more likely to
answer touchy and difficult questions. Asking questions that are emotionally or cognitively challenging
should be reserved until the latter third of the interview. Once the interviewee has warmed up on the topic,
he is more willing and more able to address difficult questions. The final two stages listed below amount to
winding the interview down and closing out the interview in a comfortable manner.

Stage 1: Create natural involvement.


Stage 2: Encourage conversational competence.
Stage 3: Show understanding of factual content and emotional undertones.
Stage 4: Get facts and basic descriptions.
Stage 5: Ask difficult questions.
Stage 6: Tone down emotional level.
Stage 7: Close down interview while retaining contact.

Footnote:

Herbert Rubin & Irene Rubin (2005). Qualitative Interviewing: The Art of Hearing Data. Thousand Oaks,
CA: Sage.

31
Ethical Issues in Depth Interviewing
As with any research project, there are ethical responsibilities that a researcher must respect when she does
a depth interview. With a depth interview, this involves collecting and disseminating information in ways
that 1) respect the person’s autonomy, and 2) avoid harming the interviewee. In a depth-interview, the
researcher begins a conversational partnership that depends upon the interviewee’s cooperation. Both
common courtesy and the research ethics enforced by Institutional Review Boards require a researcher to
disclose certain things about the project, and obtain the necessary permissions.

The priority is for the researcher to receive informed consent regarding the interviewee's participation in
the project. For informed consent, the researcher must disclose information that might influence the
interviewee's decision on whether or not to participate in the study. At a minimum, informed consent
involves having accurate information about what one is consenting to. Researchers should disclose
pertinent details about the kind of and extent of participation being requested of them. Interviewees also
need to be assured that their participation is voluntary. Even though a person may consent to the project at
one point in time, she also has the right to terminate the interview at any time for any reason without
incurring adverse consequences. The voluntary nature of the depth interview may be problematic a
researcher is interviewing someone in an organizational context. Organizations sometimes sponsor research
studies and the interviewee feel pressured by colleagues and supervisors to participate in the research. It is
the researcher’s responsibility to inform the interviewee that she has complete freedom to participate or not
participate in the study.

If one plans to record the interview, one should get the person’s verbal permission before the interview
begins. Journalists may simply place a tape-recorder out in the open and begin taping. If the person does
not object this constitutes tacit consent to be interviewed. However, the conventions of research ethics
dictate that the interviewee gives her explicit consent before the recording begins. Getting permission to
tape record a conversation comes packaged with a number of other issues such as 1) what the taped
conversations will be used for, 2) who will have access to the tapes during transcription and data analysis,
3) whether the tapes will be taped over once they have been transcribed, 4) whether the person’s name will
be used in written reports, 5) and whether confidentiality can be guaranteed. The bottom line is that the
researcher must honor any promises that she makes on these issues. If she promises confidentiality, she
should be prepared to transcribe the tapes, store them, and write up research results in a way that fulfills her
promises. Obtaining informed consent does not have to involve high drama. It can be handled in a
professional but low key way.

The criterion of avoiding harm in interview studies most often comes into play when the interview involves
matters related to embarrassing or illegal behavior. If a researcher is doing interviews drug traffickers there
is a greater probability that the interviewee could be harmed by disclosing personal information. In
particular, both the researcher and the interviewee could face legal entanglements. The criterion of
avoiding harm also involves handling sensitive or potentially embarrassing subjects appropriately. The
researcher can encourage the interviewee to be honest, but she should drop a line of inquiry if the person is
reluctant to disclose further or if it is apparent that the interviewee is suffering real discomfort. Other
issues associated with causing the interviewee no harm, come from how the results of the study are to be
communicated. Even if a person consents to having her name disclosed in conjunction with direct quotes,
the researcher should not use her name if it becomes apparent that the interviewee will suffer legal,
personal and other adverse consequences that are likely to result from the disclosure. The researcher
should avoid harming her subjects even if they have informed consent.

If you were to go through the usual university procedures for conducting a research study, you must submit
your study plan to the University of Louisville Human Subjects Committee. This committee scrutinizes
research projects of University faculty and students that involve using human subjects. It is charged with
ensuring that ethical obligations to study participants be met. If you do a study as a part of your university
work and you plan to present the results in a research meeting or publication, you must have your project
approved by the human subjects committee.
Focus Group Interviews
Group interviews or focus groups have emerged as a popular interpretive research tool. Focus groups are
widely used in marketing research, legal research, political campaign research and academic research. A
focus group is a group discussion on a particular topic guided by a moderator. The moderator poses general
open-ended questions and other conversational management techniques to get people talking on the topic
and to keep them on topic as the discussion progresses. Focus groups typically involve from 6 to 10 people.

A focus group provides information about the characteristic patterns of verbal and nonverbal
communication when a topic is discussed in a group setting. Focus groups illuminate the meaning of
events, however, they do not produce meaningful quantitative data. Focus groups can be combined with
quantitative forms of analysis. For instance, a researcher may discover a puzzling correlation between two
variables in a survey. They may want to then gather in-depth data to help them begin to understand how
people think about and understand the phenomena. Mass media researchers sometimes use focus groups to
help them unravel which aspect of a media portrayal was most salient or interesting. For many messages, it
is difficult to isolate which factors may be responsible for a particular effect. In focus groups, people can
give their reasons and rationales for behavior in an open-ended and unrestricted way.

Focus groups can obtain information more quickly than other research tools, especially when one is
conducting exploratory research. Focus groups usually involve a purposive sample: a sample of some
subset of general population who fit in a particular demographic category or behavioral category (e.g.,
owns a multimedia computer). General ideas or themes identified in an exploratory focus group interview
can be tested more rigorously in follow-up testing or research.

Focus group research can stand by itself. Researchers may assemble members of a particular culture or
group in order to sample or simulate the language and communication practices of a group of people. The
focus group dialogue can be analyzed for themes, concepts and icons in the same way that depth interviews
are analyzed. In this situation, focus groups interviews give the researcher a rich interactive database to
investigate group and cultural phenomena.

Focus groups sometimes offer an advantage over single interviews when a researcher is trying to get at tacit
information that group members know but don’t talk about much. However, when you get people into a
group discussion situation, one person’s talk will spur recollections and parallel talk in another person. This
is sometimes called the “snowball effect.” Once a topic generates momentum, it may be possible to get
more stories and anecdotes in one group interview lasting 90 minutes than in one-on-one interviews on the
same subject. When the focus group involves a naturally occurring group, it can be used to investigate how
people jointly recreate memories of a situation, or jointly create a story. One interesting characteristic of
group discussion is that it can make use of the extensive knowledge and experience the group as a whole.
The pooled knowledge and resources of a group are usually superior to that of the most experienced and
articulate group member. Hence, if the right conditions are present, the focus group can be an effective and
efficient tool for investigating some questions.

Focus groups also have some notable limitations as a data-collection method. First, a focus group should
not be utilized when the respondents know each other and the topic is sensitive. On such issues, individual
depth interviews will be more appropriate. People are not likely to talk very openly or candidly on sensitive
subjects. Moreover, if they do talk, serious ethical questions are raised, because the things that an
individual says may have serious repercussions. This violates the do-no-harm ethics criterion. In such a
situation, depth interviews are superior from an ethical standpoint, because the researcher has greater
opportunity to provide interviewee confidentiality.

Focus group discussions can also fail to access the full range of what interviewees know and feel. Some
people are reticent and reluctant to talk; some people talk too much and drown other people out. If a group
discussion predominately reflects the views of only two people in the group, then the focus group is
probably a waste of time. Individual interviews are preferable to a group discussion that is skewed by the

33
views of only a few members. The skills of the focus group moderator are crucial in this regard. The
moderator can encourage reticent group members to express their opinions. The moderator can also use a
variety of conversational techniques to prevent one or two people from dominating the conversation. A
skillful moderator can also help the group to stay on the discussion topic.

Focus group findings can be misused. The most frequent misuse of focus group research comes when
people treat qualitative results as if they were quantitative results. When focus group results are
communicated in the same quantitative form that survey results, (i.e., this number of people thought this
and this percentage of people thought this), you are probably dealing with an incompetent or an
unscrupulous researcher. If you want information about population parameters, then a survey of a
representative sample of the relevant population is required. A good focus group discussion can identify
the variety of things that people in a group think about a phenomenon. However, you need a random
sampling procedure and a large sample size before quantitative data have any real meaning. Focus group
interviewing is a very flexible and useful research tool, but it is a lousy substitute for a survey drawn from a
representative sample.
Exercise: Self-Analysis-Generating Interview Topics
Identify several topics on which someone else could profitably interview you (i.e., topics on which you are
knowledgeable, have relevant experience, and about which you can speak articulately). There are more of
these topics than you might think. Once you identify topics for yourself, you will have a better grasp on the
kinds of topics on which you can interview others.

Identifying Topics:
For each question below, identify one topic on which another person could interview you. Write a
description of the topic that is no more than 40 words.

1. Describe of a role that you play that has a unique or interesting aspect to it (e.g. oldest sister to a bunch
of younger brothers).

2. Describe a setting or context, with artifacts, rituals and other symbols that may be difficult for an
outsider to understand (e.g., work setting, church setting).

3. Describe a group that you participate in that has its own jargon or lingo (e.g., communication code). How
is this communication code used to include and exclude people?

4) Describe a recent experience in which you had to learn the ropes in a new situation, or make a life
transition. What did you learn about yourself or about life in the process?

35
Generating Main Questions for a Depth Interview:
Choose one topic identified on the front side of this exercise page and write down eight main questions,
preferably open-ended, that another person could use to interview you. Briefly, explain why each question
would provide a good entry point to exploring an aspect of the topic.
Problematic Questions and Question Patterns in the Depth
Interview
There are several kinds of problematic questions and question patterns that you should avoid when you
develop main questions for a depth interview.

The Double-Barreled Question


This creature consists of two or three questions run together. It consists of run-on sentences and run-on
logic. The rule to follow in all question asking is: one question at a time. The double-barreled question
confuses respondents and ends up at least one of the questions remaining unanswered.

Example: “Did you look forward to your senior year and has it lived up to all that you expected it to
be?”

In the example above, it would be better to break the question down into two open-ended questions. “What
did you expect of your senior year in high school?” and, “How has your senior year compared to your
expectations?” This combination of questions is likely to get more details about the interviewee’s specific
expectations for high school and about how the interviewee’s experiences actually compare to those
expectations.

The Candidate-Answer Question


This question has assumptions built into the question that encourage the interviewee to respond to the
question in a particular way. The assumptions could be about the nature of reality, the knowledge of the
interviewee, the experience of the interviewee, or the feelings of the interviewee. The candidate answer
question ends up putting words in the interviewee’s mouth. In its worst form, the candidate-answer
question puts strong pressure on the interviewee to answer the question in a particular way. Let’s start with
an absurd example.

Example: “Why do you think the Martians are trying to establish contact with human beings at this
point in time?”

The assumptions in this example are easy to pick out. It assumes that Martians exist, that they have means
of communication, and that they are trying to contact or communicate with human beings. If it were
common knowledge that the Martians were trying to establish contact with humans, then the question
would be fine. It is okay to build assumptions into a question when the questioner and listener share the
background knowledge about the issue that question addresses.

In many cases, the embedded assumptions may not be so obvious. We may assume that a person’s
feelings or experience will be a certain way because we project our imagined reactions to the situation on to
the other person.

Example: “It must be exhilarating to think of all of the choices that lie before you at this time of your
life. Please describe your thoughts and your feelings of excitement about planning for your future.”

This prompt overlooks the possibility that the person may be overwhelmed by the choices she faces or that
her reactions to the experiences in question are very different from the excitement and exhilaration
suggested by the interviewer. The assumptions in this prompt should be avoided. A more neutral and
open-ended way of rephrasing the example above would be to simply request the interviewee to: “Please
share what you are thinking and feeling as you make plans for your future.” This set of instructions does
not program or limit the interviewee to talking only about her excitement and exhilaration.

Perhaps the most problematic kind of candidate answer question, at least for main questions, is the tag

37
question. The tag question consists of a statement about the world, followed by a question that asks for an
agreeable response from the addressee.

Example: “The senior year is a kind of sad and nostalgic time. Isn’t it?”

As a conversational move, the tag question has some value, especially as a persuasive device. However, as
an interview question, the tag question is likely to elicit polite and agreeable responses that have little
correspondence to what the person actually thinks. It is not surprising that people, who ask many tag
questions, are also the same people that think that most everyone agrees with them. They state their
opinions and dare people to disagree with them; most people don’t.

Of course, things are more complicated than this. Candidate answer questions are often appropriate as
probe questions or paraphrase questions. You use these kinds of questions when you want to confirm a fact
or an interpretation you have about the person’s feelings or experience.

Example: “It sounds like you are frustrated by the fact that you have many of responsibilities as a
young adult, but that lack many of the privileges that should go along with your responsibility. Does
this accurately summarize your point of view?”

In the probe question, the candidate answer form makes sense. It reflects a hypothesis that the interviewer
has formed about the perspective of the interviewee. It then asks for specific confirmation or rejection of
the interviewer’s conjecture. It is a checkout procedure similar to paraphrasing or “active listening”. For
main questions, however, you are trying to open up areas of conversation and not trying to steer the
interviewee towards particular conclusions.

The Interrogation Sequence


This occurs when the interviewer develops too many closed-ended questions in a row. Main questions
should primarily be of the open-ended variety. Perhaps two-thirds of your prepared main questions for a
main interview should be open-ended questions. A sequence of closed ended questions, especially early in
an interview, encourages the interviewee to provide short and unelaborated answers. Recall that stage two
of the depth interview consists of training the interviewee in the kinds of answers that you want: elaborated
and depth oriented responses. Starting with a sequence of closed-ended questions will undermine your
ultimate goal for the depth interview.

You may be tempted to start with a string of closed-ended questions because you know relatively little
about the interviewee’s background and experience. You want some basic facts so that you can determine
what questions you can and should ask. Let’s say that you are interviewing a long-term member of the
Louisville fire department about her career. You want to know some of the facts about her career so that
you can develop suitable follow-up questions. You could ask a series of closed-ended questions about her
career, ask her to describe her career in an open-ended fashion, or use some combination of open and
closed-ended questions. Closed-ended main questions do have a role to play in the depth interview, but
their role is a subordinate one. For instance, closed-ended questions can serve as filter questions to help
you determine which open-ended questions are appropriate. For instance, you might ask if the interviewee
has any children and if so, how many. If the interviewee has children, then you can go ahead and ask
questions about how the interviewee’s professional life affected her role as a parent. If not, you proceed
onto the next topic area. To the extent that you use closed-ended questions, it is best to use them in ways
that subordinate them to true open-ended questions. In many cases, it is possible to ask the person to
describe some of the facts that you are after, but to do so in a way that lets the person respond in an open-
ended way (e.g., Please describe how you became a firefighter.). This approach will usually enable you
to avoid asking a large number of closed-ended questions.
Developing Depth Interview Answers
Following are several attributes that we seek to develop in the interviewee’s answers.

Depth
You want your interviewee to provide you with a thoughtful answer based upon reflection and considerable
evidence. A depth answer consists of getting the person to reflect on their experience and develop insights.
A depth answer is more reasoned and reflective than something that comes off the top of the interviewee’s
head. You will have to lead the interviewee to understand the level of depth you want in an answer. For
example, consider the following two queries: a) How do you like being a college senior? and b) Tell me
how being a college senior differs from being a junior. The second instruction provides a specific
dimension of comparison to help the interviewee to reflect on. People are also more willing to talk in depth
if they feel you are familiar with and sympathetic with their world.

Detail
Getting detail involves asking for the grubby particulars that convey the texture of everyday events and
rituals. Getting details means that you obtain clear examples, evidence and good description. You are
asking for detail when you ask for illustrations, examples and step-by step instructions of what happened or
how something is done. Use follow-up questions when "free information" is disclosed. Free information is
any unexpected information that the other person provides you as the conversation unfolds. If you provide
the right feedback early in the interview, you give interviewee cues as to what kind of detail you are
looking for. In time, your interviewees will begin providing these details without prodding as the interview
progresses. You also should consider asking an interviewee how he accomplishes a particular task which
he is responsible for. If you are interviewing a hobo, you might ask: “How does a hobo find a place to
sleep?” Talking to a CCU nurse in an investigation of pain, you might ask, “How do you recognize when a
patient is in pain, but may be trying to conceal it?”

Vividness
Getting vivid answers involves getting the range of emotion and feeling that the interviewee experiences in
a particular event or episode. Getting the person to describe their experiences develops vivid answers. Get
them to talk about concrete events and not to simply describe a phenomenon in the abstract. For example,
consider the following two queries: a) Did you feel excited when you found out you had won first prize in
the contest?” and b) “Please describe the all of the things you thought and felt upon hearing that you won
first prize in the contest.” The first question can be answered as if it is a closed-ended question. The second
question requires the person to recall and report the complexity of the occasion.

Nuance
You want to develop the nuances of your interviewee’s language and point of view. Certainly, you can do
this by the kinds of questions you ask. Nuance in other words involves developing precision in your
description. It explores the subtlety of meaning. Developing nuance will test your listening and follow-up
questioning skills. Some people speak in nuance, but others speak in bold distinctions that will require you
to ask for shades of meaning. First, you should listen for signs of ambivalence, hesitation or apparent
contradiction in your interviewee’s answer. If you detect hesitation, you can follow up. For example, you
might say, “Your answer suggests that you may have some conflicting feelings on this topic. If so, could
you explain them for me?” In other cases, you can use follow-up questions that help develop the usual
context of the information provided. “Does it always happen like this?” or “Do other people feel the same
way about this as you do?” You can often get more nuances in the interviewee's answers by summarizing
the gist of the previous segment of talk and asking for confirmation.

Footnote:
1. Herbert Rubin & Irene Rubin (2005). Qualitative Interviewing: The Art of Hearing Data. Sage.

39
Exercise: Question Development
Develop questions for one of the topic areas for your depth interview. Try to develop questions and
strategies that will help develop depth, detail, vividness and nuance. Try to get questions that will get your
interviewees to talking on your topic, without overwhelming them with technical vocabulary. Approach
indirectly. One vehicle for doing this is the grand verbal tour. Verbal grand tour: these questions ask the
person to develop a question about an ordinary day or week or how routine events took place. For
example, you might ask questions like, a)“Could you tell me about the important things that happen during
the school year?” or b) “Describe for me what you do in your French class on a typical day.” You can ask
questions that ask what a person did in a concrete situation, or a hypothetical situation in the past such as a)
“How did your parents handle the situation if they found out that you had been disciplined in school?” or b)
“How did your parents train you to sit through long church services as a young child?” Generate six
questions. Then your partner will review your questions and make suggestions to develop better depth,
detail, vividness and nuance. You will also do the same for your partner.
Exercise: Evaluating a Depth Interview Protocol
Analyze the following list of questions in terms of the organizing principles we have discussed for main
questions in this unit. Assume that the depth-interview is a part of an investigation of parents’ involvement
with their child’s school. The researchers want to get parents talking about their child’s experience in the
school and the meanings that they attach to it. Critique the following list of questions for whether it
contains the right kinds of questions. Remember that the researchers want to limit the number of factual
and closed-ended questions. They want to avoid candidate-answer questions, and they mainly want open-
ended questions that will get parents talking in depth about their child’s experiences with the school.
Rewrite questions if needed. Analyze both individual questions and the set of questions as a set. Rework
individual questions that have problems, but also assess the adequacy of the interview schedule as a whole.
What questions should be added? Should the questions be reordered? If so, how should they be
reconfigured? Review interview schedule below and then write your critique.

Proposed Interview Question list:

1) How many children do you have attending School X?


2) Please give me the name, age and grade-level of each child.
3) What classroom is your child in at school X? What is the name of his/her teacher?
4) Why did you choose to send your child to School X?
5) What subjects is your child studying in his/her classroom right now?
6) How much homework does your child has each evening?
7) How does your child get to school (e.g., by walking, by bus, or by being dropped off)?
8) In your opinion, are the classrooms in School X disorderly?
9) In your opinion, do students dress inappropriately at School X?
10) In your opinion, is the staff at school X incompetent?
11) In your opinion, do you feel unwelcome when you visit the school?
12) In your opinion, is the school principal unhelpful?
13) Do you know what homework your child is supposed to do each evening?
14) Does your child’s school have an after-school sports program?
15) Does your school have a Family Resource Center?
16) What services has your child received from the Family Resource Center?
17) What kind of education do you think your child is receiving at school X?
18) What, if anything, do you like about School X?
19) What, if any, do you dislike about School X?

41
How Many Interviews Are Needed?
If you are doing a depth-interview study, you should continue doing interviews until you feel that you have
probed the topic adequately with a sample of people who are familiar with the phenomenon. It is
impossible to know in advance how many interviews this will require. Qualitative research is emergent and
open-ended. How far you go depends upon what new things you discover along the way. Hence planning
for a depth interview project is flexible, continuous and iterative.1 Planning is iterative because it involves
going back and forth between data collection and data analysis as the project proceeds. Planning on a
qualitative project is flexible because existing plans are modified and changed, as new research agenda are
uncovered. Research planning is continuous as one builds new insights and discoveries from early stages of
the research process into the structure of the inquiry (e.g., new research questions or research procedures).
In contrast, quantitative research is usually fixed by the plan developed before the beginning of the study.

In a depth-interview project, researchers may begin with a destination in mind, but with an unspecified
itinerary. The researcher begins with questions that she believes will get the interviewee talking in-depth
about his or her experiences. After the first interviews, she should carefully review her notes or review
tapes of the interviews so that she can make the necessary revisions in the interview protocol and
interviewing plan. If the initial interviews uncover unexpected rich new material, the researchers may want
to add new questions to the interview schedule or add new interviewees based upon these initial
discoveries. The researcher may also drop or revise questions that do not seem to be yielding interesting
material. In addition, the structure of the questions may be altered to better suit the researcher’s purposes.
The researcher will continue to make adjustments throughout the research project as she goes back and
forth between provisional data-analysis and data collection.

How does the researcher know when she has enough material? The guiding principle is that you should
keep on as long as needed to facilitate discovery. This simply means that you should continue the
interviews until the interviewees repeat themes that have been highlighted in earlier interviews. When the
latest batch of interviews doesn’t add anything particularly new to your body of data, you have reached a
“saturation point” or redundancy in your interview. When this occurs you can begin to wind the
interviewing phase of your research project. It is possible for a researcher to close off depth-interviews too
early if she only interviews people from the same location or the same general mind-set. Some studies of
culture, for instance, tend to overlook variations in perspective among members of the same general
culture. On an interview project, it is also important to schedule interviews with a subset of people who are
likely to differ on the perspective that they bring to a particular topic or issue.

Footnote

1. Herbert & Irene Rubin (1995). Qualitative interviewing. The art of hearing data. Sage Publications.
Unit 4-Evaluating Communication: Rhetorical Criticism
and Critical Research
This unit examines research practices of rhetorical criticism and critical research. Rhetorical criticism
usually operates within the interpretive paradigm whereas critical communication operates within the
critical paradigm. However, both rhetorical criticism and critical research share the idea of critiquing
communication acts. A critique applies standards to an act of communication and judges the act according
to those standards. Rhetorical criticism departs from critical theory in the standards applied in the analysis
of communication. Rhetorical criticism typically asks whether and how well a communication act (e.g.,
speech, letter to the editor) achieved its goals with a particular audience. Critical theory on the other hand,
examines communication practices for the degree to which they reinforce or uphold the status quo
Objectives
Upon completing this unit, you should be able to:

4-1: Explain how formal criticism differs from expressing an opinion.


4-2: Describe the purposes of rhetorical criticism.
4-3: Differentiate between rhetorical criticism and critical research.
4-4: Identify the standards of successful communication that a rhetorical critic is employing upon reading a
piece of rhetorical criticism.
4-5: Evaluate the quality of arguments in an example of rhetorical criticism.
4-6: Explain the purposes of critical research.
4-7: Identify what standards the critical theorist is employing upon reading a piece of critical research.
4-8: Evaluate the quality of arguments in an application of critical theory.

43
Evaluating Communication
One of the essential features of life is that we make judgments about things in our environment.
"Evaluation” is perhaps the most fundamental dimension of human meaning1. This is another way of
saying that we form attitudes about the important things in our environment. Attitudes vary along the
continuum from positive evaluation to neutral to negative evaluation. In general, we tend to engage objects
toward which we have a positive attitude, and avoid things toward which we have a negative attitude. To
use an obvious example, most people listen to music they like and avoid listening to music they dislike

Having and expressing attitudes is not a distinctive feature of human life. However, we as humans are
unique in the degree to which communication affects our beliefs and attitudes towards things. As social
animals, we intensively communicate with other people about how we evaluate things in our environment.
In particular, we actively compare information about things in our environment in order to determine how
we should evaluate certain things. One of our primary ways of reducing uncertainty about our social
environment is to compare notes with other people and see how our beliefs and attitudes about things
compare with other people around us. For instance, much research into interpersonal attraction finds that
we are particularly attracted to people with whom we share important attitudes. Even finding that someone
shares our negative attitude towards another person is often enough to form an initial bond with someone
(i.e., My enemy’s enemy is my friend). Choosing your friends, at least in part, amounts to choosing your
propaganda.

Most publications offer regular commentary or criticism on some set of objects. We read critiques of
movies, food establishments, music, clothes, and sports. In this sense, most us are often critics, and most of
us frequently seek out the opinions of others. Before I go to see the latest movie release, I may want to
determine if the movie is likely to be worth my time and money. There is no objective standard here,
because what I like other people may dislike. However, if I can identify a critic who has similar evaluation
criteria as I do, I may repeatedly seek information on this critic’s opinion about things in a particular
domain. We do not consider everyone’s opinion to be of equal value: we consider the opinions of some
critics to be more authoritative than the opinions of other critics.

What is it that differentiates formal criticism from everyday evaluations that most of us make? More than
anything else, formal criticism involves selecting criteria and justifying them and systematically applying
the criteria to a particular object. In the case of communication criticism, this usually involves criticizing a
communication text. Authoritative critics not only have an attitude on a subject, they can explain the
criteria that they used and how they applied these criteria in a particular case (e.g., explain why she gives a
particular movie a positive or negative rating). Notice that the articulation of criteria is very informative
because it enables us to understand how different people come to different evaluations of the same object.
If I find that the criteria that a critic applies are different from the criteria for what constitutes a “good
movie” than I would, it helps me understand how the critic’s attitude and mine are likely to differ. In other
words, formal criticism enables me to understand how the critic arrived at his conclusions. Very often
students make the mistake of thinking that expressing their attitudes toward something is informative.
Professors often say that understanding how you arrived at your conclusion is more important than the
particular conclusion that you favor. Students often give their conclusions without warranting their
evaluations, where professors expect formal criticism (i.e., evaluation criteria identified, applied and
explained).

It should not be surprising that the study of communication also involves a heavy dose of communication
criticism (critiques of particular communication acts or practices). The next reading examines rhetorical
criticism and the final reading examines critical theory.

Footnote:

1. Charles Osgood, George Suci, & Percy Tannenbaum (1957). The measurement of meaning. Urbana:
University of Illinois Press.
Rhetorical Criticism
Aristotle defined rhetoric as" the ability to see in a given case the available means of persuasion." A more
recent writer defined rhetoric as "the process of adjusting ideas to people and people to ideas." These
definitions share the idea that rhetoric functions to achieve instrumental goals or ends. In other words,
rhetoric is meant to have particular impacts, often persuasive impacts, on real people in real situations.
This means that it is possible to evaluate rhetoric for how well it functions or achieves its desired ends
within a designated communication context. This frequently involves some degree of judgment regarding
the propriety of the discourse or how well it is adapted to the setting, audience and occasion of the
discourse.1

Rhetorical criticism is the systematic description and evaluation of persuasive messages. The rhetorical
critic lays out evaluation criteria, describes how the standards apply to the communication being assessed,
and develops an argument that supports the critic’s assessment. Aristotle developed a theory of the major
ways that people exercised influence via persuasive messages in Greek culture. In particular, he identified
three particular avenues toward persuasion: ethos or the credibility of the source, logos or how compelling
the evidence and reasoning was for an audience, and pathos the degree to which appeals to emotions
moved the audience to agreement and action. Aristotle and followers also developed a theory of how
message design contributed to the effects of ethos, logos and pathos. They developed several canons of
rhetoric to judge how well a particular message was designed and executed. The canon of discovery
evaluates the degree to which the author identified the best available arguments and evidence in support of
her proposition (i.e., especially important for logos). The canon of organization assesses whether the
message was effectively structured to produce the desired result (i.e., important for both logos and ethos).
The canon of style evaluates the degree to which the language choices in the message were appropriate and
effective (i.e., particularly important for pathos). The canon of delivery appraises the appropriateness and
effectiveness of the nonverbal aspects of delivery (i.e., particularly important for ethos). Aristotelian
criticism evaluates the degree to which particular persuasive messages are appropriate and effective
according to these evaluation standards.

Contemporary rhetorical criticism still draws from the classical concepts of rhetoric, but the texts that are
studied and the criteria that are used to evaluate them have expanded. Rhetorical critics focus on
explaining the interaction between the text and its context. They explain how a communication text
responds to, or alters the understandings of an audience. The rhetorical critic brings certain knowledge and
standards that set up expectations that the rhetorical critic has regarding the text. The rhetorical critic asks
questions such as: a) What expectations are created by the context, b) What does the text present to the
audience? And c) what features of the text are significant? The critic inspects the text from a variety of
angles evaluates how well it works.1

When considering the context of a text or speech, a critic can focus on the context that was present at the
time that the text was first created or enacted or she can examine the context when a particular audience
experiences the text. The rhetorical critic often explores the problem that the text addresses, the
communicator's credibility with the particular audience, and the genre that the text seems to fit. Each of
these features sets up expectations as to what the text should address and what form the text should take.
For instance, a speech genre often has a particular expected form, vocabulary, types of argument and story
lines.1 For instance, one much studied genre in the area of public relations is corporate apologia. Apologia
is a defense that is when defends an organization's actions or apologizes for misdeeds. One irony that this
research has discovered is that organization's sometimes respond to a crisis with equivocal speech that
expresses regret for an incident, but denies responsibility. In other words, the organization finds itself in
the position of denying that it did anything wrong, while promising never to do it again.2

Rhetorical critics also examine how the communicator constructs his identity or presents himself in a text.
Communicators make choices on how to enhance their rapport and credibility with the audience. Not
surprisingly, a communicator who occupies a socially prestigious has more options in self-presentation than
does the ordinary person. In a similar fashion, rhetorical critics examine how the audience is constructed in

45
the text. For instance, in political speeches, the audience of the American people is often portrayed as
perceptive, honest, and brimming with wisdom and common sense. The rhetorical critic may also examine
things that are absent in a text. For instance, in the mass media people who are ordinary in their physical
appearance are severely underrepresented and "beautiful people" are significantly overrepresented,

Rhetorical critics also examine symbolic forms that are employed in a text that are relevant to the
judgments that the rhetorical critic is making. For instance, rhetorical critics often analyze the arguments
that are embedded in a text. Many arguments come in the form of an enthymeme in which part of the
argument is left unstated and is left to the audience to infer. In analyzing an enthymeme the rhetorical
critic can display the unstated beliefs and values that are required to make sense out of the text. Rhetorical
critics also investigate how messages are framed via the use of names and metaphors. Metaphors present a
particular view of reality by characterizing a new or unfamiliar object or experience in terms of something
else that is familiar. For instance, business economic competition is often described in terms that are
characteristic of war or armed struggle. The rhetorical critic can examine how the metaphors that were
chosen contributed to the overall effect of the text.

It is somewhat ironic that the concept of rhetoric has such a bad name. "Rhetoric" is often equated with
empty bombast, manipulation or deception. In fact, rhetorical criticism often evaluates the ethical
appropriateness of the communication choices that the communicator made in constructing a text. The
rhetorical critic typically evaluates the choices that the communicator made in the context of the choices
that were available to the communicator. In this sense, the rhetorical critic can judge whether the text
contains the most appropriate, most ethical and most effective choices. In reality, different rhetorical critics
can apply different standards to the same text and arrive at rather different conclusions about the
effectiveness or appropriateness of the text. This is not surprising, because overall evaluations should shift
as the criteria for evaluation shift. In the end, there is no one correct way to approach rhetorical criticism.
However, the critic must articulate his standards, justify them and apply them in a clear and consistent way
if she wants the reader to take her descriptions and opinions seriously.

Footnote

1) Ann Gill & Karen Whedbee (1997) Rhetoric. In Teun A. Van Dijik (Ed.), Discourse as structure and
process (pp. 157-184). London: Sage Publications.

2) Keith Hearit (2005). Apologia theory. In Robert Heath (Ed.), Encyclopedia of Public Relations, (pp. 38-
40). Thousand Oaks, CA: Sage.
Critical Theory
Critical theory includes a variety of perspectives that scrutinize how the power structures of society tend to
distort communication. The other side of the coin is that critical theories examine the ways in which
communication is used to disguise the operation of power in social relations. Critical theorists are not
content to describe existing communication practices; they want to change communication practice for the
better. A critical theorist envisions a "better form of communication" and asks-"Why not?"

Critical theory scholars have a definite idea of what they mean by "communication distortion". Critical
theorists share a belief in the value of "rational public discourse". Rational public discourse is dialogue in
which everyone affected by a public decision has an opportunity to participate. The debate involves
unfettered discussion in which participants search for solutions to common problems facing a group or
society. Decisions that are made as a result of rational public discourse are free of coercion, threats,
bribery, spin and manipulation that are often used to substitute a "counterfeit consensus" for a "real
consensus." According to critical theorists, distorted communication has very serious consequences. It
keeps members of society from seeing the real possibilities that are open to them. In the view of critical
theorists, we end up sacrificing our collective rights and opportunities because powerful elites tend to
monopolize discourse and divert it to serve their own private interests.

Critical theorists understand that the ideal of rational public discourse is more of an ideal than a reality.
However, the ideal speech situation of undistorted argumentation can be used to critique existing forms of
communication. According to critical theory, comparing communication and decision-making in actual
situations against the ideal serves to make people aware of how communication is distorted. Critique and
consciousness-raising can also motivate people to change the system in ways that make it less distorted.
For the critical theorist, "truth" is not an external objective state so much as it is an undistorted
communication process in which the argument succeeds or fails on its own merits.

One particularly influential framework for assessing communication has been the work of Jurgen
Habermas, a German philosopher. Habermas observed that economic, political and social inequality
are the primary sources of systematically distorted communication. In other words, irrelevant factors
deriving from social and political inequality short-circuit the free discussion and inquiry that are necessary
to identify solutions and evaluate them. Habermas identified a series of criteria that could be used to
identify and evaluate the ways in which communication is distorted.

By analyzing public deliberation, critical theorists seek to make people aware of these distortions of
communication and thereby discredit these practices. For instance, there critical studies show how
communication practices in marketing, advertising and public relations are used to spin information in
ways that distorts the use of scientific information.1 Critical theorists seek to enlighten people and motivate
them to pursue social change that makes communication more equitable. This should reduce the tendency
for communication to be distorted.

Critical theorists scrutinize what is left unsaid in discourse. Communication can be distorted if the
perspective and voices of actors are actively excluded from important communication forums. Critical
theorists are aware that many voices are silenced because many individuals and groups have very limited
access to the channels of communication, especially in contemporary societies where much communication
is mediated through the mass media.2 Many critical theorists do research to identify "texts of resistance".
These are voices or texts that were suppressed or excluded from a society's predominant communication
channels. For instance, feminist theory has explored the ways in which the voices of women have been
muted or marginalized in the communication professions.3 Other scholars with a critical bent work help
activists and marginalized groups frame their messages in ways that are likely to be picked up and
discussed in the mass media. 4

Critical theorists also examine how ideology distorts communication. An ideology is a set of ideas about
the world that are promoted to the point that they become taken-for-granted or naturalized. Most analyses

47
of ideology try to show that a prevailing ideology is an incomplete representation of reality. According to
critical researchers, ideology functions to “naturalize” and legitimize particular social and power
arrangements in society that benefit elite groups. For instance, a recent article in a public relations journal
critiqued the concept of civil society as it related to communication and development in the context of the
third world.5 Civil societies is a widely used term that refers to efforts by the U.S. government and
Nongovernmental Organizations to develop and spread western democratic ideals. The author concluded
that the soothing words of "civil society" actually function to justify and simultaneously hide the ways in
which capitalist societies are intervening in third world countries to impose economic and political
institutions that favor elites and marginalize ordinary citizens.

The practice of critical theory arouses criticism and opposition from researchers working in the objectivist
and interpretive research paradigms. Objectivists fault critical researchers for discarding the standard of
objective inquiry that excludes personal subjectivity and bias. Critical theorists retort that the objectivist
paradigm merely hides subjectivity under the mantle of "objectivity", whereas critical theory makes its
point of view transparent and accountable. Interpretive researchers sometimes object to the fact that critical
theorists tend to force the perspectives of the people that they study into their preformed categories and
concepts related to race, class and gender. Critical theory is also criticized by some because it seems to be
better at deconstructing existing communication practices than it is at providing solutions about how to
better structure communication. For instance, critical theorists tend to be very critical of the actions of
hierarchical authority. However, some people, including your instructor, argue that it is impossible for
society to operate efficiently and fairly without some forms of legitimate authority. For instance, some
people know a great deal more about a topic than others. Our legal system tends to privilege expert
testimony and limit the kinds of testimony that non-experts can provide in court proceedings. This may
seem to be a distortion of communication from an egalitarian point of view, but it makes a great deal of
sense if the opinions of some people are much more informed and relevant than the opinions of others. In
the end, critical theory is important because of the questions that it raises. Its answers, on the other hand
must be treated as one provisional perspective, rather than a final authoritative verdict regarding
communication.

Footnotes

1) Sheldon Rampton and John Stauber (2001). Trust us, We’re Experts: How Industry Manipulates Science
and Gambles with your Future. New York: Tarcher/Putnam.

2) Gill, A., & Whedbee, K. (1997). Rhetoric. In Teun Van Dijk (Ed.) Discourse structure and process (pp.
157-184). London: Sage.

3) Aldoory, A. & Toth, E. (2002) Gendered discrepancies in a gendered profession: A developing theory
for public relations. Journal of Public Relations Research, 14, 103-126.

4) Ryan, C. Carragee K. & Schwerner C. (1998). Media, movements and the quest for social justice.
Journal of Applied Communication 26, 165-181.

5) Mohan Dutta-Bergman (2005). Civil society and public relations: Not so civil after all. Journal of
Public Relations Research, 17 (3), 267-289.
Unit 5-Interpretive Paradigm: Analyzing the Depth
Interview
This unit introduces you to qualitative data analysis. Depending on your personal preferences, you may
find qualitative analysis to be frustrating, intriguing, overly subjective, or creative. The focus of qualitative
data analysis is to develop an understanding of the meaning of an event or an object for a person or group
of persons. This primarily involves an analysis of the meanings that people attach to things and the
symbolic media they use to communicate those meanings. Symbolic analyses can involve everything from
examining the symbolic construction of food, to the analysis of how we use space. In the case of your
interview data, this primarily involves an analysis of the language that a person uses to describe his
experiences. In particular, you will identify and examine the meaning of different symbolic elements
within the "text" of your depth interview. Textual analysis can be done on texts such field notes, interview
transcripts, diaries, or videotapes. The purpose of the analysis is to explore the structure of meaning
embedded in your text. This introduction is necessarily superficial, but it gives you a feel for qualitative
analysis and of the kind of rigor needed to do a worthwhile analysis.

Unit Objectives:

Upon completing this unit, you should be able:

5-1: To describe the purpose of a symbolic analysis.


5-2: To explain how an analyst identifies the central symbols in a text for analysis.
5-3: To define the symbolic elements of concept, theme, icon, story, myth account and narrative.
5-4: To identify concepts, themes and other recurring symbolic elements in textual material.
5-5: To examine the structure and meanings of symbolic elements in the textual material.
5-6: To identify and explain counter examples in textual material.
5-7: To write a narrative summarizing the findings of your depth interview consistent with the conventions
of qualitative reporting.

49
Analyzing the Depth Interview
You have finished your interview. You have written your post-interview notes and documented your
questions. You are ready to analyze your interview. There are many different ways that you can analyze
interview data. This unit introduces you to a generic type of symbolic analysis. As you will recall, a
symbol is something that stands for or represents something else. The individual word is an example of a
symbol, but material artifacts such as a wedding ring or a family coat of arms are also symbols. Symbols
represent or carry meaning for the people who use them. Moreover, most symbols do not exist in isolation.
They usually participate in a larger code system (e.g., language or nonverbal codes). A code consists of a
set of symbols and their interrelationships.

To assert that you are going to do a symbolic analysis doesn't get you very far, because human life is
saturated with symbols. Within communities, most symbolic elements are widely shared and mutually
understood. Ordinarily it makes little sense to focus on these widely shared facets of denotative meaning.
These facets of symbols are "common knowledge" and are already cataloged in dictionaries and
encyclopedias. A symbolic analysis usually focuses on noteworthy meanings or nuances of a symbol for a
person (unique connotative meanings), or a group or subculture (restricted denotative meanings). A
symbolic analysis also focuses on those symbolic elements that help the researcher gain insight into the
world view of a person or group. A symbolic analysis distills the meanings of these elements. Like all
research, the interpretive researcher is selective in what he attends to and analyzes. Before the meaning of
symbolic elements can be developed, one must first identify those symbolic elements.

Locating Symbolic Elements

A symbolic analysis usually begins with the recognition of "difference" in symbolic or cultural elements on
the part of the person doing the analysis. The analysis aims to make the structure and meaning of these
symbolic elements understandable to an intended audience. While a symbolic analysis can focus on many
different symbols or symbol systems, your analysis of your interviews will focus on noteworthy linguistic
symbols embedded in the interviewee's talk in your interview. You will identify and explicate the meaning
of some or all of the following symbolic elements in your interview: concepts, icons, themes, stories,
narratives, myths and accounts.

A concept is the most basic symbolic element that you will analyze. Concepts are underlying ideas that
describe things and events that took place in the interviewee’s life experience. Concepts are noun phrases
(nouns and adjectives). Concepts are the underlying ideas of similarity and difference by which events,
people and things are classified. Linguistically, concepts occur as nouns or noun phrases (nouns and
adjectives). The trick in a symbolic analysis is not to identify concepts; talk is saturated with them, but to
identify concepts that provide important insight about the interviewee's perspective or the meaning of an
event or experience.

Most concepts are widely shared and so much a part of the shared denotative code that they do not merit
attention. However, there are also many instances, where a common concept acquires unique connotative or
specialized denotative meanings for a person or group. Even though the core meaning of the concept is
widely shared, it may also carry a unique meaning for the person or group. For instance, you may know a
person who sizes up people by their intelligence. For this person a "three digit IQ" describes an individual
who is an interesting conversational partner. In contrast, a "two digit IQ" describes a person with whom
you should only engage in small talk. Together these concepts illuminate how a person sizes up people and
interacts with them. Although the concepts have a denotative element that all of you understand, there are
specialized connotative meanings that are apparent in how the person uses the concepts in conversation that
go well beyond the shared denotative code.

When looking for a central or meaningful concept, look for a noun or noun phrase that occurs frequently in
the talk. A concept that receives verbal or nonverbal emphasis is also a promising candidate for analysis.
Also, look for concepts mentioned with their opposites (e.g., two digits vs. three digits IQ). In some cases,
the opposites will only be implied, but can be inferred with confidence because of the context of the
description (e.g., short in the NBA is anyone under six feet tall). In some cases, you may hear a core idea
described in several different ways, but not be given a single label by your interviewee. In this case, you
may need to infer the underlying concept. You may need to develop a label that seems to summarize the
underlying idea expressed by the interviewee. You should be careful in doing this because you want to
avoid “putting words in the interviewee’s mouth.” In your report, you should show that your inferred
concept accurately captures or summarizes an aspect of the interviewee's talk. The follow-up interview also
affords an opportunity to check whether your inferred concept also makes sense to the interviewee.

In some of your interviews, you will want to focus on specialized concepts shared by a group of people
such as members of an organization or occupational group. Every cohesive group of people develops its
own version of the common code. A group code makes for easier and more efficient communication
among group members. The group code also helps identify group insiders and outsiders. When you
identify an important concept for a group, you gain additional understanding of the shared group culture.

An icon is a symbolic element closely related to the concept. Icons are people, places, or things that
concretely represent an abstract concept or category. Icons often involve strong expressions of moral
approval or disapproval. When we speak of evil, we refer to icons like Hitler and Osama Bin Laden who
personify evil. In contrast, Mother Theresa serves as a personification of good. When a person talks about
heroes and villains, he is identifying his icons. We are immersed in icons. In fact, advertising is the
business of icon creation. Marketing and advertising attempt to create products, brand names and corporate
identities that resonate with targeted audiences (e.g., Like a good neighbor State Farm Insurance is there).

Most anything can serve as an icon. Couples often make icons out of places, events or songs that typify
some important turning point in their relationship. Even time-periods develop into iconic representations.
Depending upon your political point of view, the 1950s represent an idyllic vision of a lost innocence in our
society, or a period of sexual and intellectual repression. Likewise, the 1960s represent freedom and
spontaneity for some, or the decline of American culture for others. As you conduct your analysis, you not
only want to identify the icons that the person uses, but you want to explicate the meaning of the icons and
relate them to other symbolic elements. Icons serve to make a person's concepts tangible and meaningful.
As such, you will want to show how the icon represents or links up to a concept. In your analysis, you
should articulate the moral judgments that the person makes when she uses the icon (i.e., How does using
the icon apportion praise and blame?).

A theme is the third symbolic element that you will want to give attention to in your symbolic analysis.
Indeed, the analysis of themes should be the centerpiece of your symbolic analysis of the interview. A
theme is a person's assertion or comment about some aspect of reality. A theme may explain why
something happened, express an immutable fact of life, assert how people should behave, or state a
principle that the person lives by. A theme directly expresses a part of an interviewee's world view (e.g., It
doesn't pay to fool around with Mother Nature). In some cases, a theme is stated in different ways. The
analyst must propose a statement that captures the thematic threads of the discourse. Of course, you must
exercise caution in articulating a theme that is not explicitly stated in the interviewee's talk. In your
analysis, you would need to show that your proposed theme accurately captures or summarizes important
aspects of the interviewee's talk and view of the world.

You should listen and look for some of the following elements as you try to identify important themes. A
theme often occurs in the opening section of a segment of talk. The person often expresses a theme and
then works out the implications of that theme in the talk that follows. Proverbs, aphorisms and clichés are
common examples of themes. Clichés are used so often, not because they are trite, but because the people
believe clichés express important truths. You can often use the interviewee's nonverbal behaviors or tone
of voice to identify a theme that is important. The interviewee will often draw your attention to a major
point by giving it special emphasis. You can often infer a theme from an illustration or a story (e.g., the
stated or implied moral of the story). You can also infer that a theme is an important one if it is mentioned
several times, or it occurs relatively early in the interview or a segment of talk.

You have already probably figured out that concepts are important elements of the theme. In the theme

51
"You just can't fight city hall." the concept of city hall stands for a monolithic power structure that will
grind you down if you oppose it. Therefore, your analysis of themes should also include an analysis of
the noteworthy concepts and/or icons embedded within them. Your analysis should flesh out just what
the person seems to have in mind when she invokes the concept of "city hall" in the context of this theme.
City hall may represent power, but what kind of power does it seem to represent for the interviewee? Does
she imply shady deals made in smoke-filled rooms or something else? These are the kinds of details and
implied features that you should flesh out in your analysis.

Stories are a fourth important type of symbolic element. A story contrasts with a narrative. A narrative
directly answers the question of "What happened?” A story on the other hand is an account of events that is
refined to assert a moral or to make a point. The line between a narrative and a story can be fuzzy, a story is
ordinarily told with more energy and emotion. Stories are often told as adventures and they are told with
relative ease. Sometimes a story is told to communicate a moral or lesson on a difficult question. One point
of your analysis is to figure out the point or "meaning" of the story. What question or issue does the story
address? What moral (i.e., theme) does the story assert or illustrate?

A story is related to the theme in much the same way that an icon is related to a concept. A story can
provide the concrete particulars to represent and vivify an abstract theme. Stories are symbol rich and
provide good material to develop a symbolic analysis. The energy and emotion embedded in stories help
one discern which themes are primary and which ones are secondary. I once worked at a University in
which the system of rules and procedures was often not followed. At the time, I repeatedly told a story
about how a mother intervened on behalf of her 31 year-old son in a dispute with the university's academic
vice-president. The mother requested and received the student’s file and took it across campus to meet with
the communication department chair. This contradicted stated procedures because the file held a lot of
confidential information. I told the story to illustrate the point that “if you yell loud enough there really
aren’t any rules around here." When I told the story, I dramatized my frustration with this system.

Myths and accounts are additional symbolic elements that can be analyzed. A myth is a specialized story
told and retold by a group of people to explain or portray some important fact about the origins of a group.
Several years back, I participated in a study of a local manufacturing company. Workers were fond of
telling a story about how one of the company founders went into an important meeting with bankers on
roller skates. She used roller skates to get around the plant efficiently. Workers thought that it was
important that she didn't change her behavior toward people based on their social status. They asserted that
she related to everyone in the same personal and direct way. For workers this myth symbolized the freedom
and spontaneity that they cherished about their environment.

An account offers a fertile opportunity for analysis. An account is the explanation that a person gives to
explain or justify her actions. In an account, a person gives reasons for her behavior that she thinks will
make her action understandable and acceptable to other people. The account answers the question of "Why
did you do that?" An account explains the reasoning or rationale behind the person's actions in terms of
reasons or values that are socially acceptable or warrantable. The researcher need not take the account at
face value as the real reason that the person did something. Instead, the researcher may analyze the account
for what it discloses about the values that the person professes to hold and thinks that the listener also
shares. The researcher does not have to believe the excuses, justifications, and denials in accounts to make
good use of them in her symbolic analysis. Even lies and rationalizations provide rich compost for
symbolic analysis.

Coding Procedures

In the ideal case, you would transcribe your interview. However, it takes from five to six hours to
transcribe one hour of tape (and one needs a transcription machine on top of that). The next best option is to
listen to your tape carefully at least three times. As you listen to the tape the first time, use the pause button
frequently as you identify important concepts and themes in the interview. You are going to attempt to
distill the important concepts and themes in your interview.
Coding is the process of grouping interviewee’s responses into categories that bring together and/or
demonstrate similar concepts and themes. When you code, assuming you have a transcript, you should
mark the beginning and end of the coding unit and mark it for the coding category into which you are
putting the material. After you have marked all of the material in the interview, then you put all material
with the same codes together. There are computer programs that help with this process. In reality, you can
code you interview data in a variety of ways. For instance, in addition to coding for concepts and themes,
you can also code for whether the person is speaking of first-hand experiences or hearsay.

Once you have identified concepts and themes, identify quotes, examples and stories that exemplify each
theme. In a transcript, you would code your transcripts-perhaps simply by marking each different theme or
concept with a different color marker. As you go along, you may subdivide a concept or theme as you
begin to notice differences within the category. Alternatively, you might decide to combine two or more
similar concepts when you compare across categories. In the process of coding, it helps to start by coding
the clearest examples of talk that illustrate a theme. Once you have identified clear prototypical examples,
you can begin to deal with the stretches of talk that are not so clear-cut. As the analysis proceeds, you may
also eliminate themes that seem or examples that are unique and don’t have clear connections to other
themes. You can also refine your concepts and themes until you have an exhaustive set of categories. This
means that you have a place to put all of the material. You are also satisfied that the material that you put in
each category is homogeneous (i.e., represents the same concept, theme). This process is not a linear one in
that you can analyze within your categories looking for similarities and dissimilarities. If you find
important differences in the material coded within a category, you may develop or split a new category
from the examples in a single category. You can also look for similarities across categories. You may
decide to combine categories that you had previously coded in different places.

You should also identify symbolic elements that do not conform with your analysis. The exercise of
looking for exceptions or contradictions requires you to be “objective” in the sense that you examine
all of the data. This means that you do not merely select material that agrees with your emerging
hypothesis about the symbolic structure in your data. You should look for points where the symbolic
elements seem to contradict each other. Does the interviewee perceive that a contradiction exists? Is there
some way to resolve the apparent contradiction within the context of the interview?

In the final stage of your analysis, you should look for overarching themes that tie together lower order
themes. You can look for a principle that organizes the whole and gives a rich understanding of the
perspective of the interviewee in ways that capture the both the internal coherencies and contradiction of
this perspective. In the final stages of analysis, you should look for relationships between themes. Which
themes tend to occur together? Which themes appear to be subordinate and which themes appear to be
dominant? Look for a principle that organizes the whole and gives a rich understanding or perspective of
the interviewee. In this stage of analysis, you may need to try to find resolutions for some seeming
contradictions in what the person says. How does the person make sense of the contradictions (what you
see as symbolic elements)?

Presenting the Results

There are several organizational schemes for presenting a symbolic analysis. You can write a
chronological report organized by time-periods. You can also start with conclusions and present a
systematic analysis of how you arrived at the conclusion. Following the second organizational model you
might present an important theme that you identified in the interview and present a depth analysis of the
concepts, stories and other symbolic elements that are connected with the theme. This approach works quite
well for writing up a depth interview. The researcher tries to convince the reader that she is representing
the interviewee’s perspective faithfully and accurately. Using the interviewee’s own words provides detail
and realism. A research report is rich when it presents not only the main theme, but also notes variations
and refinements on that theme. The rich report also provides enough of the context to make the themes
understandable.
Unit 6: Objectivist Paradigm-Research Questions and

53
Hypotheses
This unit explores the roles that research questions and hypotheses play in research conducted under the
objectivist paradigm. The objectivist paradigm is particularly focused on answering questions or testing
hypotheses related to cause and effect. Research in each paradigm begins with a research question, but the
objectivist paradigm formulates research questions and hypotheses in a unique way. A sound objectivist
research study begins with a clear statement of the research problem: it has at least one clearly stated
research question or research hypothesis. The research question or hypothesis identifies what you want to
accomplish in the study. Only when you have a clear statement of the research problem, can you design a
study to obtain the needed information. The statement of the research problem drives decision-making and
evaluation throughout the various stages of objectivist paradigm research (i.e., planning, implementation,
data-analysis and interpretation).

This unit also considers the research cycle of a single study and the research cycle of a series of studies in
the objectivist paradigm. Replication is an important feature of objectivist research, because it enables
researchers to compensate for the inevitable flaws and limitations of a single study. Replicated results
increase our confidence that our theoretical understanding is sound (i.e., better than alternative
explanations).

Unit Objectives
Upon completing this unit, you should be able:

6-1: To explain the function of research questions and hypotheses within objectivist research.
6-2: To differentiate descriptive research questions, variable-analytic relationship questions and variable
analytic research hypotheses.
6-3: To critique appropriateness of a sample research question or hypothesis against the standards of
internal consistency, parsimony, currency, and measurement feasibility.
6-4: To determine whether a research question or research hypothesis is most appropriate in a study given
the state of existing knowledge.
6-5: To formulate well-formed hypotheses and research questions.
6-6: To explain the role that the null hypothesis plays in evaluating research results.
6-7: To explain why testing a research hypothesis gives us more information than testing a research
question.
6-8: To identify the null hypothesis associated with a research question or a research hypothesis.
6-9: To identify the variables and relational terms in a research question or a hypothesis.
6-10: To identify the different types of variables present in a research hypothesis.
6-11: To explain the role that replication plays within the objectivist research tradition.
6:12: To describe the stages of a single research study.
6-13: To explain why a good study often raises more questions than it answers.
6-14: To compare and contrast the goals and typical methods of exploratory research, confirmatory
research and generalizing research.
6-15: To explain how the research stage affects the appropriateness of particular research methods.
Research Questions and Hypotheses
A research study begins and ends with a statement of the research problem. A research problem is stated in
the form of a research question or research hypothesis. A single study may have a single research question
or hypothesis, multiple research questions, multiple hypotheses or a combination of research questions and
hypotheses.

There are several standards that well formed research questions and hypotheses should meet under the
objectivist paradigm. A research question or hypothesis should be compatible with the existing
knowledge. Researchers do not start from scratch, but build on what other scholars have already learned.
There is no need to investigate questions or hypotheses that have already been answered. Research makes
the best contribution when it is on the cutting edge: when it replicates previous research and goes beyond
previous research problems in innovative ways. Good research builds on the old findings (i.e., replicates)
and explores new avenues. If a research question or hypothesis is not innovative, study results will be
trivial or unimportant. The study will have as much appeal as yesterday’s news.

Research questions or hypotheses must also be stated clearly and consistently. A research question or
hypothesis is internally consistent if the concepts that make it up are logically compatible with one another.
Research questions and hypotheses should consist of testable constructs. This means that it should be
feasible to objectively measure all of the constructs that are embedded in the research question or
hypothesis. This means that each construct should be capable of being measured by currently available
measurement tools. If my research question concerns whether intelligent life exists elsewhere in our
galaxy, there is little hope of success given our current ignorance in measuring across vast regions of space.
The research statement should also have constructs that can be observed, experimentally manipulated, or
measured. Medieval theologians debated how many angels could stand on the head of a pin. However,
since angels cannot be observed (or at least observed reliability by independent observers) the question is
only a philosophical curiosity. One of the marks of the progress in objectivist research is that a research
question that is incapable of being researched at one time often becomes researchable at a later time as new
methods of measurement are developed. For example, the invention of the microscope made it possible to
investigate hypotheses derived from the germ theory of disease (i.e. little bacteria make you sick.).

Research questions and hypotheses can also be judged against the standard of parsimony. A parsimonious
explanation or hypothesis is relatively simple. A parsimonious hypothesis does not have more concepts
than current theory suggests are necessary. In contrast to the other two research paradigms, objectivist
research places a high value on parsimony: If two theories explain a phenomenon equally well, objectivist
researchers will prefer the theory that is less complicated (i.e. has fewer variables and relationships between
variables). In contrast, interpretive and critical researchers usually do not use parsimony as a highly
desirable attribute. Objectivist researchers begin by formulating their research questions and hypotheses.
They define what they want to discover or what idea they want to test. A research question or hypothesis
defines an objective of the research study. An ill-conceived statement of the research problem leads to an
ill-conceived study.

The statement of the research problem (research question or hypothesis) guides a researcher's decisions and
evaluations throughout the research process. A research question or hypothesis generates the design
specifications for a research study. The researcher must design a study to get information that answers the
research question or tests the hypothesis. The researcher constructs and executes the research plan
accordingly. A study ends with an evaluation of how well the research study answered the research
question or tested a hypothesis. In reporting and assessing the results of a study, a researcher should
honestly discuss the shortcomings or limitations of the study. In reality, every study has design limitations.
The researcher has to make tradeoffs between the research situation and the standards of good research
design: between what is ideal and what is feasible in the given situation.

There are at least two kinds of research questions in objectivist oriented research. A descriptive research

55
question asks how a communication variable is distributed in a given population (e.g. How many Fortune
500 companies have corporate web sites?). A variable analytic research question asks how variables relate
to each other (e.g. How does corporation size affect whether corporate social responsibility messages
appear on a corporate web site?).

A variable analytic research question asks whether a detectable relationship exists between variables. Some
textbooks distinguish between an open-ended question (e.g. Is the amount of self-disclosure to a person
correlated with how much one likes that person?) and a specific relationship question, which asks whether
a specific type of relationship exists between two variables (e.g. Is self-disclosure positively correlated with
how one likes a person?). In the instructor’s opinion, a specific relationship question is really a disguised
research hypothesis. For the remainder of the course, when we discuss variable-analytic research
questions, you can assume that we mean open-ended research questions. Variable analytic research
questions are appropriate in exploratory research when there is little knowledge or existing theory for
making predictions on how the variables are likely to relate to each other.

A research hypothesis makes a prediction about how two variables are related. Some textbooks distinguish
between open-ended research hypotheses (i.e., two-tailed statistical tests) and prediction specific research
hypotheses (i.e., one-tailed statistical tests). An open-ended research hypothesis predicts that some
unspecified relationship exists between two variables (e.g., self-disclosure is related to interpersonal
liking). Notice that this hypothesis says that the two variables are related in some way, but does not specify
any particular relationship. In contrast, a specific research hypothesis predicts that a particular relationship
will exist between variables (e.g., Self-disclosure to a target person and interpersonal liking are positively
associated with each other.).

In the instructor’s opinion, an open-ended or two-tailed research hypothesis is really a research question
disguised as a hypothesis. A research hypothesis should be used when there is enough theory to
specifically predict how two or more variables relate to each other. For the remainder of the course, when
we speak about a research hypothesis, you can assume that we are talking about a prediction-specific
research hypothesis.

Like a “well-stated” research question, a “good” research hypothesis should be clearly stated and have
constructs that can be measured. A good research hypothesis should also be compatible with the latest
theory or knowledge in a field. The literature review should be up to date and that the hypothesis should
build upon existing knowledge. If a research hypothesis contains constructs that can't be measured, the
hypothesis is untestable. Freud’s theories about the id, ego and superego were seldom tested, because
researchers had difficulty operationalizing these constructs.

Variable-analytic research questions and hypotheses in the objectivist paradigm differ in an important way
from research questions or hypotheses in the interpretive and critical paradigms. The difference is that
objectivist research pairs a null hypothesis with each research question or hypothesis. A null
hypothesis covers the grounds that are not covered by the research question or hypothesis. If a research
question asks whether a relationship exists between two variables, the null hypothesis states that no
relationship exists between the two variables. If a research hypothesis states that a particular relationship
exists between two variables, the null hypothesis states that the predicted relationship does not exist.

Let's say a researcher is interested in whether there is a relationship between a person's self-esteem and the
level of apprehension that a person experiences when giving a speech. If it is an exploratory research
investigation, the researcher should use a research question: "Is there a relationship between a person's level
of self-esteem and the apprehension that a person experiences when giving a speech?" The null hypothesis
for this research question would be: "There is not a relationship between a person's level of self-esteem and
the amount of apprehension that a person experiences when giving a speech." If the researcher had a
theory or evidence that the two variables are negatively related, he would use the following research
hypothesis: "There is a negative relationship between a person's level of self-esteem and the amount of
apprehension that the person experiences when giving a speech." The null hypothesis that is paired with
this research hypothesis would read: "There is not a negative relationship between a person's level of self-
esteem and the amount of apprehension that a person experiences when giving a speech."

The null hypothesis plays an important role in objectivist research. In the objectivist paradigm,
researchers do not prove their ideas true so much as show that the null hypothesis is probably not true. In
other words, a researcher demonstrates that the null hypothesis is probably not true under a set of
circumstances and he concludes that a particular relationship probably exists. Objectivist researchers are
aware that people tend to have a profound "confirmation bias". Unfortunately, people tend to look for
evidence that is consistent with their preexisting ideas. In other words, they fail to notice or look for data
that may be inconsistent with their hypothesis. Using a null hypothesis primes a researcher to look at all of
the data that are relevant to a research question or hypothesis. The objectivist researcher begins with the
presumption that the null hypothesis is true (i.e. the null hypothesis is regarded as true until it is shown to
be improbable by empirical results). He collects data and uses statistical procedures to test the likelihood
that the null hypothesis is true. When he has good evidence that the null hypothesis is likely not true, he
rejects the null hypothesis and concludes that a given relationship, often causal, exists between two
variables. Unit 9 covers the procedures and underlying logic of statistical hypothesis testing in more detail.
For now, it is important to recognize that the use of a null hypothesis helps eliminate some of the
confirmation bias that researchers sometimes fall into when they analyze cause and effect relationships.

A researcher can be criticized for using a research question when he should have used research hypotheses
and vice versa. A research hypothesis adds more to our state of knowledge than a research question.
A research question is like a fishing expedition that casts a wide net. If one finds a correlation between
variables, this correlation needs to be replicated and explained in future research. Usually some kind of
causal analysis is called for. A researcher risks very little with a research question. However, the data
obtained are also less informative. In contrast, by stating a research hypothesis, a researcher runs the risk
of being proven wrong. However, the payoff is greater if the hypothesis survives the test. A research
hypothesis reduces our uncertainty more decisively than a research question does.

A researcher should use a research hypothesis whenever he has sufficient knowledge to make a prediction.
A researcher, who uses a research question when a research hypothesis is called for, is likely to fall prey to
the post-hoc fallacy (i.e., find a statistically significant relationship and develop a rationale for the finding
even though this rationale was not apparent before the relationship was discovered). However, there is also
a danger with using a research hypothesis prematurely. Researchers may conclude that no relationship
exists between variables, because the predicted relationship did not materialize. However, this may miss
another type of relationship that exists between the variables (e.g., a curvilinear relationship rather than a
linear relationship). This can lead to missed opportunities and “false knowledge,”-claiming that we know
more by negative research results than we actually do. In the end, the decision to use a research question or
a hypothesis depends on using the insight of the researcher and the relevant research community.
Table: Examples of Research Questions and Hypotheses
Type of Research Example Associated Null
Question or Hypothesis Hypothesis
Interpretive Research How do young children None
Question interpret "violent acts" that
appear in children's
cartoons?
Critical Research Question How do depictions of None
cartoon violence reinforce
gender inequality?
Objectivist Descriptive How many "violent acts" None
Question appear on average in a half-
hour cartoon shows most
popular with children

57
between the ages of 4 and
9?
Objectivist Variable- What is the relationship between There is not a relationship
Analytic Research Question the amount of cartoon violence between the amount of
that a child watches and the
child's typical aggression level?
cartoon violence that a
child watches and the
child's typical aggression
level.
Objectivist Variable There is a positive There is not a positive
Analytic Research relationship between the relationship between the
Hypothesis amount of cartoon violence amount of cartoon violence
that a child watches and the that a child watches and the
child's typical aggression child's typical aggression
level. level.
Exercise: Identifying Variables in Research Statements
Variable analytic research questions and hypotheses are composed of at least two elements: constructs and
relational terms. Constructs are the names of variables. Relational terms include phrases like unrelated,
positively related, negatively related, or a curvilinear relationship. Relational terms specify the type of
variable relationship that the researcher is inquiring about. Following are some research questions and
hypotheses taken from a recent research article on the topic of self-disclosure. Identify the constructs
embedded in each example. Be careful not to confuse categories of a variable with the variable itself (e.g.,
Sex is a variable, male and female are categories within the variable of sex).

RH1: Women self-disclose more than men.

RH2: Individuals will disclose more evaluative information (feelings, thoughts and experiences) to their
spouses than to strangers, and more descriptive information (facts) to strangers than spouses.

RH3: As liking for a person increases, self-disclosure to that person will also increase.

RH4: Self-disclosure reciprocity will be greater among casual acquaintances than among long-term friends.

59
Types of Variables in Research Design
For purposes of research design and data analysis, researchers often categorize the variables according to
their function. A research hypothesis typically has at least one independent (causal) variable and one
dependent (effect) variable. The independent variable is the one that the researcher thinks of as the causal
variable. Many studies explore how the independent variable affects the dependent variable. In
experimental research, the researcher manipulates the independent variable and then observes how the
groups compare on the dependent variable.

The independent variable is sometimes called the antecedent variable (i.e., the one that comes before). This
means that the independent variable is logically before or occurs before the dependent variable. If smoking
is to be considered a “cause” of lung cancer, then you will have to show that at least some people smoked
before they got lung cancer. However, in many studies we measure our independent and dependent
variables at the same time. There are some attributes of people and events that we consider to be fixed (sex,
age, and amount of time spent working for a firm). If I discover a positive correlation between age and
emotional maturity, I am likely to consider age to be the independent variable and emotional maturity to be
the dependent variable. Ordinarily, we do not think of a person’s emotional maturity as a “cause” of a
person’s age. To make sense of a research study, you need to be able to identify the independent and
dependent variables

In some cases, the researcher simply designates one of the variables as an independent variable and the
other as a dependent variable. In a subsequent study, the dependent variable may become the independent
variable. In studies of self-disclosure and interpersonal attraction (liking), self-disclosure is sometimes
regarded as a cause of interpersonal attraction or liking (e.g., the more we self-disclose to a person, the
more we will come to like him or her). In other studies, self-disclosure is regarded as a consequence of
interpersonal attraction or liking (e.g., the more one likes a person the more one will disclose to him or her).
Some variables are reciprocally related to each other. In the real world, some variables are hooked together
in a positive or negative feedback loops.

Many studies have several independent variables and several dependent variables. Some studies examine
the effect of one independent variable on several dependent variables (e.g., How does degree of self-
disclosure affect interpersonal trust, liking, and willingness to interact with person again in the future?).
Some studies examine how several independent variables affect one dependent variable (e.g., How does a
person’s level of extroversion and degree of self-disclosure affect how much one likes another person?).
Some research studies have multiple independent variables and multiple dependent variables (e.g., How do
a person’s level of extroversion and self-disclosure affect how much we like and trust another person?).

One reason for incorporating several independent variables in a study is that variables sometimes interact
with each other in unique ways in how they affect the dependent variable. A moderator variable is a type
of independent variable that interacts with or qualifies the effect of an independent variable on a dependent
variable. The simultaneous effects of the two variables are significantly different than the effects of each of
the two variables considered alone. A moderator variable may not directly affect the dependent variable,
but it may influence how the other independent variable affects the dependent variable. A moderator
variable is often are often added to a study to determine the environmental conditions or boundary
conditions under which the independent variable affects a dependent variable (e.g., Self-disclosure
increases interpersonal liking among both men and women?). In the example, sex is included as a
moderator variable.

Researchers often investigate complex processes. This often requires exploring the role of at least one
intervening variable. To use a non-communication example, research has discovered that people who take
Vitamin E have fewer heart attacks. However, we do not yet understand how Vitamin E works. We have
only measured the beginning (consuming Vitamin E) and the end (occurrence of heart attacks) of a
complex process. We have yet to understand how Vitamin E intake affects intervening variables that
Chain Reaction Cause-Effect

Independen
ultimately trigger a heart attack. Intervening Depende
t Variable Variable nt
Variable
Many communication research questions involve intervening variables. There has been much research on
how persuasive strategies affect people’s behavior. However, we have discovered several cognitive and
affective variables that lie between exposure to a message and a person’s actual behavior (e.g., awareness,
knowledge, attitude, and behavioral intentions). This is one reason why the magic bullet or hypodermic
needle model of communication Cause-Effect viapan
effects did not Prism
out. The communication process is rather complex and
involves numerous moderator and intervening variables. Today researchers can employ a number of rather
sophisticated research/statistical designs such as structural equation modeling to chart the flow of influence
of Independen
independent variables through moderator and intervening variables. The figure below illustrates the
Moderator Dependent
relationships between the variable types.
t Variable Variable
Variable

61
Exercise: Research Questions and Null Hypotheses
State the null hypothesis that goes with each research question below.

1) Do males and females differ in how much product information they recall after seeing a humorous
television advertisement?

2) Is there a relationship between the age of a child and the degree to which the child believes advertising
claims?

3) Is there a relationship between the effectiveness of a deception effort and the severity of the
consequences for getting caught in a deception?

4) Is there a relationship between a person's need for affiliation and the person's willingness to accept
criticism?

5) Does a person’s level of self-disclosure affect his/her level of liking for the other person in a developing
acquaintance?

6) Does the use of photographs in an article affect how well readers comprehend magazine article content?

7) Do heavy viewers of television soap operas and light viewers of television soap operas differ in their
levels of satisfaction in their personal relationships?

8) Is there a relationship between a person's physical attractiveness and a person's salary level?
Exercise: Identifying Variable Types
Identify the independent variables and dependent variables in the following hypotheses. Where appropriate
also identify moderator and intervening variables. Each example may not include every variable type. At a
minimum, each example will have an independent variable and a dependent variable.

1) Leadership style will affect the level of trust in the group, which in turn will affect the level of teamwork
on group problem solving tasks.

2) Phonics reading instruction will be superior to Whole Language reading instruction in terms of how well
students can read. However, the differences in reading achievement between the experimental groups will
be much greater for boys than for girls.

3) Persons with insecure attachment will have lower quality communication with their marital partners than
persons with secure attachment. This lower quality of communication for people with insecure
attachment will in turn be associated with lower marital satisfaction for the person with the insecure
attachment style.

4) The larger the corporation, the more likely the corporation will be to include messages about corporate
social responsibility on its web page. However, we expect that this tendency should be more pronounced
in non-manufacturing companies (e.g., retail, wholesale and services) than in manufacturing-based
companies (e.g., chemical, petroleum refining, and automobile manufacturers).

63
5) The greater the amount of geographic mobility within a community, the higher the crime rate will be.
This trend will be stronger for cities over 500,000 in population than for cities of less than 500,000 in
population.

6) The higher the rate of unemployment in a community, the higher the crime rate will be. However, the
strength of this relationship will be stronger in the South and the West than in the Midwest and the
Northeast.

7) The more interested a person is in politics, the more attention she will pay to campaign issues in the
media. Media exposure to campaign issues will in turn be positively correlated with an intention to vote
in the upcoming election.

8) The higher a person’s education level, the greater the likelihood that he will vote in the upcoming
election.
Truth, Proof and the Logic of Replication
You sometimes hear people say: “It's just a theory.” By this, the speaker usually means that it is only idle
speculation: something for which there is little proof. The statement seriously misrepresents what a theory
is, but it does point to one facet of theory development, theories develop and change from one time period
to another. We can take it for granted that In the future, new concepts and theories will supplant some of
the things that we think we know today. "Science” is not yet an accumulation of unchangeable laws or
certainties. Even after much research, what do we "really" know? Is scientific change progress or is it
merely change? Are we getting any closer to the truth? These are tough questions.

The scientific method never proves a proposition with final certainty. The concepts and theories that we use
to describe and predict events in our world are tools not truths. These tools help us cope with the world
around us. Our current theories are the theories that have not yet been disproved. Our proofs are always
contingent and provisional: phrased in probabilities rather than certainties.
This is evident in the way we set up the research process. When a researcher sets out to test a hypothesis,
she works indirectly. The experimenter does an experiment in which she manipulates an independent
variable and sees how this affects the dependent variable. However, the researcher begins with the premise
that an alternate hypothesis to the research hypothesis is true. This is the null hypothesis. If you believe
that humorous commercials are more effective than serious commercials in promoting product recall, your
null hypothesis would be, “Humorous commercials are not more effective in stimulating product recall than
serious commercials.” Your research procedure sets out to show that the results of the experiment would be
very unlikely if the null hypothesis is in fact true. If it appears that the results probably would not occur if
the null hypothesis is true, we reject the null hypothesis and accept the research hypothesis as a better
explanation of the results.

What does the researcher use as a standard of proof for rejecting the null hypothesis? Usually the researcher
uses a statistical test to determine whether the null hypothesis should be retained or rejected. The researcher
says, “If I do my experiment, collect the data and analyze the results, I will only reject the null hypothesis if
my experimental results would be very unlikely if the null hypothesis is true.” The researcher clearly
defines what she means by very unlikely. Usually she says something like, "I will only reject the null
hypothesis, if I am at least 95% confident that the null hypothesis does not account for the study results."
When the researcher rejects the null hypothesis, she does so with the knowledge that she may be wrong;
she is aware of the specific risk she is taking in making the decision (i.e., Type I error).

Why do we use such an indirect procedure? The answer is that empirical research gives only two
answers, probably not and maybe. To definitively prove something, we would need to run an infinite
number of tests. Since we are finite beings, this is impossible. I can have great confidence that the sun will
rise tomorrow, but I cannot prove it because it has not yet happened. If we ever arrive at the ground zero of
truth, we will never be certain that we have arrived. No one can tell whether tomorrow may bring an
improved explanation of the phenomena we seek to understand. The best we can do is to test our ideas in
terms of probabilities. Tomorrow we may develop better concepts and theories that move us beyond what
we know today. This recognition is profoundly unsettling, because we are sentimentally attached to the idea
of conclusive proof or certainty. However, certainty is not something that you can attain in this lifetime.
But don’t despair. The research process can give us confidence about our findings. We may not be able to
achieve certainty, but we do attain a level of confidence that we need to cope with the world.

The standards of scientific research require that independent researchers replicate the results of a single
study. If the results of a single study cannot be repeated, it will be assumed that the results of the original
study were due to factors such as measurement error, statistical error, or faulty research design. If
independent researchers cannot replicate the results of a study, then the research community will come to
believe that faulty research design, fraud, or mere chance accounts for the initial research finding. The
results of the study in question will be discredited and ignored. Replication serves as an error detection
mechanism. The elements that may lead to a wrong conclusion in a single study such as fraud, statistical
error, faulty research design, and measurement error will almost surely be detected in follow-up replication
studies. The standard of replicating results gives researchers an incentive to report their research findings
honestly and accurately.

In 1989, two researchers at the University of Utah shocked the world when they reported that they had
performed cold fusion at room temperature. Fusing hydrogen atoms into helium atoms provides the sun’s
energy source. The report was astounding because of the possibilities that it had for changing world
commerce. Imagine a fuel source of unlimited supply, incredible power and virtually no pollution. Not
surprisingly, researchers in the basic sciences from around the world rushed to try to replicate the results.
Unfortunately, the results could not be replicated. Within six months or so, scientific researchers concluded
that cold fusion was a measurement artifact and a dead end.

When research results are replicated, it dramatically increases our confidence that we are on the right track.
For instance, if research decision procedures have a 5% probability of leaving us with the wrong decision
in a single study, that probability drops to less than one quarter of 1 percent with just one replication of the

65
study findings (i.e., .05 X.05=. 0025 or .25%). With two replications, the probability of a wrong result in all
three investigations falls to less than one chance in a thousand. With a few replications, we can move
from somewhat mild confidence to bold confidence in the soundness of our conclusions.

There is a serious qualification to this rosy picture. In truth, replication sometimes gives contradictory
results. For instance, some studies support the research hypothesis, and others are inconclusive. In many
cases, these contradictory results occur because of some limitation or defect in the research such as faulty
research design, measurement error or inadequate sample size. These are instances where the contradictions
are only apparent contradictions. There are other instances, however, where the contradictory results are
real and not mere artifacts of the study’s design. In these cases, researchers must work through specifying
the boundary conditions for a correlation or a causal relationship. Boundary conditions specify the domain
within which the findings apply. For instance, in the field of interpersonal communication uncertainty-
reduction theory said that communication in interpersonal relationships is ordinarily designed to reduce our
uncertainty about the other person, especially for first acquaintances or developing acquaintances. Later
theoretical developments explored the circumstances under which the predictions of uncertainty-reduction
theory applied in more developed types of relationships. Sometimes research results only have local
application. Researchers sometimes must work hard to discover what the boundary conditions are for a
given set of findings.

If the experts don't agree, what are we supposed to believe? This is a frequently asked question. We can
take comfort in the fact that researchers agree on many things. Moreover, just because researchers disagree
on an issue, does not mean that they will always disagree. In many instances, the disagreements are only a
part of the process by which research issues are resolved. Predicting the future is one area in which
scientific disagreement is sharp. Meteorologists are accurate in three to five day forecasts. However, 60-
day forecasts create more problems. Even a very small error in one's original calculations can cause very
dramatic errors on distant future projections. You can imagine the difficulties that researchers have in
predicting how much the global climate will warm up from greenhouse gases or whether there is enough
money to keep social security solvent in the year 2050. When it comes to predicting the long-term future,
we have educated guesses, but not certainties. Forecasting future trends is as much art as science.
The Spiral of a Research Study
This reading describes the typical stages of a research project that is done in the objectivist tradition. It
describes what researchers typically do at each research stage. It also describes how the respective research
stages relate to each other.

A research project in the objectivist tradition typically starts with generating the research problem. This
involves identifying at least one research question or hypothesis. Research questions and hypotheses are
usually derived from the relevant research literature on a subject or from the specific goals of an applied
research project. Generating the research problem often involves pulling some questions or hypotheses
from previous research and adding new twists so that the research question or hypothesis being investigated
is cutting edge in some respect.

After a research question or hypothesis is formulated the research project enters the research design phase.
In this stage the researcher develops conceptual definitions and operational definitions for the study's
constructs. The researcher also plans out how the desired information will be collected and analyzed.
When it comes to developing conceptual and operational definitions, researchers usually borrow from
previous research. Borrowing from previous work is efficient. Researchers also desire to use accurate
measurements and to replicate the work of previous researchers. Developing new measurements or
manipulations of study constructs is a risky. If you have a poor measurement scheme, you end up with
useless information. Many studies fail because of poor measurement of a core construct. Researchers
review the literature for measures that have performed successfully in previous research. A researcher may
improve on a scheme, but it seldom makes sense to start from scratch. Researchers want to compare their
results with previous work in the field. Replication of previous research results is difficult when one
develops a new measurement scheme.

The research implementation phase of a study is when the data is collected. The researcher carries out the
procedures specified in the study design. This is the phase that people ordinarily think of when they think
of “doing research.” It involves actually carrying out an experiment, conducting a survey, or doing a
content analysis. In practice, this usually involves more than simply following a research plan. It can
require a lot of adaptation and flexibility, as one has to adapt one’s research to real world contingencies. A
researcher may encounter new research opportunities not anticipated in the research plan. In other cases,
some aspect of the research plan is premature or too ambitious. In other cases, one cannot get access to the
information one would like. One semester I was on a thesis committee in which a student proposed an
ambitious study to determine how older people in our community perceive and react to patronizing speech.
I advised the student that she probably would have trouble getting all the community people that she
needed to participate in her study. I suggested that she drop one of her research hypotheses if she expected
to complete the project in a timely fashion. You often need to be flexible and adopt an alternate plan if you
want to complete a project.

The fourth phase of a research study is the data analysis phase. During this period, the researcher
organizes, analyzes and makes sense of the collected data. In the case of quantitative research, this
includes organizing the data for input into the computer, entering the data, programming the computer for
the needed statistical analyses and interpreting the results. In the case of qualitative research, the
researchers examine their interviews or field notes for recurring themes and patterns.

The final stage of a single research study is to refine the conceptual structure or ideas based on the results
of the study. This stage also involves making recommendations for future research studies. The researcher
realistically assesses the limitations of the study as well as its contributions. The meaning of the results is
interpreted in the context of what was previously known about the topic (i.e., the research that was
surveyed in the literature review). Recommendations are made for refining the conceptual structure in
subsequent studies. In addition, new research questions or hypotheses are developed.

67
h
Questio Researc
Data ns h
Analysi Design
s

Data
Collectio
n A successful study often
raises more questions than it answers. If one finds that a correlation exists between two variables, it raises
the question of why the variables relate in the way they do. In other cases, we may know that a causal
connection exists between two variables, but know relatively little about the intervening process (e.g., We
may know that drinking red wine reduces the risk of having a heart attacks but still be quite unaware of
how wine consumption affects the human circulatory system). Even if one finds definitive evidence that
two variables are causally related, one still does not know in which real world contexts the relationship will
occur and in which ones it won't. We come full circle back to new research questions, but the research
cycle is one of function and not of results. If we have done our job well, the research questions or
hypotheses we end a study with reflect higher levels of knowledge and awareness than when we started.
Research across Time: Stages and Methods
One of the most important features of research in a given field is that it is systematic: it progresses and
matures. Research questions and research methods change as a field of study moves from exploratory
research, to confirmatory research, to generalizing research. Research in a new area usually begins when
someone notices an interesting pattern. Exploratory research is discovery oriented. The researcher
approaches the subject with a relatively open mind. Exploratory research primarily uses research questions
rather than hypotheses. The researcher’s research questions are predominantly descriptive. The researcher
does not want to miss significant patterns. The focus of the inquiry is to develop new avenues of research.
Qualitative research methods such as depth interviews, participant observation, and focus groups are quite
useful in helping one to identify questions that future research should address.

Research textbooks seldom acknowledge that qualitative research often precedes confirmation oriented
quantitative research. For instance, a researcher inquiring into the relation between job stress and burnout
in health care settings may want to do depth interviews with health care professionals in each context.
Stressful things in one context may not induce stress in other settings. Starting with depth interviews will
help ensure that the researcher asks about the right stressors in each context.

The second general phase of research is confirmatory research. It is also sometimes called the “context of
verification or proof,” meaning that hypotheses and theories are actually tested. At this stage, there is more
emphasis on utilizing rigorous methods and procedures. Quantitative measurement tends to be predominant
at this research stage. The experiment is the best research tool for testing cause/effect relationships. In an
experiment, measurement can be done under very precise conditions: the independent and dependent
variables are isolated and other variables are held constant or are accounted for. The experiment conditions
may be artificial or unlike the real world contexts to which one wants to generalize the results.

The generalizing stage of research addresses whether the results observed in one study can be generalized
to other situations, samples, time-periods, and research methods. This involves searching for the boundary
conditions of a set of phenomena. This research stage is more eclectic in terms of the research methods
employed. In some cases, qualitative research methods such as focus groups and depth interviews are used
to explore why certain variable relationships exist. On the other hand, researchers may use experiments to
replicate previous results and gradually expand the scope in which the given concepts and theories apply.
Field research is also likely to be done during this research phase. The figure below summarizes the
objectives of each research stage.

Exploratory Confirmatory Generalizing


Research Research Research

Identifying Test Causal Explore


Patterns Relations Generalizability
A Research Exemplar of How Research Evolves
A number of scholars in the 1960s and 1970s were interested in identifying factors that predicted whether
an interpersonal relationship would last. The earliest model directly applied economic theory to social
relationships. The idea was that people seek to maximize profits from their social relationships in the same

69
way that they seek to maximize profits in their other economic exchanges. The profit model of human
relationships posited that satisfaction and hence the durability of relationships was a function of rewards the
relationship provided minus the costs. Early studies showed that rewards and costs influenced satisfaction,
although rewards tended to be more important determinants of satisfaction than costs.

However, a number of theorists and researchers were dissatisfied with this simple model because it
overlooked the fact that people often continue in relationships that they consider unsatisfying. A number of
theorists noted that it is sometimes difficult for people to end relationships when there are few acceptable
alternatives to the given relationship or when parties have a great deal invested in a relationship. Caryl
Russbult incorporated these insights into a more comprehensive model of relationship duration. Professor
Russbult proposed that the primary determinant of relationship duration (stay/leave) decisions was
commitment. She also proposed a comprehensive model of behavioral commitment called the Investment
Model. Commitment was hypothesized to be a function of relationship satisfaction (rewards-costs) +
commitments - the availability of alternative relationships.

Early studies of commitment in the confirmatory research stage focused on commitment in dating
relationships. These studies showed that the model was an important advance over the satisfaction model
when it came to predicting relationship longevity. It was shown that each of the three major predictors of
commitment independently contributed to predicting a person's relationship commitment. Second, it was
shown that commitment was the most important intervening variable between the investment model
variables and the relationship's duration. In other words, the influence of relationship satisfaction,
relationship investments and alternative relationships on relationship continuation first influenced
commitment which in turn influenced whether the person stayed in or exited the relationship.

Over the next 20 years many additional studies were done seeking to refine the concepts and extend the
application of the investment model. A recent literature review identified more than 50 studies that have
tested the investment model.1 For instance, several studies have found that when people are highly
committed to a relationship, they tend to downplay or discount the quality of alternative relationships that
are available to them. In addition, as one would expect in the generalizing stage of research, the investment
model of commitment was applied to other phenomena. Early studies showed that the investment model
predicted commitment in a number of interpersonal contexts outside of dating relationships (i.e., marital
relationships, gay and lesbian relationships). In addition, the behavior model was applied to
noninterpersonal commitments that people have to their jobs, sports, schools and clubs. The model was
still relatively successful in predicting job commitment, but considerably less successful in predicting other
noninterpersonal commitments.

In summary, a relatively simple model of relationship stability was replaced by a more sophisticated model
that added new constructs (i.e., commitment, relationship alternatives and investments). The model was
tested in dating relationships and was then successfully extended to a broad array of interpersonal
relationships as well as commitment in the noninterpersonal domain of job tenure. However, the boundary
conditions for the theory do not seem extend as well to other domains (i.e., commitment to schools, clubs,
and sports). This example illustrates that research in a particular area tends to evolve both towards greater
precision (i.e., refining existing concepts and adding new ones), as well greater scope (applying the theory
to new phenomena).

1. Benjamin Le and Christopher Agnew (2003). Commitment and its theorized determinants: A meta-
analysis of the investment model. Personal Relationships, 10 (1), 37-57.
Exercise: Testing a Hypothesis
Let’s say that a friend of yours says that she can correctly identify whether she is drinking a Coke or a
Pepsi. This is not a claim about comparing two drinks, but a claim that when she tastes one sample she can
correctly identify which cola it is. Devise an experiment that will test your friend’s claim under acceptable
standards of proof. The research hypothesis for the experiment is that your friend can correctly identify the
type of drink she is drinking. The null hypothesis is that your friend cannot correctly identify the cola she is
drinking at better than chance expectation (i.e., your friend’s accuracy will be what one would expect to
occur by chance alone). Design your experiment so that you can reject the null hypothesis and be at least
95% confident that you have made the correct decision. Hint: If the person gives the correct answer to
which type of Cola is on a single trial, we are only 50% confident that he or she actually knows how
to distinguish the taste of the two Colas (i.e., there is a 50% chance that he or she could get the right
choice merely by guessing.

71
Unit 7-Objectivist Paradigm-Basics of Measurement
This unit deals with the concept of measurement. Measurement involves making definitions of concepts
observable and tangible. Without measurement, one can only speak hypothetically. If one wants to test
claims about the nature of the “real world”, then one needs to have constructs that can be measured. In
other words, the researcher makes choices about how to measure a concept. This sometimes involves
compromises between what would be ideal measurement and what is possible.

Objectivist research focuses on developing “objective” measures. In this regard, “objective” means that a
researcher utilizes transparent procedures that precisely map how events were categorized or counted.
"Objective” measurement procedures are necessary for purposes of replication. For a measure to be
“objective”, independent researchers examining the same event should get the same measurements. The
idiosyncrasies of individual intuition and subjectivity are removed from the measurement process. In
reality, researchers only approach the ideal of true objective measurement.

The development of objective measurement begins with developing the meaning of one’s constructs. In the
deductive research tradition, the researcher 1) generates a conceptual definition, 2) develops an operational
definition, and 3) validates the operational definition. In the inductive approach to construct development,
the researcher tries to identify a “latent” pattern or consistent correlation among a variety of empirical
indicators. In either case, researchers must show that their measurement procedures are sound.

Unit Objectives
Upon completing this unit, you should be able:
7-1: TO EXPLAIN THE ROLE THAT HYPOTHETICAL CONSTRUCTS PLAY IN COMMUNICATION RESEARCH.
7-2: To compare and contrast inductive and deductive approaches to construct
development.
7-3: To explain the characteristics of each of the four levels of measurement.
7-4: To identify the level of measurement upon reading an operational definition of a concept.
7-5: To explain why higher levels of measurement are usually preferred to lower levels of measurement.
7-6: To identify the type of reliability that should be reported for a particular variable.
7-7: To identify and assess reliability information in a research report.
7-8: To explain how each type of measurement validity should be assessed.
7-9: To assess the measurement validity of a variable in a research report.
7-10: To calculate interrater reliability using a reliability grid.
7-11: To suggest ways to improve intercoder reliability on the basis of reliability grid information.
7-12: To suggest ways to improve the measurement validity of a specific variable.
7-13: To explain how measurement reliability relates to content validity, criterion validity and construct
validity.
7-14: To explain the process researchers use to validate a new measure.
Developing Constructs
Developing a measure can be a complex process. A researcher often develops his intuitive sense of a
construct into a conceptual definition. A conceptual definition describes the construct you want to
measure. Some constructs are concrete, like the amount of time in television newscasts devoted to crime
stories. One has to define what counts as a “crime story”, but the actual measurement, running a stopwatch,
is straightforward. However, many of communication constructs are “latent” rather than “manifest”.
Manifest constructs are concrete and easily identified, but latent constructs must be inferred from
empirical indicators. Examples of latent constructs include leadership ability, credibility, and intelligence
and communication competence. Latent constructs are “hypothetical constructs” or creations of the
researcher.

Hypothetical constructs play an important role in communication research. They often have several distinct
but interrelated components. When developing a conceptual definition, researchers need to figure out if
their construct consists of one dimension (one-dimensional) or multiple dimensions (multidimensional). For
example, if a researcher wanting to measure self-disclosure might decide that factual self-disclosure is
qualitatively different from emotional self-disclosure. The researcher would then develop a conceptual
description of self-disclosure that differentiated between “emotional” self-disclosure and “factual” self-
disclosure. She would then develop an operational definition of self-disclosure that taps both of these
dimensions. Not tapping all of the dimensions of one's conceptual definition in an operational definition
results in a measure that lacks content validity

Starting with a conceptual definition and developing an operational or measurement definition for a
construct represents the deductive approach to construct definition. An operational definition involves
specifying how one will measure a construct. For instance, one might begin with a conceptual definition of
a personality variable (e.g., argumentativeness) and then develop a questionnaire that assesses
argumentativeness.

One can also take an inductive approach by using statistical techniques to determine how empirical
indicators that hold together in the real world (e.g., factor analysis). This approach is widely used in
exploratory research. The researcher measures a number of empirical indicators that she thinks might go
together. She then analyzes the data to see which empirical indicators hang together to form a “latent
variable”. This is procedure that persuasion researchers used in the 1960s to identify the components of
credibility. Part of the research task then is to decide what to name the latent variable. In this approach, the
conceptual definition develops out of the operational definition.

Many researchers prefer the deductive approach because it starts with theory. Developing measures using
the deductive approach thus contributes more to theoretical development. They deductive approach to
construct development is more demanding (i.e., one actually tests one’s ideas and runs the risk of failing).
Inductive approaches are more subject to the garbage-in garbage out phenomenon. One may “discover”
correlations that occur only by chance. Inductive approaches are sometimes disparaged as “blind
empiricism”. However, the inductive approach has its defenders as well. Qualitatively oriented researchers,
in particular, often take a “grounded theory approach” toward the development of constructs. They prefer
to stick very close to their “naturalistic” data and avoid prematurely imposing a foreign conceptual
framework on their data. The inductive approach is certainly defensible when one is doing exploratory
research, or when has very rich naturalistic data.

Developing a construct requires careful thought and disciplined creativity. Whether one starts with the
deductive approach or the inductive approach, the thing that ultimately matters is whether the measure
works appropriately. We move on to consider the measurement properties of measurement reliability and
measurement validity later on in this unit.

73
Exercise: Construct Dimensions
Below is a list of constructs used in communication research. Indicate whether you regard each construct as
one dimensional or multidimensional. Explain briefly. If you think the construct is multidimensional,
please indicate what you think the separate dimensions of the construct are.

1) Amount of television watched

2) Communication Competence

3) Persuasive Ability

4) Teaching Effectiveness

5) Shyness

6) Salary
Levels of Measurement
Measurement is the art of assigning objects or events to categories according to rules. Most often, this
involves assigning a number to the event or object under consideration. From the standpoint of science, a
construct is meaningless if we can’t measure it. If you can't detect the presence of angels, then one can't
determine how many of them can stand on the head of a needle. In fact, several different measurement
schemes are employed in research. Here I use the term levels of measurement, because the measurement
schemes available to us can be categorized in terms of the amount of information that they contain. Each
time one goes up a level in the measurement ladder, the new level of measurement retains all of the
attributes of the subordinate measurement levels and adds a new characteristic. It is important for
researchers to be able to correctly identify the level of measurement. Whether a statistical procedure is
appropriate or not depends upon the level of measurement.

Nominal level measurement is the simplest level of measurement. Nominal level measurement consists of
assigning events to categories. Asking someone about his or her religious affiliation is an example of
nominal level measurement. The categories in this case would be a list of religious groups or
denominations (e.g., Christian, Jewish, Muslim, Buddhist etc.). Adequate nominal level measurement
contains an exhaustive list of categories (e.g., all of the important religious categories are represented) and
the response categories should be mutually exclusive (e.g., a person can only be classified as affiliating
with one religion). For instance, you wouldn't want to have one category "Christian" and a second category
of "Protestant", because the two categories overlap. Nominal level measurement should have a category
for everything, and everything that is classified should go in one and only one category. In nominal level
measurement, there is no underlying distinction among degrees of more or less. If you can’t rank order the
categories on some underlying dimension, you have nominal level measurement.

The second level of measurement is ordinal level measurement. At this level thinks are ranked on some
underlying dimension. A ranking of one goes to the object or event that is highest on the dimension and the
lowest ranking goes to the event lowest the underlying dimension. For instance, I might ask you to rate 5
movies in terms of how much you liked them. I could ask you to give a rating of 1 to the one you liked the
most through a 5 to the one you liked the least. Rank ordering is utilized where one can detect differences,
but one lacks a precise scale for measuring the underlying dimension. We can detect ordinal level
differences, but we lack a precise scale for communicating these differences (e.g., pain). Ordinal level
measurement has all of the characteristics of nominal level measurement (exhaustive and mutually
exclusive categories) plus an underlying ordering of more and less.

The third measurement level is interval level measurement. This level of measurement contributes a
standard unit of measure for identifying differences in the underlying dimension. Let me illustrate with a
quaint historical example. In the English unit of measure, we still speak of a "foot" which has 12 inches. In
the Middle Ages, a foot literally meant the length of a man's foot. As a unit of measure, this was an
approximate unit of measure, but it wasn't very standard because men's feet differ in length. Ultimately
countries standardized their weights and measures so that each "foot" used in the realm for measurement
was the same as any other "foot". With a standardized unit of measure, movement from one point on the
scale to the next point on the scale always involves a constant unit of measurement. No matter if you are in
the third inch or the 11th inch, the degree of change between scale points remains the same. Interval level
measurement retains all of the characteristics of nominal and ordinal level measurement and adds the
characteristic of equal intervals between measurement points.

The fourth and highest level of measurement is ratio level measurement. We might also call this
proportional measurement. Ratio level measurement has a natural zero point. This means that you can't
have any quantity that is less than zero (i.e., no negative numbers). For instance, if I ask you how much
money you have in your wallet to the nearest cent, it is conceivable that you can respond, "I don't have any
money in my wallet." With ratio level measurement, we can make claims about ratios and proportions: "I
have twice as many dollars in my pocket as you do!" Or I am one fourth old as you are dad!" Ratio level

75
measurement has all of the characteristics of interval level measurement with the added property of being
able to construct ratios or proportions.

Researchers prefer to measure at the highest possible level of measurement. Ratio level measurement
contains the most information. Nominal level measurement provides the least amount of information.
Conceptual and practical factors determine the level of measurement. The highest level of measurement
that one can measure is specified in the construct's conceptual definition. Religious affiliation by definition
is a nominal level construct. However, degree of religiosity, ranging from not very religious to very
religious, can be an interval level scale. Measuring it at the nominal or ordinal level amounts to discarding
information.

Sometimes practical considerations force researchers to use a lower level of measurement than is ideal. For
instance, respondents to surveys are often reluctant to give their household income. Household income can
be measured at the ratio level, but researchers often give people four or five broad income categories that
are of unequal width. The researcher is content to measure at the ordinal level, because ordinal level
information about household income is better than no information at all. In other cases, ordinal level
information is all that they researcher feels that she needs for the project at hand.

Scale construction adds complexity to the discussion of scales of measurement. A researcher constructs a
scale that combines multiple indicators of an underlying construct. The researcher collects information
about these multiple indicators and then combines them into a scale score. The test scores that you receive
in your classes are examples of scales. If a test consists of 100 true/false questions, your answer for a
single question is correct or incorrect. This is an example of nominal level measurement. However, we
can count the number of correct answers that you got on the test and construct a scale score. Your score
could range anywhere from 0 to 100. The scale, in fact, reflects ratio level measurement. A similar example
of this process would be coding a conversation for the types of speech acts that occur in a bargaining
session between a union and management. Let's assume that I have the following coding categories (i.e.,
comment, promises, threats, offers, questions, and requests). I then go through the transcript of the
bargaining session and code each comment in the transcript. Coding each comment into one of these
categories is obviously an example of nominal level measurement. However, if I then count the number of
comments that each party in the negotiations made in each category I am creating scales for each category
that have ratio level properties. My scales would consist of how many comments, promises and threats each
person in the bargaining session made. The scales would be ratio level because they would have a natural
zero point.

There is another property of counting nominal level responses that often confuses students. It is possible to
count the number of objects or events that fit in a category (the number of people in the sample who bought
a particular model of car). This is an example of aggregating results. This is different from scale
construction. In scale construction, the researcher combines multiple indicators for each unit of analysis.
In aggregating results, however, one counts the number of people or other units of analysis that fall into a
particular category (The number of people who got test item 1 right and the number who got it wrong).

Sometimes you may have a variable such as age that has a natural zero point, but its categories are of
unequal width. Technically, this reflects ordinal level measurement even though it has a natural zero point.
You cannot simply look for whether or not you have a natural zero point, you also have to check to
see the units or intervals of measure are constant. You also need to be able to distinguish between an
arbitrary zero point and a natural zero point. For example, I can measure attitudes using a semantic
differential scale. I can put any numbers in a semantic differential that I want, including a zero. However,
the selection of the numbers is arbitrary; if the measurement scheme has a zero, it has no natural meaning,
it is not a natural zero point. Contrast this with the question where I ask you how much money you have in
your wallet. My response in terms of dollars can range from zero to very large sums. However, it does not
make sense to speak of the -$ or debt that you have in your wallet. So when looking at a measurement
scheme with a zero point in it, you must determine whether it is a natural zero point or an arbitrary one.
Selecting a Measurement Level for an Operational Definition
Ordinarily, researchers would like to measure their constructs at the highest possible level of measurement.
Higher levels of measurement contain more information. In other words, you can do a lot more with ratio
level information in terms of how you analyze and present it than nominal level information. Ratio level
measurement is much more informative and it has greater communication power. A considerable amount
of work in the social sciences is directed toward developing ways to measure constructs that are more
precise and informative than those used in everyday life. In real life, we know about “pain”; we can
certainly differentiate between degrees of pain. However, we do not have an interval or ratio level scale for
measuring pain and communicating about it. However, medical practitioners have devised ways of getting
patients to communicate the degree of pain at an interval level. They typically give patients a numerical
reference point and some conceptual definitions as to what the standard at each end of the scale is (e.g.,
“On a scale of 1 to 10, where 1 is like very minor discomfort and a 10 is excruciating pain. What number
would you give your pain at right now?”). This information gives medical personnel better information for
administering pain relievers than the ordinal categories of everyday language.

So, what determines the level of measurement at which one chooses to measure? When you are dealing
with multidimensional or hypothetical constructs, interval level measurement is often the highest level of
measurement that is possible. We can talk about intelligence, but it is not meaningful to talk about zero
intelligence. In reality, as you move away from mean intelligence in either direction, the measurement of
IQ becomes less like interval level measurement and more like ordinal level measurement (less than 55 or
more than 145). Usually the highest level of measurement that one can hope for a construct is set by one’s
conceptual definition of the concept. If you are classifying persuasive strategies, you are using a nominal
level measurement system. However, if you are rating some attribute of persuasive strategies such as the
degree of pressure that they apply to the target person, then one is measuring at the ordinal or interval level.

Sometimes researchers measure at a lower level of measurement than they could. The most common
examples involve cases where researchers use ordinal level measurement when they could have used
interval or ratio measurement. For instance, many surveys ask respondents to indicate their ages within a
specified range (e.g., 18-24, 25-34, 35-44, 45-60, 60+). While it is possible to ask people’s age to the
nearest year (i.e., ratio level), many researchers choose the ordinal level alternative. There are cases when
researchers choose ordinal level measurement because they think it is better to get information at the
ordinal level measurement is better than getting no information. Some people are reluctant to reveal their
exact age or income. However, they are less reluctant to answer questions that provide a range. It’s not so
much that people want to lie about their age or their income; it's just that they prefer a little ambiguity in
some of the things that they communicate!

77
Exercise: Levels of Measurement
Identify the level of measurement in the following examples. Explain your answers.

1. Sex

2. Number of column inches devoted to stories on a given topic in a newspaper.

3. Listing of the top country songs in their order of popularity.

4. What kinds of civic organizations are you a member of? Check as many categories as apply.
____ Religious congregation
____ Civic or fraternal group
____ Professional or business organization
____ Neighborhood association
____ Recreational club or interest group
____ Political organization or a political action organization

5. Rank the following issues in the congressional campaign in terms of how important they are to you.
Give a rank of 1 to the most important issue, a rank of 2 to the next most important issue and so on.
____ Balancing the federal budget
____ Reducing crime
____ Maintaining a strong national defense
____ Protecting the environment
____ Cutting taxes
____ Supporting education

6. The University of Louisville should encourage more students to spend a semester studying overseas.
__ Strongly Agree ___ Agree ___ Uncertain ___ Disagree ___ Strongly Disagree

7. Do you plan to buy a new car this year? ___ Yes ___ No

8. How important do you consider television to be as source of news?


Not Important______
Important ______
Very Important _____

9. How likely is it that you would buy this product?

Very Unlikely 0 1 2 3 4 5 6 7 8 Very Likely


Exercise: Levels of Measurement #2
One can measure the same construct in different ways. See several examples below. Identify the
level of measurement for each question. Then select the operational definition which one you think
that best measures each variable. Explain your reasoning.

1A) Have you watched any movies in a move theater in the last three months?

_____ Yes _____ No

1B) How many movies have you watched in a theater in the last three months?

____ None ___ 1 ___ 2 to 3 ___ 4-6 ___ seven or more

1C) How many movies have you watched in a theater in the last three months? _____________

Is 1A, 1B, or 1C the best measure of how many movies you have watched in a theater in the last three
months? Explain.

2A) How would you rate the entertainment value of the last movie that you saw?

___ Very entertaining


___ Somewhat entertaining
___ Not at all entertaining

2B) How would you rate the entertainment value of the last movie that you saw? Circle the number that
best approximates your feeling.

Boring 0 1 2 3 4 5 6 7 8 9 Very entertaining

Which example do you think is the best measure of the entertainment value of the last movie the person
saw, example 2A or example 2B? Explain.

79
Desirable Measurement Attributes
Thus far we have discussed how operational definitions of constructs are developed. As we have seen,
researchers have quite a few options when it comes to measuring a construct. Construct development plays
a very important role in the research process. Researchers who create quality measures of a construct play
an essential role in the research process. Indeed, objectivist research is only as good as the measures
employed to gather data for describing phenomena and testing hypotheses.

Within the objectivist research tradition, researchers must show that their operational definition of a
construct is reliable and valid. A measure is reliable if exhibits the right kinds of consistency. Depending
upon the construct, researchers will be concerned with several different types of consistency. Researchers
may be concerned about how consistently researchers rate the same event or phenomenon (interrater
reliability), or they may want to show that a measure of a construct remains stable over time (test-retest
reliability). Researchers also often want to demonstrate that multiple indicators of a construct are internally
consistent (internal consistency). If a measure lacks the needed kind of consistency, it means that the
measurement contains a lot of measurement error.

Measurement reliability is only one part of the story. One may have a consistent but inaccurate measure.
Measurement validity involves an assessment of whether one is accurately measuring the underlying
construct (i.e., that your operational definition of the construct is a good one). Validity judgments are more
complex than reliability judgments. One can assess whether the operational definition captures all of the
relevant content of the conceptual definition. A test for content validity is an initial validity test that is
employed early the process of developing a measure. A researcher also investigates whether the
measurement of a variable “works” as it is supposed to. Tests of criterion validity investigate whether the
measure relates to variables of interest. The most extensive kind of validity test is a test of construct
validity or theoretical validity. In this case, the researcher determines whether the measure "works" in the
ways that a relevant theory predicts it should work. This involves being related to variables the theory
predicts it should be related to and with not being confounded with variables that the variable should be
independent of. Establishing the validity of a measure of is usually an ongoing process.

Measurement reliability and validity are different but interrelated attributes. Measurement reliability is
independent of content validity. However, good reliability is an important precursor of criterion validity
and of construct validity. A measure that is unreliable, almost by definition will have low criterion validity.
A measure can be reliable but invalid, but the reverse is not true. If a measure demonstrates excellent
criterion validity, it also should exhibit good reliability. Researchers first validate the content validity of
their operational definitions. Then they attempt to show that they have the needed levels of reliability.
When these are achieved researchers try to show that a measure meets standards for criterion and/or
construct validity. Building good objective measures is an ongoing process. Good measurement is a
process requires ongoing development, testing and adaptation.
Assessing Measurement Reliability
Measurement reliability refers to consistency of measurement. The type of reliability that is relevant
depends upon the concept measured. Measurement reliability is reported as a statistic such as percentage
agreement or a correlation coefficient. Interrater reliability, also called intercoder or interobserver
reliability, is an issue when a person places things into categories or rates something. We ordinarily expect
two raters of the same object or event come agree in their ratings or categorizations. If there are frequent
discrepancies between judges, interrater reliability is lacking. When observers make complex ratings, the
researcher is expected to report the exact level of interrater reliability that was achieved. If the coding task
is simple, interrater reliability assessment may not be necessary (e.g., counting the number of words in a
message).

When researchers categorize things into nominal level categories, 70% agreement between coders
represents a minimal level of interrater reliability. An agreement level of 80% or better is considered good
reliability, and an agreement level of 90% or more is considered to be excellent. Researchers often report
more sophisticated intercoder reliability statistics such as Scott's Pi or Cohen's Kappa. These statistics
adjust for high and low frequency categories. For instance, if 80% of the cases fall into one category, there
is a high probability of getting high levels of agreement due to chance alone. If I code all items into the
most frequent category, I would agree with the other coder 80% of the time. A Scott's pi of .70 or better
indicates adequate intercoder reliability. When attributes are coded at the interval or ratio levels, a
researcher can calculate an appropriate correlation coefficient to indicate the degree of interrater agreement.
A correlation coefficient of .70 indicates good reliability and a correlation coefficient of .80 indicates good
reliability.

Test-retest reliability is a desirable attribute for trait variables. A trait is a characteristic of a thing or
person that is ordinarily stable across time. On the other hand, a state variable is assumed to vary
considerably across time. A measure of a trait should be fairly stable across time. Personality dispositions
and abilities are trait variables, whereas measures of mood and affect are state variables. Failure to report
measurement stability of a trait in present or past research is an oversight. Hence, you have to ask yourself:
“Is this construct a trait or a state?” If your conceptual definition or theory tells you it is a trait, then you
should look for evidence of the stability of the measure. For instance, if a person gets very different ratings
on a personality scale on two successive days, it suggests that the underlying measurement scheme contains
a lot of error since personality traits ought to be pretty stable by definition.

In the study of personal relationships, there is a great deal of interest in attachment theory. Attachment
theory states that people develop relatively stable models of people and relationships from their early
experiences with caregivers. One's attachment style reflects how much one feels that he can trust other
people. Attachment style is a trait, so a person's attachment style should be pretty consistent over six month
intervals (70% stability for minimal reliability and 80% stability for good reliability). If one has ordinal
level measurement or above, test-retest correlation coefficients are usually reported (.70 or greater indicates
minimal reliability, .80 or greater for good reliability). Researchers sometimes refer to results of test-retest
reliability in previous studies to substantiate the stability of their particular measure.

Internal Consistency is a desirable attribute when multiple indicators are used to measure a construct (e.g.,
a test with more than one test question). The use of multiple indicators is highly recommended when one
measures a complex concept (e.g., personality trait). A measure has internal consistency if the individual
items in the scale tend to run in the same direction. Internal consistency can be assessed with a split-half
correlation or an index called Chronbach’s alpha. Chronbach's alpha reports the average consistency
across all possible combinations of the items added together. A Chronbach's alpha of .80 or better good
represents good internal consistency, whereas .90 or greater indicates excellent internal consistency. The
American Psychological Association does not allow results of scales with less than a .60 level of reliability
to be published in association journals. Measures that lack minimal levels of reliability are almost certain
to lack high levels of criterion or construct validity.

81
Exercise: State or Trait?
Which of the following variables are state variables and which are trait variables? Explain your answers.

1. Whether a person is an extrovert or an introvert

2. The degree of anxiety a person feels before giving a particular speech

3. The characteristic degree of anxiety or apprehension a person feels before giving any speech

4. Physiological arousal

5. Satisfaction with Social Support

6. The number of close friends a person has

7. Attention

8. Persuasive ability
Exercise: Measurement Reliability
Which type of measurement reliability is at issue in the following examples (Intercoder, Test-Retest, or
Internal Consistency)? Explain your answers.

1. One referee calls a charging foul and another referee calls a blocking foul.

2. At week a baby measures 25 inches long, but at week 2, the baby measures 24 inches long.

3) A researcher develops a questionnaire to measure students’ attitudes towards intercollegiate athletics.


He is surprised to find that people have favorable opinions about intercollegiate athletics on some of the
items, but also have rather negative opinions about other facets of intercollegiate athletics.

4) You chastise a friend because he recommended a new movie release as “awesome”, but you found the
movie to be “juvenile and boring.”

5) You ask young children who their two best friends are. You are surprised to find that the answers
change dramatically from one week to the next.

6) A person is given a polygraph test which measures whether or not a person is being deceptive. One
polygraph examiner says the person passed, but another polygraph examiner says she didn't.

7) You were sick when you took the ACT the first time. When you retook the test your scores improved
dramatically.

83
Exercise: Content Analysis
Objectives: To illustrate the principles of content analysis and to introduce you to the interrater reliability
grid. Following is a task for coding the degree of sensitivity in messages designed to help a person cope
with a distressing event. Code each one of the 20 messages at one of the three levels described below.

Coding Emotional Sensitivity of Comforting Strategies


Level 1: Messages-Messages that suppress or ignore the individual’s perspective. The speaker
condemns or ignores the specific feelings that exist in the situation for the person addressed.
This level includes the following kinds of messages.

-The speaker condemns the feelings of the other party.


-The speaker challenges the legitimacy of the other person’s feelings.
-The speaker ignores the others by shifting the focus of adaptation.

Level 2-Messages that implicitly acknowledge the individual’s perspective. The speaker indirectly
accepts the feelings of the other by positively responding to the target, but does not explicitly
discuss, elaborate or legitimize those feelings. This level includes the following kinds of
messages.

-The speaker attempts to divert the other person’s attention from the distressful situation and the feelings
arising from that situation.
-The speaker acknowledges the other’s feelings, but does not attempt to help the person understand why
those feelings are experienced or how to cope with them.
-The speaker provides a nonfeeling-centered explanation of the situation intended to reduce the other’s
distressed emotional state.

Level 3-Messages explicitly recognize and elaborate the perspective of the other. The speaker
explicitly acknowledges, elaborates and legitimizes the feelings of the other. These strategies
may include attempts to provide a general understanding of the situation. Coping strategies
may be suggested in conjunction with an explication of the other’s feelings. This level
includes the following kinds of messages.

-The speaker explicitly recognizes and acknowledges the other’s feelings, but provides only truncated
explanations of these feelings and is often coupled with an attempt to remedy the situation.
-The speaker provides and elaborated acknowledgment and explanation of the other’s feeling.
-The speaker helps the other gain a perspective on his or her feelings and attempts to help the other see
feelings in relation to a broader context or the feelings of the other.

Scenario: A person has received a rejection from a prestigious management trainee internship. The person
is upset and melancholy about the rejection. Here are some versions of what the person would say to cheer
the person up. Please rate the strategies on the three level scale articulated above. Rate each message for
the predominant level of sensitivity that it exhibits.

_____I'm so sorry. I was sure you were going to get it. Well, maybe they'll have another opening soon and
you'll be on the top of the list. I know you're feeling bad now, but think of the good experience this
has been. Nobody gets the job they want the first time out, but now you've been through the
interviewing process and you've gotten your resume together so you're all set for the next time.

_____I know that you are disappointed, but don't let it get the best of you. You are obviously very well
qualified, or else you would not have been given so much consideration. If you progressed that far
with that company, it’s not inconceivable that you will be considered by another just as seriously. I
doubt it will be long before you get a job as good as or even better than the one you had hoped for.
_____ There are other jobs out there. You went very far with your interviews and you should feel
encouraged by that fact. You just need to get your mind off of it right now. Let's go get a beer.

_____There are other jobs out there. Jobs with more benefits such bonus each quarter, raises every six
months. Free lunches, paid insurance, medical and dental. So keep on looking, keep smiling and
you will soon get the job that you desire.

_____I'm very sorry to hear this-that company is making a mistake by not hiring you. You have a lot going
on for you and I'm sure that you will find something else you like just as well very soon.

_____Are you serious? I can't believe that I thought you were a cinch for it. I guess they had somebody
else already in mind. They probably gave it to a relative or something. Don't worry about it though.
You have many more chances to get a job. Something better will probably come along.

_____As intelligent and success motivated as you are, you will get a good job. Don't take it personal that
you didn't get the job. There is a reason the other person got the job, and it doesn't mean they were
any better than you, the company felt a need to hire them. Keep your head up and continue to
search people; like you always success, because you're determined and you know what you want.
Don't let one disappointment ruin your future.

_____Look Carla, you are at your lowest now, but here is always a pot of gold on the other side of the
rainbow. If I were you I would blow off this interview and chalk it up to experience. There are
plenty more jobs in your field out there just waiting for a nice looking intelligent girl like yourself.
So don't fret and it will all work out.

_____I would tell Nancy that to be one of the top 2 finalists is a great accomplishment and that I am very
proud of her. Also I would tell her that things always work out for the best, and I'm sure there was
some reason that she didn't get the job, not to worry through because I'm sure an offer just as good
probably better will come along. Then I would take her out and have a fun time together so she
would forget her sorrows.

____Tom, you might get mad at me for saying this, but you are kind of a whiner. When something doesn't
your way, you sometimes whine more than you should. Everyone has to go through some adversity.
Sure this is a disappointment, but whining isn't going to change the situation or bring about the
results that you want. You just have to put it behind you and go on to the next opportunity.

_____I know that you really wanted this job. I'm sorry that you didn't get it. I know that you felt fairly
confident going into this last interview. Did something occur before or during to make you
nervous? Did the person they hired have more experience? There are a lot more companies out
there looking for good hardworking people like... Then I would give her a hug.

_____Margaret I know that you're very disappointed that you didn't get the job. I'm not sure what to say.
You beat a lot of other applicants, so you know you did a great job. I hope you will start looking
again tomorrow for another job opportunity. Let me know if I can help you in any way.

_____Don't let that bring you down. There are a thousand other jobs out there waiting to be discovered.
This one job is nothing. You made one of the finalists just think of all the other applicants that were
just thrown away. You ought to feel good about that anyway. Who knows, maybe the person who
got the job will mess up or the company will realized that they picked the wrong person and call you
back. Keep your head up because you’re young, bright, intelligent, and you have a promising future."

_____You really shouldn't feel down about this. Lots of other people would be ecstatic just to make the
final round like you did. You went into this with unrealistic expectations. You're just going to have
to take your lumps until something works out: keep on sending out the resumes. You are dealing
with the real world now, so you need to keep your expectations realistic and not get so invested in

85
one possibility. Whatever doesn't kill you makes you stronger.

_____ Oh man, well, what did they say about it. Well, don't worry about it. There will always be another.
Don't let it get you down. You got to go out there and go after the next one. Gosh, look how well
you did with this company. They obviously were very interested. Cheer up and let's look for the
next one.

_____"Think of all of the free time that you have now. You didn't really want to face the real world yet
anyway. Besides, IBM is not all that good a company to work for from what I have heard."

_____ You and I know you did a good job in the interview. There are always certain aspects over which
you have no control. Maybe the boss's nephew needed a job. The bottom line is that this wasn’t
meant for you, so there must be something better waiting for you that you're not aware of yet.

_____ As you did, so should we all try our very best to achieve what we want in life. Apparently this job
was not meant to be. Keep on looking because I'm sure that is something is better suited for you.

_____ I'm sorry that the situation did not go as you had hoped for. I know that this job was very important
to you. On the other hand, I think you need to readjust your focus and realize that you were
obviously qualified for the job or you would not have made that far through the process. Thus there
is a good chance that company may contact you later for a position if you remain on file. Also, I
know that this was your first choice, but there are other companies that need your talents. Keep
your confidence and feel good that you are qualified. Look at that interview as experience.

_____Look it's just one job. It’s not the end of the world. Maybe you weren't the person they were looking
for. Maybe they just don't like Italians. I don't know. Just put this behind you. Learn from it if you
can. There’s got to be other jobs out there. I know how important it was. Take a little time to be
upset, but don't give up. There's bound to be some company that needs you. Everyone can't hate
Italians now can they?

Interrater Reliability Grid

In conjunction with your partner, map your codings with those of your partner in the reliability grid below.
Consider yourself to be Coder 1 and consider your partner to be Coder 2. Only one person in each pair
should record the tallies. Compare with your partner how each of you categorized the first ad. Place a
tally in the box that intersects both your row and your partner's column. For instance, if you
categorized an ad as falling in category 1 and your partner rated it as falling in category 2, then you would
place a tally in the box intersecting at column 1 and row 2. If you both categorized the ad in category 1, you
would place a tally in the box intersecting at column 1, row 1. Place a small tally for each ad you
categorized in the appropriate box (small so as to allow multiple tallies in each box). There should be only
one tally mark per advertisement that the two of you coded. Each tally maps how both of you coded
each ad in two-dimensional space. When you have completed this task, calculate the degree of interrater
agreement you had with your partner. Count the total number of tallies that fall in the following boxes
(boxes 1, 1; 2, 2; 3, 3; on the grid's left to right descending diagonal). This total represents the number
of agreements you had with your partner. Then divide this number by the total number of ads you and
your partner classified. This will give you the percentage of agreement. Then note any boxes with a
frequent number of marks off the diagonal (disagreements which occur with regular frequency). These
squares indicate areas of inconsistency and disagreement between you and your partners. If you have
several tallies in an off-diagonal box, it reflects a consistent disagreement between the two of you in how
you rate the two categories.

Coder 1 Coder 1 Coder 1


Category 1 Category 2 Category 3
Coder 2
Category 1

Coder 2
Category 2

Coder 2
Category 3

Percentage of Agreement = Agreements/Total Categorizations=

Most frequent category disagreement = _________________

How could you use the information in the reliability grid to increase your level of intercoder reliability in
the next round of coding?

87
Assessing Measurement Validity
Measurement validity involves assessing the degree to which an operational definition of a variable
accurately measures the target construct. There are at least four different ways of assessing measurement
validity.

Content validity, also called face validity, involves comparing a conceptual definition with its operational
definition. An observer judges whether the operational definition actually taps the intended content. A
measure has content validity if independent observes agree that it adequately represents the target domain.
Anyone can have an opinion on the content validity of a measure; therefore, content validity is
sometimes called subjective validity. You can justify your opinion that a measure lacks content validity if
you can show that the measure lacks sufficient breadth (i.e., fails to include important dimensions of or
indicators that are associated with the construct), or if you show it measures the construct in a superficial
way that does not correspond with the conceptual definition.

Content validity involves observer judgment, not statistical calculations. It is the first kind of validity that a
researcher seeks to establish. The researcher usually shows the measure to several content experts and asks
them if the content of the measure accurately indexes the construct. To assess the content validity of an
operational definition, you ask questions such as: “Does it measure all of the dimensions of the construct
(breadth)? Does it make the kinds of distinctions that one’s theory and conceptual definition call for
(breadth)? Does it have enough items to assess each domain of the construct (depth)? Does it match with
the conceptual definition?

Criterion validity is a statistical assessment of validity. A measure has criterion validity when it predicts
some current or future criterion variable. Criterion validity is established by research results. It cannot be
established until research studies are actually carried out. There are two broad categories of criterion
validity. Concurrent validity is established when an operational definition correlates in the expected way
with a parallel measure of a construct. For instance, a self-report questionnaire of communication ability
has concurrent validity if it agrees substantially with ratings of the person's communication skills filled out
by her peers. Researchers sometimes try to show that a shorter version of a measure gives substantially the
same results as a longer version of the same instrument. By establishing that the short version of a
questionnaire concurs with the longer version of the questionnaire, the researcher builds a case for using
the shorter and more efficient instrument. Predictive validity is established when an operational definition
predicts a chosen criterion variable that is conceptually different from the predictor variable. When you
assess criterion validity, you ask yourself whether the measure of the construct adequately predicts the
criterion of interest (e.g., Does a personnel selection test predict who succeeds and doesn't succeed in a job?
Does the GRE exam really predict how well a student will actually do in graduate school as measured by
graduate school GPA?).

Construct validity, sometimes referred to as theoretical validity, is established by research results.


Construct validity involves establishing that a) your measure of a construct is distinct from similar
constructs, and b) it relates to other variables in the way that your theory says that it should. This involves
two separate aspects. The measure should not be substantially correlated with measures your theory
says it should not be related to (e.g., a measure of perspective-taking ability should not be related to how
much a person writes). However, the construct should be systematically correlated to other variables
your theory says they should be related to (e.g., persuasive ability should predict success in sales). It may
take several studies to establish the construct validity of a measurement technique or instrument.

Interrelationships among Types of Measurement Validity


A measure can have content validity but demonstrate low levels of criterion validity and construct validity.
Likewise, a measure may have low content validity, but demonstrate high levels of criterion validity. One
may be able to inductively construct a latent variable that predicts some important criterion variable (e.g.,
job turnover). The resulting measure may have low content validity, but work well in predicting an
important criterion variable. A researcher may know that it works but be unsure about why the measure
successfully predicts the criterion variable. Criterion validity is sometimes referred to as practical validity.
It can help one predict and control events even if one lacks a good theoretical understanding of why the
method works.

Establishing criterion validity depends upon having a valid measure of the criterion variable. Selecting the
wrong criterion variable(s) or using an invalid measure of a criterion variable will confound the
interpretation of your results. For instance, if you fail to successfully predict the criterion variable, this
could be due to invalid measurement of the criterion variable or with invalid measurement the construct
you are attempting to validate. A measure may have high levels of criterion validity, but not fit with a
theory that explains the relationship. Establishing criterion validity is often an important step in establishing
the construct validity of a measure, but it does not guarantee it.

If your measure of a variable fails to relate to other variables in the way your theory predicts, you face the
daunting task of determining why the predictions didn't hold up. The first possibility is that your theory
may be faulty and make incorrect predictions. It is also possible that your criterion variable is measured
unreliably or invalidly. You may also have significant measurement error in how you are measuring your
target construct. It may take a lot of time and effort to determine which alternative best explains your
inability to find expected relationships. It often takes considerable time and ingenuity to establish the
construct validity of an operational definition of a variable.

Table: Summary of Types of Measurement Validity

Judgment Based Criterion Based Theory Based

Content Validity Concurrent Validity Construct Validity


Predictive Validity

89
Exercise: Measurement Validity
Which type of measurement validity is at issue in the following examples? Explain your answers.

1) For the first exam the professor decides to be lazy. He constructs a ten-item multiple choice exam in
place of the usual fifty-item multiple-choice exam. Students are quite upset and claim that the exam is
unfair.

2) On your second exam, the instructor draws 80% of the test questions one of five chapters covered by the
exam. Students bitterly complain that the exam is unfair.

3) Foreign students, who score low on TOEFL, a measure of English language proficiency, have
the same GPA as students that score very high on the same test.

4) A researcher has developed a set of procedures for measuring how achievement oriented grade school
students are. She gives the test to a group of students, one half of whom were identified by their teachers
as being high achievement oriented, whereas the other half identified as low achievement oriented. The
researcher is surprised to find that both groups of students score about the same for achievement
orientation on the test.

5) I have a short questionnaire to measure a person's empathic abilities. I compare it with a much longer
questionnaire measure of empathic ability. Unfortunately, I find that the short questionnaire measure of
empathic ability does not accurately predict a person's empathy scores on the longer questionnaire.
Relating Measurement Reliability & Measurement Validity
There is no necessary relationship between the three forms of measurement reliability and content validity.
One's measure may please the content experts, but fail to demonstrate the needed consistency when it is
actually applied. Likewise, a measure may show high degrees of consistency, but lack content validity.
Measurement reliability and content validity are functionally independent of each other.

Measurement reliability and criterion and construct validity are asymmetrically related. Measurement
reliability is a necessary precondition for both criterion validity and construct validity. However, high
measurement consistency does not guarantee either criterion or construct validity. If a measurement
instrument lacks consistency, it will also lack criterion and construct validity. An unreliable measure is full
of error. Unreliable measurement places a ceiling on the magnitude of the possible relationships that a
researcher can discover. For instance, if the real underlying correlation between two constructs is .50, but
one of the constructs has a reliability level of only .50, then one will only be able to detect a small
correlation between the constructs. Measurement error is ordinarily uncorrelated with a selected criterion
variable. As measures become less reliable, their criterion and construct validities decrease as well.

However, the fact that you have a reliable measure does not mean that it also has criterion or construct
validity. Your measure may measure consistently but be consistently inaccurate. However, if a measure
demonstrates criterion or construct validity, one can reasonably infer that the measure also has adequate
levels of measurement reliability. Measurement reliability is a stepping-stone to establishing the
measurement validity of a construct. Of the two concepts, measurement validity is the more complex
and more important concept.

Content validity and the other forms of validity may be unrelated to each other in some instances. A
measure may have content validity but lack criterion validity or construct validity. Conversely, some
measures have criterion validity but may seem to be deficient in terms of content validity. There also is no
necessary relationship between criterion validity and construct validity. A measure may predict some
criterion variable quite well, but fail to fit the overall predictions of a theory.

91
Improving Measurement Reliability and Validity
Researchers spend a great deal of time trying to improve the reliability and validity of their instruments.
The first step in the process is to establish the content validity of a measure. The researcher will carefully
develop the conceptual definition of the construct. This should include a careful review of the literature to
identify all of the relevant elements or dimensions of the construct. Once the conceptual definition is
clearly stated, the researcher develops an initial instrument to operationalize the construct. Then the
researcher will ordinarily get opinions of other research experts in the field to judge whether the instrument
matches the conceptual definition. The researcher may also solicit suggestions for refining the measure and
incorporate the suggested improvements into the measurement instrument.

This first stage of concept development and review obtains a consensus among experts that the measure
closely aligns with the conceptual definition of the construct. The researcher can continue this process of
consultation and revision until fellow experts agree that a close alignment between the conceptual
definition and the operational definition has been achieved.

Ordinarily researchers then work to establish the measurement reliability of their instrument. If the
measurement involves complex codings, the researcher will check the intercoder reliability of coding
scheme. For instance, several researchers developing a coding system for portrayals of violence on
television would independently code several episodes and compare their results. They would discuss the
patterns of agreement and disagreement that they discover. Then they would refine their coding system by
adding categories where needed, clarifying definitions, and adding examples to the code book to make the
coding rules more explicit. After they have done this, they will engage in another round of independent
coding. They will again compare their agreements and disagreements and refine the codebook as needed.
They will continue this process until they achieve adequate levels of intercoder reliability. Ultimately, the
codebook should be given to a person who did not participate in developing the coding system. The third
party codes the episodes using the codebook. If the third party's codings align well enough with the
previous codings, then the researchers know that other researchers can replicate their findings.

If the researcher needs to establish the stability of an instrument, she will conduct a longitudinal study to
see whether the same person or entity retains its target characteristics over an appropriate interval of time.
For instance, if I am measuring a whether or not a person is shy, I expect that most people who rate
themselves as shy in one month will also rate themselves as shy the next month. If a measure fails to show
the needed test-retest reliability, the researcher will have to continue to refine and retest the instrument until
she obtains the desired level of stability. In practice, it is more difficult to improve the stability of a
measure than it is the other two forms of reliability, because of the need for longitudinal research.

Single item measures of complex constructs make it impossible to check the reliability or consistency of
the instrument. Single items often create problems for content validity. To assess the internal consistency
of a measurement scheme one must combine at least two items or indicators into a scale. The more
indicators one has in a scale, the higher its internal consistency is likely to be. A four-item scale is likely to
have higher levels of consistency than a two-item scale. As the number of items in a scale increases, the
less impact one bad item will have on the reliability of the scale (Increasing the number of items in a scale
can also increase the content validity of a scale.). The researcher can examine the degree of consistency
among the items in the scale. Items that show little consistency with other scale items can be dropped and
new items can be added. The researcher will continue revising and testing the scale until the desired levels
of internal consistency are achieved.

The third stage of operational definition testing is usually to establish the criterion validity of an instrument.
One strategy is to show that a measure provides the same result as other measures of the construct: it shows
concurrent validity. If I am developing a measure of the popularity of school children, I might show that
my instrument concurs with the classroom teacher's estimate of children's popularity with age-mates. In
some cases, a researcher will use a known-groups difference test to establish the concurrent validity of a
measurement. If I am trying to develop a survey scale to capture a person's attitudes towards the
environment, I might get members of the Chamber of Commerce and members of Greenpeace to fill out
my questionnaire. It is widely acknowledged that these two groups tend to hold very different attitudes on
environmental issues. The members of the two groups should differ considerably on my measure of
environmental attitudes. If the expected differences in attitude fail to materialize, the measure does not
concur with the known differences that go along with group membership.

Another use of concurrent validity is to determine whether a relatively short or inexpensive measure of a
construct concurs with a longer or more expensive measure of the same construct that has shown
acceptable criterion validity. For instance, a researcher might try to develop a version of a shorter
questionnaire to measure a personality trait to replace a much longer instrument. The researcher will then
attempt to show that the shorter instrument obtains similar results to the longer version. If her measure
concurs with the longer measure, she will be justified in using the shorter form. A researcher can refine
and retest an instrument until she gets the level of concurrence between the two measures that she desires.

Researchers also often concern themselves with establishing the predictive validity of an instrument. There
are two differences between predictive validity and concurrent validity. First, concurrent validity involves
comparing measures of the same underlying construct, whereas establishing predictive validity involves
showing that your measure accurately predicts scores on a conceptually different variable. The second
difference is that predictive validity often involves predicting some later event (e.g., a measure of job
aptitude accurately predicts how well a person will perform in a particular job). As with concurrent
validity, the criterion variable should be carefully selected. The criterion variable must be a reliable and
valid measure of the construct that it represents. The criterion variable should also be a relevant and
important criterion, rather than one that is the simplest to measure. Predictive validity is established when
one is able to show that one's measure accurately predicts scores on the criterion variable. A researcher may
continue to work and refine a measure until he gets an instrument that demonstrates the desired level of
predictive validity. Predictive validity establishes that a measure has a substantial reality about it. It is not
a measurement artifact or merely a creation of some researcher’s imaginative operational definition: it
predicts real world events.

Establishing the construct validity of a measure involves the same procedures as establishing predictive
validity, except that the researcher uses variables that are specified by the theory the researcher is testing.
The researcher may have a theory as to what constructs a measure should relate and should not relate to. It
usually takes a number of studies to establish an instrument’s criterion validity. As with other validity
assessments, a researcher can continue to refine an instrument until she gets it right. In many respects,
construct validity is an ideal rather than a widely practiced standard in social science research. In new fields
of inquiry, there may not be a well-developed and substantiated theory to select criterion variables. In
exploratory research, researchers often develop their constructs inductively rather than deductively. If a
measure doesn't show the predicted relations, it can be difficult to establish whether the theory or the
measure is at fault. In the end, most researchers settle for measures that a) appear to have valid content, b)
demonstrate adequate levels of reliability, and c) have the needed levels of concurrent or predictive
validity.
Exercise: Additional Measurement Issue Problems
What measurement issue is at stake in the following examples? Explain briefly.

Content Validity Interrater Reliability


Construct Validity Test-Retest Reliability
Predictive Validity Internal Consistency
Concurrent Validity

1) A person takes a self-report honesty exam that asks how she would handle a series of situations that
involve potential compromises of honesty. The person “flunks” the honesty exam because of low
consistency between her answers of similar questions.

93
2) A researcher wants to develop a short measure for measuring the personality trait of introversion-
extroversion. He develops a 10-item measure but is disappointed to find that it is uncorrelated with the
40-item measure ordinarily used to measure this personality dimension.

3) The researcher in problem 2 also finds that his short measure of introversion-extroversion is unrelated to
how much a person talks in a get-acquainted conversation.

4) The same researcher in problem 2 above finds that people score rather differently on their introversion
and extroversion scores in a two-month follow-up than they did in their first administration of the
introversion-extroversion scale.

5) A researcher investigates the degree to which commentators agree on their ratings of presidential
candidates’ debate performances. He compares the ratings of expert commentators and finds that there is
very little agreement or overlap in their opinions.

6) A researcher develops a measure of a person’s capacity to love other people. The measure asks how
much the person likes a wide variety of family and associates from parents, siblings, relatives, significant
others, children and neighbors.

7) My theory of love predicts that a generalized measure of love (i.e. how loving a person is) should
positively correlate with how much a person trusts people in general. Unfortunately, I find a slight
negative relationship between my measure of generalized love and level of trust that a person has in
others.

Unit 8: Descriptive Statistics


Descriptive statistics are powerful communication tools. They allow us to describe large amounts of data
in efficient and informative ways. We will investigate how statistics describe the distribution of a variable.
We will also discuss how a person can compare different distributions and we will find that it is indeed
possible to compare apples and oranges using standardized scores. We will also investigate some of the
properties of the normal curve. Finally, we will consider some of the ways that one can describe how
variables relationships.

Unit Objectives

Upon completing this unit, you should be able:


8-1: To construct a frequency distribution from unordered data.
8-2: To use a frequency distribution to identify or calculate the following statistics: mode, median, mean
and range.
8-3: To explain what a standard deviation is and explain why it is the preferred measure of variation.
8-4: To select the most appropriate measure of central tendency when the skewness and the kurtosis of a
distribution are known.
8-5: To compute and interpret the meaning of a standardized score (z score).
8-6: To compare different distributions utilizing z scores.
8-7: To describe the properties of the normal curve and explain why these properties are important.
8-8: To calculate the proportion of a distribution between two points in an approximately normal
distribution.
8-9: To describe the assumptions of linear regression.
8-10: To determine which correlation coefficient should be employed to describe a variable relationship.
8-11: To determine the type and strength of a relationship depicted by a correlation.
8-12: To explain the concept of proportional reduction of error.
8-13: To explain the uses and limitations of multiple regression and other techniques of causal modeling.
8-14: To describe the three most common mistakes that researchers make when describing variable
relationships.

95
Describing a Distribution
Descriptive statistics describe how an attribute is distributed in a group or population. Descriptive statistics
are especially useful in summarizing large amounts of data. Descriptive statistics provide a snapshot of the
nature of a distribution in a single numerical index. Quantitative measurement enables researchers to
describe their results very efficiently. No matter how many people, events or objects being described, the
appropriate descriptive statistic accurately provides important information about a distribution in a single
number.

Constructing a Frequency Distribution

Let’s say that you take a test and get the results back several days later. The exam has the percentage of
your correct answers at the top. If you are a typical student, you will be particularly interested in what this
number “means”. Your syllabus may have an absolute grade distribution printed (e.g., 90-100%=A etc.),
but you know that the relative distribution of scores can also be important. You likely want to know how
your score compares with the scores of other class members.

Your instructor is also likely interested in how well students did on the test. The first thing she probably
does is to construct a frequency distribution. A frequency distribution summarizes how often a particular
score or value occurs in a data set (e.g., In the example of a test score distribution, it records how many
people got each score arranged in order from the lowest scores up to the highest scores). The frequency
distribution efficiently communicates information about the data. It can be used to construct other useful
charts and graphs. It can also be used to calculate useful descriptive statistics.

A frequency distribution has a series of columns. The first column is the value column (e.g., In the test
score example, a value refers to an actual test score). The second column, called the frequency column,
records how often that value occurs for that variable in the data set. A frequency distribution may also have
a cumulative frequency column that records the number of items occurring at that value and all prior values.
Let’s say your score on a test is 82, the cumulative frequency column records the number of people in the
class who scored 82 or below on the test. There also may be a column titled cumulative percentage. This
column gives the percentage of total cases in the distribution that fall at that value or below. If your test
score is 82, the cumulative percentage gives you your percentile rank on the test in the class (e.g., 80% of
the test scores in the class were at or below yours). A column that often appears in a frequency distribution
is mysteriously titled fX. Actually, it is nothing more than the product of the value column multiplied by
the frequency column for each value. The numbers in this column are added together to get the sum for the
entire distribution. The mean is calculated by taking the total from this column and dividing it by the total
number of items in the distribution (the total at the bottom of the frequency column).

Measures of Central Tendency

After a test, students often ask me "What was the typical score on the test?" This amounts to asking for a
measure of central tendency. Several statistics can describe a typical score. In response to your question,
the professor might give you the mode or the test score that occurred most frequently. To identify the
mode in a frequency distribution, you identify the value that occurs most frequently. With nominal or
ordinal measurement, the mode is the only measure of central tendency that one can use. However, when
one has interval or ratio level data you should use a statistic other than the mode. In some cases, a
distribution can be bimodal with two scores occurring most frequently very far apart (e.g., Imagine a test
where an equal number of people score 100 and score 50). In many cases, a unique or irregular feature of a
distribution can cause the mode to be a misleading indicator of central tendency.

As a student, you are more likely to prefer either the median or the mean as a measure of central tendency
for the test. The median is the middle case in the distribution. Half of the cases fall above this point and the
other half fall below this point. Unlike the mode, the median is not distorted by an irregular shaped
frequency distribution. You can find the median using the following formula and the cumulative frequency
distribution column (Median= ((N+1)/2). In this equation, N refers to the number of items in the
distribution. The median here refers to the case number that you look for in the cumulative frequency
column. For instance, in a distribution with 15 cases, you will look for the 8th case in the cumulative
frequency distribution (15+1)/2=8. If there is an even number of cases in a distribution, you will split the
difference between two cases (e.g., (16+1)/2=8.5). In other words, you will locate the midpoint between
the 8th and 9th case in the distribution. For instance, if the value of the 8th case is 10 and the value of the
9th case is 13, then the median is 11.5 (i.e., (10+13)/2=11.5).

The mean is another indicator of central tendency. The mean is the average of the distribution (or in our
example the class average). The mean is the “arithmetic center” of a frequency distribution. When you
subtract the mean from each number in the distribution and add all of the difference scores together, you
get a sum of zero: the negative sums perfectly cancel out the positive sums. The mean takes all of the
scores in the distribution into account.

To calculate the mean, you add the values of all of the items in the distribution and divide by the total
number of items in the distribution. To calculate the mean in a frequency distribution, you create an fX
column in which you multiply the value column by how many times it occurs. You then add the fX column
together and divide by the total number of items in the frequency distribution. If you already have a
frequency distribution column, you simply add this column and divide by the number of items in the
distribution (i.e., (Σ fX)/N).

By this time, you are probably wondering whether the mean or the median is the best measure of central
tendency. The answer is that it depends on the skewness of the distribution. A skewed distribution is one in
which there is an extreme score or outlier in the distribution (e.g., the presence of a billionaire in a group in
which you are trying to calculate “average” income). A distribution is positively skewed when the outlier is
a very high score in the distribution; a distribution is negatively skewed when the outlier is a very low score
at the bottom of the distribution. An outlier receives a heavy weight in calculating the mean. When one
has a highly skewed distribution, the median is the preferred measured of central tendency. However, when
one has a relatively unscrewed distribution, the mean is the preferred measure. The mean takes all of the
items of the distribution into account. In inferential statistics, the mean is estimated from a sample with less
error than is the median.

You can identify skewness in your distribution by visually inspecting a frequency distribution for outliers.
An outlier is not balanced by other values on the other half of the distribution. You can identify the relative
skew of a distribution by comparing the median and the mean. When the mean and median are close to one
another, the distribution is relatively unskewed. When the mean is much larger than the median the
distribution is positively skewed (e.g. when one person got a 100 and the next highest score was a 75).
When the mean is much smaller than the median, the distribution is negatively skewed. You can also
calculate a skewness statistic with the computer to determine whether it is wise to use the mean (A skew
statistic of zero indicates an unskewed distribution in which the mean and the median are fairly close to one
another. A skewness statistic of greater three or less than –3, indicates that the distribution is highly
skewed.).

Some variables are ordinarily quite skewed (income is usually positively skewed). One common statistical
trick to mislead people is to use the mean to represent a distribution when the distribution is highly skewed
(e.g., the mean salary in this company is $40,000). In recent years, Democrats and Republicans have
exchanged barbs about whether wages have stagnated in recent decades. Republicans emphasize that mean
family income is up considerably, whereas Democrats emphasize that median salary has grown very slowly
over the last several decades. Both statistics are true, but each side has selected the statistic that best
supports their particular political philosophy.

Measures of Variation

97
To return to our test example, the second set of questions that you are likely to ask will have to do with the
variation of scores on the test (e.g., how they were spread out). The simplest measure of variation is the
range. The range is simply the difference between the highest score in the distribution and the lowest score.
The range is easy to compute, but it is not particularly useful because the size of the range is positively
correlated with the number of items in the distribution (i.e., the more people who take the test, the larger the
range is likely to be).

For this reason, two other measures of variation or dispersion are often calculated: the variance and the
standard deviation. A distribution where the cases are tightly packed together has a smaller standard
deviation than a distribution in which the cases are widely spread out. The first step in calculating the
variance is to subtract the mean from each number in the distribution. The second step is to square of each
these differences and add all of these squared deviations together. The third step is to divide by the number
of items in the distribution. The variance is the average squared deviation of distribution items from
the mean. In statistical analysis, the variance describes relationships between variables or differences
between groups, but descriptively, it is not very helpful because it is not in the original units of measure
(i.e., What does it mean to say that the variance is 3 squared inches?). For this reason, statisticians usually
calculate the standard deviation. The standard deviation is the square root of the variance. It converts the
variance back to the original units of measure.

The standard deviation helps you understand how your score relates to the distribution mean. You should
request the standard deviation when you want to determine the relative meaning of your score. The
standard deviation is the most useful measure of variation or dispersion. In the next reading, we will
investigate a measure called the standardized score. This measure describes where a single point is located
in the frequency distribution relative to the mean.

Measures of central tendency and dispersion allow one to quickly and efficiently describe and summarize a
great deal of data. With the help of two simple measures of the scores (e.g., the mean and the standard
deviation), you can begin to figure out the “meaning” of your test score.

Symbolic Notation
The table below lists some of the common symbols that will appear in the rest of the book and then lists
some of the equations for some of the descriptive statistics employed in this unit.

Symbolic Notation for Population Parameters


X= Any score in a series. xi= a specific case in the distribution depending upon the subscript.
X=the mean. M is also commonly used to refer to the mean.
N=Total number of scores in a distribution
Σ =the summation of the numbers to the right of the sign.
Mo=Mode
Md=Median
R=Range
S2 =Variance
S=Standard Deviation
Z=Standardized score. Zi will refer to a specific case in the distribution depending upon the subscript.
Equations for Common Statistics
R=Xhi-Xlo where Xhi is the largest score in the distribution and Xlo is the smallest score in the
distribution.
_ _
X= (Σ X)/N for grouped data in a frequency distribution, X= (Σ fX)/N
Md= case (N+1)/2 in the cumulative frequency column of a frequency distribution
_
Z=(X-X)/S
_
S2=Σ (X-X)2/N
S=Square root of the variance or S2 above

99
Exercise: Constructing a Frequency Distribution
1A. Construct a frequency distribution from the following survey data in response to the question, “How
many hours of television did you watch yesterday?”
Respondent A: 3 hours Respondent H: 1 hour
Respondent B: 2 hours Respondent I: 1 hour
Respondent C: 1 hour Respondent J: 3 hours
Respondent D: 1 hour Respondent K: 3 hours
Respondent E: 4 hours Respondent L: 0 hours
Respondent F: 1 hour Respondent M: 3 hours
Respondent G: 4 hours Respondent N: 1 hour

Hours of TV Frequency Cumulative Cumulative fX


watched Frequency Percentage

0 hours
1 hour
2 hours
3 hours
4 hours
Totals
1B. Calculate the range (i.e., subtract the smallest number in the value column from the largest number in
the value column). Then identify the mode and the median for this distribution.

1C. Calculate the mean for the above distribution; sum the FX column and divide by the number of items in
the distribution.
Exercise: Constructing a Frequency Distribution Example 2
In a survey of a communication class, students were asked: “How many on-line purchases have you made
in the last month?” The results for this question appear in the following frequency distribution.

# of online Frequency Cumulative Cumulative fX


purchases Frequency Percentage
Zero 14
One 6
Two 8
Three 5
Four 4
Five 3
Totals 40

1) Complete the remaining columns.

2) What is the range for this distribution?

3) What is the mode for the number of online purchases?

4) What is the median number of online purchases?

5) What is the mean number of online purchases?

101
Standardized Scores
So far, we have described the distribution as a whole. We can also describe where an individual value lies
in a distribution. We do this by calculating a standardized score or a z score. The standardized score
utilizes the distribution mean and standard deviations to locate the position of a particular value in a
distribution. The standardized score tells you how far a particular value is above or below the
distribution's mean in standard deviation units (i.e., how many standard deviations that a score is above
or below the mean).

To calculate a standard score, you subtract the mean from the value you are interested in and divide by the
standard deviation for the distribution z= (Value-Mean)/Standard Deviation. If you got an 82 on the test
and the class mean is 75 and the standard deviation is 7, then your test score is one standard deviation
above the mean (i.e. z=(82-75)/7=1). If your score is 75, the class mean, your standardized score is 0 (i.e.,
75-75/7=0). For scores below the mean, the standardized score will be a negative number (e.g., 68-75/7=-
1 or one standard deviation below the mean). Standardized scores are useful for describing where values
lie in a particular distribution, but their real value comes in enabling us to compare different distributions.
The old saying that you can’t compare apples and oranges isn't accurate. You indeed can compare scores
from two different distributions if you can first convert them to the common metric of standardized scores.

Let’s say that you are taking a Spanish class. One evening you compare notes with a friend on how well
both of you are doing in the class even though you have different teachers and different tests. You received
a 75 on your first test and your friend got an 85 on his test. However, you suspect that your friend has an
easier instructor and that he took an easier test. You suspect that your score of 75 is actually better than his
score of 85. To make headway in this argument you need to calculate both of your z scores. If the mean of
your class was 65 and the standard deviation was 10, your z score is 1.0 (i.e., 75-65/10=1). If the mean of
your friend’s class was 80 and the standard deviation was 10, his or her z-core is .50. You can argue that
your score of 75 is better than your friend’s score of 85, because your standard score is higher. Your friend
might argue that the scores are not comparable because the students in his class are more proficient in the
subject. Alternately he may argue that his instructor is superior and that students learn better as a result.
Relative comparisons may be misleading if the groups come from fundamentally different groups (novices
are being compared to experts). However, standardized scores can be used to compare different
distributions. In real life, we make many comparisons like this as we try to make choices between different
distributions.
Comparing Distributions Using Standardized Scores
Janelle is an emerging track enthusiast. She really likes the sport, but she is trying to find her niche. She has
many events to choose from, but she isn’t sure where she should invest her limited training time. She
wants to choose the events in which she will be most competitive. After the first two meets of her first year
in high school, she has the following data regarding her performance. She compares her performances with
the normed data for her age group (group means and standard deviations). Rank the events in the order
training priority. Give the first ranking to the one she should devote the most time to etc. Calculate and use
the standard scores for each event to justify your answer (Note that in the case of running events, the best
performances are the ones that have the most negative values in terms of the z scores. Hence, use the
absolute value for the running events when comparing them with the field events).

Event: Janelle’s Performance Group Mean Standard Dev.


Long Jump 17’ 12’ 5’
High Jump 5’6” 5’ 4”
Pole Vault 10’ 8’ 1’
200 Meter Dash 25 seconds 28 seconds 2 seconds
400 Meter Dash 61 seconds 68 seconds 5 seconds

103
The Normal Curve
This normal curve is of particular interest to statisticians. It is also called the Bell Shaped Curve. Strictly
speaking, the normal curve is a theoretical curve. Real world variables only approach normality because
real world distributions are finite and the normal curve assumes an infinite number of items. This reading
explains some of the properties of the normal curve and explains why it is widely used in descriptive
statistics.

The normal curve has several interesting properties. First, the mean, mode and the median are all equal.
The greatest proportion of cases falls in the regions fall close to the mean and there is a gradual tapering off
in the proportion of cases as you move away from the mean. The normal curve is also symmetrical curve:
the cases on each side of the mean perfectly mirror each other. The normal curve also has the property that
the proportion of cases that fall within given standard deviation units from the population mean is known.
In your statistics class, you found out that approximately 34% of the cases in the normal curve fall between
the mean and one standard deviation above or below the mean. The following figure displays the
percentage of cases that fall in each standard deviation interval up to three standard deviation units above
and below the mean.
The importance of the normal curve derives from several factors; many real world distributions are
approximately normal. Secondly, in the normal curve the proportion of cases that fall within a known
standardized score from the population is constant. If you know a score from a normal curve, you also can
determine what the cumulative percentage or ranking of that score is on that particular curve. Thirdly, all
real world distributions can be compared to the normal curve. This means that we can use our knowledge of
the normal distribution to make estimates about many real world populations. However, the main reason
that the normal curve is so important is that random error is normally distributed. This means that the
normal curve can be used to make inferences about a population from known characteristics of a sample.

If a distribution is approximately normal, I can use my knowledge of how proportions fall on the normal
curve to estimate the percentage of cases that fall between given points in their real world distribution; I
can estimate the proportion of cases between any two points in the distribution if I know or can calculate
their standard scores (i.e., z scores). All I need to make these estimates is the distribution mean, the
distribution standard deviation, and the normal curve proportions table.

Table: Normal Curve Proportions Table


Z scores Proportion Percentage
.05 .02 2%
.10 .04 4%
.15 .06 6%
.20 .079 7.9%
.25 .099 9.9%
.30 .118 11.8%
.35 .137 13.7%
.40 .55 15.5%
.45 .174 17.4%
.50 .192 19.2%
.55 .209 20.9%
.60 .226 22.6%
.65 .242 24.2%

105
.70 .258 25.8%
.75 .273 27.3%
.80 .288 28.8%
.85 .302 30.2%
.90 .316 31.6%
.95 .329 32.9%
1.0 .341 34.1%
1.05 .353 35.3%
1.10 .364 36.45
1.15 .375 37.5%
1.20 .385 38.5%
1.25 .394 39.4%
1.30 .403 40.3%
1.35 .412 41.2%
1.40 .419 41.9%
1.45 .427 42.7%
1.50 .433 43.3%
1.55 .439 43.9%
1.60 .445 44.5%
1.65 .451 45.1%
1.70 .455 45.5%
1.75 .460 46%
1.80 .464 46.4%
1.85 .468 46.8%
1.90 .471 47.1%
1.95 .474 47.4%
2.00 .477 47.7%
2.05 .48 48%
2.10 .482 48.2%
2.15 .484 48.4%
2.20 .486 48.6%
2.25 .488 48.8%
2.30 .489 48.9%
2.35 .491 49.1%
2.40 .492 49.2%
2.45 .493 49.3%
2.50 .494 49.4%
2.55 .495 49.5%
2.60 .495 49.5%
2.65 .496 49.6%
2.70 .497 49.7%
2.75 .497 49.7%
2.80 .497 49.7%
2.85 .498 49.8%
2.90 .498 49.8%
2.95 .498 49.8%
3.00 .499 49.9%

As you can see most cases fall relatively close to the mean. As one moves away from the mean the
proportion of cases per standard deviation unit diminishes dramatically. In a normal distribution, more than
95% of the cases fall within +/- two standard deviations of the mean. Fewer than two cases in a 1000 fall
plus or minus three standard deviations from the mean. If you have a performance, aptitude or
characteristic that falls more than three standard deviations from the mean, you are exceptional in that
regard (either exceptionally poor or exceptionally good).

The fact that the normal curve can be quite useful for description raises the question of how one can assess
whether one’s distribution for a variable is approximately normal. Two descriptive statistics enable us to
determine if a variable is approximately normally distributed. The skewness statistic indicates how close a
distribution is to being symmetrical. A positive skew indicates that the mean of the distribution is larger
than the median, whereas a negative skew indicates that the mean is pulled to be less than the median. A
normal curve has a skew of zero. Kurtosis is a second statistic that researchers employ to determine
whether a curve is approximately normal. Kurtosis indexes how high or peaked the curve is. A normal
curve has a kurtosis of 3. A kurtosis of greater than 3 indicates a peaked and narrow distribution, whereas a
kurtosis of less than three indicates a relatively flat curve. The formulas for calculating skewness and
kurtosis statistics are rather complex. The computer calculates them for us with great ease. If a distribution
is approximately normal, we can use our knowledge of the normal curve to describe the distribution.

107
Is Grading on the Curve Really a Good Idea?
When I turn back a test students usually ask whether I graded the test on the curve. This is a case where
students would often be rather dismayed if I truly gave them what they asked for.

To “grade on the curve” means that the instructor grades on the actual distribution of scores rather than on
predetermined standards (i.e., getting a certain percentage of the questions right for each grade). Grading on
the curve assumes a normal distribution of exam scores. Under this scheme, the highest scores in the
distribution get As and the low scores on the test get Fs. To ask an instructor to grade on the curve, also
accepts the notion that the notion that the distribution of tests is something approximating a normal curve.
To grade on the curve, the instructor would select some a standardized score for each grade. The most
frequent grade would be a C, since this is the middle of the grading distribution. In addition, the number of
As would perfectly match the number of F’s. These categories would be the most infrequent categories.
Moreover, the number of B’s would perfectly match the number of D’s. The proportion of students getting
each letter grade would be set ahead of time. A typical scheme might be 10% of the class might receive
A’s, 10% F’s, 20% B’s, 20% D’s, and 40% C’s.

One of the first questions you might want to ask is what the mean and median scores for the test were. If
you have a normal distribution, these scores should be fairly close to each other. If the class average is
higher than 75%, which is a mid-range C on our grading scale at U of L, then the grades of the class will
actually be lowered if the instructor grades on the curve! You would likely become rather agitated with the
instructor if she graded on the curve rather than on the predetermined standards. However, the reasons to
grade on the curve when grades are too high are just as valid as the reasons to grade on the curve if grades
fall significantly below a 75% average. If knowledge of the subject is assumed to be normally distributed,
then you can make a case that an abnormally high proportion of A’s and B’s points to an invalid test. From
a statistical point of view, grading on the curve involves assigning grades by z-scores rather raw scores. On
some exams, grading "on the curve" will raise student grades, but other instances it would actually lower
them.
Exercise: Determining Proportions on the Normal Curve
If a distribution is approximately normal, we can closely approximate the percentage of cases that fall
below or beyond a given point, or between two points simply by knowing the z-score(s) of the relevant
point or points. In this exercise, you will determine the percentage of cases falling beyond or below a
certain point in a normal distribution. Assume that the scores on a test that you just took are approximately
normally distributed. Use the normal curve proportions chart. You will probably want to draw a diagram
for each problem to determine what proportions you will add and subtract. You may want to estimate the
proportions visually first by looking at a picture of the normal curve with the portions of the curves marked
out. Then compare your visual estimates with your final answer. Your visual sense should be fairly close
to the answer than you get via the calculations that you do.

1. If you score 1.2 standard deviations above the mean on a test, what percentage of the students in the class
scored below you on the test?

2. If your friend scored -1.3 standard deviations below the mean, what percentage of the students in the
class scored better than your friend?

3. What percentage of the students who took the test scored between you and your friend (i.e., between -1.z
and 1.6z)?

4) On this test, Jan had a score of 75. The class mean was 80 and the standard
deviation for the test was 10. Assuming an approximately normal distribution of test
scores, what percentage of the students taking the test scored better than Jan?

5) What percentage of the cases on the exam fall between .8 z and 1.6 Z?

109
Exercise: How Exceptional are Janelle's Performances?
In an earlier exercise, you helped Janelle decide what track events she should pursue. Now help Janelle
determine her actual ranking in comparison to other girls her age in the following events. Assume that you
are dealing with an approximately normal distribution. What percentage of girls her age can she outperform
in each event? What percentage of girls her age does better than she does in each event? Use the Normal
Curve Proportions Table to derive your answers. Remember for running events, better performances
correspond to negative scores.

Event: Janelle’s Performance Group Mean Standard Dev.


Pole Vault 10’ 8’ 1’
200 Meter Dash 25 seconds 28 seconds 2.0 seconds
400 Meter Dash 61 seconds 68 seconds 5 seconds
Describing Variable Relationships
Some statistics describe relationships between variables. The simplest case is to describe the relationship
between two variables (i.e. a bivariate relationship). This description answers the following question: As
one variable increases in value, what tends to happen with the values of the second variable (e.g., What
happens to the price of a house as the size of the house increases?). When we have variables that are
measured at the interval level or above, we can write equations to describe these relationships. In the case,
of nominal or ordinal level measurements, correlation coefficients only summarize the strength of the
relationship between the two variables.

A good prediction equation allows a researcher to predict the value of one variable when one knows the
value of a second variable. For instance, by knowing your age, I also have some insight into the kind of
music you are likely to prefer. While there are many exceptions to the rule, the underlying correlation
between age and music preferences is the primary marketing tool that drives formats for commercial radio
stations. Radio marketers try to identify a niche in the market and to appeal to a specific demographic
segment in that market (e.g., women between the age of 18 and 34). Knowing the relationship between
variables helps one predict an important outcome variable: significantly reduce one’s prediction error
relative to merely guessing. In fact, a researcher can calculate the likely amount of prediction error once he
knows the underlying relationship between the variables.

The simplest way to portray the relationship between two variables is to visually display the relationship on
a graph. This is done by putting values of one variable on the X axis and putting the value of the other
variable on the Y axis and plotting the individual cases. It is a good idea to start with a visual graph,
because many equations or models can be used to describe variable relationships. Looking at a graph gives
you a good sense of which model will best fit the data. For instance, if your visual display seems to suggest
a curvilinear relationship between two variables (i.e., as one variable increases in value, the second variable
first increases in value and later decreases in value), it does not make sense try to fit a line to the data. To
do so might lead you to conclude that no relationship exists between the variables when there actually is a
strong relationship between the two variables.

Why not just stop with the graph? The answer to this question is two-fold. First, a graph does not give us a
prediction tool. If I know the value of one variable, what specific value should I predict for the second
variable? Second, the graph does not give us a precise measure of how much prediction error we reduce
when we predict the value of one variable from the value of a predictor variable. Statisticians develop
equations to make guide their predictions of one variable when one knows the value of a second variable.
They enter the known value of the predictor variable into the equation and predict the value of the second
variable (e.g., an insurance agent enters your age into the equation to predict how likely you are to be
involved in a crash and hence make a claim against the insurance company).

Statisticians also compute correlation coefficients to describe the strength of the relationship between two
variables. By knowing the correlation coefficient one can easily calculate how much more accurate one’s
prediction is than mere guessing. Different correlation coefficients are used to describe different types of
relationships (i.e., linear, nonlinear). Variables measured at different levels have different measurement
properties and therefore require different statistics (i.e., variables measured at nominal, ordinal and
interval/ratio levels).

Statisticians usually start by applying the simplest equations to describe the relationship between two
variables. In the case when one has two variables measured at the interval or ratio level, this is done by
finding the equation that describes the straight line that best fits the data-points. This estimation technique
is linear regression. A linear regression model assumes that there is a constant rate of increase of variable
Y, for each unit increase in variable X (e.g., crime rates decrease as a constant rate as a cohort gets older).
The best fitting line is the one that has least prediction error of all of the lines that can be drawn through the
data points. If you are wondering how one conjures this best fitting line and its equation, the answer is that
statisticians use a mathematical method called matrix algebra to calculate them. We won't cover matrix
algebra because it is a rather tedious business. However, computers are very willing to calculate these

111
equations for us, and we are very glad to let them.

The best fitting line is described with a simple equation which takes the form Y=a +bx, where Y=the
predicted value of Y, a= the y intercept when x is equal to zero, and b=the slope or pitch of the line, and X
is the value of the predictor variable. Translated this means that we can predict the value of Y by
multiplying the value of X by the line’s slope, and add the value of the Y intercept. The value of a is
determined by calculating what the value of Y would be when X is zero. When the value of b is positive,
one is describing a positive relationship between two variables (i.e., as X increases in value, Y also
increases in its predicted value). When the value of b is negative, one is describing a negative or inverse
relationship between the two variables (e.g., as the value of X increases, the predicted value of Y
decreases). The steeper the pitch of the regression line, either in the positive or negative direction, the
stronger the relationship between two variables. A very slight pitch indicates a relatively weak relationship
between the two variables. A pitch of zero (i.e., a horizontal line through the data points) indicates that
there is no relationship between the variables, as there is no predictable change in Y as the value of X
increases.

When one has a linear relationship between two variables, researchers are also interested in describing the
direction (i.e., positive or negative) and the strength of the relationship with one number: the Pearson
correlation coefficient. This coefficient describes relationships of variables measured at the interval or
ratio level. This correlation coefficient has a potential range from -1 to +1. A correlation of -1 indicates the
strongest possible negative relationship between two variables: one could perfectly predict the decreasing
value of Y as the value of X increases. A value of +1 indicates the strongest possible positive relationship
between two variables. A correlation coefficient of zero indicates no discernible relationship between the
variables (e.g., a straight line drawn through a random scatter of points). In summary, the direction of a
relationship is indicated by whether the correlation coefficient is positive or negative, and the relative
strength of the relationship is indicated the absolute value of the correlation coefficient (i.e., how far it is
from zero). A correlation of -.3 indicates the same strength of relationship as a correlation coefficient of .3.

There are two other numbers that we must to calculate from the correlation coefficient to identify the
proportional strength of relationship between two variables. The Pearson correlation coefficient is a
logarithmic index of the relationship’s strength; it is the square root of the proportion of variance that the
regression equation accounts for. Researchers have to calculate the square root of this proportion in order
to get a number that indicates whether the relationship described is a positive one or a negative one. One
needs to calculate the coefficient of determination to determine the absolute amount of prediction error or
variance eliminated by using the linear prediction equation. To get the coefficient of determination you
simply square the Pearson correlation coefficient and multiply by 100 (e.g., .3 x .3 x 100=9%). A
coefficient of .30 means that one has 9% less error or variance when using X to predict Y, than when one
simply guesses the value of Y by knowing the mean of its distribution.

To put the coefficient of determination in perspective, you should also calculate the coefficient of
nondetermination. This is simply the amount of variance or prediction error that is still unaccounted for by
the linear regression equation. Calculate the coefficient of nondetermination by subtracting the coefficient
of determination from 100%. In the case of a correlation coefficient of .30 with a coefficient of
determination of 9%, the corresponding coefficient of nondetermination is 91% (i.e., 100%-9%=91%).
Together, these statistics help us understand the practical significance of a correlation.

When one does not have interval or ratio level measures, one must turn to other correlation measures to
describe the strength of the relationship between two variables. When two variables are measured at the
nominal level, Cramer's V can be used to describe relationship strength. This statistic ranges from zero to
1, with 0 indicating no relationship between the variables, and 1 indicating a perfectly predictable
relationship (e.g., if I know your sex, I can also perfectly predict whether or not you like country music).
When you have rank-order or interval level data, one can do use the Spearman's rho. This correlation
coefficient has pretty similar properties to the Pearson correlation coefficient, with the correlation ranging
from +1 to -1, and a zero indicating no correspondence between the ranks on two variables. Consult the
information sheet on page 118 for additional information about each relationship statistic.
Integrating Correlation and Regression
This reading further explains the concepts of the coefficient of determination and the coefficient of
nondetermination in linear regression. Let’s begin with a review and a graphic illustration. When we have a
correlation 0, we also have a regression line that also has a slope of zero. The distribution between two
variables such as length of one's toes and musical ability should look something like the following
distribution.

Musical
Ability Regression
Line

Length of one’s little toe

The relationship charted above shows no covariation between the variables. As one's small toe gets longer,
there is no corresponding change in one's musical ability. The best fitting regression line is a horizontal line
that is at the mean for musical ability. Another way to think of this is that if you had to guess any
individual's musical ability, you would be best to guess right at the mean of the distribution for every
individual. If you were to calculate your prediction error using this strategy, you would have less prediction
error or error variance than for any other estimate strategy. Error variance is = (sum (estimated value-
actual value)2)/number of points estimated in the distribution.

Now let's imagine the opposite kind of distribution, one in which there is a perfect correlation of +1
between the variables of house size and house cost. In this example, if I know the size of a house, I can
perfrectly predict the cost of the house. The predicted values and the actual values are all the same. The
prediction error in this case is equal to zero. All of the points fall directly on the regression line.

House Cost Regression


Line

House Size

Of course, there are very few real world distributions where there is a perfect correlation between two
variables. As regression line becomes steeper, the correlation coefficient that describes how closely the data

113
points fit around the line also increases in magnitude. Conversely, as the regression line flattens out, the
correlation coefficient becomes correspondingly smaller. In the example below, we see that there is a
general trend. As a person's communication apprehension increases, there is also a corresponding increase
in the person's preference for larger classes. There is a definite relationship, but there is still a fair amount
of prediction error or error variance. The error variance indicates how far the actual values deviate from
the predicted value on the regression line.

Preferred
Class Size Regression
Line

Communication Apprehension

Let's say that the regression equation for the above relationships is as follows:

Predicted value of Y (preferred class size=15+1X). Moreover, let us say that the actual Pearson correlation
between the two variables is .50. This would give us a coefficient of determination of 25% and a
coefficient of nondetermination of 75%. Now we can see visually what the coefficient of determination
means. Essentially it means that you reduce your error variance by 25% when you use the prediction
formula provided by the regression line compared to simply using the mean preferred class size to estimate
preferred class size for any single person (e.g., If a person has an apprehension score of 2, his or her
predicted preferred class size will be 17 students). The coefficient of determination gives you the
proportional reduction of prediction error that you get by using the regression formula. For instance, if
your variance is about 10 using the mean of the distribution, your variance will be 7.5 when you use the
regression line. coefficient of nondetermination you the proportion of prediction error that remains.

The concept of proportional reduction of prediction error is very useful. It can be applied to many
prediction formulas outside of the regression examples and the interval/ratio level measures. You can think
of the coefficient of determination as an indicator of how much we have been able to reduce our
uncertainty. Correspondingly, the coefficient of nondetermination is the amount of certainty that remains as
we try to understand and predict the variation of the target variable.
Multivariate Tools for Describing Variable Relationships
Our previous readings on describing variable relationships have been limited to bivariate relationships (e.g.,
simple correlation that investigates only two variable relationships at a time). This reading briefly defines
and explains the tools multivariate analysis, or statistical tools that take into account more three or more
variables at a time. These statistical tools include multiple regression, partial correlation, and causal
modeling.

Multiple regression is an extension from simple linear regression in that it uses several independent
variables to predict a single dependent variable. Multiple regression applies when one has interval or ratio
level variables for both independent and dependent variables. There are comparable techniques for when
one has variables at the nominal or ordinal level (e.g., logistic regression). The multiple regression
approach formulates an algebraic equation that best predicts the criterion variable using a combination of
predictor variables. For example, I might want to predict frequency of teenagers’ marijuana using the
predictor variables of age and sensation seeking (a personality variable). An algebraic equation will be
created in the form of Y=a +b1x1+ b2x2 (i.e., the predicted value of frequency of marijuana use (Y) is a
function of the X intercept (a) plus the age of the child (x1) times its slope (b1), plus the teenager’ sensation
seeking score (x2) times its multiplier or slope (b2). This equation is the same as the simple regression,
except that it adds an algebraic term for the second predictor variable. Indeed, you can keep adding
additional predictors as far as your sample size reasonably permits.

Multiple regression enables researchers to build more powerful predictive models. When one has 4 or
more predictor variables, the researcher may be able to account for more than 2/3’s of the variance of the
criterion variable. For instance, John Gottman a researcher in the area of marital distress was able to predict
which couples will get divorced over a four year period using seven indicators with a multiple coefficient
of determination of more than 90%.1 This is one reason that multiple regression is a popular statistical
technique in communication research and in social science research. Ideally, the researcher should have
predictor variables that are uncorrelated with each other. If there is considerable multicollinearity or
correlation among the predictor variables, it is difficult to untangle which individual variables are the
strongest predictors of the criterion variable, because the parameter estimates (the slope terms) vary widely
from data set to the next.

Partial correlation is another statistical technique that is related to multiple regression. Partial correlation is
used to check what kind of relationship exists between two variables (e.g., communication apprehension
and GPA) when a background variable that is correlated with both variables is taken into account (e.g.,
self-esteem). In this example, one can have the computer calculate a partial correlation between
communication apprehension and GPA, after the scores for each individual student have been adjusted for
the relationships between self-esteem and GPA, and self-esteem and communication apprehension. Partial
correlation is an invaluable technique for testing the effects of potential confounding variables and artifacts
in communication research. It is widely applied in survey research when researchers are unable to use
random assignment to group to control for self-selection differences between groups.

Causal modeling combines the techniques of multiple regression and partial correlation to investigate the
relationships between multiple independent and dependent variables simultaneously (what we called
systemic causation in an earlier unit). Causal modeling is utilized when one has a model that includes
intervening variables and moderator variables. Causal modeling allows the researcher to describe the paths
of correlation between the multiple variables when the relationships for other variables are held constant.
Causal modeling is both useful and dangerous. It is a useful descriptive tool, but it is subject to
considerable abuse in that one can tinker with the data until one comes up with a best fitting model. Then
the question is whether one has discovered something real, or whether the model is simply an artifact of the
computer looking for order in the data set. Causal modeling is most useful when it is driven by an existing
theory. In this case, one is at least testing hypotheses and the temptation to let the computer make up its
own solution is held in check.

115
There are many other statistical tools for describing how variables relate to each other. One can use factor
analysis to identify latent variables that exist between indicators. Alternately, one can use canonical
correlation, which combines multiple regression and factor analysis, to determine the relationships between
multiple predictor and multiple criterion variables. Before the advent of the computer, these statistical tools
were too labor intensive for researchers to calculate many of them. Today, however, a researcher can use
computer to calculate such complex statistics.

The use of more complex statistics does not necessarily make for better data analysis. The ease of
calculation tempts researchers to explore their data set and “crunch the numbers until they scream.” If you
go on a fishing expedition in your data, you can usually come up with something, even if the discovered
relationships are ones that are likely to be statistically significant just by chance alone. I have read research
papers that set out looking for any variable relationships they could find. The researchers then provide
extensive after the fact explanations for why the discovered relationships exist. What was not obvious
before testing often seems obvious after the fact. This is a common manifestation of the post-hoc
fallacy. We often look back at events and see clear causal patterns than we failed to see before hand. The
saying that hindsight is better than foresight captures this logical fallacy. When it comes to analyzing
results, the human mind can usually come up with a "compelling causal explanation” for a correlation or
time-ordered relationship. Like all other statistical tools, complex or multivariate statistics can be useful if
they are used appropriately. However, when used without discretion, they can lead to misinformed
interpretations. Sometimes complex statistical tools serve to best answer a research question, but simple
statistics are sometimes preferable too. Wisdom consists of knowing when to use each.

Citation Note:

1. Gottman, J.R. (1994). What predicts divorce? The relationship between marital processes and marital
outcomes. Hillsdale, NJ: Erlbaum/

Some Common Measures of Association


Common Measures of Conceptual Basis Statisti
Association c Type
Cramer's V: Indicates the strength of the relationship Non parametric
between two nonparametric nominal level
variables. It is based on the Chi-Square. A
value of 0 denotes no relationship or
independence of the variables. A Cramer's V
of 1 indicates the strongest possible
relationship of the variables. If you know the
value of 1 variable, you can perfectly predict
the value of the second variable.

Spearman's Rho: Indicates the strength and direction of Nonparametric


nonparametric relationship between two
ordinal level variables. Values of the statistic
range from +1 to -1. A +1 value indicates a
perfect positive relationship between the
variables, whereas a value of -1 indicates a
perfect inverse relationship between the
variables. A correlation of zero indicates no
discernible relationship between the variables.
Pearson Product Moment Indicates the direction and strength of a Parametric
Correlation: relationship between two interval level
variables. Values of the correlation coefficient
range between 1 and -1. A value of 0 indicates
no relationship between the variables. A
value of 1 indicates an increasing relationship
between the variables, whereas a value of –1
indicates an inverse relationship between the
variables.
Partial Correlation: Describes the strength of a correlation between Parametric
two parametric variables when a third variable
is held constant. It is interpreted in the same
way as the Pearson correlation. The partial
correlation is employed when you want to
determine whether a relationship between two
variables is confounded or suppressed by the
background variable.
Regression: This is the predictive side of the Pearson Parametric
product-moment correlation. Regression
involves an equation that predicts a value for
one variable at a given value of a second
variable. For terms of this section, we are
primarily talking about linear regression,
which is the simplest type of regression.
Multiple Regression: An extension of regression which predicts a Parametric
dependent variable as an algebraic linear
combination of two or more predictor
variables. Multiple regression requires
interval or ratio level measurement for both the
independent and dependent variables.

117
Exercise: Choosing Measures of Association
In the following examples, identify the measure of association that should used to describe the relationship
between the described variables. Explain your reasoning. You will need to figure out the level of
measurement for each variable and then select the appropriate statistic.

1) A manager of a retail computer store believes that women are more likely to buy maintenance
agreements when they buy a computer than men are. Which measure of association should she use to
determine the strength of this relationship?

2) You are on a search committee. There are 12 applicants for the job. Each person on the search committee
rates each of the 12 job candidates in terms of their perceived qualifications, giving a rank of 1 to the
most qualified, 2 to the second most qualified and so on. You want to determine how much your rankings
agree with another person on the committee.

3) An organizational researcher wants to describe the relationship between tenure (length of time a person
has worked for an employer) and a person's salary as measured in thousands of dollars.

4) A school researcher believes a child's performance on science tests is a joint function of the child's
reading and math abilities. He wants to see how much of the variation in science scores he can predict
from knowing a) the child's current reading level, and b) the child's current mathematics level.

5) A school researcher believes that there will be no correlation between the amount of time that a child
spends on homework and his/her scores on achievement tests when the income level of the child's family
is held constant.
Exercise: Correlation and Regression Problems
1A) Identify the strongest and the weakest correlation coefficients from the following series: r1=-.60;
r2=-.10; r3=. 30; r4=.50.

1B) Calculate the coefficients of determination for the strongest and weakest correlations in problem 1
above. How much more shared variance does the strongest correlation coefficient account for than the
weakest correlation coefficient?

1C) Now calculate the corresponding coefficients of determination for the two instances that you calculated
above in 1B. Interpret what each coefficient of nondetermination means in a practical sense.

2) As a researcher, I am interested in the kind of relationship there is between success in school for

119
communication majors (as indicated by GPA), and success in their profession (as indicated by annual
salary two years after they graduate from college). I find that the regression line that best describes the
relation between these two variables in my study is as follows: Income=$25,000 + ($2000 X GPA).
Calculate the predicted income for graduates who had undergraduate GPAs of 2.0, 3.0, and 4.0
respectively.

3) Before my first child was born, my wife and I went through a typical Lamaze class. The childbirth
educator predicted the sex of the babies that were to be born in the class. She turned out to be right in 7
out of 9 cases. When I asked her how accurate she was predicting the sex of babies in general, she
informed me that she had correctly predicted 78% of the 300 pregnancies in her child-care classes over a
three-year period. A person who was merely guessing would on average make the correct prediction 50%
of the time. How much of the prediction error was the childbirth educator able to eliminate, compared to
someone who merely guessed the sex of the child? Explain.
Exercise: Additional Correlation and Regression Problems
1) Determine the coefficients of determination and nondetermination for a correlation of -.40. Explain
what each of these coefficients mean.

2) In class, we focused on linear models of variable relationships. When is it inappropriate to use linear
regression to model a data set?

3) A researcher has data on the relationship between household income and the number of magazines that
a household purchases. The household with the highest income in the sample is $150,000. What
difficulties may the researcher get into if she uses the regression equation to predict the number of
magazines in households with incomes of $300,000 or more?

4) Suppose the number of magazine subscriptions per household is best described by the equation:
Predicted # of household subscriptions=2.0 + (1.5 x # of people in the household). What number of
subscriptions would you predict for a household of 2 persons? For a household of 4 persons?

121
Perils in Describing Variable Relationships
Researchers should avoid several pitfalls when describing variable relationships. The first of these pitfalls
is fitting an inappropriate model to the data. For instance, if you try to fit a straight line through a data set
that has a curvilinear relationship, you will find no relationship or get a highly misleading indicator of the
relationship that actually exists. Although a linear model best describes many variable relationships, some
relationships do not have a constant rate of change between the variables. A researcher should visually
inspect a graph of the target variables before applying a linear regression model. Researchers can also fit
different models to the data and see how much shared variance the competing models account for.

Researchers sometimes make the mistake of extrapolating or making predictions far beyond their
data points. Let’s say I want to predict the value of a house that has 10,000 square feet. Assume that I
have data on the relationship between house size and house cost in Louisville of houses up to 6,000 square
feet. I can certainly use the prediction equation to predict the probable price of the 10,000 square foot
house, but I run the risk of being wrong in my prediction. I can’t be certain that the relationship between
variables will continue to follow the same pattern outside the range for which I have data. What was a
linear relationship may begin to flatten out (e.g., the law of diminishing returns), or become curvilinear
beyond a certain point (e.g., correlation between room temperature and comfort).

Extrapolating beyond one’s data points also creates problems when one makes predictions about the future.
We usually assume that the future will continue to be a linear extension of past trends. If the crime rate is
going up, we think it will continue to increase next year. If stocks have increased more than 20% in value
each year for three years, then we think the trend will continue indefinitely. In the short term, predicting
the immediate future from the immediate is sensible. However, it is unwise to project too far into the future
because parameter relationships often change. Based upon historical evidence, we know that stocks tend to
increase in value over time. However, we don’t know which stocks will increase in value, and when they
will increase in value. The wise investor will diversify her investments and stay in the market for the long
haul. There are some long-term aggregate patterns that one can predict based upon extensive historical data,
but we don’t know when and where the gains and losses will occur. When it comes to predicting the future,
humility is always the best policy.

Outliers create a final problem for researchers trying to describe variable relationships. An outlier is a data
point that is extreme in some regard. Having an outlier in a data set can lead to serious distortions when it
comes to trying to describe a relationship between two variables. Outliers sometimes indicate that a data
collection or data entry error has occurred. For instance, if you enter a 9 into a column when you really
intended to enter a 0, you have likely created a serious outlier problem. Experienced researchers look for
outliers in their data sets before they begin doing statistical analyses. However, sometimes an outlier is
quite real. If one has a large data set, the influence of one outlier will probably be small. However, if one
has a relatively small data set (i.e., less than 100 cases) the distorting influence of one or two outliers can be
profound. When you inspect the data, you may see a real relationship between the variables if you exclude
an outlier from the data. Some researchers recommend that an outlier be excluded from the statistical
analysis. Excluding an outlier can be justified because of the serious distorting effect that the outlier has on
the regression equation. However, some researchers dislike excluding outliers because this may allow
experimenter bias to creep into the statistical analysis.

The best solution may be to compare the statistics for when the outlier is included and when the outlier is
excluded. The reader can decide if the researcher’s decision to exclude the outlier makes sense. In some
cases, outliers can become the focus of in-depth case studies that combine qualitative and quantitative data.
If one finds an important exception to a general rule, carefully examining the exception may extend our
understanding of the phenomenon in an important way. For instance, if we find that a child from an
impoverished background does much better in school than one would expect given the general correlation
between poverty and school performance, then we may learn something important by studying this
exceptional case. Researchers are often most concerned with the "typical” and the “normal”, however,
there are often good reasons to closely scrutinize exceptional cases or outliers.
Exercise: Charlie's Career Decision
1A) Charley faces a difficult career choice. His career options include becoming a sports broadcaster, an
opera singer or a Wall Street investment banker. Charlie wants to pursue the profession for which he
has the greatest aptitude. Fortunately, there are objective tests that provide valid information about a
person's aptitudes in each area. Charley takes the three tests-the sportscaster test, the opera singer test
and the investment banker test. Following are Charlie's scores on each test along with the mean scores
and standard deviations for each test. Calculate Charlie’s standardized Z score for each test. Then
explain which profession Charlie should pursue based on these test results.

Profession Aptitude Score Mean Score Standard Deviation


Sportscaster 80 60 8
Opera Singer 90 70 20
Investment Banker 78 50 10

1B) Assume that scores on the three aptitude tests are normally distributed. Determine Charlie’s percentile
ranking in each of the three areas.
Unit 9: Objectivist Paradigm-Inferential Statistics
The unit discusses how sample statistics are used to make inferences unknown population parameters. The
unit also covers hypothesis testing using statistical decision procedures.

Unit Objectives:

Upon completing this unit, you should be able:

9-1: To describe the steps in hypothesis testing.


9-2: To explain the role that a sampling distribution plays in hypothesis testing.
9-3: To formulate a research hypothesis, specify how the hypothesis will be tested, and interpret relevant
statistical results.
9-4: To identify the appropriate statistic for testing for differences between the distributions of two or more
subgroups.
9-5: To explain the differences between parametric and nonparametric statistics.
9-6: To assess the practical significance of statistically significant results.
9-7: To explain the conceptual bases of Type I and Type II error.
9-8: To identify the tradeoffs between Type I error and Type II error in particular cases.
9-9: To identify instances of statistical malpractice in news media reports and other forms of public

123
communication.
9-10: Explain how one can use statistics to ethically select relevant examples to make a persuasive case.
Statistical Procedures for Testing Hypotheses
In a previous unit on setting up research questions and hypotheses, you will recall that we discussed the
general logic of hypothesis testing. As you will recall, this involves setting up a clearly stated research
hypothesis and a null hypothesis. The researcher collects relevant data and then tries to determine if the
study results are likely if the null hypothesis is true. Think back to the Pepsi/Coke demonstration. It is
relatively unlikely that a person who is guessing will correctly guess whether which cola is being presented
on five successive trials. If the research results show that the null hypothesis is likely not true given the
current results, the researcher rejects the null hypothesis and accepts the research hypothesis as a better
explanation of the study's results. The rest of this section picks finishes up our discussion of hypothesis
testing by exploring the statistical procedures in our previous discussion of hypothesis testing did not
explore the statistical mechanics of hypothesis testing.

All statistical hypothesis testing involves three elements: a) a test statistic that summarizes the study results,
b) a researcher selected statistical significance level, and 3) a relevant sampling distribution that specifies
how the test statistic is distributed by chance. Let's explore each of these elements a bit deeper.

You are already pretty familiar with one kind of test statistic: the correlation statistic. Correlation
coefficients such as the Pearson correlation coefficient, Spearman's rho, and Cramer's V all describe the
strength of association between two variables. In contrast, test statistics such as the chi-square, the t-test,
and the analysis of variance F-test that summarize the degree of difference on the dependent variable
between compared groups. These statistics all have different uses and properties, but they all serve the
same function, they summarize study results in a single number.

At this point, the researcher asks how likely is it, given my sample size, that I would get a statistic of
this magnitude if the null hypothesis is true. For example, what is the probability of getting a Pearson
correlation coefficient of .40 between two variables with a sample size of 100 if the null hypothesis is true?
If the given result is sufficiently unlikely, the researcher will reject the null hypothesis and accept the
alternative hypothesis as a better explanation of the study results. This raises the question of what is
the standard for sufficiently unlikely? The answer is that the researcher selects the level of improbability
that the test-statistic must rise to before the null hypothesis is rejected. As you will recall, the conventional
standard of statistical significance in much social science hypothesis testing is .05. Translated this means
that the researcher will only reject the null hypothesis if the test statistic is of sufficient magnitude such that
it would occur less than five times out of a hundred due to chance alone. If the actual statistics is less
probable than the set level of statistical significance, that is less than .05, the researcher rejects the null
hypothesis and accepts the research hypothesis.

There is nothing sacred about a statistical significance level of .05. For instance, it is sometimes
appropriate for a researcher to set a less stringent level of statistical significance in exploratory research
(e.g., .10). In addition, if there is a high practical cost associated with rejecting a true null hypothesis, a
more stringent level of statistical significance may be appropriate. Note that researcher selects the level
of statistical significance before the commencing with hypothesis testing. To adjust one's statistical
significance level after the fact so that one can "reject" the null hypothesis is cheating.

This brings us to question of how we can estimate how likely a test statistic of a given magnitude would be
if the null hypothesis is true. This is where statisticians come to our aid. They provide us with information
about the probability distribution of the test-statistic at a given sample size. To return to the correlation
example listed above, if the null hypothesis is true and there is no relationship between two variables, I
could go to a random numbers table and generate two sets of 100 numbers and correlate them. If I were to
draw a second sample 100 numbers and repeat the calculations, I would get a second chance correlation. If
I were to continue doing this, I would find that the resulting correlation coefficient under this random
procedure would vary from sample to sample due to random sampling error. The complete distribution that
describes this variation of the given statistic over an infinite number of samples is called a sampling
distribution. Over the decades, statisticians have mathematically derived the sampling distributions for
test-statistics at each sample size under the conditions of the null hypothesis.

125
In many statistics books, information about relevant sampling distributions is displayed in a critical values
table. A critical values table specifies the magnitude that a given test statistic must exceed in order to reject
the null hypothesis with a specific sample size. For instance, for the Pearson correlation coefficient, with a
sample size of 100 and a statistical significance level of .05, the researcher must have a correlation
coefficient of at least .17 or larger in order to reject the null hypothesis. In other words, a correlation
coefficient of .17 is sufficiently large that it would occur only 5% of the time due chance alone. If I have a
correlation coefficient of greater than .17, I can reject the null hypothesis. In the same example, if my
statistical significance level was set at .01, my correlation coefficient would need to equal or greater than .
20 in order for one to reject the null hypothesis.

If you take a close look at any critical values table for any test statistic, several trends become apparent.
The first is that the likelihood of rejecting the null hypothesis increases when the researcher has a large test-
statistic: the larger the test-statistic, the less likely it is that that statistic occurs due to chance alone. The
second trend is that one must have somewhat larger test-statistics to reject the null hypothesis as one's level
of statistical significance becomes more stringent (e.g., is set at .01 rather than .05). A third trend is that
the magnitude of the critical diminishes as the size of one's sample increases. This is because there is less
variation from sample to sample as sample size increases (i.e., the sampling distribution of test statistics is
more compressed as sample size increases).

Statistical Testing using Computer Generated Statistics


Few researchers consult critical values statistical tables these days. Computer programs such as the
Statistical Package for the Social Sciences (i.e., SPSS) calculate the test statistic and provide a probability
estimate (i.e., p value) for the test statistic from the relevant sampling distribution. The p value indicates
how likely it is that a test statistic of a given magnitude would occur if the null hypothesis is true (e.g.
p<.03 indicates that the test statistic of the given size would occur less than three times in a hundred due to
chance alone). The researcher notes the magnitude and the direction of the test statistic and then examines
the p value associated with the test statistic. The researcher rejects the null hypothesis if the p value is
smaller than the researcher selected statistical significance level. If the p value is more probable than the
researcher's statistical significance level, then the researcher retains the null hypothesis. In short,
hypothesis testing became much easier with the advent of sophisticated statistical programs such as SPSS.

One-tailed vs. Two-tailed Statistical Testing


The process of statistical significance testing is somewhat different for research hypotheses than it is for
research questions. A typical research question might ask, "Are variable X and variable Y significantly
correlated with one another?" The research question does not specify whether there is a positive or a
negative relationship. In this situation it is necessary to test for the possibility for both positive correlations
and negative correlations. Hypothesis testing for research questions is called two-tailed testing because it
tests for statistically significant relationships in either direction. In contrast, a research hypothesis might
specify that, "There will be a positive correlation between variable X and variable Y." This instance of
statistical testing is called a one-tailed statistical test as it only requires testing for a relationship or
difference in one specified direction. From a practical perspective, it is important for researchers to keep in
mind whether they are using one-tailed or two-tailed tests. One-tailed tests of statistical significance are
more powerful than two-tailed tests in that the size of the test-statistic required to reject the null hypothesis
is smaller. Computer programs provide both one-tailed and two-tailed tests of statistical significance, but
since two-tailed statistical tests are the default option, the researcher must request the one-tailed test option.
The one-tailed test option is that a researcher should use when he is testing a hypothesis.

Statistically Significant Results and Practically Significant Results

When one is able to reject the null hypothesis, the results are said to be statistically significant.
Statistically significant results, however, do not always translate into practically significant or important
results. A statistically significant result only means that one can reject the null hypothesis. Whether or not
the results are important is a separate question. Recall that the size of the test statistic needed to reject the
null hypothesis decreases as the sample size increases. When sample size becomes very large even small
relationships can be statistically significant. A large sample size and the statistical power that comes with it
(i.e., ability to detect real but small relationships) are good things, but some researchers misuse these
properties. Researchers sometimes attribute more importance to statistically significant results than they
should.

Research results need to be statistically significant before they can be practically significant, but many
statistically significant results are not very important in a practical sense. The meaning or significance of
results should never exclude human judgment and subjectivity. “Statistical sense” should not be allowed to
crowd out “practical sense” in interpreting research results.

So what standards should we use to judge the practical significance of statistically significant results? One
answer is to say that it depends upon the context. Another standard for evaluating the practical significance
of statistically significant results is to consider the amount of variance that the relationship or group
difference accounts for. In the case of a Pearson correlation coefficient, the coefficient of determination
gives us this information. We might ask how large the coefficient of determination must be to be
considered "practically significant" in communication research. There is not a simple answer to this
question. In communication research, and the social sciences generally, we seldom find correlations of
greater than .6 (or a 36% coefficient of determination). For most bivariate correlations, the majority of the
variance is prediction error. This is not surprising given that many of our constructs are hypothetical and
involve some measurement error. In addition, multiple causal factors usually influence our dependent
variables. Therefore, any single relationship is likely to account for a relatively small proportion of the
variance of the criterion variable. Many researchers consider relationships with Pearson correlation
coefficients of smaller than .2 or -.2 to be rather trivial.

As you might expect, there are exceptions to this rule. For instance, if your dependent variable is very
important, then even a small correlation may be of interest. If the dependent variable is the number of lives
saved, a coefficient of determination 3% may be considered “practically significant.” In addition, small
relationships may be considered practically significant in an exploratory study. Finding a correlation
between input and output gives the researcher a reason for exploring underlying processes more fully. What
is “practically significant” may change when a field reaches a relatively mature stage after the strongest
variable relationships have been verified. Finally, results that can be applied to an important problem may
be “practically significant” even if the amount of variance accounted for is small. Finding that a clinical
intervention reduces speech apprehension will get more attention than a finding that a personality trait
influences speech apprehension. A clinical intervention can be used to help people, but there is little that
can be done to change a person's personality traits. Determining what statistically significant results mean
requires a good dose of human subjectivity, interpretation and common sense.

127
Exercise: Hypothesis Testing-Linear Correlation
Describing the relationship between two variables at a time is the beginning of data analysis, but many
hypotheses include more than two variables. One can have multiple independent variables, intervening
variables, moderator variables and multiple dependent variables. Investigating such relationships is a
complicated business. Following is an example that I drew from a state rankings book that compiles
statistics from a variety of federal government and state related sources. I have selected thirteen variables
that or could be related to the overall death rate in a state. The descriptive statistics for the 13 variables
appear in the table below. A brief description of each variable appears below the table.

Descriptive Statistics of State Variables

N Minimum Maximum Mean Std. Deviation


TAXCIGS 51 2.50 142.50 44.5667 32.52326
NOINSURE 51 5.90 23.50 12.9588 3.78445
HEXPEND 51 2760.00 8166.00 3781.7843 778.08946
SMOKEPER 51 12.90 30.50 22.8059 3.01021
RESPIR 51 23.60 76.60 47.5824 8.89714
HEART 51 191.00 355.00 258.5078 38.94224
STROKE 51 42.10 85.60 63.3294 8.95695
CANCER 51 110.00 261.00 199.4510 29.91676
DEATH 51 680.00 1082.00 890.6863 86.00756
BOOKS 51 1.70 5.00 3.1784 1.08357
VOTE 51 40.50 68.80 53.8065 6.85016
TAXRATE 51 29.20 39.90 33.2922 2.00498
DINCOME 51 18612.00 32820.00 24008.8039 3316.07037
Valid N 51

Taxcig=State tax on a pack of cigarettes


Noinsure=Percentage of population not covered by health insurance
Hexpend=Per capita personal health care expenditures
Smokeper=Percentage of adults who smoke
Respir=Age adjusted death rate by Chronic Lower Respiratory Diseases
Heart=Age-adjusted death rate by heart disease
Stroke=Age adjusted death rate by Stroke
Cancer=Estimated death rate by Cancer
Death=the Age adjusted death rate for each state
Books=Books per capita in public library systems

On this assignment, you need to develop the following steps:

a) State one research question and three hypotheses of concerning linear relationships between selected
variables in the data set. You should state your research question and hypotheses in appropriate form (i.e.,
you must state what kind of relationship you expect to occur between each hypothesis). Give a brief
explanation for why you predict the specific correlations in each respective hypothesis (i.e., why you
expect a positive or negative correlation).
b) State the appropriate null hypothesis for your research question and each of your three hypotheses.
c) State the significance level that you plan to use in testing your hypothesis, including whether you plan to
use a one-tailed or two-tailed test of statistical significance.
d) Find the appropriate correlation and level of significance (i.e., p<value) for each hypothesis from the
appropriate correlation table and report it.
e) Calculate the coefficient of determination and coefficient of nondetermination for each variable.
f) Interpret the results. You should tell what you have decided with respect to the null hypothesis in each
case and discuss the "practical significance" of each finding (i.e., whether the relationship between the
variables is strong enough to have real world importance).

Fictitious Example:

Research Hypothesis: There is a negative correlation between the number of sunny days in a city and the
rate of depression in that city.

Rationale: A lack of sunlight should be a contributing factor to the incidence of depression. Cities in the
far north tend in some countries tend to have higher rates of drug dependency and other factors associated
with depression.

Null Hypothesis: There is not a negative correlation between the number of sunny days in a city and the
rate of depression in a city.

Test of Significance & Statistical significance level -One tailed significance test because this is a research
hypothesis. The level of statistical significance is set at the conventional level of significance .05.
**You may use any statistical significance level you want in your examples.

Results: r=.-50, p<.002.

Coefficient of Determination=-.50 x -.50 x 100%=25%


Coefficient of Nondetermination=100%-25%=75%

Interpretation: We can reject the null hypothesis because the probability that a negative correlation of -.50
is due to chance alone is less than 2 times in a thousand. This is less probable than our statistical
significance level, so we reject the null hypothesis and conclude that there probably is a negative
relationship between these two variables. The coefficient of determination shows that the rate of sunlight is
one important contributing factor to the depression rate in a city. However, the coefficient of
nondetermination shows that the majority of the variation in depression rates between cities is still
unaccounted for.
Research Question Test
a) Research question

b) Null hypothesis

c) Type of test (i.e. one-tailed vs. two-tailed) & Maximum Type I error permitted

129
d) Results & probability

e) Calculate coefficient of determination and coefficient of nondetermination

f) Interpret Results

Hypothesis #1 Test
a) Research Hypothesis & Rationale

b) Null hypothesis

c) Type of test (i.e. one-tailed vs. two-tailed) & Maximum Type I error permitted
d) Results & probability level

e) Calculate coefficient of determination and coefficient of nondetermination

f) Interpret Results

Hypothesis 2 Test
a) Research Hypothesis & Rationale

b) Null hypothesis

c) Type of test (i.e. one-tailed vs. two-tailed) & Maximum Type I error permitted

d) Results & probability level

131
e) Calculate coefficient of determination and coefficient of nondetermination

f) Interpret Results

Hypothesis #3 Test
a) Research Hypothesis & Rationale

b) Null hypothesis

c) Type of test (i.e. one-tailed vs. two-tailed) & Maximum Type I error permitted

d) Results & probability level

e) Calculate coefficient of determination and coefficient of nondetermination


f) Interpret Results
Reference Sheet: Some Common Difference Statistics

Procedure Conceptual Basis Type of


Statistic
Single Sample Chi- Compares differences in categories of Nonparametric
Square one variable. It tests whether the
difference in frequencies between the
categories of the variable differ from
what would be due to chance alone. It
is used with nominal or ordinal level
measurement.
Multiple Sample Chi- Compares the distribution of two Nonparametric
Square groups of people or objects across the
categories of a second variable (e.g.,
Did men and women differ on whether
or not they voted in the last election?).
The chi-square indicates whether there
is a significant difference between the
expected frequencies in the cells, and
the actual frequencies. It is used when
one has two or more variables
measured at the nominal level.
Independent Sample T- Examines differences between two Parametric
Test different groups on a dependent
variable measured at the interval or
ratio level. The t statistic is the ratio of
the difference between sample means
over the average variation within the
groups. For large samples, the
distribution of the T statistic is
approximately normal
Matched Sample t-test Examines differences between two Parametric
measurements derived from the same
group (pretests compared to post-tests).
Conceptually it is similar to the
independent sample T test. The
dependent variables compared between
groups are at interval or ratio level.
One Way Analysis of Used to examine differences in the Parametric
Variance means of more than two groups on a
single independent variable. It consists
of a ratio of average differences
between group means divided by the
average variation within groups. A
significant F test requires follow-up
tests to determine which differences
attain statistical significance.
Factorial Analysis of Examined differences between groups Parametric
on two or more nominal or ordinal

133
Variance independent variables. It has the same
conceptual basis as the one-way
Analysis of Variance. However, it has
the added advantage that it allows one
to test for interactions between
independent variables.
Difference Testing Examples
Single-Sample Chi-Square Using Critical Values
A new care dealer wants to test whether males have a particular preference when it comes to three models
of cars that they buy at an auto dealership. The automotive dealer analyzes the records for a recent 5 week
period. The following pattern of results is obtained.

Men Bought
Model A 80
Model B 30
Model C 40
Total 150

The null hypothesis in this case is: "Among these three models of car, men as a group do not have a
particular preference for the model of car they buy." The car dealer is using a statistical significance level
of .05 to test the null hypothesis.

To test this null hypothesis, one must calculate a single-sample chi-square. The chi-square compares the
actual frequencies with the frequencies that one would expect if the null hypothesis is true. If the null
hypothesis is true, the expected frequency for each model of car would be 150 cars/3 models or 50 cars sold
in each category. To calculate the chi-square, one does the following calculations: 1) Subtract the expected
value from the actual value in each category and square the difference; 2) Divide the squared difference by
the expected value; and 3) Add remaining values for each category together. The calculation looks like the
following below.

Chi-square= (80-50)2/50 + (30-50)2/50 + (40-50)2/50=900/50 +400/50 +100/50=18 +8 + 2=28

Now we need to know how unlikely a chi-square of 28 would be under the conditions of the null
hypothesis. In other words, how unlikely is it to get a distribution like the one above due to chance alone.
At this point we need to check a critical values table for the critical value of the appropriate chi-square
distribution. In this case, we are going to look for the critical value of a chi-square distribution of two
degrees of freedom (The degrees of freedom= number of groups-1 for a single sample chi-square. In the
example above, once you know how many men bought model A and how many men bought model B, you
know the number who bought C. Once two values are known, the third is known as well.).

The critical value for the chi-square distribution at the .05 level of statistical significance is 5.99. In other
words, with this type of distribution, a chi-square of 5.99 would be expected only five times out of 100 if
the null hypothesis is true. Any actual chi-square value that is equal or greater than this value leads us to
reject the null hypothesis. Since our chi-square of 28 is greater than the critical value of 5.99, we reject the
null hypothesis that men as a group do not differ in their preferences among these three types of
automobile. Practically speaking, we can conclude that men register a preference for automobile A.

Chi-Square Example Using Computer Printout


In reality, few researchers calculate the chi-square and then consult the critical values table. Instead, we
rely on programs such as Statistical Package for the Social Sciences (i.e., SPSS) to calculate the test-
statistic and then provide the probability value from the appropriate sampling distribution. The printout
from the SPSS program provides us with the value of the test-statistic and the probability of a test statistic
that large under the conditions of the null hypothesis. If the p value is less than our selected level of
statistical significance, then we reject the null hypothesis and accept the alternate hypothesis.

135
The example that follows came from a survey that was recently conducted for the WLKY TV and Humana.
Both of these organizations sponsor a program entitled "Success by Six". Success by Six promotes early
childhood education and initiatives by parents in programs and public service announcements. The
sponsoring organizations assess the awareness among adults in the metropolitan Louisville area. A
telephone survey was completed using a sample of random telephone numbers from the Louisville
metropolitan area. A total of 198 interviews were completed. However, I wanted to test whether the level
of awareness for males and females was similar because more women than men had completed the survey.
It seems that women are more likely to answer the telephone in a household than males are. Hence, if the
levels of awareness for males and females differed, I need to weight or adjust the estimated percentages in
my final sample to account for these differences in estimating the final population parameter. Following is
the data from our survey. See the labels at the bottom of the table to interpret the results

SEX * AWARE Crosstabulation

AWARE
Aware of
not aware Success
of program by Six Total
SEX male Count 45 24 69
Expected Count 33.8 35.2 69.0
% within SEX 65.2% 34.8% 100.0%
female Count 52 77 129
Expected Count 63.2 65.8 129.0
% within SEX 40.3% 59.7% 100.0%
Total Count 97 101 198
Expected Count 97.0 101.0 198.0
% within SEX 49.0% 51.0% 100.0%

Count= number of people in the given category


Expected Count=The number of people that would be expected under the null hypothesis of no difference
between males and females in their awareness of the program.
% within sex=percentages of lack of awareness/awareness by sex.

We see from the table above that approximately 60 percent of the females in the sample were aware of the
Success by Six Program, but that only 35% of the males were. However, I still needed to test whether or
not these differences in awareness for males and females were within the normal boundaries of what one
might expect by chance alone. The null hypothesis is that the sexes do not differ in how aware they are of
the Success by Six Program. To test this hypothesis, I requested a chi-square analysis from SPSS. The
resulting chi-square and the probability value appear in the chart below.

Chi-Square Tests

Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 11.160b 1 .001
N of Valid Cases 198
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 33.80.

The chi-square of 11.16 is sufficiently large so as to occur less than one time in a thousand due to chance
alone. Hence I reject the null hypothesis and accept the alternate hypothesis that the females are more
aware of the program than are males. This result informed me that I needed to weight the percentages by
sex when it came to estimating the total percentage of people in the metropolitan area who were aware of
Success by Six.

I also calculated the Cramer's V to show that this problem could be approached from the perspective of
correlation as well. The result below shows a Cramer's V coefficient of approximately .24 and that this
statistic is significant at the .001 level. In other words, sex and awareness of the program are significantly
correlated. You will note that the difference testing approach using the chi-square and the correlation
approach using Cramer's V resulted in very similar outcomes.

Symmetric Measures

Value Approx. Sig.


Nominal by Phi .237 .001
Nominal Cramer's V .237 .001
N of Valid Cases 198
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null
hypothesis.

Conclusion:
There are a number of statistics for testing for differences between groups. In this example we have
examined the chi-square. The chi-square is utilized when both the independent and dependent variables are
measured at the nominal level. The first example showed the kinds of calculation that one would do if one
were doing the calculations of the chi-square statistic by hand. The second example exhibited the way that
researchers today actually operate. Contrary to the impression you may have developed in your statistics
class, most researchers do very little calculation. Researchers are now content to focus on what is most
important-interpreting what the statistics mean.

137
Multiple Regression: A Case Study
So far, we have primarily discussed examples of bivariate analysis of two variables at a time. This type of
data analysis has two limitations. Most dependent variables are influenced by a number of variables.
Hence, any single predictor variable is likely to have only a modest coefficient of determination and a
rather large coefficient of nondetermination. Bivariate analysis also does not take into account background
variables that may related to both variables. For instance, if I want to determine how the smoking
percentage in a state predicts the overall age adjusted death rate in a state, I probably should also include
the overall affluence in the state because socioeconomic status is somewhat positively correlated with
general healthy lifestyle variables and negatively correlated with the death rate. If I want to avoid a
confounding the variables of smoking percent and the average disposable income in the state, I should
include the average disposable income as a control variable. That way I can determine what the smoking
percent variable contributes to predicting the death rate above and beyond the variable of disposable
income.

With multiple regression a researcher uses two or more independent variables to predict the values of the
dependent variable. Multiple regression is an extension of simple regression. The only thing that varies is
that a new coefficient is generated for each additional predictor variable. Following is a multiple regression
example that I developed for predicting the death rate in a state. I was interested in finding out how much
variance in a state's overall death rate I could account for using the following independent variables in our
class dataset.

a) Smoking Percent=Percentage of adults in the population older than 18 who smoke. I expected this
variable to be positively related to the state's death rate.
b) Cigarette Tax=The state's excise tax rate. I expected states with a higher excise tax to have a somewhat
lower death rate (i.e., a negative correlation) because in many states excise tax proceeds are used to fund
public health initiatives in the state.
c) Health expenditures=the amount of money spent per capital on delivering health care in the state. One
would hope that this figure is negatively correlated with the death rate in the state.
d) Dincome=the disposable income is the per capita income available to people in the state after taxes have
been taken out. One would again expect this variable should be negatively correlated with a state's death
rate.
e) Uninsured rate=the percentage of people in a state who lack health insurance. If people who lack
insurance coverage are less likely to get timely medical attention for health problems (i.e., cancer) one
would expect that this variable would be positively correlated with a state's death rate.
e) Books=average number of books per capita in the county or city library. I have included this variable,
because communities with a higher education rate and literacy rate should probably be more informed
about healthful practices. Hence, I expected this variable to be negatively related to a state's overall death
rate.

Now, I could have looked at the six bivariate relationships between these independent variables and the
dependent variable. However, I know that many of my independent variables are also correlated with one
another. As a result, I cannot simply take the coefficients of determination for each of the six bivariate
relationships and add them together to get the total coefficient of determination. I want to investigate the
joint impact of these variables and to determine the unique amount of variance accounted for by each of the
predictor variable.

I ran the multiple regression equation using a stepwise procedure. The SPSS program identified the
variable that is the single best predictor of state death rate that exceeded the selected statistical significance
level (i.e., the slope of the coefficient was significantly different from zero at the .05 level of significance).
SPSS then identified the second most power statistically significant predictor of death rate and entered it
into the equation. Then it checked to see if the first variable should be discarded if its coefficient declined
to statistical insignificance after the second variable had been included. This stepwise process continued
until all of the statistically significant predictors were entered into the prediction equation. As it turned out,
five predictor variables were ultimately entered into the prediction equation. Moreover, none of the
predictor variables were reduced to statistical insignificance when other variables were added to the
equation. The variables were added in the following order a) smoking percent b) cigarette excise tax rate,
c) health expenditures, d) books per capita and e) disposable income. The uninsured rate variable did not
make it into the equation. In addition, the combined model had a multiple correlation coefficient of .79
(i.e., R) and a multiple coefficient of determination of .62% (i.e., multiply the R square by 100%). The
adjusted R square includes an adjustment for sample peculiarities and so it is preferred over the simple R
square, so the best estimate of the amount of variance accounted for by our model is 58%.

Model Summary

Model R R Square Adjusted R Square


1 .585(a) .342 .329
2 .667(b) .445 .422
3 .715(c) .512 .481
4 .765(d) .585 .549
5 .789(e) .623 .581
a Predictors: (Constant), SMOKEPER
b Predictors: (Constant), SMOKEPER, TAXCIGS
c Predictors: (Constant), SMOKEPER, TAXCIGS, HEXPEND
d Predictors: (Constant), SMOKEPER, TAXCIGS, HEXPEND, BOOKS
e Predictors: (Constant), SMOKEPER, TAXCIGS, HEXPEND, BOOKS, DINCOME

However, we are also interested in the relative contribution of each variable to the prediction equation. The
next part of the table displays the parameter estimates for the final model. The standardized coefficient
gives what is equivalent to the partial correlation between each predictor variable and the death rate after
adjusting or controlling for the five other predictor variables. The t column gives the size of the t-statistic
for testing whether the coefficient is significantly different from zero, and the Sig. Column gives the
probability that the given beta coefficient is different than zero just due to chance alone.

Coefficients (a)

Standardized
Model Coefficients t Sig.

Beta
5 (Constant) 7.295 .000
SMOKEPER .398 3.956 .000
TAXCIGS -.299 -2.733 .009
HEXPEND .476 4.137 .000
BOOKS -.255 -2.524 .015
DINCOME -.268 -2.142 .038
a Dependent Variable: State Death Rate

All of the coefficients are less likely by chance alone than our .05 level of statistical significance.
However, there is at least one notable surprise in these figures. The level of health care expenditures in a
state is positively correlated with the state's death rate-directly opposite of my original prediction. This
probably means that our health care system treats illness and injury much more than it actively prevents
illness and death. When we rank order the variable in terms of their predictive power, we find that the
relative rank ordering of the predictor variables 1) health expenditures, 2) smoking percent, 3) Excise tax
on cigarettes, and 4) a virtual tie between per capita books and disposable income. In the end, this example
shows that multiple predictors used simultaneously can substantially reduce the amount of prediction error
compared to using bivariate regression and correlation analyses alone.
Type I Error and Type II Error in Research Design

139
When a researcher sets up decision procedures about whether to retain or reject a null hypothesis, she wants
to manage two types of error; she wants to avoid rejecting a true null hypothesis (i.e., type I error) and she
wants to avoid retaining a false null hypothesis (i.e., type II error). When a researcher tests a null
hypothesis, she sets a level of proof that the study results must meet before the null hypothesis is rejected
and the research hypothesis is accepted as the better explanation of the results. The researcher sets a
statistical significance level to guide her decision-making. The significance level is the amount of error
that the researcher is willing to risk that she is wrong when she rejects the null hypothesis. This researcher-
selected error is Type I error (also known as α or alpha in your statistics course). If a researcher sets her
significance level at .05, she is taking a 5% risk of being wrong when she rejects the null hypothesis.
Another way of saying this is that if a researcher sets a significance level of .05, over a large series of tests
she would be correct approximately 19 times out of 20 when she rejects the null hypothesis .

Researchers can set any significance level except a 0% significance level (i.e., certainty that one is right).
The .05 significance level is the conventional standard of statistical significance in social science research.
There is nothing sacred about this particular significance, except that the first statistical inference tables
were published at the .05 significance level. With the advent of computer statistical packages, the
researcher can use any significance level that she deems appropriate. For exploratory research, a researcher
may use a relaxed significance level of .10. In cases where greater certainty is desired, a researcher uses a
more rigorous level of proof for rejecting a null hypothesis (e.g., .01).

Why don’t researchers use more stringent levels of proof for testing null hypotheses? Why not simply set
significance levels at .001 (i.e., risk no more than 1 error in a thousand tests of statistical significance)? The
answer is that when we require higher levels of confidence with regard to Type I error, we dramatically
increase the probability Type II error. Type II error is the error that we risk when we retain the null
hypothesis (i.e. the probability that we have retained a null hypothesis that is in fact false). If you set your
significance level very high, you will fail to reject most false null hypotheses. Researchers dislike Type II
error as much as they dislike Type I error. Ordinarily they seek to balance between Type I error and Type II
error.

The probability of Type II error (the probability that a retained null hypothesis should have in fact been
rejected) is influenced by: 1) the researcher selected significance level for Type I error, 2) the sample size,
and 3) the strength of the underlying variable relationship that one is investigating (sometimes called the
effect size). It is impossible to know what the actual probability of Type II error is, because the researcher
never knows the actual strength of the variable relationship that she is investigating. However, researchers
can design their studies to have a given level of statistical power when it comes to managing Type II error.
The researcher will select a sample size and a significance level that will enable her to detect a relationship
of a given size (e.g., a real correlation of .30) a known percentage of the time. A study that has statistical
power of .80 for an effect size of .30, assures the researcher that she will detect real variable relationships
of .30 at least 80% of the time (i.e., if there is a real relationship of .30 or greater between the variables, she
will be able to correctly reject the null hypothesis 80% of the time).

The degree of statistical power in a study is important when a research hypothesis is not substantiated.
Research hypotheses often fail because the statistical power of the study is inadequate. Inadequate
statistical power is the leading reason why different studies often show inconsistent or inconclusive results.
A study with low statistical power is one in which the null hypothesis is more likely to be retained than it
should be. A well-designed study should have adequate statistical power as well as a respectable
significance level.

The two main routes to increasing statistical power are to 1) increase the level of Type I error that one is
willing to risk, and 2) to increase the size of the sample. Increasing sample size is the most attractive
strategy, because it allows the researcher to reduce levels of both types of error simultaneously. With a
large sample, a researcher can have confidence that she has made the right decision with respect to the null
hypothesis. Good research design includes planning for a sample size that can get the job done.
As a practical matter, a researcher must calculate the relative costs of making a Type I error or a Type II
error. These calculations help the researcher decide where to set her significance level. In some cases,
rejecting a true null hypothesis is more costly than a Type II error, but in other cases, it is just the reverse.
Exploratory research is usually most concerned with avoiding Type II error. The researcher will rely upon
replication research to decisively test any relationships or hypotheses that emerge. Because exploratory
research often does not utilize large samples, relaxed levels of Type I error (e.g., .10) are often employed.
In confirmatory research, the best alternative is to increase sample size so one can have low levels of both
Type I error and Type II error. The cost of each type of error can be quite high (lead to many inefficiencies
in the study of a phenomenon). In the end, it is the researcher’s responsibility to assess the relative costs of
both Type I and Type II error, and to then make the necessary adjustments in her research design. The
exercise that follows this reading gives you some practice in assessing these costs in specific cases, and
then requires you to specify what steps the researcher might take to deal with each problem.

Meta-analysis is a statistical technique that enables a researcher to pierce through inconsistent results
across studies, especially when some of the individual studies have small sample sizes. Meta-analysis
enables an analyst to combine the results of several studies into one large data set. The researcher
combines the results of studies that have used the same independent and dependent variables. The pooled
data set enables a much more decisive analysis. There have been many instances where a meta-analysis
has shown highly definitive research findings where the compared individual studies have shown confusing
inconsistency.

I once heard an angry radio talk-show host berate the Centers for Disease Control for their finding that
second hand smoke can cause lung cancer. He noted that only one of ten studies found such a relationship.
He ranted on for an hour about how absurd the CDC finding was. Then he mentioned meta-analysis was a
bunch of hocus-pocus. In fact, he had it entirely wrong: the meta-analysis helped compensate for the
inadequate sample sizes of the individual studies that made up the meta-analysis. Meta-analysis is one of
the most important data analysis tools to come along in recent decades. Meta-analysis studies bring light to
many research areas that seemed to be very murky and contradictory.

Table: Summarizing Type I & Type II Error


A researcher has collected her data and now is ready to analyze her results. She will face the choice of
either 1) rejecting the null hypothesis and accepting the alternate research hypothesis, or 2) retaining the
null hypothesis. Each decision has a type of error associated with it.
Type I error is risk of rejecting a null hypothesis that is true. The researcher selects
this level of risk before she conducts the statistical test. She does this by setting
her significance level which is known as α. Conventionally this is set at .05 or .01, but it can
be altered depending upon the needs of the researcher.
Type II error is the risk of retaining a false null hypothesis. To control this type of error, the researcher
needs to design the study to have adequate statistical power for his purposes. She can adjust the level of
Type I error that he is willing to risk, increase sample size, or increase the power of any experimental
manipulations that she is employing.

Type I Error and Type II Error Example

RH: Shy people will prefer larger classes than gregarious people.

NH: Shy people will not prefer larger classes than gregarious people
.
The researcher collects and analyzes her data. She must decide whether to retain the null
hypothesis or to reject the null hypothesis and accept the research hypothesis as the better
explanation of study results. If she rejects the null hypothesis, she concludes that shy people do

141
prefer larger class sizes than gregarious people. There is always a chance that she is wrong in this
conclusion--the probability that she is wrong however, should be no greater than the level of Type
I error (the level of statistical significance that she chose to begin with). If she retains the null
hypothesis, she concludes that shy people do not prefer larger classes than gregarious people.
Again there is the possibility that she made the wrong conclusion (i.e., shy people do prefer larger
classes, but she just did not detect it). The probability of this type of error is Type II error.

Actual State of the World


Shy people do not Shy people prefer
prefer large classes. large classes.

Decision to Retain
Null Hypothesis Correct Decision Type II Error
Shy people do not prefer
large classes.

Decision to Reject
null hypothesis and
accept the research Type I Error Correct Decision
hypothesis.
Shy students prefer large
classes
Exercise: Weighing Type I and Type II Error
In the following examples, write out the research hypothesis and the null hypothesis. Identify the costs
associated with Type I and Type II error in each example. Then explain which type of error(s) the
researcher should give the highest priority in designing the study. Explain carefully.

1) A soft drink company is test-marketing a new cola that it is thinking of using to replace an existing cola
product. The changeover of producing and marketing the new cola will be several hundred million
dollars. The researchers want to determine if the new cola is likely to be a financial success.

2) Several researchers are testing a new drug for treating secondary infections such as pneumonia
associated with AIDS. The researchers want to determine if the new drug regimen is superior to the old
regimen for treating secondary infections (e.g., AZT).

3) A politician is laying out her plans for the final weeks of campaigning before the election. She has been
leading her opponent in the opinion polls to date. She wants to see if she is retaining her lead. If her
opponent has pulled even with her, she will need to invest in a costly media campaign that will put her
campaign committee deeply in debt (even if she wins the election). She hires a polling firm to determine
what she should do.

143
Stories and Statistics
You have probably heard the quote by Benjamin Disraeli, a Prime Minister of Great Britain, who said:
“There are three kinds of lies: lies, damned lies and statistics.” The insinuation of this much-used quote is
that statistics are useless because they are so effective in misleading people. However, to disregard
statistics because they can be misused is similar to not trusting anyone because some people lie. Used
correctly and appropriately, statistics are immensely helpful. In fact, we have nothing to replace them.

When a person says he disregards statistics, he is not being truthful. What he actually means is that he
prefers his own personal statistics (i.e., statistics based on his experience) to those that describe and
summarize the experiences of many people. These personal statistics usually appear in story form. While it
may not appear to be so, telling a story is making use of a statistic, the statistic of a single instance. The
authentic story claims more authority than any statistic based upon a representative sample of people. The
storyteller sometimes says, “My personal experience of one instance is more real than anyone else’s
experience, especially all of those experiences described together.”

We know that small samples are very susceptible to sampling error. The experience reflected in a story
may be quite untypical; it may lead to serious misjudgments if it is regarded as informal indicator of central
tendency. If we allow stories and anecdotes to substitute for statistics, we will likely produce horrible
estimates of population parameters.

Let us say that I am considering buying a new car. If I want to buy a reliable model, I can conduct my pre-
purchase research searching for stories, searching for statistics, or some combination of the two. If I ask a
friend whether his make of car has been reliable, I am collecting a story of one person’s experience. If I buy
a car based on this person’s recommendation alone, I am assuming that her experience is representative of
all the people who bought this model. However, even the most quality-minded automobile manufacturer
occasionally puts out a lemon. If my friend happened to buy a lemon from a high quality manufacturer, I
will avoid the target automobile based upon a sample of one. In fact, statistics accumulated from a large
number of persons buying this model show that this company produces fewer lemons per thousand than its
competitors do. It is even more probable that I will underestimate my risk by using a sample of one. Most
automobile manufacturers produce automobiles that perform reasonably well most of the time. If you use
only one person’s experience, then it is probable that most individual owners have had a satisfactory
experience with the vehicle. I may conclude that a make of car has good reliability when aggregate
statistics show that this make of car has lower overall reliability than competing models.

Should I give more weight to my friend’s evaluation or to the aggregate statistics in Consumer Reports?
The answer in this case is simple: give Consumer Reports more weight than the experience of my friend.
However, there are two important exceptions to this rule. Disregard the statistics if they are based on
invalid data collections techniques (e.g., measurement error), or if they are drawn from an unrepresentative
sample. In these cases, the story or experience of one informed authority may be a better guide than the
aggregate data with measurement error and systematic error. In fact, an error-ridden statistic has the
potential to produce more confusion and misjudgment than a story.

So why do we use stories so often for proof? Well, it turns out that a good story often has more persuasive
punch than a statistic does. This may reflect a weakness of human rationality, but it is understandable. Our
“common sense” provides us with tools to survive in our environment. In many instances, we have to rely
upon personal experience because it is all we have. We have to decide what to do; we don’t often have the
time to conduct a representative poll of other people’s experiences. In short, we are biased toward the
authenticity and representativeness of our own stories, because we are emotionally attached to our lessons
won via the hard knocks of experience.

Another reason that we often give more weight to a story than to a statistic is that we trust the storyteller
more than we trust the person(s) or institutions that collected and interpreted the statistic. In many cases,
the statistics are collected by anonymous sources (e.g., The Census Bureau) whereas we have learned to
trust the storyteller based on our long-term interaction with him. The depth, detail and vividness of the
story reassure us about the competence and goodwill of the storyteller. A good story also generates
emotional involvement and identification with the plot and with its characters. A good story engages our
moral faculties more directly than a well-selected statistic does.

There are situations where storytelling by a trusted authority should be seriously considered. An authority
or expert may have a good sample of experiences or stories from which to select typical or representative
cases. If we lack other information, the informed experience of the expert or authority may be much better
than statistics drawn from small and unrepresentative samples.

This raises the question of how we should use statistical information and stories when presenting our data.
If we use only statistics, we run the risk of being less persuasive than we can be. If we only use stories we
may be persuasive, but we may lead our audience to faulty conclusions. If we want to be both persuasive
and ethical, we will find ways to combine statistics and stories to meet both objectives. One way to do this
is to first carefully study the meaning of one’s statistics, and to select stories that are typical in some way.
In other words, select stories that are consistent with and amplify the statistics that you use. Combining an
important statistic with a relevant story tends to amplify audience members' understanding and retention of
important information.

In the end, we need to exercise cautious skepticism in judging the authenticity and significance of both
stories and statistics. In judging the quality of statistics, we need to assess measurement validity of relevant
variables, the quality of the sample, the quality of the statistical evidence, and the intentions of the source.
In the case of stories, we need to make the same assessments, giving extra special attention to whether the
stories told are typical cases or are outliers. Both cynicism and naïve trust serve us poorly. Statistics can
mislead, but they are also powerful tools to enlighten, persuade and inform people. The same is true of
stories. The ultimate moral and ethical responsibilities of how each are used rest with the people who
communicate them. Careful communication and skeptical listening are both necessary in a democratic
society.

145
Statistical Malpractice: A Top 10 List
There is a popular book, first published in 1954, titled How to Lie With Statistics. If I were writing a
similar book, I would follow David Letterman and title it “The Top Ten Ways People Commit Statistical
Malpractice.” In many cases, the person communicating the statistics does not mislead intentionally. Here
is my top 10 list of ways people misinterpret statistics.

#10. Do Selective Statistics and Selective Comparisons

This is a common practice. In any instance where one has to put their best foot forward, there are usually
some statistics that look better than others. In this practice, you report the one statistic that sounds the best.
Imagine that you are a CEO of a company who has to give an accounting of your corporate performance to
stockholders. If the company has made good profits, you are likely to point this out and omit the fact that
the performance of your company lags behind the performance of other companies in your sector. On the
other hand, if your company lost money in a particular quarter, you are likely to point out that many other
companies have had a difficult quarter in your industry as well. In other words, you make industry
comparisons when they are favorable and neglect to mention them when they are not.

This practice is illustrated in selecting a measure of central tendency. As you will recall, we have three
candidate statistics to answer the question, “What was a typical score in the distribution?” (i.e., mode,
median, mean). Faced with the prospect of trying to make a persuasive point, communicators often choose
the statistic that will make the strongest point. If you want to emphasize that family incomes are up during
your tenure in office, you might select the mean that shows healthy growth over your years in office. If you
are the opposition, you may want to use the median, which shows small but negligible growth over the
same period. Which statistic is true? Well, both of them can be accurate. The beauty of it is that no one can
ever accuse you of lying. However, the accepted consensus among statisticians is that one should use the
median to represent skewed distributions. However, there are cases where neither statistic is clearly
superior in the information that it provides. When the mean, median and mode diverge from each other, one
can report all of them and let the reader decide which best indicates central tendency.

#9. Monkey With the Visuals

Some statistical malpractice involves misleading visual comparisons. The reported statistics may be
accurate, but their visual presentation is misleading. Imagine that you are a journalist writing an article
about a recent increase in crime. If you want to highlight the recent increase in crime (say 3%), then cut off
the bottom of the chart and start the graph at the point of the recent upturn. It will leave the impression that
crime is escalating at an exponential rate. However, if you place the same increase in the context of
changes in crime patterns that have taken place over the last 10 years, the recent increase in reported crime
appears to be a rather inconsequential blip. There are a number of other ways to create misleading visual
impressions in how charts and graphs are constructed. When you look at a chart, assess whether it depicts
the big picture.

#8. Report Statistics from Inadequate Samples

This practice used to be a big problem. This is a smaller problem today because of the widespread use of
random sampling procedures. We often use samples to try to predict some unknown population parameter
(e.g., in what proportions will people vote for or against the referendum). Inference from samples to
populations is misleading if one has small or unrepresentative samples. In the case of small samples, the
random error from one small sample to the next is so large that that it is impossible to estimate the
underlying population parameter with any precision. In everyday life, we often find that people use their
own experiences (a sample of 1) to make declarations about the nature of the world. If you have a
representative sample, sampling error decreases substantially as the size of the sample increases. In this
case, a large sample is much better than a small one.

However, a large sample is not necessarily a good sample. If one has an unrepresentative sample, which
systematically over represents some segments of the population and under represents others, a large sample
will lead most people to have unwarranted confidence in the results of a poll. If you have an
unrepresentative sample, you cannot estimate random error. This makes it impossible to generalize from
the sample to a larger population. The Literary Digest prediction for the Roosevelt-Landon election of
1936 is a famous example of the grave mistakes that result from making inferences from a biased sample.
For purposes of estimating population parameters, a large and representative sample is idea. For better or
worse, they are now widely used in polls and surveys. This form of statistical malpractice has declined in
recent decades.

#7. Fail to Contextualize Statistical Claims

Much statistical information is presented as being more definitive than it really is. In recent years,
journalists started reporting the confidence interval that accompanies an estimate of a population
parameter. For instance, if the researcher is estimating the percentages of people whom will vote yes or no
on an upcoming referendum, we are told that the poll has so many percentage points plus or minus error.
This leads to the mistaken conclusion that the actual percentage of people is certain to fall within this range.
What the journalist fails to convey is the level of confidence that the researcher has that the true population
parameter will fall within this interval. Most researchers use a 95% confidence level when constructing
their estimated ranges of sampling error. They can construct intervals with more rigid confidence levels,
but the +/- percentage points error will be larger as well. The researcher realizes that if 100 polls of the
same sample size were taken, on average 95 of those sample estimates should fall within the estimated
range. However, the researcher knows that in 5% of the cases, the true population parameter will fall
outside of the estimated interval. In making estimates, there is only confidence never certainty. However,
journalists sometimes fail to make this clear. As such, they set up claims that researchers know not to be
true. Every competent researcher knows that their estimates will sometimes be wrong. Failure to report the
confidence level makes it appear that the researcher is claiming infallibility about his projections. In actual
practice, when reading such studies, you can assume that the confidence level associated with any
confidence interval is 95%.

A more serious form of this kind of statistical malpractice relates to the use of confidence intervals
themselves. If you didn’t know otherwise, you might assume that the confidence interval estimates all
sources of error. In fact, it only estimates random sampling error. It does not estimate systematic sampling
error or measurement error. If a sample is carefully selected, systematic sampling error should not be a
significant problem. However, measurement error (unreliable or invalid measurement) always is present to
some degree in our statistics. The confidence interval reflects the degree of error that is present only if
one has a perfectly representative sample and perfectly valid measurement. As such, the confidence
interval is more an estimate of minimum probable error rather than the maximum possible error. If one has
unreliable or invalid measurement, the confidence interval is of little use. The statistical information
reported in polls can be very useful. However, we should not attribute more confidence in our statistics than
they merit. Reckless interpretation of statistical results can create unwarranted distrust and cynicism.

#6. Treat Time as a Constant

This form of statistical malpractice takes place when comparisons are made over time. We may assume that
the statistic, collected at different points in time, measures the same underlying concept. However, the
meaning of a concept often changes over time. This malpractice is usually well intentioned. We see many
articles charting a supposed rise in cases of domestic abuse and child abuse. However, the underlying

147
assumption of this comparison is that the rate at which these crimes are reported has remained
constant. Perhaps these crimes are reported more often than they were in the past, as public attention has
been focused on these issues in the media.

In some cases, there are shifts in the manner in which an underlying statistic is calculated. Once every
decade or so the Educational Testing Service, the organization which administers the SAT, recalibrates the
scoring of the SAT. This scheduled recalibration took place several years ago. After each calibration, the
test mean is 500 and standard deviation is 100 points. Each calibration is scheduled years in advance.
Hence, it is misleading to directly compare changes in SAT scores from different periods. Even when the
calculation of a statistic remains constant, social and demographic changes can transform the nature and
meaning of the statistic. We hear conflicting statistics from the political parties concerning the progress of
the middle class. One political party proclaims that per capita income is up dramatically. The other claims
that the median worker’s wage has remained virtually stagnant. Which side is telling the truth and which
side is misleading? Both sides are doing both. Yes, median worker wages have moved relatively little, once
cost of living increases are taken into account. However, the cost of living index itself may overestimate
increases in cost of living so that we may have had more income growth than the figures show. So how can
per capita income be up dramatically over the same period? Simple, the number of children per household
has dropped over the last 30 years. Divide essentially the same income over fewer people and you have a
dramatic increase in per capita income in American families.

Comparisons across time also assume that the semantic meaning of concepts has remained the same over
time. However, comparisons over time maybe suspect if the underlying meaning of the concepts has also
changed. Over the last several decades, I have seen polls comparing the numbers of people who classify
themselves as liberal, independent and conservative. Quite aside from the question of whether this
classification really captures people’s attitudes, we have the problem that the meanings of the terms, liberal,
independent and conservative have themselves changed dramatically in the intervening 30 years. In terms
of social mores and attitudes on many topics, the country is much more “libertarian” than it used to be. In
addition, “conservatives” now accept positions that were once “liberal”. Note the particular glee that
“conservatives” have in quoting Dr. Martin Luther King as they describe the evils of “affirmative action.”
Perhaps because of their successes, “liberals” had to continually redefine themselves to distinguish
themselves from “conservatives,” hence “liberals” became more “liberal.” Now imagine a person who
considered herself to be a “liberal.” Assume that this person’s attitudes on many issues have not changed.
Just because of the drift in the meaning of these terms, the person who was “liberal” now may consider
herself to be “conservative,” even though her underlying political beliefs have remained remarkably
consistent. Trend comparisons in how people classify themselves politically should be viewed skeptically.
Time changes many things, some of them quite hidden.

#5. Fail to Discuss Operational Assumptions and Procedures

This form of statistical malpractice is fast climbing the list. Its frequency and seriousness has increased as
computer simulation and statistical modeling have become more common. The current debate about
whether or not global warming is real comes down to whether or not you believe in computer modeling (or
more accurately, which computer model you believe). On one hand, we hear that global warming is a
calamitous threat; on the other hand, we hear that it is a myth. In terms of the costs of controlling
greenhouse gases, we hear that they can be handled quite easily, or that it will cost so much as to stifle our
economies. Researchers, each of whom believes that his computer model most accurately captures reality,
promote these competing claims.

Computer simulation is a very useful research tool. However, the assumptions that are programmed into a
computer model have a lot to do with how the computations will turn out. There is no easy way to deal with
this problem, because it is usually beyond the ability of the layperson to determine which set of
assumptions is completely valid. However, there are several things that journalists and other
communication professionals can do to contextualize computer-modeling claims. First, you can explain
how competing models differ in the assumptions that they make (e.g., how the underlying assumptions in
the President’s estimate of future governmental revenues differs from those of the Congress). Second,
journalists can get candid assessments of the adequacy of the model’s assumptions from nonpartisan
experts. Third, you should seek to find out how well the predictions of the computer models triangulate
with other methods of verification (e.g., field experiments etc.). Finally, communicators should treat the
predictions of such models as educated guesses, not as settled facts. If we were to do this, we might be able
to discuss these issues much more civilly.

#4. Promiscuously Extrapolate beyond the Data

When it comes to predicting the future, no one has a crystal ball, or at least none we know about. If I could
predict the future, I would be rich. I’d put a modest amount of money in the right place in the stock market
and with a little time, I would become quite rich.

Businesses, governments, indeed all institutions, need to make estimates about the future. In making these
estimates, we use the only tools that we have: we make predictions based on experience or performance.
The simplest version of extrapolation is to simply forecast present trends as a linear trend into the future.
In the short run, extrapolating from past events works fairly well. In fact, we could not really function as a
complex modern society without such projections. However, such projections become more precarious, the
farther we predict into the future. Meteorologists have good accuracy in three to five day forecasts, a
moderate accuracy in 30-day forecasts, and lousy accuracy in six-month forecasts.

The first reason that long-term extrapolations fail, especially linear extrapolations, is that very small errors
in trajectory compound into very large errors over time. Perhaps some of you had the experience of
building a tower with blocks when you were a child. Even the slightest imbalance at the bottom of your
structure is magnified as your tower gets taller. Eventually the tower comes crashing down. The same is
true of projections of the future. When one predicts far into the future, small errors in your initial
calculations compound like interest. In the 2000 presidential campaign, the candidates put forward their
plans on what they wanted to do with the “budget surplus.” Well, we now know that the budget surplus
disappeared as quickly as it had appeared in the first place. The surplus appeared because the economic
forecasts slightly underestimated the level of economic growth for several years. Likewise, most of the
projected surplus disappeared as the reality of the current economic slowdown set in.

A second reason that long-term extrapolations fail is the familiar phenomenon of regression to the mean. A
trend that departs from its historic average tends to return to that average over the long haul. The stock
market had three exceptional years in 1997-1999. Stocks rose more than 20% per year. Many analysts also
predicted sharp increases for the year 2000. However, in 2000, 2001 and 2002, the stock market steadily
declined. This turn of events was entirely understandable for anyone familiar with statistical trends.

A third reason that extrapolations often fail is that relationships between parameters often change. With
each new technological innovation, “futurists” make grand predictions about how the new technology will
transform life in new and wonderful ways. On the other hand, there are gloom and doomers who always see
a terrible crisis around the corner. Both have a very bad record of predicting the future. Their failures
originate in changes in the ways the parameters relate to each other. The futurologists are not all wrong, but
they fail to anticipate the unintended consequences of human inventions (No one predicted the invention of
computer viruses which we now spend much time and energy to protect ourselves from). Moreover, the
gloom and doomers cannot account for the feedback loops in human behavior and practice that will likely
occur. Thirty years ago, we had forecasts of a terrible population explosion that was going to plunge the
world into poverty, misery and starvation. Well, a number of things changed along the way. Birth rates
have plunged in many countries as living standards have increased. In fact, in many countries there is now
a concern about having a scarcity of young folk to take care of an aging population. China’s one-child
policy, for instance, will create a society without aunts, uncles and first cousins. When it comes to
predicting the future, we should do it with great caution. If they are honest, people who forecast the future

149
should practice a lot of humility.

#3. Equate Correlation with Causation

This form of malpractice is deeply rooted in the human condition. We tend to jump to premature
conclusions about cause and effect. If we see an instance or two where one event precedes another (we see
an occurrence of a correlation), we conclude that the prior antecedent event caused the subsequent one.
This is how superstitions begin. A star basketball player notices that she performed well on the night when
she wore an old ragged pair of sneakers. Therefore, she decides to wear them the next several nights. If
her hunch pans out the next several nights, an incidental or spurious correlation will turn into a firm
conviction. It may even snowball into a social institution if enough people become convinced of it.

This pattern also carries over to cases where the correlations between phenomena are real. As we have
discussed the problems in interpreting correlations elsewhere, I will not repeat myself here. I only repeat
that this is a very common and deadly practice. Anytime you discover a correlation, repeat the mantra,
“Correlation is not causation.” Repeat the mantra to anyone who falls prey to this seduction. To establish
causation requires that you first rule out alternative plausible explanations. Patience is a virtue when it
comes to making causal claims.

#2. Create Much Ado about Statistical Significance

This is a professional disease of academic researchers. In this scenario, a researcher uses inferential
statistics to disprove the null hypothesis. She finds statistically significant results and then makes a big
deal out of them. Researchers often overstate the importance of what they find. In the words of a familiar
cliché, they make a mountain out of a molehill.

Let me start with an example that plagues studies with rather broad research questions. If you will recall, a
research question does not make a prediction. The null hypothesis for a research question is simply that
there is not a relationship (or difference) of any kind between the two variables. Researchers
conventionally use a significance level of let’s say .05 as an acceptable level of Type I error. This means if
you run 20 correlations, on average one of those correlations will be statistically significant due to
chance alone. If you were to select numbers from a random numbers table, an average of one statistically
significant result would show up per trial of 20 correlations. Research hypotheses are favored over
research questions when it comes to confidence of statistical interpretation. In a research hypothesis you
take a risk with each test statistic you test. The researcher makes a prediction and risks being wrong. If you
simply go on a fishing expedition, you can always find something statistically significant, even if it is
merely a statistical artifact.

An even more deadly version of this practice occurs when researchers make a big deal of statistically
significant results that are trivial in a practical sense. If one has a huge sample (and a lot of statistical power
with it) then even trivial real world relationships will be statistically significant. We may find a small
correlation between teachers' pay and students’ test performances, but if this correlation only accounts for
2% or 3% of the variance in student scores, we can hardly say that we need to raise teachers' pay in order to
improve the quality of education. Statistics need to be evaluated within a social context to determine their
practical significance. Statistically insignificant results are never practically significant, but not all
statistically significant results are practically significant. Statistical decision procedures do not
substitute for using informed judgment about the real importance of statistical results.

#1. Put Numbers on Garbage (i.e., report statistics on invalid data)

This is the most persistent and offending form of statistical malpractice. It has risen in recent years, partly
because of the increasing use of polling and surveys. However, we continue to see many statistics reported
that are essentially meaningless. They are meaningless because they start with invalid measurement. Invalid
measurement never gives one anything more than garbage.

An obvious instance of this type of malpractice occurs when a pollster utilizes biased or leading questions.
Ask a question about abortion and you can prove most anything you want. Frame the question the way you
want and you can frame statistics to support your cause. Most people are against killing unborn children.
Many people also believe that government bodies should tread lightly when it comes to reproductive
choice. The conflicting statistics about abortion partially reflect the use of biased and slanted questions.
However, they also partially reflect the complexity and ambivalence of the public sentiments about
abortion. Yes, most people have reservations about abortion, especially late term abortions and abortions
involving minors. However, many people also oppose categorical bans on abortion. Perhaps we think that
the option should be available, but we think it should be regulated. Sometimes like the six blind men and
the elephant, we get reports of inconsistency because the underlying reality is more complex than our
measurement of it. Biased measurement may tap one dimension of a phenomenon without touching the
other dimensions.

A second manifestation of this form of statistical malpractice occurs when researchers create the sentiments
that they think they are measuring. If you ask someone his opinion about something, it implies that he
should have an opinion about it. Respondents sometimes create an opinion on the spot. The researcher
then publishes the results and expounds on their meaning. Pollsters ask people about all kinds of issues
about which they have little relevant knowledge. When it comes to public issues, polls and surveys should
be confined to those things most people know and care about.

Our society might be a better place if some of us expressed fewer opinions. I have opinions about many
things that I will not express because I lack essential information. I may have had an opinion about the
innocence or guilt of a celebrity defendant, but I’m not on a jury hearing the relevant facts and arguments.
Of what importance is my opinion? On questions that relate to facts, the knowledge of informed authorities
or people who really know is infinitely more important than the scrambled opinions of people who lack
credible information.

A third frequent form of statistical malpractice in this category is to use terms that lead to bypassing
(because they have different understandings of the concept, people believe they disagree when they don’t
or conclude that they agree when they don’t). Everyone thinks world peace is a good idea. Likewise, the
Gallup organization asks people whether they believe in God. They usually report that more than 90% of
people in the United States believe in some sort of God. Which God, pray tell?!? The radically different
ideas of what and who God is make this question trivial and meaningless.

There are many other subtle forms of this kind of statistical malpractice. For this reason, you should
carefully scrutinize and evaluate the operational definitions underlying statistics. Using your informed
judgment, you can at least assess the content validity of the measurement scheme.

Conclusion

Yes, there are some insidious forms of statistical malpractice. However, we shouldn’t throw statistics out
the window. We can’t replace them with anything better. Scrutinize statistical reports with a skeptical but
open mind. If you use some care and caution, you can protect yourself against statistical malpractice. Don’t
believe everything you ear, don’t disbelieve it either; Listen to it and test it. A person with a skeptical
attitude says “Show me! I am ready to believe if you present a compelling case.”

Unit 10: Objectivist Paradigm-Internal Validity and


External Validity

151
This unit introduces you to two important concepts of research design: internal validity and external
validity. Internal validity is a desired property of studies that inquire into questions of cause and effect.
The concept of “cause” is rather complex. The first reading looks at the different models of "cause" that are
used in everyday language. There are many correlations in our world. However, it often takes some
detective work to determine which causal relationship, if any, accounts for the correlation. A study has
internal validity when its design enables us to make claims about the causal relationships between
constructs. In other words, other causes or potential explanations for the results have been ruled out. Based
upon a long history of trial and error, researchers have identified a number of systematic threats to internal
validity called artifacts. We will examine the tools that researchers use to control for artifacts in their
studies. We will work with everyday examples of causal reasoning to determine which artifacts may be
present and how they can be controlled or accounted for.

External validity is a second desirable characteristic of a study. A study has external validity when one can
confidently generalize the results of the study to other samples and situations. External validity is a
complex entity, but it is maximized when one has a large representative sample, when the study utilizes
multiple measurement methods, and when the facets of the study are very much like the real world
phenomena one wants to describe or understand. The unit also examines the relationship between internal
and external validity.

Unit Objectives

Upon completing this unit, you should be able:

10-1: To define, compare and contrast the different models of causation.


10-2: To correctly identify the kind of causation employed in a causal claim.
10-3: To identify the artifact or artifacts that may threaten the internal validity after reading description of a
study design.
10-4: To propose a concrete remedies to control for a specific artifact.
10-5: To identify potential limitations on the external validity of a study's findings.
10-6: To propose replication strategies that address potential limitations on external validity.
10-7: To explain the relationship between internal validity and external validity in research design.
Correlation and Causation
Correlation is not causation. Researchers continually have to remind their readers that finding a
relationship between two variables does not prove that the variables are causally related. A relationship
between two variables can be attributed to a number of factors. When people discover a correlation
between two variables, they typically have a preferred explanation for the relationship. However, a
researcher is accountable for systematically ruling out other explanations before she presents her
conclusions about cause and effect.

The concept of cause is rather complex. There are several different forms of causation. In every day
speech, we seldom distinguish between these forms of causation. This results in a great deal of confusion
and miscommunication (sometimes purposeful). The strongest case of causation is sufficient causation: If
A then B. If event A occurs, then event B will always follow. Necessary causation is a slightly less
stringent form of causation. In this case, whenever B occurs, A always occurs before B. Variable A is a
necessary catalyst for the occurrence of B. Variable A may have to work in conjunction with some other
variable to create the effect on B. A still weaker class of causation is faciliative causation. The presence of
A tends to facilitate changes in B. However, B sometimes occurs even in the absence of A. In the social
sciences, most instances of causation that we talk about involve faciliative causation. In the social world,
many variables often affect or cause variations in our dependent variable.

To complicate things, cause-effect relationships are often not linear. Systems causation exists when there
are reciprocal feedback relationships between sets of variables. It can be very difficult to determine where
the cause starts and where it ends (e.g., A influences B which influences C which influences A). The
question, which comes first the chicken or the egg, characterizes the dilemma researchers' face when they
discover systemic patterns. In recent years, causal modeling using computers has become a preferred way
to investigate systemic causal patterns. Debates about global warming, rest on conflicting claims being
made from different causal modeling programs that are being used to estimate the effect of greenhouse
gases on the world's climate.

In communicating about cause and effect, people often end up bypassing each other (thinking they agree on
the meaning of the term, when in fact they don’t). One person may have one kind of causation in mind,
when another person is thinking of another kind of causation. Some people maintain that cigarette smoking
does not “cause” lung cancer. The person who makes this claim is clearly thinking of sufficient causation
or necessary causation. Many people who smoke never develop lung cancer. Likewise, people sometimes
get lung cancer without ever having smoked. However, it is also true that smoking cigarettes tends to
facilitate the onset of lung cancer: the more one smokes, the greater the likelihood that one will get lung
cancer. Moreover, when one quits smoking, the risk of contracting lung cancer goes down. When we talk
about cause and effect, we are usually speaking of faciliative causation.

Researchers require three things to infer a causal pattern between two variables. First, there must be a
correlation between the two variables. Second, one of the variables must consistently precede the other
variable in time or logical precedence. Logical precedence means that one variable does not affect the
other variable from a logical or common sense point of view (e.g., A person’s age is not determined by a
person’s level of knowledge on a subject). Third, other causal relationships must be ruled out. Most of us
take into account the first two conditions when we talk about cause and effect relations. It is the third area
that formal research goes beyond our common sense.

If we have a positive correlation between two variables A and B: when one increases, another one tends to
increase as well. Following are some of the ways that one can account for the correlation.
A⇒B A could be a cause of B.
B⇒A B could be the cause of A

153
Both A and B could be related to some background causal variable C (One is likely to find a positive B
correlation between the crime rate and ice cream consumption). Both variables relate to the temperature
outside. Crime tends to go down when it is cold outside. There are fewer people around to rob, and burglars
don’t like to freeze either.

A~~~~~~~~B
The two variables may also be spuriously related: they are related by chance, especially if the sample is
small. Spurious correlations are usually easy to disprove if one has a relatively large sample. Spurious
correlations are an important source of superstitions (i.e., athletes who wear the same pair of “lucky”
sneakers for weeks upon end).

The two variables could also be reciprocally related to each other is a positively reinforcing spiral (e.g.,
Liking and self-disclosure are sometimes related this way.

Hypothetical Example: Pot Smoking and GPA


Let us say that we find that the amount of pot that students smoke is negatively correlated with their GPAs.
In most cases, we would probably think that the following causal relationship holds:

PS⇒GPA:
The more pot you smoke, the more it interferes with your motivation and higher cognitive processes.
However, it is also possible that a low GPA causes one to smoke pot. Perhaps students smoke pot to relax
and escape their troubles. A low GPA might create a need for escape and relaxation.
.
GPA⇒PS
A combination of these two explanations may also hold. Perhaps the two variables are related in a self-
reinforcing spiral. A failure on a test induces one to smoke dope, which in turn leads to poorer
performance on the next test, which leads to a greater need to escape.

GPA⇔PS
Both variables may also be related because they are related to a personality trait called sensation-seeking.
High-sensation seekers tend to get bored rather easily. A person who loses interest in studying may also be
more inclined to use pot.

Sensation Seeking⇒GPA & Sensation Seeking⇒Pot Smoking


So, how does one find out which cause best accounts for a correlation? The answer is by a process of
systematic observation and analysis. In a cross-sectional study, one can rule out the possibility that the
relationship is spurious. Longitudinal studies can help one determine which factor comes first in time.
Experimental studies involving a manipulation of an independent variable also help in this regard. Ruling
out alternate causal relationships is the most difficult task. This usually requires laboratory experiments in
which one manipulates an independent variable and holds all other factors constant. There are also certain
statistical tests that one can use in field studies (e.g. partial correlation) to try to test the plausibility of
various alternatives. Sometimes it takes careful detective work to discover the full causal pattern by
identifying moderator variables and intervening variables.

155
Exercise: Identifying Causal Models
Following are some common types of causal claims. Identify the type of causal model (sufficient,
necessary, faciliative, systems) used in each argument below. Explain your answer.

1) “Despite what all of the self-help books and positive thinkers say, to be a world class athlete in football,
you have to have exceptional speed. You can practice and put in all the effort you want, but it won’t make
you faster. Exceptional technique does not compensate for a lack of speed. Some athletes with
exceptional speed do not pan out as professional athletes, but there aren’t any that I know of who have
succeeded without it.”

2) “I don’t accept the idea that inhaling asbestos fibers causes cancer. When I was young, I worked with
many asbestos products in the construction industry. As you can see, I am still living and kicking fifty
years later despite having breathed a whole lot of the stuff. It didn’t cause lung cancer in me, so how
can they claim that it causes lung cancer in other people.”

3) “There are exceptions of course, but in general children tend to do better cognitively, socially and
emotionally when their parents remain married than when they get divorced, even in marriages which
have a lot of marital conflict.”

4) “To get out of debt, you need to plan ahead and avoid extra charges for late payments and emergency
purchase of services. However, you really can’t avoid these charges if you don’t have enough
household income in the first place.”
Identifying Artifacts that Compromise Internal Validity
Internal validity is a necessary characteristic for research designed to answer questions about cause and
effect. A study has internal validity, when the researcher has justifiable confidence that differences in an
independent variable produce changes in a dependent variable. In causal relationships, the independent
variable must precede the dependent variable, the independent and dependent variables must be correlated
and the researcher must be able to rule out alternative causal explanations or explanatory variables.

The experiment is the best tool set up for addressing questions of causation. The essence of good
experimental design is to rule out rival or alternative explanations for the results. The researcher does this
by carefully manipulating the experimental variables, creating equivalent experimental groups and
controlling for extraneous variables called artifacts. This reading identifies some guidelines on recognizing
the presence of artifacts in research design. Research methodologists have identified a number of artifacts
that occur frequently to create real problems in making inferences about cause and effect. Researchers must
account for possible artifacts in their research studies. The presence of artifacts makes it difficult to
interpret research results to determine what the results mean.

Researcher Behavior or Bias

Demand Characteristics refer to behaviors by the researcher or factors in the research situation that
inadvertently influence research subjects’ responses. For instance, some subtle nonverbal behaviors by the
researchers may differ according to the experimental condition and affect the dependent variables in
addition to the intended experimental manipulation. One famous example of demand characteristics is the
Hawthorne effect. This occurs when the very act of studying something leads to changes in the phenomena
studied. The Hawthorne effect is named after a study that was conducted in a Hawthorne, Illinois General
Electric plant in late 1920s. The researchers wanted to know how work environment factors such as the
amount of light in the work environment affected worker productivity. They changed a number of
environmental factors and found that worker productivity improved. Then they removed the changes and
the work environment reverted to the original conditions. They expected productivity to fall back to the
pre-experimental levels. In fact, productivity remained very high. After much soul searching, the
researchers concluded the study itself, in which the workers were treated as collaborators, had changed the
nature of the work environment and had created the productivity increases.

A closely related demand characteristic in medical research is the placebo effect. Medical studies have
shown that sick persons often feel less pain, or improve in their symptoms after they receive a placebo pill
(e.g., a sugar pill that has no expected therapeutic value). We don't understand exactly how the placebo
effect works, but it is a robust phenomena. Since even a fake intervention can have effects, medical
researchers usually include a placebo group, in which the participants receive a placebo treatment or
medication. The results for the placebo group are compared with the experimental group. An experimental
drug must register significantly greater effects than the placebo treatment.

Demand characteristics are difficult to control completely. Some steps that are commonly built into studies
to reduce demand characteristics include: (1) concealing the specific hypotheses of the study from the study
participants (i.e., use a blind or double blind procedure) (2) training research assistants to behave
consistently across all of the conditions in the study, and (3) giving instructions that guarantee study
participants anonymity or confidentiality.

A second artifact that sometimes occurs in this category is experimenter bias. Experimenter bias refers
anything unintentional that a researcher may do as she collects and analyzes data that leads to confirming
the hypothesis. A researcher, just like the ordinary person, is sometimes prone to wishful thinking. She
may overlook data contradictory to her hypothesis or bias her perceptions of the data. For instance, if my
hypothesis is that women use more hesitant language in an informal conversation than men, my

157
expectations may bias me to code more hesitations in women's speech than in men's speech. Using blind or
double blind procedures can control for experimenter bias. A blind procedure is when the person collecting
or analyzing the data is unaware of the hypotheses of the study, or is unaware of which group the research
participant is a member. A double blind procedure is when neither the subject nor the people collecting
the data know which group a subject is in, or are unaware of the study's hypotheses.

A third type of researcher-induced bias emerges when the researcher fails to maintain consistent
experimental procedures. The artifact of Procedure reliability and validity refers to the consistency of how
independent variables are manipulated and/or measures are administered. The first principle for
guaranteeing procedure validity and reliability are that everyone in the same group should
experience similar testing conditions. The second principle is that the only differences between
groups being compared should be intended manipulation of the independent variable. All other
differences should be controlled. Any other difference between the groups (e.g., setting of administration
etc.) becomes a possible alternative explanation for differences between groups on the dependent variable.
When analyzing the procedure reliability and validity in a study, look for consistency of procedure
within groups, and look for control of differences other than the independent variable between
groups. When you read research reports on experiments, you will find that researchers control for all kinds
of small details in order to guarantee procedure reliability and validity
.
Time Related Artifacts

A second class of artifacts emerges with effects that can occur with the passage of time. In longitudinal
studies, studies that involve some period of time between a pretest and post-test, one must consider the
possibility that other events or processes during the time period also influenced the dependent variable.

History is a potential problem in any longitudinal study. History refers to any change in the external
environment that may influence the dependent variable(s) in addition to the independent variable. History is
a catchall label for any random or unplanned events during the period of the actual study. History can be
accounted for by using a control group. If external factors influence the dependent variables, they should
affect scores for the control group as well as the experimental group.

Statistical Regression to the Mean is sometimes referred to as the law of averages: what goes up will
eventually come back down and what is down relative to historical benchmarks is likely to come up.
People, who depart from their baseline score or performance in some respect, will tend to return to that
baseline of performance with the passage of time. If a study is looking at groups that have departed from
their past baseline performance in some respect, they will tend to return to their baseline performance
independent of any effect of the independent variable being manipulated in the study. Adding a control
group to a study allows one to take into account any statistical regression effects that have occurred. Any
statistical regression should occur in the control group as well as the experimental group.

Maturation refers to any naturally occurring developmental changes in people or phenomena that also
influence the dependent variables. Growing up, becoming fatigued and practice effects all fall under the
heading of maturation effects. Maturation is a particular problem in longitudinal research. It is also
sometimes a problem in a within-subject or repeated measures design. The person receives multiple
experimental manipulations in these studies. In longitudinal studies, a control group helps one account for
maturation effects. In the case of repeated measures designs, there is no control group, so it is important to
randomize the order of experimental treatments for different study subjects. This assures that factors
such as practice, fatigue or order effects are not confounded with the variable manipulations.

Testing Sensitization refers to a situation where simply giving a pretest alters how the person scores on a
subsequent post-test. This effect can arise via two different routes. On one hand, giving a pretest may
sensitize the person to something in the experimental treatment that ultimately leads to an experimental
effect. This effect would be missing in a post-test only design. On the other hand, simply giving the pretest
may influence the person's score on any post-test. This second case may have strong similarities to the
artifact of maturation. There are several ways to try to control for testing sensitization. One is to use a post-
test only design, just eliminate the pretest. The use of unobtrusive measurement can also sometimes work
to reduce testing sensitization.

Background Differences between Groups

A final category of artifacts has to do with background differences between the groups of study participants
on variables that may influence the dependent variable. Selection refers to any difference between the
groups on a background variable that may account for an apparent relationship between the independent
and dependent variable. For instance, several years ago, I had an honors student who examined the
effectiveness of an anti-smoking program for fifth-graders. It turned out that she had to compare a
classroom of fifth-graders from the Catholic schools with a classroom of fifth-graders from the public
schools. Because there may be some differences in attitudes toward smoking between public school and
Catholic school children, we could not rule out selection as an alternative explanation of her results.
Selection can be controlled using pretests, statistical control, matching subjects on important background
variables such as sex, education or income, and randomly assigning research participants to groups.
Random assignment to group is the best of these procedures. This involves assigning each person to a
group by using a randomizing procedure such as rolling a dice or using a random numbers table.
Randomly assigning people to groups ensures that any unaccounted differences between groups are likely
equalized, especially if the sample size in the groups compared is large (i.e., 30 or larger). Selection is one
of the most important of all artifacts to account for. It is always a potential problem in surveys comparing
naturally occurring groups (e.g., people choose whether or not to smoke, exercise, or eat high cholesterol
foods). It is often impractical or unethical to randomly assign people to groups. Therefore, researchers
have to use back-up procedures such as matching, statistical control or pretests to control for the most
likely selection possibilities (e.g., sex, age, education). However, none of these control procedures is as
effective as random assignment to group. With matching and statistical control, the possibility always
remains that some unknown background factor was the true causal factor. Only random assignment to
group allows the researcher to control for all possible background variables.

Artifacts in Context
Some of the artifacts are very common. Selection is the most pervasive artifact. Surveys and cross-
sectional studies are all prone to this artifact. Poor procedure reliability and validity is a constant concern
in experimental studies. Demand characteristics are also a probability in many studies of human behavior.
These artifacts are the most worrisome because they are quite common and are relatively difficult to
control. Other artifacts like maturation, regression to the mean, history and testing sensitization are
common in longitudinal studies, but they are relatively easy to control (e.g., add a control group or use a
post-test only design). Researchers sometimes have to employ less that optimal designs because of
practical and ethical concerns. However, the researcher should always use the best methods for controlling
for artifacts that are feasible under the conditions that the researcher faces. We can’t demand perfection
from researchers, but we can require them to account for likely artifacts that may compromise the
interpretation of their research findings.

159
Exercise: Identifying Threats to Internal Validity
Which artifacts may compromise the interpretations of "study results" in the following examples? In
several examples, there may be 2 or more possible artifacts, but there is a primary research artifact in each.
Explain your answer. Choose from the following artifacts.

Selection
History
Maturation
Regression to Mean
Procedure Reliability and Validity
Experimenter Bias
Demand Characteristics

1) At the beginning of the baseball season, the team's star player is suffering from a terrible batting slump
compared to his performance the previous year. The general manager hires a batting coach to work with
the star player. After a month, the player's batting average returns to his performance of the previous year.
The general manager attributes the improvement to the excellent instruction of the batting coach.

2) A researcher finds that children who watch a lot of TV tend to be less trustful of people than children
who watch less TV. The researcher concludes that watching a lot of TV "cultivates" perceptions of a
"mean world" among children.

3) A network research company brings an audience into a theater to see how it rates three different hour-
long pilot programs. These programs are being considered for one time slot in the upcoming season. The
audience sees program A, followed by program B, which is followed by program C. The audience rates
the entertainment value of each program immediately after seeing it. The results how that program A as
most popular, program B in second place, and program C in third place. The research company
recommends that the network adopt show A because of its higher ratings.
4) Ted came down with a cold and decided to see if vitamin C helped him recover faster. After four days of
inhaling vitamin C, Ted begins to feel better. He concludes that taking the vitamin C "cured" him.
5) A blood bank in a city recently started a major public relations drive to encourage people to donate more
blood. After a month, they find that the donation rate has gone up 10% from what it was before the
campaign. The blood bank's director concludes that the public relations campaign increased blood
donation rates.

6) A firm wants to see if its team building workshops have increased employee morale. The CEO talks to
50 randomly selected employees and asks them if their morale has improved. Employees report that their
morale has increased dramatically since the team building workshops were conducted.

7) A researcher for the American Heart Association wants to test whether children adopt healthy eating
habits to reduce their cholesterol after a nutrition instruction program. She uses existing school
classrooms to conduct her study. She uses a classroom from a parochial school to test the effectiveness of
the program on dietary habits, using a post-test only design. She then selects a classroom of students from
a public school as a control group, since all of the students in the parochial school system have received
the treatment. After she collects the data and runs the analysis, she finds that the children who received
the campaign have indeed adopted healthier eating habits.

8) In an experiment, children who scored very high on a pretest of creative ability, show dramatic
declines on post- test creativity scores after watching an hour of superhero cartoons.

9) A researcher conducts a survey among boys ages 6 through 10. He finds a positive correlation between
the amounts of TV that the boys report watching and the number of fights they report getting
into. The researcher concludes that watching a lot of television facilitates the occurrence of aggressive
behavior among young boys.

10) A researcher for a local radio station asks listeners to participate in an evaluation study. The subjects
come to a laboratory and listen to 10-second clips of music. Then they rate how much they like each

161
song. In the first 30 minutes listeners are asked to rate music the station is currently playing, while in the
second 30 minutes they rate new music selections. During the analysis, the researcher finds that listeners
say they like the current music better than the new music.

11) A researcher has a hypothesis that people tend to repeat themselves more when they tell a lie than when
they tell the truth. The experimenter asks study participants to make two truthful statements and two false
statements when asked specific questions by a conversational partner. The experimenter knows which
statements are true and which statements are false. The researcher videotapes the interaction behind a
one-way mirror. She codes the videotapes and compares the truthful statements with the false ones. She
finds that her subjects repeat themselves more when they are telling a lie.

12) A researcher is investigating the relative effectiveness of two methods of teaching math to middle
school students. The traditional curriculum is to be compared to the new curriculum. The researcher
randomly assigns students to the traditional math group (control group) and the new math group (i.e.,
experimental group). The researcher selects a long-term teaching veteran to teach the traditional math
section and a young instructor just out of school to teach the new math section. The researcher wants to
ensure that the actual instructor of each section is comfortable teaching that particular style. The
researcher runs the study and finds that the new math group has superior skills on the math skills test
taken at the end of the year of instruction.
Internal Validity-Extended Example
What artifacts may also account for the results at each stage of the following extended example? Explain.

1) An educator develops a curriculum to teach children perspective-taking skills. She administers a role-
playing exercise that measures perspective-taking skills to a group of second graders at the beginning of
the school year. She re-administers a parallel test of perspective-taking skills at the end of the year.
She finds that children perform dramatically better on her measure of perspective-taking skills than they
did at the beginning of the year. She concludes that her curriculum was a success

2) The same researcher runs a correlational analysis and finds that children who play violent video games
score lower on perspective-taking than children who do not watch violent video games. She concludes
that watching video games depresses a child’s perspective-taking skills.
3) The following year, the researcher decides to do a follow-up study with children who had the greatest
increase in their perspective-taking skills the preceding year. She wants to determine whether the
increase in perspective-taking skills is long-term. Unfortunately, she finds that this group of children
has a significant decline in their perspective-taking skills in the succeeding year. She concludes that her
perspective-taking curriculum has had only short-term effects.

163
Assessing External Validity
When a researcher assesses the likely external validity of a study, she addresses the question of whether the
study has found something of broad significance or something of only local significance. If the study is
replicated with some changes in procedure, sample and context, is it likely that similar results will be
obtained? Are there salient moderator variables or boundary conditions that are likely to influence the
outcomes? When one assesses the likely external validity of a study, a person tries to gage the practical
significance of the findings (i.e., What does this study tell me about that part of the world to which I want
to apply my findings?). In assessing external validity, we ask ourselves if there are likely limitations on
how widely the results of the study can be generalized. One may have a very tightly controlled
experimental laboratory study that finds causal relationships between an independent variable and a
dependent variable. However, you may still be skeptical how these variables relate to each other in "the real
world". This reading outlines some factors that can limit how far the findings of a study can be generalized
or applied.

Limitations Due to Sample Specific Results

One place to start in assessing the external validity of a study is to examine the sample used in the
study. Experimental studies seldom use a large sample that is representative of the general population. For
example, if all of the studies done on a particular research question only used samples of college
sophomores at American universities, one might not be able to generalize the findings beyond college
sophomores at American universities. If we are looking at basic physiological processes, this probably is
not a big deal, because the basic facts of human biology are probably universal. However, when one deals
with variables influenced by historical and cultural factors, the question of generalizability becomes
important. When reading a research report you can ask the question: “Are there characteristics of this
sample that are unique enough that they may limit how far one can generalize these findings?”

Limitations Due to Time Specific Results

The findings of a study may also be limited by the time-frame employed in the study. Most studies of
persuasion look at the short-term persuasive effects of a commercial, a communication campaign, or set of
persuasive strategies. Likewise, many studies of the organizational innovation, measure changes in
organizational climate or productivity over relatively short periods. If no longitudinal studies have been
done on a research question, one can ask whether the studies only prove a short-term effect or relationship
between the variables. Short-term effects may be real, but they may not be very important unless they
also translate into long-term effects.

Limitations Due to Dependent Variable Specific Results

The type of dependent variables employed may also limit the generalizability study results. If a person is
only able to measure one dependent variable, or to employ one particular measure of a variable, then one
may not be able to generalize beyond the kind of data collection method or dependent variable utilized in
the study. If one uses a self-report to measure the dependent variable, one might ask if the effect could
generalize to measures involving behavioral observation. There is the possibility that the observed effect
only occurs when you look at self-reports (Perhaps people lie or are unaware of the factors that influence
their behavior). We can be more confident about the generalizability of study findings when the
results are confirmed with multiple methods of measurement. Likewise, we can be more confident
about the generalizability of the results, if more than one kind of dependent variable is measured. For
instance, if one were investigating the effectiveness of a campaign to convince kids to not take up smoking,
it would be better to have measurements of knowledge about tobacco products, attitudes towards tobacco
products, and behavior towards tobacco products, rather than to have information on only one of these
components. If one only has measures of behavior, one might miss positive influences of a campaign on
knowledge and attitudes. If one only measures knowledge and attitudes, it is not clear that the results also
be generalize to behavior (which of course is ultimately the dependent variable of most interest to the
campaign organizers).

Limitations Due to Independent Variable Manipulation Specific Results

One may also be skeptical about the generalizability of the results because of how the independent
variable was measured or manipulated. Scholars are sometimes concerned that an independent variable
manipulation lacks realism: is qualitatively different from how the independent variable occurs under
natural circumstances. If you want to investigate the influence of humor in radio advertising, you may
wonder if listening to different versions of a commercial in a laboratory is similar to how people usually
listen to commercials while they are in their cars. One might be able to demonstrate an effect on knowledge
and attitudes in the laboratory condition, which fail to approximate real world listening conditions.
Likewise, any one study is probably limited in terms of the levels and kinds of stimuli employed. Hence,
one can question whether one would get similar results if one changed the levels of the independent
variable or the type of experimental stimulus employed.

Limitations Due to Lack of Ecological Validity

A final consideration falls under the heading of what I call “all things are not equal”. By this I mean that
the real world is often quite a complex mixture of factors that may override or supplant relationships that
one finds between variables in a given study. To investigate the relationship between only a couple of
independent and dependent variables, one must simplify the environment of the study to hold all other
things constant. Perhaps the study produced effects that only occur under the very specialized conditions.
It could be that there is a real relationship between the independent and dependent variables under specified
conditions, but independent variables that are more powerful normally suppress influence of this variable.
Likewise, other variables may normally suppress a causal relationship between the two variables. If you
can think of likely factors that may intervene or prevent an effect from occurring, then further research is
called for under naturalistic conditions (e.g., a field study).

Making judgments about the likely external validity of a study is more provisional than judging the internal
validity of study. It is impossible to prove that a study lacks external validity; one can only raise
reasonable questions about likely limitations on generalizing study findings. Raising questions about
external validity is raising questions about the boundaries within which the findings apply. Has the study
uncovered a widely applicable principle, or does it only occur in a rather specialized niche or context?
Questions about external validity serve as a springboard for further research. Researchers will try to
replicate the results by varying the samples, time-frames, independent variable manipulations,
dependent variable measurements, and control variables in the study. Research into the
Exercise:
generalizability of a phenomenon may take many studies over a number of years.
Identifying Potential Limitations of External Validity
Below are several abstracts of studies that have appeared in communication journals. Identify at least one
factor in the study design and execution that should be investigated as a potential limitation on the
generalizability of study findings. Explain why your factor deserves consideration in a replication study.
How should the replication study be carried out?

1. This study examined the association between a listener’s response style and her affective reactions
following a conversation with a distressed individual. Thirty female subjects participated in a fifteen-
minute dyadic interaction in which they talked with a confederate enacting a depressed role.
Observational coding of listener response strategies revealed that subjects who relied on advice-giving,
chit-chat, and/or joking were significantly more depressed and more rejecting of their partners than

165
were subjects who acknowledged the confederate’s mood and used supportive listening techniques.

2. An empirical study was conducted to investigate the practical management of interactions sustaining
close friendships. Ten pairs of close friends were interviewed individually on two occasions and
together on a third occasion. An interpretive analysis of subjects' remarks identified a dialectical
principle governing the communicative organization of friendship. The dialectic of the freedom to be
independent/freedom to be dependent conceptualizes the patterns of availability and co-presence in a
close friendship. Basically, while each person is free to pursue individual interests apart from the other
without the friend's interference or help, each retains the liberty to rely on the other should it be
necessary. In granting each other a combination of these two freedoms, the individuals co-create a
basis for patterns of interaction in their relationship that may curtail their individual liberties.

3) This study investigated the structure of public relations roles. A five-page questionnaire was mailed
to 321 public relations professionals in the Public Relations Council of Alabama (PRCA). The return
rate was 43% with a total of 137 returned surveys. A cluster analysis revealed four primary
practitioner roles and one minor role. Two practitioner types gave high priority to technical activities
even though they also scored high on managerial and boundary spanning activities. A follow-up
analysis showed that the practitioner roles could be differentiated on relevant variables such as
number of years as a professional and the size of the organization that the practitioner worked for.
4) This study examines the relationship between cognitive complexity and conversational recall. It was
hypothesized that individuals high in cognitive complexity will recall more person-oriented
conversational information, recall more person-oriented information in an interference condition, and
recall more total information from a conversation than subjects low in cognitive complexity.
Subjects were 72 college students who observed a videotaped conversation. Recall was elicited by
means of a free recall task and an 18-item multiple choice test. MANOVA and canonical
discriminant-analysis results supported each hypothesis.

167
External Validity Case Study
When I taught at the University of South Alabama in Mobile Alabama, my department chair asked me to
do a survey of 600 households in the Mobile area to test awareness of what I can best describe as a radio
publicity stunt.

Radio stations have to fight for what is left over of the advertising dollar in any metropolitan market. The
newspaper in a given market is likely to attract up to 50% of the advertising dollars. The television stations
take up another third. This leaves the multiple radio stations competing with direct mail and billboards for
the remaining 17%. The radio stations engaged in a cooperative stunt to try and draw attention to the power
of their medium.

Here is what they did. They developed a campaign for what was equivalent to a $15,000 radio only
campaign to run on the fifteen radio stations in the metropolitan market. The product was to be advertised
two times morning and evening during drive time for two weeks. They advertised that a mall was going to
be built under the Mobile River next to the Bankhead Tunnel-a local landmark that everyone knew about.
Of course the Mall was fictitious. To add to the intrigue, the advertisement was placed by an out of town
agency in Tennessee, and the person at the contact agency was conveniently out of town during the
campaign. In addition, the only people who know about the scheme were the station CEOs and the
advertising managers. The news staffs of the radio stations were not informed about the plans. At the end of
the campaign, the radio stations ran a disclaimer commercial informing everyone of the publicity stunt and
proudly proclaiming that it showed the power of radio advertising.

The coordinator of the stunt was a close friend of my department chair and asked him to do a survey to
judge how effective the campaign was, as judged by people who could accurately recall the name of the
mall or its location. I got volunteered for the job because I was the instructor for the research methods
course. I didn't much like the assignment because I had suspicions that they wanted to misuse and
misrepresent the data. My research methods classes knew what was happening as the campaign unfolded
and were sworn to secrecy.

As you might expect, the radio people had quite a bit of fun with their campaign. All of the news staffs of
the newspaper, the television stations, and the radio stations tried to run down the story. There was a flood
of interpersonal discussion about the "project" as well-as judged by comments my classes heard. The
message recording bank of the contact telephone number that was given in the advertisement was
immediately overloaded with inquiries:-including inquiries form city officials, state representatives,
congressional representatives and senators. It only took about four days one of the local TV stations to run
down what was actually happening with the help of the radio institute in New York. The debunking story
ran as the lead story in the evening news.

After two weeks of the campaign we did the campaign awareness survey. We completed 600 short
interviews with a random sample of households in the Metropolitan area. We found that 50% of our
respondents could accurately recall either the name of the new mall or its location. An additional number
had heard that a new mall was coming to town but could not recall any further information.

When presented with the results, the head of the radio coalition said: "We can now say to potential
advertisers that if you spend $15,000 on a radio only campaign, this is the kind of awareness that you can
develop for your product." Critique the radio manager's claim. In other words, assess the likely
generalizability of the particular findings of this survey.
Relating Internal Validity and External Validity
Internal validity and external validity are two features that we would like to maximize in the design of
research. Unfortunately, it is difficult to have high levels of both internal validity and external validity in
the same study. The research procedures that maximize internal validity (i.e., radically simplifying the
research environment, holding environmental factors constant and using relatively homogeneous
populations in order to minimize variance of the dependent variable) also tend to limit the generalizability
of study findings. Likewise, attempts to maximize external validity (e.g., use naturalistic study conditions)
make it more difficult to control for artifacts.

In the case of experiments, internal validity is the highest priority, and external validity is a secondary
priority. In an experiment, the researcher carefully designs the independent variable manipulation. All other
variables are controlled so that the only differences between the compared groups are the levels of the
independent variable. To achieve this kind of control, the experimenter has to set up conditions that are
unlike the naturalistic conditions that the researcher wants to explain. Therefore, it may be difficult to
generalize the results of the experiment to the “real world.” Many experiments only investigate one
independent variable at a time. In the real world, multiple factors usually influence the dependent variable.
Even if one finds that a variable has a causal relationship with a dependent variable in laboratory
conditions, this effect may seldom occur in the real world because other factors usually operate to negate it
or overwhelm it.

In contrast, if a study carefully describes/reflects communication in context, it will ordinarily have low
internal validity. We may be able to describe the relationships between variables in the scene, but it will be
difficult to make precise statements regarding causation. Communication in context involves a complex
combination of variables that are outside the control of the researcher. Studies that have high ecological
validity or external validity usually will have lower levels of internal validity.

The key to handling this dilemma is to conduct a systematic line of research. This allows the researcher
to compensate for the limitations of one study by doing a later study that addresses those limitations. In the
confirmatory phase of research, internal validity tends to be of most interest. When one then wants to know
whether or not the results can be generalized, the original study can be replicated with new samples, new
manipulations of the independent variable, new measures of the dependent variable, and in new contexts.
Field studies often occur in the generalizability stage of research to follow-up on earlier experimental
studies.

169
Unit 11: Objectivist Paradigm-Experiments
This unit expands the issue of internal validity. Experiments are designed to be internally valid. They can
provide answers about the causal relationships between variables after alternative explanations (i.e.,
confounding variables) have been accounted for.

The experiment is a relatively recent social invention. Only in the last 100 years have researchers perfected
the principles and techniques of experimental control. Researchers use research control to rule out
alternative explanations of cause and effect. Classic experimental design manipulates an independent
variable, while holding all relevant environmental and other causal factors constant. The researcher then
observes the effect that the independent variable had on the dependent variable(s). In practice, this requires
creativity and a persistent attention to detail.

Experimental design requires creativity to create an experimental situation that tests an experimental
hypothesis. This unit describes some elementary principles of experimental design. It examines some of the
strengths and weaknesses of the experimental paradigm. The chapter also reviews several approaches to
experimental design.

Unit Objectives

Upon completing this unit, you should be able:


11-1: To explain what must be done to create a valid manipulation of an experimental variable.
11-2: To explain how researchers control for specific artifacts in an experiment.
11-3: To explain the role of moderator and control variables in experimental design.
11-4: To explain the advantages and disadvantages of factorial experiments.
11-5: To explain the difference between a within subjects factor and a between subjects factor.
11-6: To identify the kind of experimental design that is employed upon reading a description of an
experimental study.
11-7: To assess how appropriate a particular experimental design is for a given problem.
11-8: To diagnose a specific problem in an experimental design (i.e., research artifacts) and explain how it
can be corrected.
11-9: To propose an appropriate experimental design for an elementary research or hypothesis.
Research Control in Experimental Design
Experiments are about developing and exercising research control. What do experimentalists control?
They manipulate the independent variable(s) in the experiment; they control how comparison groups are
created; and they control or account for any extraneous or confounding variable(s) that might also affect the
dependent variable(s). Experimentalists obsess about research control because it is the only way that they
can get clean answers to their questions about cause and effect relationships.

Manipulating the Independent Variable


The first step is to design and manipulate the independent variable. The term “manipulate” might be better
termed: “administer the different experimental treatments”. Developing an operational definition of the
independent variable can be tricky. The researcher must choose or develop levels of the independent
variable that are consistent with the research objectives. Secondly, the researcher must develop a valid
measure of the independent variable. Let’s say that I want to develop an experiment to test the impact of
humor in advertising on consumers’ intentions to buy a product. To determine how humor affects
consumers’ intentions, I need to compare humorous advertisements with something else. In the language of
experimental design, I need a control level or comparison level of the variable. The simplest version of my
independent variable might be to create two advertisements. The two advertisements would contain the
same the amount of information, but they would differ in the degree of humor that they contain (i.e., create
a humorous commercial and contrast it with a serious commercial). It is important that my two ads be
equivalent in terms of other background variables such as length of ad, type of product etc. If there are any
systematic differences between the ads, then it will undermine my interpretation that the differences in
humor are responsible for group differences in consumers’ intentions.

If my research question or hypothesis demands it, I may need to go beyond this simple kind of design. If I
ask about how different levels of humor affect consumers’ attitudes and intentions, I will need to develop
and compare advertisements at several levels of “humorousness”. If I want to compare several types of
humor, I will need to identify relevant types and develop at least one advertisement representing each type.
At a minimum, my independent variable should compare at least two advertisements

How can I know that my manipulation of advertisement humor is valid? Perhaps I have an odd sense of
humor. If the people in my experiment don’t find the “funny ad” to be humorous, then I have not tested the
effect of ad humor on consumer attitudes and intentions. One solution to this problem is to pretest my
advertisements. I can have independent judges view the advertisements and rate the ads. I can also include
a manipulation check of the levels of the independent variable in the experiment. If someone asks, "Did
your respondents think the funny ad was funny and the serious ad was serious?" I can show them that this
was the case. In the end, judgments about the validity of one’s independent variable depend upon one’s
assessment of the experimental manipulation’s content validity.

Creating Equivalent Groups

The second major task for the researcher is to create equivalent experimental groups. I do not want the
groups of people that I compare to be systematically different on background demographics or personality
variables. Any systematic background differences between the experimental groups will create a confound
that undermines my interpretation of how the independent variable affects the dependent variable (i.e., the
artifact of selection). At a minimum, I must create at least two groups for comparison. One group will
usually be designated to be the control group. In medical studies, the control group is often called a placebo
group. One group gets a sugar or placebo pill because numerous medical studies have shown that getting a
treatment of any kind can often lead people to “get better”. In many experiments, the control group may
get a conventional treatment. For instance, an experiment might compare a group that gets the standard

171
curriculum with a group getting the experimental curriculum.
So how does the experimentalist create equivalent groups? Several different strategies are possible. First,
the experimenter can create groups that match on demographic variables, such as sex and age, that are
likely to confound the interpretation of the results. Alternatively, the researcher may pretest the
experimental groups on the dependent variables to show that there are no prior differences between the
groups. Researchers can also use statistical controls such as partial correlation to account for potential
confounding variables. However, the best tool for creating equivalent groups is to randomly assign people
to groups. In my experimental group in my humor in advertising study, I might use a random numbers
table to decide which group a person goes into. With relatively large experimental groups, random
assignment reduces the possibility that the experimental groups are significantly different on any
background variables that might influence the dependent variable. Random assignment to group is the best
tool for controlling for the artifact of selection.

Controlling for Artifacts

The researcher must also control extraneous variables or artifacts. We have previously discussed the
artifacts related to time such as maturation, history and regression to the mean. Each of these can be
handled with the presence of a control group. If you are concerned about any of these time-dependent
changes influencing the dependent variables, then these changes should appear in both the control groups
and the experimental groups equally. If the groups subsequently vary on the dependent variables, we can be
confident that the differences are due to the experimental treatment and not to changes that would occur
merely with the passing of time.

If an experimentalist is concerned about testing sensitization (something in the pretest sensitizes the person
to the experimental treatment and hence their scores on the dependent variables on the posttest), he can use
a post-test only design, or add groups that include a post-test only design. If he is concerned about demand
characteristics (participating persons' responses are biased by what they think the purposes of the study
are), the researcher can carefully design the study to conceal the real intent of the study. As you read some
experimental studies, you may be puzzled by some of the procedures that are put into the study. These
procedures often control for demand characteristics.

To control for experimenter bias, the researcher can ensure that the study is carried out by people who are
blind to the experimental hypotheses. A classic control feature of experimental design is a “double blind
study”. In a double-blind study, neither the research administrator, nor the persons participating in the
study know the hypotheses associated with the study. In the case of a placebo study, neither the nurse nor
the doctor taking the results will know whether the person is getting the placebo or the experimental drug.
A double blind study is complex to administer, but it is a powerful research tool because it controls for both
demand characteristics and experimenter bias.

The experimentalist also needs to control the administration of the experiment to ensure procedure
reliability and validity. The researcher must ensure that every person within the same experimental
group gets the same independent variable manipulation. The experimentalist must also ensure that the
only systematic difference between the compared groups is the desired difference on the independent
variable. Unreliable variable manipulation and inconsistent administration of measurement protocols make
experimental results uninterpretable. Any unintended systematic differences between groups create
potential confounds. It is impossible to control for all possible compromises of procedure reliability and
validity. However, a good experimentalist controls for the threats to procedure reliability and validity that
are likely to be important.

It is possible to rate experimental designs on how well they exercise research control. The most inadequate
designs lack a control group. A single period case study with a pretest and a post-test falls into this
category. One-time case studies are subject to the artifacts of history, maturation, and regression to the
mean. One-group case studies are best used to generate hypotheses. In some cases, it is possible to utilize
time comparisons/changes as a baseline to establish some basis for making causal claims about causation.
If a researcher only has one group to work with, she might consider using single group time-series
comparisons. This approach compares a group to its own baseline including several measurements before
and after the experimental manipulation. If a significant change occurs only in the period where the
experimental treatment occurs, one can have confidence that the change is not due to maturation or
statistical regression. However, it is more difficult to completely discount the possibility of history. An
experimental variable that is tested with one group in repeated experimental groups is often referred to as a
within-group factor because the same people are involved in each experimental condition. An experimental
variable that assigns different people to the experimental groups is called a between groups factor.

A second type of inadequate design involves cases where it is impossible to ensure that experimental
groups are equivalent. In field research, nonequivalent research designs are often required because the
institutional setting does not permit randomly assigning people to groups. Several years ago I advised an
honors thesis in which a student was evaluating the effectiveness of an antismoking campaign with fifth
graders. All of the fifth graders in the Catholic system had received the educational program. To get a
comparison group, she had to get access to a class in the public school system that had not received the
same program. As there are some important demographic and social differences between parochial students
and public school students, we could not ensure that selection was not a factor influencing the results.
Hence, we treated the study as an exploratory study rather than a rigorous hypothesis testing procedure. If a
researcher must utilize potentially nonequivalent groups, she can use pretests/posttests. A more adequate
research design includes safeguards such as random assignment and pretests and posttests. If testing
sensitization is a major concern, random assignment post-test only designs are appropriate.

Factorial Experimental Designs

Up to this point, I have described experimental studies that have one independent variable. In fact, many
experimental studies have two or more independent variables or factors. Studies with two or more
independent variables are factorial designs. Factorial designs have several advantages. First, they allow an
experimenter to combine several experiments. Therefore, factorial designs are more efficient than single
factor designs. The researcher can assess the main effects of each independent variable. Factorial designs
also enable the researcher to include control or moderator variables that may interact with the other
independent variable in terms of the effect that it has on the dependent variable. Factorial designs allow the
researcher to investigate for potential interactions between the independent variables.

An interaction exists when the effect of one independent variable changes when it is combined with
different levels of another independent variable(s). Interactions come in many shapes and sizes. The effect
of one independent variable can be enhanced or suppressed by a level of the second independent variable.
We might find for instance, that humorous advertisements have a considerable positive impact on male
consumers’ intentions to buy a product, but have only modest effects on female consumers’ intentions to
buy the product. This is an example of an interaction in which the sex of the viewer qualifies effect of the
ad humor variable on what viewers recall. In other words, the type of ad has different effects on male and
female viewers. Sex and ad humor interact in affecting the dependent variable. If a researcher has a
significant interaction, he must interpret the interaction before he interprets the main effects of each
independent variable. If a person has three or more independent variables, the interpretation of
interactions can become very complex. A study with three independent variables has four potential
interactions, whereas a study with four independent variables has 11 potential interactions. The potential
difficulty of interpreting complex interactions is both a strength and weakness of factorial designs.

In a research article, you may see something like the following: “It was a 2 X 2 between subjects design.”
Each “2” refers an independent variable with two levels. A 2 X 2 design is the simplest type of factorial
design. It has four cells or groups. If I have two types of ad I am comparing, and I have included sex as a
second control variable, then there are four experimental groups or cells (i.e., males who saw the serious
ad, males who saw the humorous ad, females who saw the serious ad, and females who saw the humorous
ad). If I had a study comparing three types of ads across the sexes, I would have a 3 X 2 design with six

173
groups. You can figure out the number of cells by multiplying the numbers in the study description. The
number of independent variables is indicated by how many numbers are in the factorial design description.
The magnitude of each number reflects the number of levels for that particular variable.

There are two graphs depicting factorial designs below. One depicts an interaction and one depicts a main
effects only outcome. In the main effects example, the effects of the independent variables are additive
(i.e., the graphed lines are parallel to one another). In the interaction, the effects of the independent
variables are interactive not additive (i.e., graphed lines are not parallel to one another). In an interaction,
the independent variables influence each other in terms of how they affect the dependent variable.
175
Exercise: Identifying Design Structure
Read the research abstracts in the examples below and answer the questions that follow.

1. Research Abstract: Some restaurant owners want to get as many diners in and out of restaurants as
possible without having anyone feel rushed. Researchers piped music into the university's faculty and
student cafeteria. They observed 13 regular patrons eating on three separate days. One day the researchers
played fast instrumental music (120 beats per minutes), the next day slow instrumentals (60 beats) and on
the third day, no music. They repeated this pattern for 12 days. People ate slowest of all (3.15 bites per
minute) when no music was played. With slow music they munched a bit more quickly (3.99 bites per
minute). However, on fast instrumental days, they chomped away at a rate of 5.2 bites per minute.

A. Who are the subjects? How were they assigned to experimental conditions?

B. What is the independent variable? How was it operationalized? What is the dependent variable? How is
it operationalized? Which group serves as the control group for this experiment?

D. What kind of design is this (single factor or factorial)?

E. Is this a between or within-subjects design? Explain.


2. Identify the independent variable and very briefly describe how it was operationalized. Then identify the
dependent variable and very briefly describe how it was operationalized.

This experiment randomly assigned 417 undergraduate male and female students to watch one of three 70-
minute films (male oriented pornography, female oriented pornography, and a control group that saw a
romantic comedy). The participants filled out a 13-item Rape Myth Scale (i.e., the belief that women who
get raped have done something to deserve it). Compared to the control group, persons in the two groups
that saw a pornographic movie were more likely to endorse the rape myth.

3. Name the independent variable in the following abstract and list its three levels. Then identify the
dependent variable. List two research hypotheses in this example. Fifty-three married couples were
randomly assigned to engage in activities for 1.5 hours each for 10 weeks that were self-defined as (a)
exciting or (b) pleasant, or couples were in a (c) no-special activity control group. When the two activity
groups were compared to the control group there were no significant differences in marital satisfaction.
However, when the exciting activities group was compared with the other two groups together, the couples
in the exciting activities group reported higher levels of marital satisfaction.

In a 2 X 3 X 2 experimental factorial design, how many independent variables does the study have?
How many cells or groups? Unit 12: Objectivist Research-Surveys
This unit gives a brief overview of survey research. This unit focuses on directive interviews or surveys
that use a standardized list of questions. Such surveys primarily consist of closed-ended questions (i.e.,
researcher defined answers). Surveys collect factual information about behaviors and preferences from a
broad spectrum of people. They are usually given to a sample of a larger group (population). Sample
statistics are then used to estimate unknown population parameters (e.g., percentage of households in
Jefferson County with broadband Internet access). This unit introduces common issues related to sampling
and methods of survey data collection.

Unit Objectives:

Upon completing this unit, you should be able:


12-1: To explain how different types of error affect a researcher's ability to make inferences about
population parameters.
12-2: To explain how each component of survey error can be minimized or be taken into account.

177
12-3: To design and propose an adequate design for an applied sampling problem.
12-4: To evaluate the value of different probability sampling procedures for a specific research question.
12-5: To explain the relative advantages and disadvantages of self-administered surveys, personal
interviews, and telephone interviewing.
12-6: To explain the kinds of information that should be included in a survey report so that readers can
evaluate survey results.
12-7: To explain the advantages and disadvantages of open-ended survey questions.
12-8: To describe the advantages and disadvantages of the most common forms of closed-ended questions
in survey research.
12-9: To critique a survey protocol and improve it.
12-10: To design a survey protocol for a specific research problem.
Estimating Population Parameters: Sources of Error
The federal government conducts a census every 10 years. The census is an attempt to count the number of
inhabitants in the country. Census takers attempt to contact every household in the country. The United
States Constitution calls for a census of the population every decade. The primary purpose of the census is
to allocate the correct number of congressional representatives to each state every ten years. In fact, the
census does much more than count the number of people in the United States. The United States Census
Bureau also collects information about the economic and social conditions of the country. Government
bodies use this information for planning and public policy purposes and businesses utilize census
information to plan business activity. It takes several billion dollars to pull off a census, so it is not
surprising that this event occurs only once every ten years.

Let’s say that I would like to know how many people in the Jefferson County have access to Internet from
their homes. The actual percentage of people who do is a population parameter: an attribute of the
population that can be represented numerically. Because of the overwhelming costs of doing a census, we
usually fall back on using a sample statistic to estimate the population parameter. A sample is a subset of
a population that is chosen to represent the larger population. A sample statistic is any attribute of a sample
that can be represented numerically. Researchers ordinarily select a sample, collect information about the
attributes of that sample (e.g., Do people have Internet access from home?), and then calculate the sample
statistic. The sample statistic is used to estimate the range within which the underlying population
parameter is thought to lie (i.e., the actual percentage of people in Jefferson County that have Internet
access from home).

Estimating a population parameter from a sample statistic is always a challenge. There are three sources of
error researchers must cope with as they make such estimates. The problem of estimation error is that we
seldom find out where the true population parameter is located (election results are a notable exception).
First, there is the familiar problem of measurement error. If I have an invalid measurement of Internet
access (e.g., social desirability bias), my estimate of the population parameter (i.e., the percentage of people
who have Internet access at home), will be inaccurate. Measurement error associated with question wording
is a persistent problem in surveys. It is a “bad” form of error, because it can’t be estimated. Because of
this, researchers should eliminate as much measurement error as possible.

Systematic sampling error is a second type of survey error. Systematic sampling error occurs when
some parts of the population are systematically over-selected and other parts of the population are
under-selected. A sampling procedure can be biased towards over-selecting some groups and under-
selecting others. If you conduct telephone surveys, your sampling frame leaves out households that do not
have telephones. If you use the telephone book for your sampling frame, you leave out people with
unlisted telephone numbers, people who have changed their residence since the telephone book was last
published, and people who have just moved into the community. As with measurement error, some residual
systematic sampling error is inevitable. There are always people in a population that are inaccessible
(people in prison, nursing homes, or people who are seldom home). In addition, some people refuse to
participate in a survey. Even if you have an exhaustive sampling frame, your final sample may be biased.

It is not possible to precisely estimate systematic sampling error. Researchers are under an obligation to
reduce it as much as possible. The usual recommendation is to use a probability sample. In a probability
sample, every member of a population has a known probability of being selected. If you have a complete
roster of the members of the population, a simple random sample can be drawn. However, researchers
usually don't have such a list, so statisticians have developed techniques such as cluster sampling and
random digit dialing, to generate probability samples. For instance, although telephone interviews exclude
households that do not have telephones, in most locales in the United States, nearly 95% of all households
have telephones, so this source of bias should create only minimal problems. Hence, random digit dialing is
often the method of choice for survey research.

179
Random sampling error is the third type of survey error. Random sampling error is error that occurs due
to chance. If you were to draw repeated samples of a given size from a population, your sample statistics
would vary somewhat from sample to sample. This variation in sample statistics from sample to sample is
random sampling error. The random variation of a sample statistic from sample to sample (i.e., for samples
of a given size) constitutes a sampling distribution. Statisticians identify the sampling distributions that
accompany particular statistics and sample sizes. Because we know these properties, we are able to
estimate random sampling error that occurs when we attempt to estimate population parameters from a
sample statistic.

Researchers regard random sampling error as a “good” type of error. Random sampling error is “good” in
the sense that it can be estimated. Researchers can decide what amount of error they can tolerate in
estimating a particular parameter given available resources (i.e., time and money). When a researcher
estimates a population parameter from a sample statistic, they usually calculate a confidence interval or
range within which the true population value is likely to fall. In survey research using nominal level
measurement, you will hear or read this as the plus or minus percentage points error. There is always a
confidence level (conventionally a 95% confidence) associated with this confidence interval. In common
parlance, when a researcher draws a confidence interval at a stated level of confidence, she is saying that if
she drew repeated samples, the true population parameter would fall within her confidence intervals on
average about 95% of the time.

One can construct confidence intervals at higher and lower levels of confidence than 95%. The 95%
confidence is a social science convention. If you need more confidence in making a parameter estimate,
one can construct a confidence interval at a more rigorous confidence level (e.g., 99% confidence level).
With one exception, you can construct a confidence interval for any confidence level in estimating a
population parameter. The exception is that there is no confidence interval for certainty or 100%
confidence. Statistics are useful, but they deal with probabilities, never certainties. The downside of
demanding more confidence is that the confidence interval becomes wider

The second important feature of random sampling error is that it decreases as sample size increases. This
gives researchers something positive that they can do when they need high levels of confidence in making
their parameter estimates. The researcher chooses a confidence level and a confidence interval that he
thinks is appropriate for the project, and then finds the sample size that is necessary to achieve these
objectives. In reality, the cost of drawing a large sample is often a deciding factor in deciding how much
estimation error a researcher can live with. Whatever decisions the researcher makes, it will be an informed
decision: she will know what kind of accuracy she is getting for her money.

Now there are some problems in application. One is that estimates of random sampling error assume that
one has no measurement error and no systematic sampling error. In fact, the confidence interval and
confidence levels are meaningless if large amounts of these errors are present. Small amounts of
measurement error and systematic sampling error are present in any data collection effort. Therefore,
researchers should be cautious in how they use these figures. Researchers should be clear that the estimates
only apply to random sampling error. It does not account for instances for when people misunderstand a
survey question, disclose false information, or over sample certain groups. The confidence interval is a
rough estimate or approximation of the likely degree of parameter estimation error. More than anything
else, researchers have an ethical obligation to be modest in the claims that they make.

Sample size is less important than the representativeness of the sample. Sample size decreases in sampling
error only when one has probability-based samples. If you have a very large sample with systematic
sampling error built into it, increasing sample size does not reduce sampling error. A large systematically
biased sample leads people to have more confidence in the parameter estimate than they should. A sample
of 2000 people provides a more accurate parameter estimate than a biased sample of 10 million people.
Exercise: Drawing Confidence Intervals
The SGA is conducting a poll to determine if students support a proposal to implement a $20.00 semester
charge for printing on laser printers on campus. A student would receive a card to enable him to print up to
500 copies (4 cents per copy). Six hundred full-time students are randomly selected from the registrar's list
of enrolled students. These students are interviewed via telephone (Assume that it is a simple random
sample). The survey finds that 52% of the students surveyed are in favor of the fee increase and 48% are
opposed to the fee increase. The sample has a standard error of 2.0%. Construct an 80% confidence interval
around your best estimate of both population parameters (both percentages). Then determine whether you
can conclude that more students actually favor the printer proposal than are against it at the 80% confidence
level. Use the normal curve table to find the multiplier that corresponds to the selected interval. Explain
your conclusion
.

181
Issues in Sampling
Sampling is a complex topic. You can take statistics courses that deal with nothing but sampling. As we
have previously noted, the most important element in sampling is to draw a representative sample. A
second important element in sampling is to draw an adequate sample size. The first attribute is more
important than the second attribute. The next several pages explore some of the issues related to sampling
and describe different sampling procedures.

Non-Probability Sampling Procedures

You have filled out surveys for class projects or you have filled out a survey by marketing researchers in a
mall. In this “version” of sampling, the interviewer asks anyone who is willing to fill out the survey. This
type of sampling is called convenience sampling, but it is really haphazard sampling. It is haphazard
because there are no controls for systematic sampling error. Students often confuse haphazard sampling
with random sampling. I have seen class presentations where students claimed to have carried out “random
sampling” in their research for clients. Students have heard that a random sampling is a good thing, so it is
important to describe your sample as a random one. However, further description reveals that the sample
was a haphazard or convenience sample. When I challenge this characterization of the sample, I hear
something like, “well I just picked people at random.” To which I must respond, “No you didn’t! At best,
you randomly sampled the people you haphazardly bumped into! The people whom you associate
with are a highly selective subset of all of the people in the population.” For purposes of honesty and
ethical practice, please remember the distinction between convenience sampling and probability sampling.

The problem with convenience sampling is that one shouldn’t use sample statistics from convenience
samples to estimate population parameters. Haphazard sampling procedures tend to over-represent some
portions of the population and under-represent others. If a research project draws subjects from the people
who frequent a local mall, it over represents the types of people who frequent malls (that mall in particular)
and under represents others who seldom visit malls (e.g., middle-aged males). Interviewers also have a
tendency to approach people who are similar to themselves in terms of age, sex and socioeconomic status.
Haphazard sampling is not viable if you want to estimate the parameter for a population.

There are situations where convenience samples are defensible. If you are doing an exploratory study, a
convenience sample if often appropriate. For instance, it is perfectly okay to use a convenience sample to
pretest a survey. It is also permissible to use a convenience sample to examine relationships between
variables. Convenience samples are fine because you are not using your sample to estimate an attribute of
a larger population. The lack of a random sample may bring up important questions about the
generalizability of the findings, but this is a secondary concern in many studies. For instance, many
experimentalists prefer to minimize individual variation when they construct experimental groups. The
more that people in both groups come from similar backgrounds, the less likely that it is that the artifact of
self-selection will be a problem. In addition, confining one’s sample to a select subgroup makes it easier to
for experiments to detect or document the influence of independent variables on dependent variables. In
marketing surveys, for instance, the researchers are sometimes interested in identifying differences in
consumer preference in different demographic groups. If you look at the samples used in communication
research articles, many of the samples are convenience samples.

A somewhat better sampling technique is quota sampling. In quota sampling, the researcher knows the
overall demographic composition of the population on given variables (e.g., sex, race, age, residence).
Frequently, these percentages come from census data in a geographic area. The researcher uses this
information to set up sampling percentages for each sampling category that match those of the overall
population. For instance, if 10% of the target population is white males over the age of 55, then the 10% of
the total sample will also be white males over the age of 55. In a quota sample, each interviewer receives a
quota of the number of people she must interview in each demographic cell (e.g., 18-24 year old female).
The quota sample ensures that the final sample proportions do not over-represent or under-represent the
percentages of people on the selected demographic variables. Quota samples are usually constructed
around variables such as sex, age, race, education and income. From a practical standpoint, it is difficult to
develop quotas on more than three variables, because it puts too much of a burden on the interviewers to
find people who fit in very specific categories (e.g., white female, between 24-35 with a grade school
education). Despite its limitations, a quota sample is a significant improvement upon a convenience sample
when it comes to estimating a population parameter.

A quota sample may suffer from systematic sampling bias. It is possible for serious distortions within quota
categories on some other background variable. Interviewers tend to get a hold of the most accessible
people within each category. For instance, if interviewers are doing telephone surveys using the quota
method, they may primarily conduct their surveys during the day. They may be able to fill their quotas, but
the final sample would over-represent the retired, the unemployed, or people who have nontraditional work
schedules. Because of the possibility of systematic sampling error, it is not possible to calculate precise
estimates of random sampling error. While quota samples are not as desirable as probability samples, they
are much better than convenience samples. Moreover, it is impossible to entirely get rid of systematic
sampling error, even if one begins with a perfectly representative sampling frame. In the author’s opinion,
quota sampling is often acceptable because it can be applied very efficiently. If you select the right
variables to set up your quota (i.e., variables which affect the attribute being estimated), quota samples can
provide useful estimates of underlying population sentiments and trends.

A purposive sample is a third kind of sampling procedure that is often used. In a purposive sample,
members of an existing social group or groups are sampled. A group is chosen because it represents
particular viewpoint. If I want to get an indication of how people with different political viewpoints feel
about a proposed environmental law, I might survey members of the Chamber of Commerce and a local
environmental group. Purposive sampling is employed when researchers want to compare the points of
view of people with different social and political philosophies. If the purposive sample represents a
particular point of view, it can provide useful information. However, population parameters should not be
estimated from purposive samples.

Probability Sampling Procedures

This brings us to a discussion of probability sampling in which every member of a population has a known
probability of being selected. Probability sampling procedures are also called random sampling
procedures. Probability samples are advantageous because they reduce systematic sampling error and the
remaining random sampling error can be estimated.

Simple random sampling is the simplest probability sampling procedure. This procedure begins with a list
of all members of the target population. Then one randomly selects members from the list using a random
numbers table or some other type of randomizing device. Systematic sampling is a slight variation of
simple random sampling. In this version, if one wants to randomly select 1/10th of the total population, the
researcher uses a random numbers table to provide a random start for cases 1 through 10. After the first
case is selected, (e.g., person number 3), the researcher then uses a skip interval to select every tenth case
after that (3rd case, 13th case, 23rd case etc.). In practice, there is little practical difference between simple
random sampling and systematic sampling.

Stratified random sampling involves taking multiple random samples. Let’s say that I have a population
list that is separated out or stratified by sex and age. I would calculate the percentage of cases in the
population in each category, and then I would randomly select the correct proportion of cases in each
category. The difference between quota sampling and stratified random sampling is that that the researcher
has a random sample within each sampling category in a stratified random sample compared to a haphazard
sample within each category in a quota sample. If I have a sample size of 1000, and I have calculated that
15% of my population consists of females under 30 (i.e., sample is stratified by sex and age), I would select

183
150 females under 30 at random from the overall population of females under 30 years old. I would also do
this for all of the other combinations of sex and age. Stratified random sampling is superior to simple
random sampling if one uses the right stratification variables (e.g., ones that are correlated with the variable
being estimated). Stratified sampling removes the stratified variables as potential sources of random
sampling error. (i.e., we know that the final sample does not over-represent or under-represent the
proportions of the population in each category).

The main problem with the random sampling procedures discussed so far is that they require a complete list
(i.e., sampling frame) of the population. In the real world, we usually lack a complete list of population
members. An incomplete sampling frame is an ongoing problem that researchers cope with. For instance,
researchers are sometimes tempted to use telephone book listings as their sampling frame. However, in
many areas, the telephone book is a very selective sampling frame. It leaves out people who do not have
phones, unlisted telephone numbers, people who have recently moved into the community and people who
have recently moved within the geographic area. The incomplete nature of the telephone book as a
sampling frame means that one will likely draw a sample that has a lot of systematic sampling error built
into it.

Over the years, researchers have developed a number of techniques to overcome the fact that they often do
not have lists of all population members. One of the most frequently used procedures is random digit
dialing. In this procedure, a researcher develops a comprehensive list of telephone numbers and then
randomly selects telephone numbers to call. This leaves out households without a phone, but it is a
significant improvement over using the telephone book. Random digit dialing is widely used because it can
be done relatively quickly and cheaply. In practice, random digit dialing can be time consuming because
one ends up calling many businesses phones and unallocated exchange numbers when making calls (i.e., on
a given exchange, only the first 8,000 numbers may be allocated, meaning that 20% of the calls the
interviewers make, get disconnect messages).

Cluster or multistage sampling is another sampling procedure used to overcome the lack of a list of a
population. You can use a cluster sample, if you have a list of secondary sampling units. For instance,
even though I may not have a list of all high-school students in the state of Kentucky, I easily can obtain a
list of high schools in the state along with their enrollments. In cluster sampling, I would first sample
clusters, and then get rosters of members from each selected unit. I might first randomly select 50 schools
in the state of Kentucky, get a list of the students at each school, and then randomly select students within
each school. If the clusters that I am sampling vary according to their size, I can adjust the probability for
selecting each cluster to reflect its relative size (e.g., a simple random sample of schools with the same
number of students sampled from each school would ultimately over-represent students from small schools
and under-represent students from large schools). Cluster sampling usually involves more random error
than simple random sampling because sampling error from each stage enters into the process. This problem
diminishes, however, if one samples a relatively large number of clusters in the first stage. However, this
also makes the data collection effort more expensive.

In the end, few samples that researchers draw are perfectly representative of the target population.
However, with care researchers can reduce the likelihood of making large errors. If samples are carefully
selected and the resulting sample statistics are interpreted appropriately, survey research can provide very
useful information.
Question Formats
This reading introduces several question formats that are widely used in communication surveys. The
advantages and disadvantages of each format are discussed.

Rank-order questions are popular in communication research. For instance, in agenda setting research in
mass communication studies, survey respondents indicate how important various issues are to them in a
political campaign, as in the example below.

Please rank order the following issues in the 3rd District congressional race in terms of how important
they are to you. Please give a rank of 1 to the issue that is of greatest importance to you, a rank of 2 to the
next most important issue and so on.
____ Crime
____ Education
____ Health Care Reform
____ Campaign Finance Reform
____ Environmental Issues
____ Social Security
____ Tax Cuts
____ Foreign Affairs
____ Unemployment
____ Economic Growth

Rank-order questions are easy to construct. They are also easy for respondents to understand. The format is
also economical in terms of the amount of space it requires. One is able to cover a number of items with
one heading and set of instructions, as opposed to needing a separate question for each item. Most
respondents can answer rank-order questions as long as the number of ranked items is manageable (i.e. not
more than 10-12 items).

There are also several disadvantages of the rank-order format. First, rank-order items are hard to employ in
telephone surveys. Telephone surveys depend upon the respondent being able to easily remember the
question and the answer options. Ordinarily, telephone surveys should not have rank-order formats that
have more than four or five options. Another disadvantage of rank-order items is that they provide only
ordinal level measurement. Other question formats more closely approximate interval level measurement
and hence permit the use of more powerful statistical techniques than do ordinal level measures. A third
disadvantage of ordinal level measures is that beyond 10-12 items, people begin to lose their ability to
make meaningful discriminations in rank. People are often clear about their top choices and their least
preferred choices, but they are often uncertain about their rankings in-between. This raises the possibility of
significant measurement error and reinforces the measurement principle that you can't force people’s
judgments to be more precise than they actually are.

One of the most widely used formats in social science research is the Likert item. The Likert format is
named as after a social psychologist by the name of Rensis Likert. Likert items consist of a statement with
which the respondent agrees or disagrees.

Circle the response that most accurately reflects your position on the following question.

"The Jefferson County Public Schools should use a phonics program to teach reading."

Strongly Agree
Agree
No Opinion
Disagree
Strongly Disagree

185
Likert items approximate interval level measurement. Likert scales are popular because they are very easy
to construct and they are ideal for telephone surveys. One can cover many items with one set of
instructions. The response categories are easy to remember, so an interviewee can respond to an extended
list of questions. The ease of construction and the efficiency of the format contribute to its popularity.

Despite its popularity, the Likert scale has several limitations. First there is the question of whether one
should include a response item between the Agree and Disagree options. Many researchers put in a middle
category, but they differ on what they put there. Sometimes the middle category is listed as neutral, in
other cases it is listed as uncertain or don't know. A person can be committed to a neutral attitude on a
topic. On some issues, people avoid forming an opinion because they think the issue is trivial or because
they don't want to take sides. This is a rather different position than not being able to form an opinion on a
topic because of conflicting feelings or because of a basic lack of information. In some cases a researcher
may want to distinguish neutrals from the uncertains/don't knows. In this case, she may want to include
"neutral" as a middle response, and include a "don't know" response off the scale. People who give the
uncertain or don't know response are not included in the calculation of group means etc. on the question.

A second interpretive difficulty with Likert scale items is that they must be stated in unequivocal language.
If you were to use a statement like "It probably would be a good idea for the Jefferson County Schools
to go to a year around school schedule." the response categories don't make sense. One can't strongly
agree or strongly disagree with a statement that is hedged or is stated in equivocal language. In studying
attitudes on an issue, researchers are limited to statements that clearly capture the polar opposites of the
attitude continuum. Likert items are often better at measuring the attitudes of people with very strong
opinions, but less accurate in gaging the shades of gray in-between. The statement format also makes Likert
scale items more subject to idiosyncratic response due to question wording. Hence, it is usually not a good
idea to compare surveys in which similar but not exactly worded Likert items have been used. An
alternate format that gets around both problems is to put labels that directly indicate attitude strength on an
issue such as the example below.

Circle the response that best characterizes your attitude toward the proposal that the Jefferson
County Schools go to a year around school schedule.
Very Negative
Negative
Neutral
Positive
Strongly Positive
Don't Know

A third popular format for determining the overall meaning that an object, person or concept has for a
person is the semantic differential. This semantic differential approach to measuring concept meaning was
first employed in cross-cultural research. According to research by Osgood and Tannenbaum, humans
make judgments on the dimensions of evaluation, activity, and potency. On a semantic differential scale, a
respondent rates an object on numerical scales anchored by polar descriptors.

Louisville's Triple A baseball team is going to change its name to the Bats. Please rate the new team
name on the following scales. What associations does the name bring to mind for you?

Bad 0 1 2 3 4 5 6 7 8 Good (Evaluation dimension item)

Weak 0 1 2 3 4 5 6 7 8 Strong (Potency dimension item)

Passive 0 1 2 3 4 5 6 7 8 Active (Activity dimension item)

The semantic differential has several distinct advantages as a question format. The most important
advantage is that it provides a means for comparing the meaning of an object or a concept among very
different groups of people. Research shows that these dimensions capture a considerable portion of
meaning in human cognition. A related advantage of the semantic differential is that a researcher can
easily develop multiple item inventories to gage each of the three dimensions based upon past research.
Semantic differential scales usually have good reliability because extensive research has refined the
semantic differential dimensions. Even though the concepts rated may change dramatically, the underlying
consistency of the rating scales remains quite stable. The semantic differential is also advantageous as a
question format because it constitutes a comprehensive approach to determining the meaning that an object
or concept has for a person. One can compare the meaning of concepts in three-dimensional space and
compare the meaning of different objects. Semantic differential scales are widely used by marketing
researchers who want to develop a niche or identity for a product. The semantic differential provides a
powerful tool to analyze the meaning of a product in three-dimensional space. A final advantage of the
semantic differential is that it represents interval level measurement. One can usually utilize parametric
statistics without reservation.

There are several limitations in using semantic differentials. The most important limitation is that
researchers are often interested in different dimensions of meaning than the three dimensions of evaluation,
activity and potency. These are three important dimensions, but they do not exhaust the meanings that
objects and concepts have for people. In particular, researchers that are oriented toward qualitative research
suspect that the power of the dimensions varies across cultures. People sometimes have difficulty filling
out the semantic differential correctly. Respondents sometimes circle the adjectives at the end of the scales
rather than the numbers. Researchers should consult with previous theory and research when developing
semantic differential scales. Semantic differential scales can be difficult to use in telephone surveys.
Respondents sometimes find it hard to visualize the adjectives and the intervening points. Finally,
respondents are sometimes unclear what the midpoint on a scale means because it does not have label
attached to it. Despite some of these limitations, the semantic differential is a staple of survey researchers.

A fourth popular question format is the forced-choice question. Forced choice items ask you to indicate
which of two alternatives you prefer, or are most like you. Forced choice items are particularly popular in
personality tests such as the Minnesota Multiphasic Personality Inventory (MMPI). Some of you may have
taken career aptitude tests that use forced choice items such as in the following example.

Which of the following activities would you choose as a leisure activity?


A. Read novel on the New York Times best seller list
B. Go shopping

Forced-choice items are used to determine what tradeoffs a respondent makes between a specific set of
alternatives. Each item is paired against all other items in the array. This format challenges a respondent to
make specific choices among the set of alternatives. By systematically comparing all of the elements in the
set, one gets comprehensive information about a person’s preferences.

On the side of disadvantages, forced-choice questions require respondents to make choices with which they
have little familiarity. For instance, in the example above, I may seldom engage in either activity. A
respondent may feel that the choices she must choose between are so unfamiliar that her answer means very
little. Alternately, she may find the choices so unappealing that she doesn’t consider any of the options
feasible activities (e.g. Would you prefer to be executed by hanging or by electrocution?). Forced-choice
questions are usually aggregated into scales to summarize a person’s preference of the alternatives. The
resulting scales are seldom independent of one another. If you score high on one of the alternatives, you
will score lower on the other scales. This creates some difficulties when it comes to interpreting the scale
scores. It is probably best to focus most on the relative rankings of the alternatives rather than differences
of magnitude between the options. Because of the non-independence of the scores, one should also use
nonparametric statistics in the data analysis.

The four question formats reviewed here are the most commonly used ones. Many other types of questions
appear in survey research. In fact, researchers adapt whatever tools they need to meet the measurement
requirements of a particular survey research project. As in any other line of work, no one tool has universal
advantages.

187
Survey Design Issues
In American culture, we have become quite familiar with surveys. Surveys may be the single most utilized
social science research method. The methodology of the survey is deceptively simple. We ask people
questions about their behavior, knowledge, values and attitudes. Their answers are taken to accurately
reflect some aspect of reality. However, the ordinary survey makes some large assumptions about people’s
willingness and ability to report information. A survey is an indirect measure of a person’s behavior,
beliefs and values. In some cases, a survey may be the best available way to collect information; in others
situations, it is a highly inadequate research methodology.

Surveys deal with self-report data. To assume that self-report data are valid, we assume that the person
understands the question as the researcher intended. If we survey a number of people using the same
question, we also assume that all the people in the sample understand the survey question and response
options in similar ways. In other words, to compare people’s answers we need to assume that all the people
in the survey share denotative meanings for the research question and response options. The assumption of
“common meaning” limits what the survey researcher can hope to accomplish. The researcher must use
common concepts that all members of the survey are likely to share. This entails using words and
vocabulary that are understandable to the least educated respondent answering the survey.

If one assumes that survey data are valid, one also assumes that the person’s answer is a true response and
not merely a measurement artifact. This is especially a problem with surveys that inquire about
respondents’ attitudes and beliefs. When you ask a person about their attitudes or beliefs on a topic, one
assumes that they have attitudes on the topic or issue. There are many objects, people and concepts that I
don’t have a meaningful opinion about. For instance, I don’t really have a definable attitude as to whether
or not the “X games” are a “true sporting event”. I haven’t seen an actual competition, so I don’t really
have an attitude on the question. People will sometimes create opinions that didn’t exist prior to being
asked a survey question. In these instances, one can legitimately question whether the results are
meaningful or are merely “measurement noise”. Researchers should restrict their survey questions to
topics that people have some knowledge and familiarity.

A third assumption is that the respondent actually knows the answer to the question. For instance, I would
make a rather poor survey subject when it comes to answering some questions about our household. My
spouse is a nurse with considerable medical training. When it comes to knowledge about health practices
and procedures relating to my children, I don’t have a clue. I am content to let my spouse make decisions
about most of the important medical decisions facing the family. As a result, I am not a good respondent
when it comes to answering questions about the health care that my family receives. Survey questions
sometimes also assume that people keep track of mundane information. For instance, I don’t correctly
recall how many glasses of water I drank yesterday. I don’t ordinarily keep track of such things. I have
some idea of how much water I drink a day, but it may not correspond very well with the actual reality. It
might be better to have me record how many glasses of water I consume in a day than it is to ask me a
survey question. Generally, researchers should limit their surveys to matters that respondents’ know and
care about.

Researchers must also assume that the person can accurately recall the requested information. A person’s
memory is best for events that were memorable and/or relatively recent. In addition, a person may
remember an event, but forget exactly when the event occurred and so misreport details concerning the
timing of the event. Generally, it is best for researchers to limit their use of survey questions that make
detailed inquiries about past events, unless those events are quite memorable.

Finally, a researcher must assume that respondents are answering the survey questions honestly and
accurately. This is indeed a concern, but it is probably not the most problematic assumption that one must
make about survey data. The tendency to answer survey questions dishonestly increases when the subject
is highly sensitive. For instance, surveys show a sizable discrepancy between heterosexual males and
heterosexual females in terms of the number of sex partners that they report they have had. Since there is
relative balance between the number of males and females in the overall population, one would expect the
mean for males and females to be equal. Men may over report the number of sexual partners, or women
underreport the number of sexual partners, or perhaps both tendencies occur. On many subjects, however,
respondents have little to gain by giving dishonest answers on a survey.

Researchers follow a number of procedures to improve the reliability and validity of their surveys. The
most common practice is to standardize the interview procedures. For surveys, everyone should hear
exactly the same questions, receive the same instructions, and have the same response format. This goes
down to the level of ensuring that different interviewers conducting the survey have a standard nonverbal
demeanor in their interactions with respondents. As much as possible, the researcher wants to minimize
differences in interviewer behavior that may lead to differences in how respondents answer the questions.

Researchers should word survey questions so that the question and response options have the same
meaning for all respondents. In reality, this can only be approximated. In addition, this condition becomes
more difficult to achieve when one surveys a diverse population. At a minimum, researchers need to avoid
words that have highly idiosyncratic connotative meanings for different members of the population (e.g.,
gay, liberal). In addition, researchers should provide clear definitions for concepts as needed.

Let’s consider a deceptively simple example. Suppose someone asked you: “Did you eat breakfast today?”
Well, what exactly constitutes breakfast? Are a Coke and a donut breakfast? Is it breakfast if it is eaten
after 11:00 AM? “Breakfast” has different connotations for people. How might the researcher deal with
this problem of common meaning? He could offer a definition of what he means by breakfast. A better
option would be to ask a broader factual question: “What did you eat yesterday before 10:00?” This takes
the burden of definition off the subject and enables the researcher to make precise codings.

For the most part, survey researchers prefer to use standardized closed-ended questions in surveys. Open-
ended questions are much more difficult to standardize. For instance, educated people tend to give longer
and more elaborated responses than less educated people. Standardized questions eliminate this problem.
Closed-ended questions are easier to code and tabulate. Closed-ended formats significantly reduce the cost
of data coding and analysis. The primary limitation of closed-ended questions is that they often provide
little insight as to why a person has a particular opinion.

Survey researchers often distinguish between factual questions and subjective questions. An answer to a
factual question can be checked against other sources. For instance, if I ask you to report the number of
days that you were absent from work last year, it may be possible to check the accuracy of your answers
against company archives. In contrast, subjective questions inquire about the hidden phenomena of
attitudes, beliefs and values. These can only be validated indirectly (e.g., observe to see if a person’s
expressed opinions and behaviors agree). As you might expect, there are different requirements for
improving the validity of each type of question.

To improve the accuracy of factual questions, researchers should question the person in a household who is
most informed on a topic. Researchers should also limit their questions to recent or memorable events.
Likewise, researchers have developed a number of procedures to minimize social desirability responses in
surveys such as using self-administered questionnaires where needed to assure the respondent's anonymity.

For subjective questions, the question of validity is more complex. Questions into subjective states can
only be answered in relative terms, not in absolute terms. Researchers have found that it is best to avoid
complex statements. It is also best to include an adequate number of responses that approximate the way
people actually think. For instance, people do have attitudes in terms of shades of gray, so a five-item scale
is better than a two-response scale. However, it is unhelpful to include more than seven response
categories because people probably don’t usually make more distinctions than that. Most experts advise
researchers to consider the answers that people provide to subjective questions with caution. Questions that
have slightly different wording may not be as similar as the reader assumes they are. Subtle differences in
question wording can have dramatic effects on how people answer “subjective” survey questions.

189
Examples of Problematic Questions
The following problematic questions came from student projects several years ago. It was from a project to
assess how students felt about their advising experiences at the University of Louisville. Some of the
problems are obvious, but experience shows that it is much easier to identify the defects in someone else’s
questions than it is to analyze your own work.

Problem: Vague terminology

“On a scale of 1 to 10 (10 being the highest) please rate your overall advising experience at U of L?”
Comment: We don’t know the dimension of measurement and we don’t know if highest (10) indicates
satisfaction or dissatisfaction.

Question Problem: Ambiguous Terms

“Is your advisor accessible?”___ Yes ____ No


Comment: Accessible could mean several different things (available at times convenient to the student vs.
not aloof in communication). A continuum of responses should be used too (i.e., something between all and
nothing).

Question Problem: Two Questions or One?

“Are you a transfer student, and if so, was advising better at your former school?”
Comment: You need at least two questions here. This is an example of a double-barreled question.

Problem Question: Overlapping Ranges

“How many hours do you work per week?” ___ 5-10 ___ 10-20 ___ 20-30 ___ 30+
Comment: First category should include 1 to 4. Response categories should also be mutually exclusive (i.e.,
categories should not overlap. See 10 hours, 20 hours and 30 hours).

Question Problem: Large Gaps in Ordinal Scales

“How often do you see your advisor?”


___ Never ___ Once a semester ___ Twice or more a semester
Comment: There is too large a gap between never and once a semester. Intermediate categories are needed
(e.g., once a year) to avoid overestimating the frequency with which students see their advisors.
.
Problem Question: Question or Statement?

“Would you say that the advisors care about you as an individual?”
____ Strongly Agree ____Agree ____ Uncertain ____Disagree ____Strongly Disagree
Comment: You can’t agree with a question. Watch to see if answer responses are compatible with item
form.

Question Problem: Request for Vacuous Information

“Do you feel your advisor could be more helpful somehow?”


Comment: This is close to a leading question, suggesting that advisors are less helpful than they should be.
The bigger problem is that if they say yes, we don’t have any idea of how they could be more helpful.
Exercise: Critiquing a Telephone Survey
Assume that the following is a telephone survey designed to investigate the relationship between a person's'
sources of news and his perceptions of and attitudes about crime. Identify what is wrong with each
question and rewrite the question to get rid of the defects. You may use several questions to replace a
defective one if you want, or you can use a better measurement methodology.

1. How many felonious crimes, if any, have you been a victim of over the last three years?
____ Not at all
____ Once
____ Several times
____ Many times
Critique/Rewrite

2. Would you say that is very frequently, frequently, occasionally, or very seldom that you watch television
newscasts?
Critique/Rewrite

3. Please rank the sections of the newspaper in terms of how much attention you give to each section.
Give a one to the section that you devote the most attention to etc.
____ Editorial page
____ sports section
____ Advice columns
____ Classified ad
____ National and international news
____ Obituaries
____ Comic section
____ Local news
Critique/Rewrite

4. How many stories do you read about crime in the average week in the newspaper or in news magazines?
Critique/Rewrite

5. What proportion of your total television viewing time do you devote to watching news programming?
Critique/Rewrite

191
6. How do you react to the following statement? "Newspapers and television sensationalize crime when
they report the news."
___ Strongly Agree
___ Agree
___ Disagree
___ Strongly Disagree
Critique/Rewrite

7. How do you react to the following statement? "District attorneys shouldn't plea bargain with hardened
criminals?"
____ Agree
_____Disagree
Critique/Rewrite

8. Crime is getting worse each year. Don't you think?


____ Agree
____ Disagree
Critique/Rewrite

9. What is your attitude toward President Bush’s proposal to use military tribunals to prosecute terrorists?
_____Favorable
____ Unfavorable
Critique/Rewrite

10. Do you feel that the exclusionary rule should be abolished?


____ Yes____ No
Critique/Rewrite
Exercise: Dealing With Social Desirability Bias
When I was in graduate school, one of my professors assigned me to analyze a data set that dealt with how
much people valued various job attributes. A shortened version of the question format appears below.

Please rate how important you consider the following aspects of your work environment to be for you
when it comes to being satisfied with your job. Please use the following rating scale below.
5=very important, 4=important, 3=somewhat important, 2=of little importance, 1=of no importance
___ Salary
___ Fringe-benefits
___ Opportunities for advancement
___ Opportunities to learn and develop new skills
___ Recognition for a good work
___ Good relationships with co-workers
___ Good relationship with supervisor
___ Interesting work

When I did the first data run, I discovered a significant problem. Most of the respondents rated nearly
every attribute as very important. On a five-point scale, most of the means were between 4.5 and 4.8. Very
few people rated any of the attributes as less than a 4. Because we wanted to understand individual
differences in terms of how important they thought these attributes were, the results were quite
uninformative. There was little variation to analyze.

What was the problem with the question format? Well, the structure of the question format sets up a social
desirability bias. In fact, these are probably all things that most any person would like to have in a job.
However, in the real world, we often have to make trade-offs between some of these attributes (e.g., what
do you do when you have to decide between good relationships with co-workers and $5,000 in extra
salary). The real question comes down to which trade-offs we will make when we have to.

Your task is to devise a measurement scheme that you think can overcome this social desirability bias and
really determine how important various job and work climate attributes are to workers. You should explain
how your question format solves the social desirability problem in this example.

193
Survey Example: Parent Communication Survey
I'd like to start by having you rate how satisfied you are with communication from the school.

1. Overall, how informed do you feel about what is going on at the school? Would you say that you are
very informed, somewhat informed, or not very informed?
____ Very informed
____ Somewhat informed
____ Not very informed

1. Overall, how informed do you feel about what is going on in your child's classroom? Would you
say that you are very informed, somewhat informed, or not very informed?
____ Very informed
____ Somewhat informed
____ Not very informed

2. Overall, how satisfied are you with communication you receive from the school? Would you say that
you are very satisfied, satisfied, have no opinion, dissatisfied, or very dissatisfied?
_____ Very Satisfied
_____ Satisfied
_____ No Opinion
_____ Dissatisfied
_____ Very Dissatisfied

For the following statements, please tell me whether you strongly disagree, disagree, have no opinion,
agree, or strongly agree with the statement.

4. School personnel consider my opinion when it comes to decisions concerning my child.


Strongly Agree Agree No Opinion Disagree Strongly Disagree

5. The school actively seeks ideas from parents concerning school wide polices and programs.
Strongly Agree Agree No Opinion Disagree Strongly Disagree

6. School personnel consider parents’ opinions when making decisions about school wide policies and
programs.
Strongly Agree Agree No Opinion Disagree Strongly Disagree

7. The next questions ask how you prefer to receive information from the school. Tell me whether
each method of communication is an excellent method, a good method, or not a good method to
keep you informed.

Outdoor signs excellent method good method not a good method


Information board in school lobby excellent method good method not a good method
Written notes from child's teacher excellent method good method not a good method
Telephone calls from child's teacher excellent method good method not a good method
Parent phone tree excellent method good method not a good method
Mailings to families excellent method good method not a good method
Automated telephone calling system excellent method good method not a good method
Written notes from school principal excellent method good method not a good method
Parent information meetings excellent method good method not a good method
PTA newsletter excellent method good method not a good method
E-mail message excellent method good method not a good method
8. When you visit the school, how welcome do you feel? Do you feel very welcome, welcome, somewhat
welcome or not very welcome?
_____ Very Welcome
_____ Welcome
_____ Somewhat welcome
_____ Not very welcome

9. How helpful are school personnel if you have a question or problem? Are school personnel very helpful,
helpful, somewhat helpful, or not very helpful?
_____ Very helpful
_____ Helpful
_____ Somewhat helpful
_____ Not very helpful

10. How important are the following information sources for keeping you informed about the school? Is the
source a major information source, a minor source, or not a source you rely on?
Your child's teachers major source minor source Not a source I rely on
Signs/displays at school major source minor source Not a source I rely on
PTA newsletter major source minor source Not a source I rely on
School principal major source minor source Not a source I rely on
Other parents major source minor source Not a source I rely on
Parent information meetings major source minor source Not a source I rely on
School programming events major source minor source Not a source I rely on
Your child major source minor source Not a source I rely on

11. How interested would you be in receiving information on the following topics at PTA meetings or
parent informational meetings? Would you be interested, somewhat interested, or not interested in
receiving information on the topic?

Grouping of Students Interested Somewhat interested Not interested


Discipline Policy Interested Somewhat interested Not interested
Student Assignment Plan Interested Somewhat interested Not interested
Transition from Primary to Intermediate Interested Somewhat interested Not interested
KERA Testing Interested Somewhat interested Not interested
Transition to Middle School Interested Somewhat interested Not interested
School Decision-Making Procedures Interested Somewhat interested Not interested
School Committee Tasks Interested Somewhat interested Not interested

12. What other topics, if any, would you like to receive information about?
______________________________________________________________________________________
______________________________________________________________________________________

13. Did you have a child at Walton Elementary last year? Yes No
If no, skip to question 18.

14. Think about your contact with the school in the last year. You may answer don't recall if you can't
remember. In the past year have you:

met your child's teachers? Yes No Don't recall


visited a class at the school during the day? Yes No Don't recall
attended a parent-teacher conference? Yes No Don't recall
attended a school open- house? Yes No Don't recall
attended any other after school function? Yes No Don't recall
volunteered to help with a school activity? Yes No Don't recall
worked on a school committee? Yes No Don't recall

195
15. Were you asked about your interest in volunteering at the school?
_____ Yes
_____ No
_____ Don't recall

16. Have you received a PTA newsletter this fall?


____ Yes
____ No
____ Don't recall

17. How much of the PTA newsletter do you usually read? Please estimate the percentage.
________________________________

18 How much emphasis should the PTA newsletter give to the following topics? Should the PTA
newsletter give the topic major emphasis, moderate emphasis, or minor emphasis?

Announcements of upcoming school events Major Moderate Minor


News from teams Major Moderate Minor
Reports from parent representatives on school Major Moderate Minor
Committees
Updates from the principal Major Moderate Minor
Reports on special school projects Major Moderate Minor
Explanations of school procedures & policies Major Moderate Minor
Articles on educational topics Major Moderate Minor

19. Are there any other topics you would like to see covered in the newsletter? If so, what are they?
________________________________________________________________________________

20. Do you have Internet access from your home?


______ Yes
______ No If respondent answers no, skip to question 23

21. How useful would the following types of information be to you on a school web page? Would the
information would be useful, somewhat useful, or not very useful to you?

Announcements of upcoming school events Useful Somewhat useful Not very


useful
Announcements of committee meetings Useful Somewhat useful Not very
useful
Contact numbers for PTA officers Useful Somewhat useful Not very
useful
Contacts for parent reps on school committees Useful Somewhat useful Not very
useful
Volunteer opportunities at the school Useful Somewhat useful Not very
useful
School policies Useful Somewhat useful Not very
useful
Committee minutes Useful Somewhat useful Not very
useful
Description of school program Useful Somewhat useful Not very
useful
Listing/Description of school personnel Useful Somewhat useful Not very
useful
Web board for parent questions/feedback Useful Somewhat useful Not very
useful
22. What else, if anything, would you like to see on a school web page?
______________________________________________________________________________________

23. What one thing should the school do to improve communication with parents?
______________________________________________________________________________________
______________________________________________________________________________________
That was my final question. Thank you very much for your time and cooperation.
Parent Survey Interviewing Tips Sheet
I recognize that most of you are amateurs when it comes to interviewing, although many of you may have
done some "telephone" work sometime in your life. In this case, we are working for real clients who are
counting on us to provide them with reliable and valid information. The most important thing for you to
keep in mind is that we want to be able to generate answers that will be comparable across respondents.
This means that we do not want you the interviewer to artificially influence or bias the results in one
direction or the other. Ideally, you will serve as a neutral medium through which valid information is
collected. We don't want the answers to be slanted or influenced by your personality, style or creativity.
Following are some requirements and some suggestions that will enable you to obtain the respondent's
cooperation and ensure that you have valid data.

1. Keep accurate and timely records on the call sheet. This is important because we need accurate
records to manage the data collection task and avoid mistakes. We want to finish as many surveys
as we can, but we also want to avoid mistakes. We want to know where we stand as the survey
proceeds. The quality of our effort will depend on large part on simply taking the time and care to
keep our records up-to-date and accurate.
2. Use the selection and survey procedures as designed. We need to ensure procedure reliability
across all of the people surveyed. We want each person to hear the same questions. Please do not
"improvise" with the script of the questionnaire. Whenever you improvise by changing the
question order or wording, you run the risk of undermining the validity of the data collection
effort. When it comes to structured interviewing, creativity is almost always a bad thing; it
introduces all sorts of idiosyncratic elements into measurement. Stick to the script.
3. Your phone voice should be pleasant and engaging. Your voice should convey interest in wanting
to find out the interviewee's point of view, but you never want to imply that one answer is more
correct or more proper than another is. You not only need to stick to the script, but you need to
follow a standard nonverbal script as well. Indicate interest but never hint at approval or
disapproval of the person's answer or opinion. Your job as the interviewer is to be an intelligent
automaton. Don't let your personality bias answers by being either too enthusiastic or too
detached. Be interested and businesslike.
4. The quality of our effort will depend upon how many complete interviews we generate. For this
reason we want to have as few refusals as possible. Some people will be reluctant to participate in
the survey because we got them at a bad time. When confronted with a reluctant respondent,
please ask him or her when would be a better time to call back and complete the interview. If the
person has a question, please answer that question to the best of your ability. You should be able
to answer most of the questions from the fallback statements sheet provided for you in this packet.
Please memorize this sheet and have it visually available when you start an interview.
5. If you encounter a direct refusal, please respect the refusal. Please be sure that you record it as
a refusal on your code disposition sheet. The last thing we want to do is to mistakenly call back a
household that has already refused to participate. This will only this create bad feelings toward the
school on the part of the parents.

6. Try to leave a good last impression: sincerely thank the respondent at the end of the interview.
Parent Survey Fallback Sheet

Who are you doing the survey for?


We are doing the survey for a communication committee at Walton Elementary. The committee is

197
composed of the principal, three teachers and four parents. The committee wants to use the results of the
survey to develop a comprehensive communication plan for the school.

How did you get my number?


We are using the 2003-2004 school PTA directory.

Who will find out my answers?


We will report the percentages of parents who answered the questions in a particular way. Individual
responses will not be identifiable. Your answers will remain confidential. We hope that you will answer the
questions freely and candidly.

How long will the survey take?


For most people the survey will take from six to 9 minutes. Interviews vary in length because some of the
questions do not apply to many people.

What kinds of questions are you asking?


We are not asking any personal questions about things such as your income or education. We are asking
questions about the best ways the school can communicate with you and what sort of information you
would like from the school.

Who are you? Why are you doing the project?


I am doing this as part of a course project in Communication Research Methods. The class has several
projects that give students hands-on experience with different research methods. The project is supervised
by Dr. Greg Leichty. He is a professor in the Communication department at the University of Louisville.
You can reach him or leave a message at 852-8175 if you have questions about the survey project
Parent Survey Call Sheet
____________________________
Target Telephone Number
Parent(s) (Circle One that Does Interview) Children at ***** Elementary
__________________________________ ____________________________________________
__________________________________ ____________________________________________
____________________________________________
____________________________________________
Survey Introduction:
Hello, My name is ___________________. Could I speak with _________________________
Or ______________________________?

I am a student in a communication class at the University of Louisville. We are surveying parents of


Walton Elementary students. The Communication Committee at Walton wants to get your ideas and
suggestions to develop a communication plan for the school. You should have received a letter from
Principal ***** ****** explaining the survey. Your answers will remain confidential. The survey will
last about seven or eight minutes. May I begin?

Contacts Time/Date/Disposition/Comments/Interviewer Initials

___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________

Telephone Call Disposition Codes: Related Instructions

Be sure to let the telephone ring at least five times. Use the fallback sheet to answer queries. If the person
is busy, offer to call back at a more convenient time. Record the time when the person will have time to
answer the survey. Be sure to thank the respondent upon completing the interview as well. After each call,
use the following disposition codes to indicate the outcome of the call.

1=No answer after 5 rings.


2=Busy. Be polite.
3=Answering Machine.
4=Number disconnected or no longer in service.
5=Target person not available sheet.
6=Interview time rescheduled.
7=Refusal by Target Respondent.
8=Partial interview.
9=Completed Interview.

199
Concept Glossary
academic research: research done by professors and students that add to the public knowledge of an
academic discipline.

account: an explanation of why someone did something. Excuses, apologies and justifications are
examples of accounts. An account is one of the symbolic elements analyzed in depth interview transcripts.

Analytic induction: A method of textual analysis in which a researcher derives categories inductively from
his texts. The category system that is derived must account for all of the cases through a systematic
analysis of negative cases that do not support the working hypothesis.

appeal to authority: an argument that a claim should be accepted because a trustworthy source endorses it.

appeal to empirical inquiry: an argument that a claim should be accepted because it is consistent with
empirical research findings. This form of appeal provides a means for adding knowledge to a cultural
system. It is particular useful for adjudicating disputes about factual questions when authorities, intuitions
or traditions conflict with one another.

appeal to intuition: an argument that a claim should be accepted because it is based on principles that are
clear and readily apparent or are “common sense”.

appeal to personal experience: an argument that a claim should be accepted based on the communicator’s
personal experience as it relates to the claim.

appeal to tradition: an argument that a claim should be accepted because it is a long-established


convention.

applied research: research that pursues a practical end or application. Applied research pursues a short-
term goal as opposed to developing new theory.

artifact: an alternate explanation for research results other than the independent variable. The trials and
errors of research have revealed the following systematic artifacts that researchers must pay attention to in
research design and data interpretation. See also the individual artifacts of history, maturation, regression
to the mean, selection, experimenter bias, demand characteristics, testing sensitization, and procedure
reliability and validity.

attribute: a characteristic of a person, object or event that can be measured (e.g., personal income).

basic research: research that inquires about a phenomenon without direct concerns for the practical value
of the findings. Basic research is sometimes referred to as a search for knowledge for its own sake. (See
also applied research).

bibliography: A list of sources that have been used in a research report. The bibliography usually appears
in footnotes or in a reference list at the end of the article.

blind procedure: a study is where the persons collecting or coding study data do not know the hypotheses
of a study or do not know to which experimental group a person belongs. Blind procedures help control for
the artifact of experimenter bias. A double blind study is one in which neither the subject nor the data
collection and processing agents are aware of which group the subject is in.
boundary conditions: specifying what part of the empirical world a particular theory or research
finding applies. Specifying boundary conditions is the particular concern of generalizing stage of
research.

candidate-answer question: a problematic question in a depth interview protocol. A candidate-


answer question makes implicit assumptions that is questionable. Care should be taken to reduce the
number of problematic assumptions that are posed in a question or line of questions.

causal modeling: The use of sophisticated statistical procedures to test systems models of causation
that include feedback loops and other forms of nonlinear causation between variables.

cause: to infer causation the cause must originates prior to the effect; there must also be a consistent
correlation between two factors and alternative explanations for the effect have been excluded. See
also sufficient causation, necessary causation, facilitative causation, and systems causation.

central tendency: a single value chosen to represent a typical score in a distribution (e.g., mode, median,
mean).

Chi-square: a nonparametric statistic that tests whether the distribution of cases of a variable is different
from what one would expect due to chance alone. It is used with variables that are measured at the nominal
or ordinal level.

closed-ended question: a question in which the respondent's answer is restricted to a specific response set
developed by the researcher.

Cohen’s Kappa: a measure of intercoder reliability that compensates for the rate of agreement expected by
chance alone. It adjusts for the categories used most frequently in a coding system.

Chronbach’s alpha: a measure of the internal reliability or consistency of a scale. The average correlation
between all possible combinations of split halves of a scale is calculated.

Cluster sampling: see multiple stage sampling.

coefficient of determination: the percentage of prediction error reduced in a target variable using one or
more predictor variables in linear regression.

coefficient of nondetermination: the percentage of prediction error in the criterion variable that is left
unexplained by a linear regression equation.

Communication criticism: a form of case-study analysis of a public text such as a speech in which the
researcher seeks to understand and/or evaluate the text according standards of what constitutes effective
communication in the situation in question.

concept: a label or category that people use to make sense of their environment in qualitative textual
analysis. Concepts are underlying ideas of similarity and difference by which events, people and things are
classified. The meaning of a concept is usually defined by the pair contrast in the context of usage (e.g. tall
vs. short). A concept is a noun phrase (noun and adjectives). Researchers are particularly interested in
concepts that have meanings that depart in some way from the shared denotative code. A concept may
have a unique connotative meaning for an individual, or the concept may be a part of a specialized
denotative code for a group of people. A textual analysis often explicates meaning of a concept for a person
or a group by closely examining how the concept is used in conjunction with other symbols.

conceptual definition: a description of a construct in words, usually in terms of how it compares or relates
to other constructs.

201
complete observer: an observational role in which the researcher is uninvolved in any meaningful way in
the observational scene. Maximum attention can be devoted to observation.

complete participant: an observational role in which the researcher is so involved in the scene of
observation, that an account of the events can only be constructed after the fact.

concurrent validity generates equivalent results to another measure of the same construct. Concurrent
validity checks usually attempt to show that a new and more economical measure gives similar results to a
more complex and more expensive measurement instrument.

condensed account entry: a field note entry that is written in the scene of participant observation, or is
written directly after leaving the scene. The condensed account entry notes important facts about the scene
as well as recording important facts and details that will be used to write up the expanded account. (See
also expanded account entry).

confidence interval: a range in which a population parameter is estimated to fall. The confidence interval
takes into account random (i.e., sampling) error, but it does not account for either measurement error or
systematic error. The confidence interval usually has a confidence level associated with it-say 95%. This
means that if the sampling and estimation procedures were replicated, that on average 95% of the interval
estimates would include the actual population parameter.

confidence level: the probability that one risks of making the wrong decision when one rejects the null
hypothesis (i.e., the null hypothesis is actually true). This is sometimes referred to as the alpha level and/or
type I error. This error standard is selected by the researcher prior to the study.

confirmability: a standard for evaluating the quality of a qualitative research study. A study is confirmable
to the extent that the researcher makes the interpretive process that he followed transparent. In other words,
the researcher shows how he arrived at his interpretations from the empirical data that was collected.

confirmation bias: the subconscious tendency that people have to ignore for evidence that is inconsistent
with their preconceived ideas or hypothesis. In other words, we tend to only look for evidence that is
consistent with our hypothesis. For instance, confirmation bias is one reason that group stereotypes are
often quite resilient. Objectivist researchers use procedures such as testing the null hypothesis to force
researchers to examine all of the evidence that is relevant to a hypothesis or research question.

confirmatory research: stage of research that is particularly concerned with testing causal theories.
Confirmatory research typically uses research hypotheses. This is the intermediate research stage in the
evolution of research in a particular content area.

confounding variable: a variable that is related to both the independent and dependent variables. The
presence of a confounding variable makes it difficult to determine the causal sequence. Artifacts are
common confounding variables that researchers have discovered through a tedious process of trial and
error. Good research design controls for confounding variables.

connotative meaning: the unique shades of meaning that an object or concept develops for an individual
that is not shared with other members of a person's social group or society. Connotative meaning also
includes unique affective associations that a person makes with a concept.

construct: a concept developed for theoretical or measurement purposes. A construct can be measured
many different ways, so a construct is often contrasted with a variable that is linked to a particular
operational definition of a construct.

construct validity: sometimes referred to as theoretical validity. An operational definition has construct
validity to the extent that it relates to measurements of other constructs in ways predicted by one’s theory.
The measure relates to constructs the theory predicts it should be related to, and is unrelated to constructs
that the theory predicts it should be unrelated to.

content analysis: analyzing the properties of messages (visual or textual). This usually involves coding for
the presence or absence of qualities, or for coding given messages into categories.

content validity: the extent to which the operational definition is judged to cover the range of meanings or
dimensions included in a concept. Content validity is established when content experts agree that a
measure accurately and adequately indexes a particular domain.

Contingency table: a table that represents the relationships between variables in the form of percentage
distributions. A contingency table often accompanies a chi-square analysis or a relationship between two
or more nominal or interval level variables.

control group: serves as a comparison group with the experimental group or groups. This is sometimes
referred to as the no treatment condition (or the placebo condition in medical studies). The scores of the
control group on the dependent variable are compared with the experimental treatment groups to assess the
effects of the independent variable. The control group helps rule out time dependent artifacts such as
history, maturation, and regression to the mean.

convenience sample: a nonprobability sample where people are included in the sample based on their
availability (such as relying upon student volunteers to fill out a survey). A convenience sample is
sometimes called a haphazard sample. A convenience sample is of little use when one is trying to estimate
population parameters, but convenience samples are often used in studies investigating relationships
between variables.

correlation: covariation between two or more variables. There are several different kinds of correlation
patterns. A positive linear relationship is where two variables increase together at a constant rate. A
negative linear relationship: indicates that one variable decreases at a constant rate as the other variable
increases (i.e., an inverse relationship). A nonlinear relationship encompasses any systematic relationship
between variables that is nonlinear, that must be described by some other mathematical function than a
straight line (e.g., a curvilinear relationship).

correlation coefficient: a goodness of fit test that describes how well a particular estimate of a relationship
between two variables actually fits the data. While regression provides estimates of the value of the
predicted variable from a predictor variable, a correlation coefficient roughly describes how well the
predicted values fit with the actual values. Correlation coefficients come in various forms, depending upon
the levels of measurement for both of the variables. The Pearson Correlation coefficient is one of the most
widely used correlation coefficients. It is a goodness of fit test for linear models that are fit to instances
when the related variables approximate interval and ratio level data.

Cramer's V: a correlation coefficient that describes the strength of a relationship between two variables
measured at the nominal level. This correlation coefficient is derived from the chi-square and ranges from
0 which indicates no relationship between two variables and 1 which indicates a maximum possible
departure from expected probabilities.

criterion validity: the extent to which a measure predicts some other variable of interest. Criterion validity
is sometimes referred to as “practical validity.” The two major types of criterion validity are concurrent
validity and predictive validity

criterion variable: a variable that researchers wish to predict from the values of other known variables
using statistical procedures such as regression or multiple regression.

203
Critical case sampling a type of nonprobability sampling that is often used in qualitative research studies.
The researcher seeks out and analyzes instances that embody a given phenomenon in a dramatic way. For
instance, a historian may search for and analyze cases which seemed to be turning points in the
development or trajectory of a group or an institution.

critical research paradigm: investigates the production and reproduction of systems of domination and
power via communication. Critical research seeks to change communication practices and thereby achieve
greater social equality and social justice.

critical values table: a table contains information about the sampling distributions of test statistics for
different sample sizes. The critical values table gives the value that a test statistic must equal or exceed in
order to reject the null hypothesis for a given level of statistical significance and degrees of freedom.

culture: knowledge and practices of a social group social group that are taught to new group members.

data analysis phase: Phase of research where research data that has previously been collected is analyzed
and conclusions with regard to the research questions or hypotheses are drawn. The data analysis phase
will be relatively longer when qualitative research methods are employed compared to quantitative research
methods.

deductive approach: an approach to construct development in which the empirical indicators of a construct
are drawn from preexisting theoretical or constitutive definitions of a construct. See also inductive
approach.

demand characteristics: an artifact that can threaten cause/effect conclusions in a study. Mere observation
of the subject or the subject’s knowledge of the study hypothesis creates the observed effect. The
Hawthorne Effect is a famous example of a demand characteristic. Behavior is sometimes affected merely
by having some observe it. A second well-known demand characteristic is the Placebo effect in medicine.
The mere fact that a person gets a fake pill or placebo, often leads to an improvement the patient’s medical
condition. A third example of a demand characteristic is the Social Desirability Bias, where one behavior
or answer is considered to be socially prestigious or correct. The respondent answers or behaves in a way
that departs from his usual behavior. There is no single method for controlling for demand characteristics.
Ordinarily researchers attempt to minimize participants' knowledge of study hypotheses. Respondents
should also be assured that their answers will be anonymous or confidential. Blind procedures that conceal
the hypothesis of the study are also often used to control for demand characteristics in experiments.

denotative meaning: the core meaning of a concept or object that is widely shared with one's social group
or society. Denotative meanings can be codified and published in dictionaries.

dependent variable: the variable that the researcher wants to explain the variation of (i.e., how does the
independent variable affect the dependent variable). Sometimes referred to as the effect variable.

dependent variable specific results: study results that may not generalize to conceptually similar dependent
variables or to other operational definitions of the dependent variable. This threat to external validity
should be assessed with study replications that utilize somewhat different operational definitions of the
dependent variable.

depth interview: interview in which an interviewee talks in depth about his experiences and the meaning
those experiences have for him.

descriptive research question: a research question that inquires about how a variable is distributed in a
particular sample or population (e.g., What percentage of households in the Metropolitan Louisville has
broadband access to the Internet?).
descriptive statistics: statistics utilized to communicate information about the distribution of a variable or a
relationship between variables.

discussion section: section of a research report that assesses the significance of the research findings. This
section typically includes a discussion of study limitations as well as suggestions for future research.

dispersion: measures of the degree of variation of an attribute in a particular group. Common measures of
dispersion include the range, the variance, and the standard deviation.

double-barreled question: a question that poses two questions at the same time. Double barreled questions
should be avoided in depth-interview protocols as well as in standardized surveys.

double blind study: a study in which neither the subjects nor the people collecting study data are unaware
of the hypotheses of the study. A double blind study is an effective protection against the artifacts of
demand characteristics and experimenter bias.

double consciousness: an essential skill for participant observer researchers. Double consciousness refers
to a person’s skill to participate in a role, but still insightfully observe and analyze the unfolding events in a
scene.

ecological fallacy: drawing conclusions about individuals based solely on aggregate data. For instance, if
one uses census data and finds that census districts with lower incomes have higher crime rates and one
concludes that individuals with lower incomes commit more crime.

ecological validity: the extent to which the situation in which the research is conducted is similar to the real
world settings the researcher would like to generalize to. Threats to ecological validity can be assessed
through replication studies in more realistic settings such as field experiments. Ecological validity becomes
of particular concern in the generalizing stage of research.

ethnography: the study of a culture of a group of people. The ethnographer usually intends to
systematically describe that culture to another audience unfamiliar with the culture. Ethnographers typically
spend a considerable amount of time “in the field.” Ethnography typically uses a variety of research
techniques, especially qualitative research techniques such as participant observation and depth-interviews.

Evaluation research: a type of applied researcher where one assesses the effectiveness of a program or
intervention according to predetermined criteria. For instance, a researcher might seek to determine if a
worker training program actually achieves the goals that the program was designed to achieve such as
placing trainees in the desired types of jobs.

expanded account entry: a field note entry that provides an extended descriptive account of what took
place in the scene of observation. (See also condensed account entry, personal journal entry and
provisional account entry).

experiment: a research study that investigates the effect of an independent variable on a dependent
variable. An independent variable is typically manipulated, and its effect on the dependent variable is
observed. A well designed experiment controls for artifacts.

experimenter bias: researcher expectations or hypotheses sometimes bias measurement and create effects
for the independent variable in a study. For instance, researchers often have a confirmation bias where they
pay more attention to data that are consistent with their hypotheses than to data that are inconsistent with
the hypothesis. Researchers often use blind procedures to protect against this artifact.

exploratory research: sometimes referred to as the discovery phase of research. In this stage researchers

205
typically seek to identify patterns or correlations among events without specifying causal mechanisms.
Exploratory research typically uses research questions rather than hypotheses. Qualitative research
methods are utilized more heavily in this research phase. (See also confirmatory research and generalizing
research).

external validity: a study has external validity to the extent that its findings can be generalized to other
groups of people, measurement conditions, times and situations. Unusual features of a study can limit the
degree to which one can make inferences beyond the present study. An analysis of external validity
identifies potential moderator variables that need to be checked in subsequent research studies.

Face validity: a judgment of content validity where one asks whether a measure appears to tap the full
range of a variable as conceptualized by the researcher.

faciliative causation: asserts that the presence of some prior condition X facilitates the subsequent
occurrence of effect Y (i.e., If X, then an increased probability of Y). This is the most common type of
cause in communication research. It concedes that there are multiple causes for an outcome (e.g.,
performance on an exam).

factorial experiment: an experiment with two or more independent variables or factors. Factorial designs
allow the researcher to test for interactions between independent variables in terms of how they affect the
dependent variable

factual question: In survey research, a question about factual matters. The accuracy of an answer to a
factual question can be validated against other sources of information. See also subjective question.

.false consciousness: the ideas that nondominant groups have about societal arrangements that have been
distorted by the ideology of dominant social groups. According to researchers in the critical paradigm, this
false consciousness is cultivated by the ideologies of dominant groups and serves to conceal the real
interests of subordinate societal groups. Critical researchers seek to pull back the veil of false
consciousness, and give subordinate groups a more objective and realistic understanding of their situation.
According to critical researchers, unmasking false consciousness is a first step toward making societal
arrangements more equitable.

focus group: a guided group discussion on a topic of interest to a client. Transcripts of the group
discussions are content analyzed. Focus groups are particularly popular in marketing research.

follow-up question: a question that explores new information provided by the interviewee in a depth-
interview.

forced-choice item: a question format in which the respondent is asked to choose between two statements
that have been paired to represent different sets of alternatives. For instance, measures of a person’s
conflict management style are usually asked to indicate which of two statements best characterizes how
they prefer to handle a conflict situation.

frequency distribution: tally of the number of times that particular values occur in a data set.

generalizing research: stage of research that is primarily concerned with testing whether existing theories
can be extended to new situations and contexts. Generalizing research is particularly concerned with
identifying the boundary conditions for theories. This involves a systematic search for moderator variables.
The generalizing research stage often includes research data that is collected in more naturalistic situations
than laboratory research (i.e., field experiments). External validity of findings is a primary concern of
research in the generalizing stage.

history: an artifact that threatens conclusions about causation in longitudinal research (i.e., involves
comparisons between a pretest and a posttest). History refers to any unplanned or external event in the
course of a longitudinal study that may affect the dependent variable in addition to the independent
variable. Adding a control group controls for history.

hypothesis: a prediction about the relationship between at least two variables that is subjected to a test.

hypothesis testing: statistical procedures that are utilized to determine whether a null hypothesis should be
retained or rejected.

hypothetical construct: a concept that researchers create in order to account for observable events. A
hypothetical construct is inferred from multiple empirical indicators and is sometimes called a latent
variable. For instance, a personality trait is inferred when a person detects an underlying consistency in a
set of behaviors that a person typically engages in.

icon: a specific person, event or thing that serves as a concrete pictorial representation of an abstract
concept or theme. Even a time-period such as the 1960's can come to be an icon (i.e., freedom or
debauchery depending upon your political loyalties). The icon is regarded as a prototype or a
personification of the category. Icons are usually charged with strong evaluative and moral judgments. A
symbolic analysis attempts to unpack the specialized meanings (connotative and denotative) of the icon for
the person or group that uses it.

independent variable specific results : study results that may not generalize to conceptually similar
independent variables or to other operational definitions of the independent variable. This threat to external
validity should be assessed via replication studies that use somewhat different operational definitions of the
independent variable.

ideology: any group’s prevailing ideas about the nature of the world and the moral principles that should
apply. Many analyses of ideology originate in critical research that tries to show that a prevailing ideology
is an incomplete representation of reality. According to critical researchers, ideologies function to
“naturalize” and legitimize particular social and power arrangements in society that benefit elite groups.
See also false consciousness.

independent variable: the researcher's candidate causal variable. The researcher may design (manipulate)
the independent variable and then record the effect that it has on the dependent variable.

idiographic explanation: a form of explanation in which the researcher seeks to identify the contextual
causes of a behavior or an event. Idiographic explanation seeks an in-depth explanation of a particular case.
See also nomothetic explanation.

inductive approach: an approach to construct development that searches for patterns among a myriad of
empirical indicators. Inductive approaches to construct development may use qualitative research methods
such as grounded theory or multivariate statistical procedures such as factor analysis to identify new
constructs. See also deductive approach.

inferential statistics: Statistical procedures that enable researchers to draw conclusions about populations
based on data from probability samples.

informed consent: an ethical standard for conducting research, where research participants must be
informed about all matters that would affect their willingness to participate in a study, including any
benefits and harms that might arise from the study. With this information, the respondent can make an
informed choice about participating or declining to participate in the study.

interaction: an experimental effect where the effect on the dependent variable depends upon which levels
of two or more independent variables are combined. At least one of the independent variables qualifies the

207
effect of a second independent variable (e.g., at least one variable acts as a moderator variable). When a
statistically significant interaction is present, the interaction must be interpreted before the effect a single
independent variable is assessed.

interrater reliability: the degree of agreement among two or more observers rating the same object or
event. Intercoder reliability and interobserver reliability are equivalent terms for interrater reliability.
Common indices of interrater reliability include a) Percentage of agreement, b) Scott’s Pi, and c) Cohen’s
Kappa. Scott’s Pi and Cohen’s Kappa adjust for the high and low frequency categories.

internal consistency: a form of measurement reliability that indexes the degree of consistency between the
multiple indicators of construct. For instance, if one is using a questionnaire to measure attitudes in a
particular domain, one should expect a fairly high degree of consistency in the answers a person gives to
similar questions in the questionnaire. Common indices of internal consistency are split half correlations
and Chronbach’s alpha. Chronbach’s alpha is often referred to as alpha using the Greek sign (e.g., α =.89).

internal validity: a study has internal validity if variations between study groups on dependent variables
can be attributed to the effects the independent variable and not to some other background factor or study
procedure. A study has internal validity to the extent that the study controls for artifacts.

interpretive paradigm: a research tradition that focuses on meaning. It seeks to explicate the denotative,
connotative, and mythic levels of meaning of codes, symbol systems and narratives. It relies heavily on
qualitative research methods such as participant observation and depth interviews.

interrogation sequence: a series of closed-ended questions in a depth-interview protocol that is highly


undesirable.

interval level measurement: a common metric or unit of measure in the measurement scheme. There are
equal interval levels between each measurement point.

intervening variable: lies between the independent variable and dependent variable in time. The
independent variable influences the intervening variable, which in turn affects the dependent variable.

kurtosis: is a measure of how peaked a distribution is. The normal curve has a kurtosis of 3. Peaked curves
will have a kurtosis value of greater than 3. Flat distributions will have a kurtosis that is significantly less
than 3. The kurtosis statistic can be used, in conjunction with a distribution’s skewness statistic, to
determine whether or not a distribution is approximately normal.

levels of measurement: there are four qualitatively different kinds of measurement that have been
identified. They form a hierarchy from the simplest form of measurement to the most complex: nominal
level, ordinal level, interval level and ratio level. The higher levels of measurement contain more
information in their measurement schemes.

Likert item: a popular question format where respondents indicate their level of agreement/disagreement
with a statement. Responses typically range on a continuum form Strongly Agree through Strongly Agree.

linear regression: a statistical procedure that determines the best fitting line through a set of points using
one predictor variable and one output variable. The best fitting line is one that minimizes the average
squared deviations of actual output variable values from the expected values on the prediction line. See
also correlation and multiple linear regressions.

literature review: a section of a research report that reviews existing research that has been done on a
particular topic. The literature review shows how a study extends or expands on existing research.

main effect: an effect of an independent variable on the dependent variable when an interaction is not
present or has been accounted for.

main questions: questions that are developed in advance of a depth interview-planned questions as opposed
to probes and follow-up questions that are developed in the interview itself. Main questions are intended to
cover the topic and to give the interviewee an opportunity to talk in an open-ended way.

manifest construct: a construct that is easily read from surface indicators. In most instances, the sex of a
person is quite manifest. A manifest construct contrasts with a hypothetical construct which is constructed
and legitimated by the researcher.

manipulation check: a procedure employed in an experiment to determine if experimental conditions were


perceived in the way that the experimenter intended. For instance, a manipulation check of humor in a
commercial would substantiate that respondents who heard the "humorous" commercial thought it was
funny and that the people who heard the “serious” commercial thought that it was serious.

maturation: an artifact that threatens cause/effect conclusions in a longitudinal study. Maturation refers to
any changes in the dependent variable that normally occur with the passage of time. Maturation includes
factors such as boredom, fatigue, practice effects and human development. It is best controlled for by
adding a carefully selected control group to the study design.

mean: a measure of central tendency. The mean is the often referred to as the average. It is calculated by
adding the values of all of the items in a distribution dividing by the number of items in the distribution.
The mean is the preferred measure of central tendency in an unskewed distribution.

median: a measure of central tendency. The median of a set of numbers arranged in order of magnitude is
the middle value when an array had an odd number of values or the arithmetic mean of the two middle
values when there is an even number of values. The median is the preferred measure of central tendency
when a distribution is highly skewed.

measurement: the assignment of categories or numbers to objects according to a set of rules.

measurement error: inaccuracies in measurement because of unreliable or invalid measurement.

measurement reliability: the consistency of a measure. This is reported in the form of a reliability statistic,
usually in the methods section of an article

measurement validity: the assessment of whether an operational definition accurately measures a construct.
The three major types of measurement validity are content validity, construct validity and criterion validity.

meta-analysis: a statistical procedure used to combine findings from several studies. The studies must
employ similar operational definitions for the variables included in the analysis. Meta-analysis helps make
sense of contradictory findings that occur when individual studies have small sample sizes and thus have
limited statistical power.

Maximum variation sampling: a type of purposive sample that is widely used in qualitative research
studies. It is a nonprobability sample in which a researcher seeks out people or cases that are known to
vary considerably in their perspective or experiences in a domain of interest.

methods section: section of the research report that explains the procedures that were employed in
collecting research data. This section describes how variables were manipulated or operationalized as
provides as information on measurement reliability for study variables.

mode: the most frequent value in a frequency distribution.

209
moderator variable: qualifies the effect of an independent variable on a dependent variable (creates an
interaction). The joint effects of two variables are significantly different than the effects of each of the two
variables considered alone (The whole is different than the sum of the parts). Potential moderator variables
may be included in a study to test the generalizability or external validity of study findings.

multidimensional construct: a construct that consists of two or more semi-independent elements or


dimensions (e.g., credibility is conceived to consist of the elements of trustworthiness, competence and
dynamism).

multiple regression: a statistical technique that uses multiple variables to predict the values of a single
output variable. Multiple regression is an extension of simple linear regression.
multiple stage sampling: employed when a researcher doesn’t have a complete list or sampling frame of all
of the elements of the population. In cluster sampling, one randomly samples clusters (e.g., voting
precincts) in stage one, and then samples randomly within each selected cluster. Cluster sampling
ordinarily entails more random error than the other probability sampling procedures.

multivariate statistics: statistical analyses where multiple dependent variables are analyzed as a common
set.

myth: a story about the origins of a particular group or institution that is frequently retold by group
members to convey something important about the culture of a group. A myth is a symbolic element that is
frequently explored in an interpretive study of a group’s culture.

narrative: a factual account concerning events that transpired in a particular setting that provides few
inferences or conclusions about the meaning of events. See also story.

necessary causation: the second strongest form of causal reasoning. It asserts that some prior condition X
is necessary for an effect Y to occur (Y never occurs without X having occurred before it). This kind of
causation recognizes that there are cases where X alone is may not be sufficient to cause Y. This usually is
the strongest form of causal reasoning that communication researchers invoke.

negative case analysis: a part of the process of analytic induction-a qualitative research method. The
researcher seeks out cases and analyzes cases that do not seem to fit with his working hypothesis. A
careful analysis of such cases serves to bring needed adaptations to the working hypothesis or to identify
boundary conditions where the hypothesis does not apply.

nominal level measurement: assigns items to categories with different labels. There is no underlying
dimension in the category system. Nominal level measurement is the most basic level of measurement in
that it provides less measurement than the other levels of measurement.

nomothetic explanation: seeks to identify a few causal agents that widely impact conditions or events.
Nomothetic explanation seeks conclusions that generalize across many events and contexts. See also
idiographic explanation.

non parametric statistic: statistical tests of significance that do not involve assumptions about how a
variable is distributed (e.g., does not assume that the variable is normally distributed). Nonparametric
statistics are usually used with variables that are measured at the nominal or ordinal level. The chi-square
is an example of a nonparametric statistic.

nonprobability sample: often referred to as nonscientific samples. This means that members of the target
population do not have a known probability of being selected in the sampling procedure. Nonprobability
samples do enable a person to estimate sampling error. Examples of nonprobability samples include
convenience samples, purposive samples, and quota samples.
normal curve: a symmetrical bell-shaped curve that possesses specific mathematical characteristics. Many
real world populations have attributes that are approximately normally distributed. Most importantly,
sampling error is normally distributed. Knowledge of this useful property is utilized in hypothesis testing.

null hypothesis: a prediction that a particular relationship between two variables does not exist. The null
hypothesis that accompanies a variable analytic research question states that no relationship of any kind
exists between two variables. In contrast, the null hypothesis that accompanies a variable-analytic research
hypothesis states that the particular relationship predicted by the research hypothesis does not exist. In
objectivist research, the researcher tries to discredit or falsify the null hypothesis.

objectivist paradigm: the philosophical paradigm that informs research that focuses on developing general
theories of phenomena. It emphasizes the importance of measurement methods and seeks to identify
common patterns of cause and effect in human communication.

observer as participant: a role in participant observation in which a researcher plays a relatively minor role
in the scene of observation. The participation demands of the role are such that the researcher can devote
most of his time and effort to observation. (See also participant as observer, complete participant and
participant as observer).

one-tailed test-a test of statistical significance used for directional hypotheses. It tests for a particular type
of relationship. A one-tailed test is considerably more sensitive in its ability to detect true relationships
(i.e., reject the null hypothesis) than a two-tailed test.

open-ended question: a question in which the respondent has latitude to define the nature and extent of an
answer (e.g., Why did you decide to become a teacher?).

operational definition: defining a concept by reference to the concrete details of data collection and data-
processing. The operational definition specifies what information was collected and how it was processed
to produce the data utilized in the study.

ordinal level measurement: assigns ranks to objects. The items are ordered along an underlying dimension
(e.g., popularity of songs). However, the distance between each rank is not specified.

outlier: an extreme value in a distribution that skews or inflates statistics such as the mean or standard
deviation. Outliers can distort correlation statistics that describe the relationship between two variables.

parametric statistic: a parametric statistic is a test of statistical significance that makes assumptions about
the distribution of the variable from which a study sample was drawn. Some common assumptions are that
the variables are normally distributed or that groups being compared have equal variances. If these
conditions are not met, then the confidence that one can have regarding Type I error in a study may be
undermined. Examples of parametric statistics are t-tests, analysis of variance, correlation coefficients and
multiple regression.

parsimony : a standard for judging the quality of a hypothesis or theory in objectivist research. A
parsimonious explanation is economical; it explains the more than competing explanations with fewer
concepts. Objectivist researchers prefer that their hypotheses and theories be as parsimonious as possible.

partial correlation: a statistical analysis in which the statistical relationship between two variables is
calculated while statistically controlling for the influence of one or more background variables. Partial
correlation is a preferred method for testing for the artifact of selection in survey research.

participant as observer: a role in participant observation, where the researcher plays an important or
central role in the scene of observation. The researcher must devote considerable time and effort to the role

211
requirements. In this situation, the ability of a researcher to achieve a double consciousness is essential if
the account of events is to be informative.

participant observation: a research method where the researcher plays the dual roles of observer and
participant, usually in a naturally occurring situation. Participant observers record their observations in the
form of field notes. Participant observation is a primary research method of ethnographers.

Pearson correlation coefficient: a goodness of fit statistic that indicates how well a particular linear
regression matches the data when one has variables measured at the interval or ratio levels of measurement.
This correlation coefficient provides information on the direction of the relationship (positive or negative
slope) as well as the strength of the relationship.

peer review: where a research report is evaluated by experts in a field of study they are published.

peer reviewed journal: a scholarly journal that uses blind peer review procedures to evaluate and select
articles that will appear in the journal.

personal journal entry: an account in field notes that records how a researcher responded emotionally to
observed events. These are recorded separately from other field note entries. See also expanded account
entry and provisional account entry.

population: a group of objects, subjects or units that has attributes one wants to describe.

population parameter: a numerical attribute of a population. A population parameter is usually unknown.


Researchers ordinarily use a sample statistic to estimate a population parameter.

post-hoc fallacy: attributing causality after the fact when the results were not initially predicted. People
have a tendency to be able to find explanations looking back on a situation that they did not initially
anticipate.

practical significance: the common sense assessment of the importance of statistically significant results.
When one has a large sample test statistics of relatively small magnitude may be statistically significant and
be rather trivial according to standards of practical significance.

prediction error: The degree to which actual values of an output variable differ from the predicted values
of that variable when the values of one or more predictor variables are known.

predictive validity: is a form of measurement validity. It involves an assessment of whether a measure


predicts or is significantly related to a selected criterion variable.

probability sample: a sample in which each member of a population has a known probability of being
selected in the sample of a population. Examples of probability samples include simple random sampling,
cluster sampling, and stratified sampling.

probe: a question or prompt in a depth interview that encourages the interviewee to keep talking, provide
clarification, or provide additional details. Probes give the interviewee instructions as to the attributes that
you desire a well-formed answer to have (i.e., depth, detail, vividness and nuance). Probes are especially
useful in the initial stages of a depth interview.

procedure reliability and validity: an artifact that threatens cause/effect conclusions in a study due to
inconsistent administration in the study or faulty design. A lack of Procedure reliability and validity
originates in any inconsistency in how the study procedures were implemented between groups or within
groups. Next to selection, procedure reliability and validity is the most common and most insidious
research artifact. Researchers attempt to control for procedure validity by developing independent variables
that have content validity and by using manipulation checks to assure that experimental conditions were
experienced as they were intended. Procedure reliability is accomplished by ensuring that exactly the same
procedures are employed for each study administration. This ensures that the only differences between
groups are the ones designed by the researcher.

proportional reduction of prediction error: degree to which a statistical prediction procedure diminishes
the level of uncertainty in estimating the values of an output variable. In simple linear regression and
correlation, the coefficient of determination provides the proportional reduction of prediction error. Similar
indices can be calculated for other statistical prediction procedures.

proprietary research: research that is the property of the business or group that commissioned the research.
It is available to the public, only if the sponsoring organization releases it.

provisional interpretation entry: an entry in field notes that gives the participant observer’s unfolding
initial interpretation of events that are described in the expanded account entry. This entry provides a
record of how the researcher’s understandings developed and changed as the researcher became
acculturated to group and the scene.

purposive sample: selecting a known group of people to sample from, because they have characteristics
that one is interested in studying (e.g., sampling members of Greenpeace to investigate attitudes of
environmental activists).

qualitative research methods: research methods that produce data that is primarily non-numerical in terms.
Examples of qualitative research methods involve depth-interviewing, discourse analysis and other forms
of data collection that primarily focus on the analysis of language and symbolic elements. Qualitative
research methods are particularly compatible with the interpretive research paradigm.

quantitative research methods: research methods that structure data collection and analysis in numerical
terms. Quantitative research methods are particularly compatible with the objectivist research paradigm.

quota sample: setting up quotas for the proportion of people to be selected in certain demographic
segments of the target population. If the overall population is 53% female and 47% male, then the final
overall sample will reflect these percentages. Quota sampling involves less systematic error than
haphazard sampling. Quota sampling is widely used in survey research.

random assignment to group: used in experiments to control for the artifact of selection. Random
assignment means that persons who participate in the study are randomly assigned to the experimental
groups. This experimental procedure is not to be confused with random sampling which deals with how a
sample is drawn for a study.

random digit dialing: a sample of telephone numbers drawn at random. Random digit dialing is usually a
variation on simple random sampling. Random digit dialing excludes people who do not have telephones.

random sampling: a sample that is drawn from a population in which each member of the population has a
known probability of being selected. Random selection reduces systematic sampling error. This sampling
procedure is not to be confused with the experimental procedure of random assignment to group.

random sampling error: random fluctuations that occur from one sample to the next. Random sampling
error can be represented by a sampling distribution. Increasing sample size reduces random error if one has
a representative sample. Because it can be estimated, random error is called a “good” type of error.

range: a measure of the dispersion of values in a distribution. The range is the highest value minus the
lowest value in a distribution. The size of the range is highly correlated with the number of items or events
in the distribution. Therefore, statisticians tend to favor the standard deviation as a better indicator of true

213
variation in a distribution.

rank order item: a question format that asks respondents to rank order a range of alternatives on some
underlying dimension with a rank of 1 given to the item that is the highest order on that dimensions (e.g.,
rank of one to the most popular song). Rank order items represent ordinal level measurement.

ratio level measurement: enables comparisons of proportions or ratios between two measurements because
the measurement scheme has a natural zero point as well as equal interval units of measure.

refine conceptual structure phase: stage of research where preexisting theories and models are refined and
further developed in the light of recent research findings.

regression to the mean: an artifact that threatens cause/effect conclusions in longitudinal studies. It refers
to the tendency for very high or low scores in one time period to move back toward a mean over time. In
longitudinal studies, one might expect changes over time if the groups one is measuring have had unusual
performances. A control group helps control for this artifact.

replication: conducting a study that duplicates some aspect of a previous study. Replication is important in
research process because it allows researchers to confirm the results of earlier research studies and to assess
the external validity of research findings.

research abstract: summarizes study design and findings in a short paragraph. The research abstract is
often included in research databases such as Communication Abstracts.

research design phase: phase of research where researchers design procedures to collect and process data
to answer prospective research questions or hypotheses.

research hypothesis: a prediction that the researcher accepts as being the best description of empirical
events, after the null hypothesis has been rejected.

research implementation phase: phase of research where researchers implement data collection and
processing procedures.

research paradigm: a set of assumptions and research objectives that inform a particular study. These
typically include assumptions about the nature of the world (i.e., ontology), assumptions about people
(philosophical anthropology), assumptions about the nature of knowledge (i.e., epistemology), and
assumptions about research objectives. Communication research has three broad research paradigms or
research traditions (See also objectivist research paradigm, interpretive research paradigm and the
critical research paradigm).

research question: question that asks about the presence or absence of a quality or inquires about possible
relationships between constructs in a particular study. In objectivist research, research questions do not
specify the type of relationship in advance. Therefore, they use two-tailed statistical significance tests.

results section: section of a research report that summarizes the findings with regard to the study’s research
questions or hypotheses.

rhetoric: the art of identifying and selecting appropriate means of persuasion. Aristotle described rhetoric
as “the faculty of discovering in the particular case, what are the available means of persuasion.”

rhetorical criticism: evaluating the degree to which a particular instance of strategic communication
measures up to standards of communication excellence.

sample: a group that is drawn from a larger population. Sample statistics are often used to estimate
population parameters. See also probability sampling.

sampling distribution: the distribution of a sample statistic that would occur if all possible samples of a
given size were drawn from a population. Sampling distributions estimate random sampling error.
Sampling distributions are derived by statisticians. Knowledge of sampling distributions allows researchers
to estimate the probable error in estimating a population parameter and in testing hypotheses.

sampling frame: the list of population elements from which a sample is drawn.

sample specific results: study findings that may not replicate with other samples because of peculiarities in
the sample employed in the study. For instance, most early research on cardiac disease almost exclusively
utilized male subjects. Subsequently researchers questioned whether these findings about cardiac disease
could be generalized to women. Replication is needed to test for the likelihood of sample specific results.

sample statistic: a number that describes some attribute of a sample.

saturation point: this is the point in a depth interview study where new respondents voice the very same
symbolic elements such as concepts and themes that previous respondents have used. The interviewing
process can be terminated when this point is reached.

Scott’s Pi: a measure of intercoder reliability that compensates for the rate of agreement that is expected by
chance alone. It adjusts for the most frequently used categories in a coding scheme.

selection: an artifact that threatens cause/effect conclusions in a study when different groups are compared.
Selection refers to any background differences between study groups that create an artificial correlation
between the independent variable and the dependent variable. Researchers must assume that groups being
compared are equivalent in all important respects. Selection is sometimes called "self-selection", meaning
that groups of people that we compare involve behaviors that people selected. Hence, these groups may
differ in background differences in personality, habits and demographic characteristics. Random
assignment to group is an effective experimental procedure for controlling for selection. However,
statistical controls are required when random assignment to group is not feasible. Selection may be the
most widespread research artifact.

semantic differential: a question rating scale in which an attitude object is rated on numbers that are placed
between to bipolar adjectives. The semantic differential represents interval level measurement. It was
specifically developed to measure the meaning dimensions of evaluation, potency and activity.

simple random sampling: using a random numbers table or some other procedure to assure that people or
elements are sampled at random from the sampling frame of the population.

single factor experiment: an experiment with one independent variable.

skewness: a measure of the degree of asymmetry in a distribution. A skewed distribution ordinarily has at
least one outlier in it. A distribution where the mean is greater than the median is a positively skewed
distribution. A distribution where the mean is less than the median is a negatively skewed distribution.

Spearman's rho: a correlation coefficient that describes how strongly two sets or ranked variables are
related to each other.

standard deviation: the most frequently used indicator of variation in a distribution. To calculate the
standard deviation one would subtract each and every score in a distribution from the mean. Square these
differences and add the sums together. If one then divides this sum by the number of items in the
distribution one gets the distribution's variance. The standard deviation is the square root of the variance.

215
standard error: the standard error is the standard deviation of a sampling distribution for a given statistic
(e.g., the Standard Error of the Mean is the standard deviation of a sampling distribution of means). As
sample size gets larger, the standard error of a sampling distribution becomes smaller.

standardized score: gives the distance of a score from the distribution's mean in standard deviation units.
Scores greater than the mean have a positive standard score and scores less than the mean have a negative
standard score. Scores located at the distribution mean have a value of zero. It is sometimes referred to as
the z-score.

statistical power: the probability of rejecting a null hypothesis that is false when the effect size,
significance level and sample size are specified. Statistical power is equal to the 1- the probability of Type
II error. Good research design adjusts the levels of type I error (i.e., statistical significance level) and
sample size to balance the risks of Type I and Type II error.

statistical significance level: a statistical standard one uses to test a null hypothesis. If the actual result is
less probable than the statistical significance level (e.g., the results would occur less than five times out of a
hundred due to chance alone), the null hypothesis is rejected and the results are “statistically significant.”
Different statistical significance levels can be used depending upon the purposes of the researcher. The
statistical significance level is synonymous with type I error.

story: a narrative that is related to make a point, often a moral point. Stories are told to illustrate a moral
principle or make a particular point. Stories highlight some facts in a scene and ignore others in
constructing the significance of the event. Stories are often analyzed as symbolic elements in a qualitative
research studies.

stratified random sampling: a random sample that draws from subpopulations within the overall sampling
frame. For instance, one might conduct a random sample of males and a random sample of females.
Stratified sampling can produce samples with less sampling error than simple random sampling.

subjective question: a question in survey research that inquires about the subjective perceptions or
experience of an individual. The accuracy of one’s answer to a subjective question cannot be validated
against an external source. See also factual question.

sufficient causation: the strongest form of causal reasoning. If some prior condition X exists, then some
following condition Y always occurs. This kind of causation is seldom if ever successfully invoked in
explaining communication or human behavior.

survey method: asking questions about a person’s behaviors, beliefs or attitudes and treating the answers as
verifiable data.

symbolic analysis: an analysis of the unique connotative meanings of symbols such as concepts and themes
for a person or the unique denotative meanings shared by a social group or subculture.

systematic error: error that arises because some portions of a population are over represented in the sample
and other portions are underrepresented. Increased sample size makes systematic sampling error worse, not
better. A probability sample reduces the likelihood of systematic sampling error.

systematic random sampling: a probability sampling procedure where every nth item in a sampling frame
is selected after a random start (e.g., After a random number between 1 and 10 is drawn to select the first
item, every 10th person in the sampling frame is selected—5th, 15th, 25th.).

systems causation: a nonlinear form of causation. It takes into account multiple factors interacting
simultaneously. It is difficult to identify independent variables and dependent variables because the
variables are linked via feedback loops. Systems causation is the most complex form of causal reasoning
researchers use. It is most often used in causal modeling done with computers.

test-retest reliability: the degree of consistency between measurements of a construct at different points in
time. Trait variables should show consistency over time. This type of reliability is often reported simply as
a Pearson correlation coefficient of time 1 and time 2 scores.

testing sensitization : an artifact that can threaten cause/effect conclusions in a longitudinal study. Testing
sensitization occurs when a person receives a pretest which in turn affects the person's score on a posttest.
Testing sensitization is a special case of the artifact of maturation. Testing sensitization can be controlled
for by adding a post-test only group to the experimental design.

theme: a statement that makes an assertion about reality. A theme may explain why something happened,
describe how a person should behave, or express a fact of life. A theme involves evaluative and moral
judgments. A symbolic analysis identifies important themes that crystallize a person or group's worldview.
The analysis explains the full meaning of the theme for the person or group.

time specific results: study findings that may not generalize to other time periods, or in the case of a
longitudinal study, to other studies that involve different time periods. Very unique events that occur
during the time of a study increase the likelihood of time specific results. These threats to external validity
should be assessed through replication.

trait variable: an enduring attribute of an object or person. Trait variables can be contrasted with state
variables which are very changeable. Measures of trait variables should demonstrate test-retest reliability
over appropriate periods of time.

transferability: a standard that is applied to judging the quality of a qualitative research study. A study is
judged to be transferable to the extent that the researcher provides detailed information so that the reader
can assess how study findings may apply to other contexts and populations.

trend study: a longitudinal study in which one or more characteristics of a population are monitored over
time. For instance, a researcher could follow the degree to a new technology is adopted by members of a
target group or population.

triangulation: using several measurement methods to investigate a research question or hypothesis often
using a combination of qualitative and quantitative methods. The data that are collected via different
methods are explicitly compared. For instance, a researcher may compare information from closed-ended
surveys with data from depth interviews that use a lot of open-ended questions. If the results between the
two methods agree or triangulate, it increases the researchers' confidence that the results are not simply
artifacts of the kind of research method that is used (i.e., method specific results).

t-test: a parametric inferential statistic that investigates differences between two groups on a comparison
variable that is measured at the interval or ratio level. The t-test is a parametric inferential statistic.

two-tailed test-a test of statistical significance used in testing the null hypothesis associated with a research
question. It tests for whether any discernable relationship exists between two variables. For example, one
might test whether either a positive or a negative correlation (i.e., different from zero) exists between two
variables.

type I error: the error that a researcher risks when she rejects the null hypothesis in hypothesis testing (i.e.,
the probability that the rejected null hypothesis is true). Type I error is equal to the significance level
selected by the researcher.

type II error: the error one risks when the null hypothesis is retained (i.e., the probability that the null
hypothesis one has failed to reject is false). The probability of Type II error is affected by sample size, level

217
of Type I error selected, and effect size, and measurement error. A researcher commits type II error when
she fails to reject a null hypothesis that is false.

unidimensional construct: a construct or variable that is relatively homogeneous. It has only one
underlying dimension. One-dimensional constructs are relatively less complex than multidimensional
constructs.

unitizing: segmenting sequential data into units for purpose of analysis. For instance, a researcher might
break down a transcript of a debate into the separate arguments that make up the debate. The researcher
could then code each argument identified in the coding process into a relevant category.

ways of knowing: strategies that people employ to resolve disagreements concerning matters of fact. See
also appeal to authority, appeal to empirical research, appeal to intuition, appeal to personal experience and
appeal to tradition.

variable: any construct that is operationally defined and can be measured (i.e. can take on two or more
values).

variable analytic research question: a research question that asks whether any sort of relationship exists
between two variables (e.g., Do men and women differ in their levels of self-disclosure?).

variance: a measure of variation in a distribution. It is the average squared distance of items from the mean
of the distribution. The variance is more often used in inferential statistics than in descriptive statistics
because its units of measure are cumbersome. However, the square root of the variance provides a more
convenient measure of variation: the standard deviation.

z-score: see standardized score.


Subject Index
—A— coefficient of determination, 108, 197
academic research, 7, 196 coefficient of nondetermination, 108, 197
account, 50, 196 Cohen’s Kappa, 79, 197
appeal to authority, 5, 196 complete observer, 21, 198
appeal to empirical inquiry, 6, 196 complete observer role, 22
appeal to intuition, 5, 196 complete participant, 22, 198
appeal to personal experience, 6, 196 concept, 48, 197
appeal to tradition, 5, 196 conceptual definition, 71, 198
applied research, 7, 196 concurrent validity, 86, 98, 198
artifact, 148, 153, 196 condensed account entry, 24, 198
attribute, 196 confidence interval, 143, 176, 198
—B— confidence level, 143, 176, 198
basic research, 7, 196 confirmation bias, 55, 198
bibliography, 12, 196 confirmatory research, 67, 137, 198
blind procedure, 196 confounding variable, 198
boundary conditions, 58, 64, 67, 160, 197 connotative meaning, 198
—C— construct, 199
candidate-answer question, 197 construct validity, 86, 91, 199
causal modeling, 111, 197 content analysis, 82, 199
cause, 197 content validity, 86, 90, 199
central tendency, 197 control group, 167, 199
Chronbach’s alpha, 79, 197 convenience sample, 178, 199
closed-ended question, 197 correlation coefficient, 107, 199
Cluster sampling. See Multiple stage sampling Cramer's V, 108, 199
criterion validity, 86, 200 inferential statistics, 95, 146, 204
criterion variable, 200 informed consent, 30, 204
critical case sampling, 200 interaction, 169, 204
critical paradigm, 16 intercoder reliability, 90
critical research paradigm, 200 internal consistency, 79, 90, 204
critical values table, 122 internal validity, 148, 153, 165, 204
culture, 16, 200 interpretive paradigm, 15, 204
—D— interrater reliability, 70, 78, 79, 204
data analysis phase, 65, 200 interrogation sequence, 205
deductive approach, 71, 200 interval level measurement, 73, 205
demand characteristics, 153, 168, 200 intervening variable, 58, 205
denotative meaning, 200 —K—
dependent variable, 58, 201 kurtosis, 103, 205
dependent variable specific results, 201 —L—
depth interviews, 201 levels of measurement, 205
descriptive research question, 54, 201 Likert item, 181, 205
descriptive statistics, 201 linear regression, 107, 205
discussion section:, 201 literature review, 9, 11, 54, 65, 205
dispersion, 201 —M—
double blind study, 168, 201 main effect, 205
double consciousness, 22, 201 main questions, 28, 205
double-barreled question, 201 manifest construct, 71, 205
—E— manipulation check, 167, 205
ecological fallacy, 201 maturation, 154, 168, 205
ecological validity, 201 Maximum variation sampling, 206
ethnography, 21, 201 mean, 94, 154, 206
evaluation research, 7 measure of central tendency, 94
expanded account entry, 24, 202 measurement, 70, 206
experiment, 202 measurement error, 15, 16, 25, 64, 140, 175,
experimenter bias, 153, 168, 202 179, 206
exploratory research, 31, 54, 67, 91, 136, 137, measurement reliability, 11, 79, 89, 90, 206
202 measurement validity, 86, 206
external validity, 148, 165, 202 median, 94, 96, 206
—F— meta-analysis, 137, 206
faciliative causation, 149, 202 methods section, 11, 12, 206
factorial design experiment, 202 mode, 94, 96, 206
factual question, 185, 202 moderator variable, 58, 206
false consciousness, 16, 202 multidimensional construct, 206
focus group, 31, 203 multiple regression, 111, 206
follow-up question, 29, 203 multiple stage sampling, 207
forced-choice item, 183, 203 multistage sampling, 180
frequency distribution, 94, 203 multivariate statistics, 112, 207
—G— myth, 50, 207
generalizing research, 67, 203 —N—
—H— narrative:, 207
history, 154, 168, 203 necessary causation, 149, 207
hypothesis, 203 negative case analysis\, 207
hypothesis testing, 203 nominal level measurement, 73, 207
hypothetical construct, 71, 203 nomothetic explanation, 207
—I— nonparametric statistic, 207
icon, 49, 203 nonprobability sample, 207
ideology, 16, 203 normal curve, 101, 104, 207
idiographic explanation, 204 null hypothesis, 54, 55, 63, 207
independent variable, 58, 204 —O—
independent variable specific results, 203 objectivist paradigm, 15, 208
inductive approach, 71, 204 observer as participant, 22, 208

219
one-tailed test, 208 —S—
open-ended question, 208 sample specific results, 212
operational definition, 208 sample statistic, 175, 176, 212
ordinal level measurement, 73, 208 sample:, 212
outlier, 95, 118, 208 sampling distribution, 121, 176, 212
—P— sampling frame, 212
parametric statistic, 208 saturation point, 40, 212
parsimony, 53, 208 Scott’s Pi, 79, 212
partial correlation, 111, 208 selection, 155, 212
participant as observer, 22, 208 semantic differential, 74, 182, 212
participant observation, 16, 21, 209 simple random sample, 179, 212
Pearson correlation coefficient, 108, 209 single factor experiment, 213
peer review, 13, 209 skewness, 95, 103, 213
peer reviewed journal, 9, 13, 209 Spearman's rho, 108, 213
personal journal entry, 24, 209 standard deviation, 96, 101, 144, 213
population, 209 standard error, 213
population parameter, 142, 143, 175, 198, 209, standardized score, 96, 101, 104, 213
212 statistical power, 136, 213
post-hoc fallacy, 55, 112, 209 statistical significance level, 136, 213
practical significance, 209 story, 213
prediction error, 110, 209 stratified random sample, 213
predictive validity, 86, 91, 209 stratified sample, 179
probability sample, 179, 209 subjective questions, 185, 213
probe, 28, 209 sufficient causation, 149, 213
procedure reliability and validity, 154, 168, 209 survey method, 214
proportional reduction of prediction error, 110, symbolic analysis, 214
210 systematic error, 140, 214
proprietary research, 7, 210 systematic random sample, 214
provisional interpretation entry, 24, 210 systematic sample, 179
purposive sample, 179, 210 systematic sampling error, 143, 175, 178, 179,
—Q— 210
qualitative research methods, 16, 20, 67, 200, systems causation, 149, 214
204, 210 —T—
quantitative research methods, 15, 210 testing sensitization, 154, 168, 214
quota sample, 178, 210 test-retest reliability, 214
—R— theme, 49, 214
random assignment to group, 155, 168, 210 time specific results, 214
random digit dialing, 180, 210 trait variable, 214
random sample, 210 transferability, 214
random sampling error, 176, 210 trend study, 214
range, 96, 211 triangulation, 215
rank order item, 211 t-test:, 215
rank-order questions, 181 two-tailed test-, 215
ratio level measurement, 73, 211 type I error, 136, 138, 215
refine conceptual structure phase, 65, 211 type II error, 138, 215
regression to the mean, 145, 168, 211 —U—
replication, 63, 65, 211 unidimensional construct, 215
research abstract, 11, 12, 211 unitizing, 215
research design phase, 65, 211 —V—
research hypothesis, 53, 54, 211 variable, 215
research implementation phase, 65, 211 variable analytic research question, 54, 215
research paradigm, 211 variance, 96, 110, 215
research question, 53, 211 —W—
results section, 11, 212 ways of knowing, 5, 215
rhetoric, 212 —Z—
rhetorical criticism, 43, 212 z-score, See standardized score
221

Anda mungkin juga menyukai