Anda di halaman 1dari 37

Correlation/Regression

RELATIONSHIP ANALYSIS

PPT 18-1
Learning Objectives
• The meanings and uses of regression and
correlation analyses
• Calculate regressions and correlation
• Basics of multivariate statistical analysis
techniques

PPT 18-2
Statistics Not Always Black and
White
• How does the story relate to marketing research?
• Explain the meaning of this statement from the story:
“Statistical fallacies by themselves might create a certain
amount of random mischief. But the big problem is that
statistics which seem to confirm the dogmas of the
intelligentsia are seized upon and trumpeted throughout
academia and the media, with little or no concern for
“multicollinearity” or any of the other pitfalls.”
• How can the Internet be used to help you understand
multicollinearity, correlation, and other statistical
concepts?
PPT 18-3
Relationship Analysis

The examination of the association between


two or more variables. In marketing, some
of the more apparent relationships include
associations between advertising and sales,
company size and advertising budget,
supply and demand for products, and
customer satisfaction and customer loyalty.

PPT 18-4
Scatter Diagrams

• Two related variables, called bivariate data, plotted as points


on a graph.
• Each point on the diagram represents a pair of values, one
based on the X scale (independent variable) and the other
based on the Y scale (dependent variable).
• Making a scatter diagram usually is the initial step in
investigating the relationship between two variables, because
the diagram shows visually the shape and degree of closeness
of the relationship.
• A scatter diagram also indicates whether the relationship
between the two variables is positive or negative.
PPT 18-5
Simple Regression Analysis
Refers to statistical techniques for measuring the
relationship between a dependent variable and one or
more independent variables. The relationship
between two variables is characterized by how they
vary together. Given pairs of X and Y variables,
regression analysis measures the direction (positive or
negative) and rate of change (slope) in Y as X changes,
or vice versa. Using the values of the independent
variable, it attempts to predict the values of an
interval-or ratio-scaled dependent variable.

PPT 18-6
Regression Analysis Requires Two
Operations
• Derive an equation, called the regression equation, and
a line representing the equation to describe the shape
of the relationship between the variables. The
regression line is the line drawn through a scatter
diagram that “best fits” the data points and
accurately describes the relationship between the
two variables. The equation and its line may be
linear or curvilinear.
• Estimate the dependent variable (Y) from the
independent variable (X), based on the relationship
described by the regression equation.
PPT 18-7
Correlation Analysis
• Statistical techniques for measuring the
closeness of the relationship between variables.

• It measures the degree to which changes in one


variable are associated with changes in another.
• It can only indicate the degree of association or
covariance between variables. Covariance is a
measure of the extent to which two variables
are related.
PPT 18-8
Correlation Analysis -
continued
• Regression and correlation analysis may
be either simple or multiple. Simple
analysis uses only two variables, one
dependent and one independent.
Multiple analysis deals with three or more
variables, one dependent and two or more
independent.

PPT 18-9
Regression Equation and Line
Researchers estimate the regression line using the following
equation:
Y = 0+ 1Xi + I
0 = the Y intercept when X equals zero
1 = the slope of the regression line, which is the increase or
decrease in Y for each change of one unit of X
Xi = a given value of the independent variable
i = the observation number
i = the error term associated with the ith observation

PPT 18-10
Regression Equation and Line -
continued
The model involves parameters that are
unknown ( 0 and 1) but can be estimated
from sample data. The error term, i,
referred to as “eta,” is also unobservable,
but can be estimated from sample data.

PPT 18-11
The Lack Of Precision Can Be Due To

• Complexity of most marketing and other business


problems
• The functional form of the relationship between the
dependent and independent variables may differ from
the one selected
• Measurement of the variables may be imperfect
• Data are typically available only at an aggregate level
• Data are based on human behavior, so the error term
in the model may account for a “random” component
in behavior

PPT 18-12
Least-Squares Method
• A statistical technique that fits a straight
line to a scatter diagram by using the
shortest vertical distances of all the points
from the straight line.
• The equation derived by this method will
yield a regression line that best fits the
data.

PPT 18-13
• Regression coefficients are the values that
represent the effect of the individual
independent variables on the dependent
variable.
Standard Deviation of
Regression
The standard deviation of the Y values from
the regression line (Yc). This is also called
the standard error of estimate, since it can
be used to measure the error of the
estimates of individual Y values based on
the regression line.

PPT 18-14
Total Deviation

Total deviation = Unexplained deviation +


Explained deviation

The terms “explained” and “unexplained” are used


here to indicate whether or not a portion of the total
deviation is reduced by the introduction of the X
values in computing Yc values. When these values
are summed and squared individually, they estimate
the explained and unexplained variation of Y.
PPT 18-15
Coefficient of Determination (r ) 2

• The strength of association or degree of closeness


of the relationship between two variables
measured by a relative value. It demonstrates how
well the regression line fits the scattered points.
• It indicates the amount of variation in the
dependent variable that is explained by the
variation in the independent variable and vice
versa.
• It is defined as the ratio of the explained variation
to the total variation.
PPT 18-16
Coefficient of Determination (r2) -
continued

• When r2 is close to 1, the Y values are very


close to the regression line. When r2 is
close to 0, the Y values are not close to the
regression line.
• r2 is always a positive number. It cannot
tell whether the relationship between the
two variables is positive or negative.

PPT 18-17
Correlation Coefficient
• The square root of r2, is frequently
computed to indicate the direction of the
relationship in addition to indicating the
degree of the relationship.
• It is the correlation between the observed
and predicted values of the dependent
variable.

PPT 18-18
• Since the range of r2 is from 0 to 1, the
coefficient of correlation r will vary within
the range of from 0 to  1.
• The + sign of r will mean a negative
correlation. The sign of r is the same as
the sign of b (the slope) in the regression
equation.
Calculating Regressions Using
Computers
• To run the calculations using SPSS
– Click on “Statistics”
– Then click on “Regression” and “Linear”
– These commands designate the statistical test
to be run
• To run calculations using Excel
– Click on “Tools” and “Data Analysis”
– Then click on “Regression.”

PPT 18-19
Multiple Regression Analysis
• This test will determine the association or
relationship between dependent and
independent variables.
• In multiple regression analysis, more than
two variables are included in the
examination. While the dependent
variables is still represented by Y, the
independent variables are represented by
X1, X2, X3, . . . and so on
PPT 18-20
• Since with multiple regression we are
dealing with more than one independent
variable, we refer to the association
between the dependent and independent
variables as the coefficient of multiple
determination, denoted by.
Calculating Multiple Regression Using
Computers
• To perform the computations using SPSS
for Windows
– Click on “Statistics”
– Then click on “Regression” and “Linear”
– These commands designate the statistical test to
be run
• To run the calculations using Excel
– Click on “Tools” and “Data Analysis”
– Then click on “Regression.”
PPT 18-21
Forecasting Using Time Series
Analysis
• Numerical variables that are calculated,
measured, or observed sequentially on a regular
chronological basis are called time series
• A time series representing an organization’s is
the result of interactions of many changing
forces
• The forces can be business, economic, political,
and social influences as well as the forces of
nature.

PPT 18-22
Time Series Patterns Or
Components
• Secular trends - direction of a time series
movement over a long period of time usually
represented by a straight line or a smooth
curve.
• Seasonal variation - repeating periodic
movement of a time series

PPT 18-23
• Cyclical fluctuations or “business cycles” -
expansions (ups) and contractions (downs) of
business activities around the normal value
• Irregular movements - erratic movements,
including all types of time series movements
other than secular, seasonal, or cyclical
Two Popular Forecasting
Techniques
• Trend Analysis - Used when historical data is
plotted or extrapolated to project some outcome in
the future.
• Exponential Smoothing -Type of weighted
average forecasting technique that assigns heavier
weights to recent data and lighter weights to less
recent data. When forecasting, the more recent
data are more likely to be better predictors of the
near future than are earlier periods.
PPT 18-24
Multivariate Statistical Analysis
 Any simultaneous analysis of more than two
variables.
 Many times, multivariate techniques are a means of
performing in one analysis what used to take
multiple analyses using univariate techniques
(analysis of single-variable distributions).
 Common multivariate techniques: multiple
discriminant analysis, multidimensional scaling, factor
analysis, cluster analysis and conjoint analysis.

PPT 18-25
Multiple Discriminant Analysis
(MDA)
• Appropriate tool for testing the hypothesis that the
group means of a set of independent variables for two
or more groups are equal.
 Used if the dependent variable is categorical [either
dichotomous or multichotomous and the
independent variables are either interval or ratio data.

 When two classifications are being examined, it is


referred to as a two-group discriminant analysis. When
three or more classifications are identified, then
multiple discriminant analysis is used.
PPT 18-26
Multiple Discriminant Analysis
(MDA)
- continued
 Intent of this technique is two-fold:
(1) to understand group differences
(2) to predict the likelihood that a variable will belong to a
particular group, based on several independent variables.

 Linear combination is known as the discriminant


function
 An important function of discriminant analysis is to
create a classification matrix, which shows the
number of correctly and incorrectly classified cases.

PPT 18-27
Factor Analysis
• Groups attributes that are alike.
• Used to examine interrelationships among many
variables and to explain these variables in terms of
their common underlying and unobservable
dimensions (called “factors”).
• Factor analysis can be used to reduce the
information contained in several original variables
into a smaller, more manageable, set of variables
while losing as little information as possible.
• Data must be gathered from interval scales.
PPT 18-28
Cluster Analysis
• Grouping data into “clusters” such that elements in the
same group are similar to each other, and elements in
different groups are as different as possible.
• Partitions a sample into homogeneous classes.
• Used to identify market segments--groups of consumers
with relatively similar needs.
• Seeks to identify constructs that underlie objects.
• Interval scales must have been used during data gathering.
• Creates different groups or requires previous knowledge
of the group membership for each item included.

PPT 18-29
Multidimensional Scaling

• Also referred to as perceptual mapping.


• Used to identify important dimensions underlying
respondents’ evaluations of test objects.
• Convert consumer judgments of similarity or
preference into distances represented in
multidimensional space.
• Multidimensional scaling techniques are used to
identify important dimensions underlying customer
evaluations of products, services, or companies.

PPT 18-30
Conjoint Analysis
 Provides information about the relative
importance respondents place on
individual attributes when choosing from
multiple brands.
 Built on the assumption that consumers
make complex decisions based not on one
factor at a time but on several factors
“jointly” (thus the term “conjoint”).

PPT 18-31
Net Impact
• The Internet
– Will not help researchers with statistical analyses.
– Will lend qualitative support for the research
findings obtained from the quantitative analyses.
– Can inform researchers about advancements
made in statistical analyses through published
manuscripts, discussion groups, and chat groups
– Researchers also use electronic mail extensively
to share their research findings

PPT 18-32
Decision Time!
If correlation analysis is a popular and informative
statistical method, why should researchers bother
using the somewhat intimidating multivariate
statistical techniques? Do you feel there is really much
to gain from these methods?

http://www.swlearning.com/marketing/shao/powerpoin
t/CH18_7.ppt#5

PPT 18-33

Anda mungkin juga menyukai