Anda di halaman 1dari 37

Chapter- 9

Characteristics of Multivariate Analysis Dependency Techniques -- Multiple Regression Analysis -- Discriminate Analysis Interdependency Techniques -- Factor Analysis -- Cluster Analysis

What is Multivariate Analysis?

Multivariate analysis is defined as All statistical techniques which simultaneously analyze more than two variables on a sample of observations. In other words, the multivariate analysis helps the researcher in evaluating the relationships between multiple (more than two) variables simultaneously.

Example ..
Where the sales level of a companys product is influenced not only by demand, but also by competitors strategies. While analyzing sales, the manager of the company has to take this variable into consideration as well. Multivariate techniques are broadly classified into two categories. -- Dependency Techniques -- Interdependency Techniques

Dependency Techniques
Dependency techniques aim at explaining or predicting one or more dependent variables based on two or more independent variables. Here, the focus is on defining the relationship between one dependent variable and many independent variables that affect it.

Multiple Regression Equation

Y a 1 X 1 2 X 2 3 X 3 ..... n X n e
Where Y is the dependent variable, X is the independent variable, s are the slope coefficients that represent the change in dependent variable Y when there is a change of 1 unit in X variable. is the Y-intercept when X=0

Key Purposes
Multiple regression analysis is used for two key purposes: To identify relationships between variables To predict the outcomes Multiple regression analysis can help the researcher to evaluate the association between a single dependent variable and two or more than two independent variables.

Coefficient of Multiple Determination

The coefficient of multiple determination measures the magnitude of the association of the variables involved in multiple regression. It is denoted by R2. In mathematical terms, it measures the percentage of variation in variable Y explained by the independent variables.

Test of Significance
Null hypothesis H0 : R2 = 0 Alternative hypothesis H1: R20 The test statistic can be determined using the formula

SSR k F SSE (n k 1)

SSR is the sum of squares due to regression SSE is the residual sum of squares n represents the sample size k represents the number of variables in the problem We reject the null hypothesis if the calculated F-value exceeds the tabular F-value. We accept the null hypothesis if the calculated F-value is less than the tabular F-value

Issues in Multiple Regression Analysis

While using multiple regression analysis, the researcher has to consider the following factors: Multicollinearity Dummy variables Discriminant analysis Canonical Correlation analysis Multivariate Analysis of Variance (MANOVA)

What is Discriminant Analysis?

Discriminant analysis is the technique used for classifying a set of observations into predetermined groups based on a set of variables known as predictors or input variables . Business researchers often face situations where they need to classify the population or objects into certain groups.

A financial institution may want to classify various investment options into high return, medium return, and low return investments Similarly a market research agency might want to asses the quality of various car models and classify them under high quality, medium quality and low quality categories

Where it is used?
By using discriminant equation we can classify the objects into particular predefined groups To predict the success/failure of the objects. Based on the classification of objects, we can find answers like: which investment option will provide higher returns, or who are the potential customers Discriminant analysis also helps in determining the factors that aid in discriminating between the objects.

This can be used in marketing where we can apply it to achieve an understanding of how customer preferences for different brands differ

General discriminant analysis equation.

The equation is Z = b1X1i + b2X2i + + b1Xni Z is the discriminant score Xni are the discriminating variables (independent variables) b1, b2, b3, .bn are the discriminant coefficients or weights corresponding to each independent variable. The discriminant score is determined for each object. Using these scores as the basis, the researcher will decide as to which group the object belongs to. This equation is also used to identify the major factors that help in discriminating the objects contd..

The number of discriminant equations required to carry out discriminant analysis depends upon the number of categories into which the objects are to be classified We need to develop n-1 discriminant equations, where n represents the number of categories, to carry out the discriminant analysis

For example
If the problem consists of categorizing the objects into two groups, (such as eligible and ineligible candidates, buyers and nonbuyers) then we need to develop a single discriminant equation For a problem consisting of three categories (high return stocks, medium return stocks, and low return stocks), we need to develop two discriminant equations

Interdependency techniques are used in situations where no distinction is made between variables which are independent variables and those which are dependent variables Instead the interdependent relationships between variables are examined. Prominent interdependency techniques are factor analysis, cluster analysis, metric multidimensional scaling and non-metric multidimensional scaling

Factor analysis can be defined as a set of methods in which the observable or manifest responses of individuals on a set of variables are represented as functions of a small number of latent variables called factors. Factor analysis is used when the research problem involves a large number of variables making the analysis and interpretation of the problem difficult It helps the researcher to reduce the number of variables to be analyzed, thereby making the analysis easier. Using factor analysis, the researcher can reduce the large number of variables into a few dimensions called factors that summarize the available data

Benefits of Factor Analysis

Factor analysis can be used to identify the hidden dimensions or constructs which may not be apparent from direct analysis It can also be used to identify relationships between variables It helps in data reduction It can also help the researcher to cluster the products and population being analyzed

Terminology in Factor Analysis

A factor is an underlying construct or dimension that represents a set of observed variables

Factor loadings:

It helps in interpreting and labeling the factors. It measures how closely the variables in the factor are associated. These are also called factor-variable correlations. Factor loadings are correlation coefficients between the variables and the factors

Eigen values:

Eigen values measure the variance in all the variables corresponding to the factor. They are calculated by adding the squares of factor loadings of all the variables in a factor. They aid in explaining the importance of the factor with respect to the variables

Communalities: Communalities, denoted by h2, measure the percentage of variance in each variable explained by the factors extracted. This is calculated by adding the squared factor loadings of a variable across the factors. The communality ranges from 0 to 1. A high communality value indicates that the maximum amount of the variance in the variable is explained by the factors extracted from the factor analysis Total variance explained: It is the percentage of total variance of the variables explained. This is calculated by adding all the communality values of each variable and dividing it by the number of variables

Factor Variance Explained:

It is the percentage of total variance of the variables explained by the factors. This is calculated by adding the squared factor loadings of all the variables and dividing it by the number of variables

Procedure followed for Factor Analysis

Prominent methods are centroid method, principal components method, and maximum likelihood method Steps followed in factor analysis are as follows: Define the problem Construct a correlation matrix that measures the relationship between the factors and the variables Select an appropriate factor analysis method Determine the number of factors Rotation of factors Interpret the factors Determine the factor scores

Interpretation of factor analysis

Example: A market researcher at a credit card company wants to evaluate the credit card usage and behavior of customers, using various variables The variables include age, gender, marital status, income level, education, employment status, credit history, and family background Here age, gender, marital status can be combined under a factor called demographic characteristics. The income level, education, employment status can be combined under a factor called socio-economic status, credit history, and family background can be combined under a factor called background status

The factor analysis results obtained by the credit card company

Variable Demographic characteristic s Loadings on factors Socio-economic status Background status H2 communal ity

Gender Marital status Income levels Education Employment Credit history Family background

75 83 21 17 -04 32 14

06 05 78 83 81 12 32

-8 -7 -11 -11 -6 -75 -71

82 80 53 69 57 62 70

Percentage variance explained

Cumulative percentage

of 43%






The Results of the Factor Analysis can be Interpreted in the following ways
Factor loadings:
In the above table we observe that the variables age, gender, marital status have high factor loadings on the first factor compared to the other two factors. Thus we can infer that these two variables are highly correlated and represent an underlying common factor. Thus the analysis of factor loadings can help in interpreting and labeling the factors.

Total variance explained:

It can help in understanding how well the factors are able to summarize the data. In the table the first factor explains 43% of the variation in the data and the second factor explains 10% of the variation in the data. The total variance explained by these three factors is 73%. The remaining 27% variance in the data remains unexplained

Factor scores:
As the variables are grouped into factors, they become new variables. These factors are used for subsequent analysis. The values for each new observation based on these new variables are called factor scores.

Rotation of factor matrix:

Using the process called rotation; the matrix is further simplified to interpret the factors. Rotation helps in developing clearer factor loading patterns, with some variables having high loadings on a particular factor and other variables having a loading nearer to zero. This helps the researcher to interpret the factors in a different way. There are three prominent methods of rotation: orthogonal rotation, varimax rotation, and oblique rotation.

Meaning Of Cluster Analysis

Cluster analysis can be defined as a set of techniques used to classify the objects into relatively homogenous groups called clusters Cluster analysis involves identifying similar objects and grouping them under homogenous groups

Uses Of Cluster Analysis

Cluster analysis is used in business research
for various purposes This technique is widely used in marketing in order to segment the market Cluster analysis can also be used to identify new product ideas by clustering the company products into homogenous groups, and comparing them with the offerings available in the market; this can help identify gaps in the companys product portfolio Cluster analysis can also be used as a data reduction technique

Procedure followed in Cluster Analysis:

1) Defining the problem: We need to first define the problem and decide upon the variables based on which the objects are clustered. 2) Selection of similarity or distance measures: The similarity measure tries to examine the proximity between the objects. There are three major methods to measure the similarity between objects Euclidean distance measures Correlation coefficients Association coefficients

3) Selection of clustering approach: There are two types of clustering approaches Hierarchical clustering approach: It consists of either a top-down approach or a bottom-up approach. Prominent hierarchical clustering methods are: single linkage, complete linkage, average linkage, wards method and centroid method. Non-hierarchical clustering approach: There are three prominent non-hierarchical clustering methods: sequential threshold method, parallel threshold method, and optimizing portioning method.

4) Deciding on the number of clusters to be selected: One way is to decide it intuitively. Another way is to get inputs from the pattern of clusters that a method generates. The researcher can use distance between the objects as the criterion. So a researcher can set a certain distance value and he limits the clustering process to the point where the values exceed that specified value. 5) Interpreting the clusters: It can be done using the centroid. The centroid helps the researcher in explaining the cluster and providing an appropriate label to the cluster.

Multidimensional Scaling
It is defined as a technique that involves representing objects preferences and perceptions as points on a multidimensional space This statistical technique is used to reveal the underlying dimensions based on which consumers perceive that two objects are similar It is commonly used in motivational research

Thanks for your Attention