Anda di halaman 1dari 19

DE MONTFORT UNIVERSITY LEICESTER BUSINESS SCHOOL DEPARTMENT OF ACCOUNTING AND FINANCE

BRIEF NOTES IN THE USE OF SPSS 12 TM

Dr. Panagiotis Andrikopoulos Bosworth House 8.10 Pandrikopoulos@dmu.ac.uk

Brief Notes in the Use of SPSS 12

A. INTRODUCTION These notes are based on experience of SPSS 12. There are numerous functions and activities that you can perform using this statistical utility software. However, we will focus our attention only to the basic/introductory ones, as further examination requires prior knowledge in advance statistics and econometrics. B. DATA ENTRY SPSS has facilities for data entry, but I always enter the data on a spreadsheet first, and then transfer it to SPSS. This gives you a chance to do any preliminary analysis with the spreadsheet. Put column (variable/field) headings in the first row, and then enter data for each case (person, company, etc) on successive rows (well see certain examples later). When you transfer to SPSS tick the box telling the package to pick up variable names from the first row. SPSS is sometimes a bit fussy about what it will accept, so it is a good idea to save the file in an old Excel format (e.g. *.xls). However, the version 12 is able to open even Excel 2000 files. It is preferable to use short variable names - no more than 8 characters with any spaces or punctuation marks. You should code groups or variables using numbers - 1, 2, 3 instead of A, B, C (only necessary for some procedures but usually makes your life easier). If any data is missing leave the cell blank. Do not enter 0..!!! The SPSS will give you the opportunity to exclude these cases if you want from your analysis. Yes/No is best coded as 1 for Yes and 0 for No; then the average of the column will give you the proportion answering yes. C. ANALYSIS: COMMONLY USED METHODS In SPSS 12, after loading your data, you can use the main menu to perform the required statistical or econometric analysis. Analytically, Analyze / Descriptive Statistics You use this function in order to estimate frequencies, means, standard deviations, variances, histograms, etc., for any particular statistical sample or statistical universe.

Copyright@2005 Dr. Panagiotis Andrikopoulos

Brief Notes in the Use of SPSS 12

Analyze / Compare Means / Means You use this facility to compare means (of different groups of cases) and do an ANOVA examination or a simple t-test. The dependent variable is the numerical measurement, and the independent variable is the grouping variable. Take the option of an ANOVA table. (Use a Paired sample t-test if you are comparing the means of two variables for the same cases.)

Copyright@2005 Dr. Panagiotis Andrikopoulos

Brief Notes in the Use of SPSS 12

Analyze / Descriptive statistics / Crosstabs This facility is used for a cross tabulation. Check that Statistics is set to ChiSquare test. For tables with two rows and two columns, select Chi-square to calculate the Pearson chi-square, the Likelihood-ratio chi-square, the Fisher's exact test, and the Yates' corrected chi-square (continuity correction). For 2 2 tables, Fisher's exact test is computed when a table that does not result from missing rows or columns in a larger table has a cell with an expected frequency of less than 5. Yates' corrected chi-square is computed for all other 2 2 tables. For tables with any number of rows and columns, select Chi-square to calculate the Pearson chi-square and the likelihood-ratio chi-square. When both table variables are quantitative, Chi-square yields the linear-by-linear association test. Overall, what you need to remember, is that for all two by two tables, the Fisher exact test gives exact probabilities (significance levels); for larger tables the Pearson Chi-square gives an acceptable approximation. A very important stage in this form of analysis is to make sure that all cell contents are set in a way that you get appropriate percentages. Other estimates that you can get using the Crosstab function are: o o o o The Contingency Coefficient The Phi and Cramers V measure. The Lambda measure of association. The Uncertainty Coefficient.

Reading an advanced text in statistics to understand these measures is highly recommended.

Copyright@2005 Dr. Panagiotis Andrikopoulos

Brief Notes in the Use of SPSS 12

Analyze / Correlate This facility should be used to estimate the correlation coefficiency between two or more variables. The data variables will form a correlation matrix.

You also have the option to select anyone from the following three alternative statistical methodologies. o The Pearson Product Moment Correlation Coefficiency test. This coefficient, often referred to as the sample correlation coefficient, is the most appropriate parametric method for examining relationship between variables when those variables are measured at an interval level. o The Kendalls Tau-b Bivariate correlation Coefficiency measure. o The Spearman Rank Correlation Coefficiency test. This is a non-parametric statistical technique. It only assumes that the data are at an ordinal level of measurement and therefore it requires fewer simplifications concerning the distribution of the data sample. Spearman coefficient converts the sample values into ranks and then these ranks are assigned separately for each variable, X and Y. Finally, this methodology calculates and assigns the ranks mean to any values that are equal. Analyze / Regression / Linear This is the facility for linear regression/association between variables. For multiple regressions, SPSS offers help with choosing which variables to enter in the model (independent or explanatory variables).

Copyright@2005 Dr. Panagiotis Andrikopoulos

Brief Notes in the Use of SPSS 12

From the five different data-introduction methodologies, the Stepwise selection should be considered as the most commonly used in practice. According to this methodology, at each step, the independent variable not in the equation, which has the smallest probability of F, is entered, if that probability is sufficiently small. Variables already in the regression equation are removed if their probability of F becomes sufficiently large. The method terminates when no more variables are eligible for inclusion or removal. This is a cautious approach in introducing variables into the multiplefactor model. To perform this procedure, click the following: Method / Stepwise and stick with the default entries.

In all cases, you will need to explore the statistics and options available. The online help of SPSS is useful, but tends to be a bit brief. There should also be a wide selection of books and manuals available in the library - check the keyword SPSS. These will explain the statistics and the rationale behind them as well as how to use this particular software. One good, introductory, manual is: Norusis, M. J. (2000) SPSS10 guide to data analysis. Chicago: SPSS Inc. You should be able to print your results easily. To save results as a file and import it into a word-processor or spreadsheet, use File / Export, or just simply select the item / table you want to transfer and do a Copy / Paste.

Copyright@2005 Dr. Panagiotis Andrikopoulos

Brief Notes in the Use of SPSS 12

D. ANALYSIS: LESS COMMON METHODS Analyze / Non-parametric tests Non-parametric techniques are based on the assumption that the data sample or statistical universe is not normally distributed (this is often the case with large sample sizes). One method to test this hypothesis is with the use of the One Sample Kolmogorov-Smirnov Test. Using this methodology you can test data for possible conformity to all Normal, Uniform, Poisson and Exponential Distributions. If data are found to reject the normality hypothesis, then we could proceed further with the use of other non-parametric techniques, such as testing for randomness using a Run Test. For these types of tests, all data have to be numerical. You then specify the cutoff and the SPSS will count runs of numbers above and below the cutoff (0 will often be appropriate for the cutoff).

Even, this methodology seems easy and straightforward, it is not!!! There are specific assumptions that have to be made strictly related to the nature of the data examined and in most cases the researcher have to be able to understand the statistical properties of them. For example, if you test the hypothesis of the random behaviour of stock market prices/returns (EMH), a mean/median/mode cutoff point of a simple run test is not directly relevant. In this case, it is most preferable to use the Wald-Wolfowitz run test.

Copyright@2005 Dr. Panagiotis Andrikopoulos

Brief Notes in the Use of SPSS 12

This is a more general test, which aims to detect differences in both the locations and the shapes of the data distributions. The Wald-Wolfowitz runs test combines and ranks the observations from both groups of data. If the two samples are from the same population, the two groups should be randomly scattered throughout the ranking; otherwise the null hypothesis of price observations being random should be rejected. Graphs / Time series / Autocorrelation. This facility should be used only for time-series data. It allows you to plot the autocorrelation function and the partial autocorrelation function of one or more series. As most researchers in the field of theoretical finance and investments, use large time-series data, the problem of autocorrelation is very common. What exactly we mean this term is the correlation between the error terms arising in a time series data. This problem is also called serial correlation and mathematically can be defined as the correlation of the error term ut at time period t with the error terms u t +1 , u t + 2 .and u t 1 , u t 2 .and so on. Such correlations in the error terms often arise from the correlation of the omitted variables that the error term captures.

To graph these autocorrelations you follow the above procedure. However, to identify the problem of autocorrelation in the first place in a multiple regression model, you have to use the Durbin-Watson test statistic. This can be found in the Statistics section of the linear regression analysis.
Copyright@2005 Dr. Panagiotis Andrikopoulos

Brief Notes in the Use of SPSS 12

A first-order autocorrelation should range between (0.0 < d > 4.0) with somewhere between 2.0 indicating no autocorrelation at all (d0.0 is for positive and d4.0 for negative).

Analyze / Classify / Discriminant This facility is used for discriminant analysis. Discriminant Analysis. This analysis is useful for situations where you want to build a predictive model of group membership based on observed characteristics of each case.

Copyright@2005 Dr. Panagiotis Andrikopoulos

Brief Notes in the Use of SPSS 12

The procedure generates a discriminant function (or, for more than two groups, a set of discriminant functions) based on linear combinations of the predictor variables that provide the best discrimination between the groups. The functions are generated from a sample of cases for which group membership is known; the functions can then be applied to new cases with measurements for the predictor variables but unknown group membership. Always bear in mind that the grouping variable can have more than two values. The codes for the grouping variable must be integers and you need to specify their minimum and maximum values. Cases with values outside of these bounds are excluded from the analysis. An example of a Discriminant analysis: On average, people in temperate zone countries consume more calories per day than those in the tropics, and a greater proportion of the people in the temperate zones are city dwellers. A researcher wants to combine this information in a function to determine how well an individual can discriminate between the two groups of countries. The researcher also thinks that population size and economic information may also be important. Discriminant analysis allows you to estimate coefficients of the linear discriminant function, which looks like the right side of a multiple linear regression equation. That is, using coefficients a, b, c, and d, the function is: D = a * Climate + b * Urban + c * Population + d * GDP per capita If these variables are useful for discriminating between the two climate zones, the values of D will differ for the temperate and tropic countries. If you use a stepwise variable selection method, you may find that you do not need to include all four variables in the function. Relevant Statistics that can also be estimated: For each variable: Means, standard deviations, univariate ANOVA. For each analysis: Box's M, within-groups correlation matrix, within-groups covariance matrix, separate-groups covariance matrix, total covariance matrix. For each canonical discriminant function: Eigen-value, percentage of variance, canonical correlation, Wilks' lambda, Chi-square. For each step: Prior probabilities, Fisher's function coefficients, unstandardized function coefficients, Wilks' lambda for each canonical function. E. TEST OF NULL HYPOTHESES By the end of this short-session you should be able to: o Understand the rationale of tests of null hypotheses and the meaning of significance levels (p-values). o Be able to use the Compare means and Crosstabs routines on SPSS and interpret the results.

Copyright@2005 Dr. Panagiotis Andrikopoulos

Brief Notes in the Use of SPSS 12

1. Introduction Imagine you've got some data, which seems to indicate that: o Women are cleverer than men, or that o Companies in one sector are more profitable than companies in another; or that o People who go out jogging are less likely to die than those that don't (in the news recently - based on a large sample in Denmark); or that o Men are better at throwing than women (in the news recently - based on, I think, a sample of 27 people). In each of the above cases, the data come from samples; therefore, it is possible that the result may not hold for the population as a whole. With another sample, the answer may be different. Null hypothesis tests, are a way of seeing if you can reasonably rule out sampling error as an explanation and conclude that the result does hold in a more general context than the sample you have actually studied. There are many null hypothesis tests in statistics texts e.g. ANOVA (analysis of variance), t-test, chi square test, Mann-Whitney U test, runs test, and so on. The steps in all of them are identical. The only differences lie in how you do steps 3 and 4 below. Step 1 - State the null hypothesis (Ho). This is the hypothesis that there is no difference between the groups, or no relationship between variables, or that any relationship observed is just due to chance. It is a hypothesis, which is set up to be shot down. The hypothesis you want to demonstrate is the alternative hypothesis (H): i.e. that there is a difference or a relationship. (Sometimes the alternative hypothesis is one sided - e.g. you may just be interested in the possibility that women are cleverer than men). Step 2 - Get the data. This will normally come from random samples. Step 3 - Check if the data supports the hypothesis you want to demonstrate E.g. check that joggers are less likely to die. This is usually a matter of finding the difference of two means or proportions or a correlation coefficient. In particular, check that the difference is large enough to be interesting: if it is not, don't bother with Step 4. Step 4 - Estimate the probability of obtaining results as extreme or more extreme as those obtained if the null hypothesis is true. This probability is called the significance level or p-value. It provides a rough measure of the plausibility of the null hypothesis; Low p-values indicate that the null hypothesis is not plausible, so there is good evidence that the alternative hypothesis is true.

Copyright@2005 Dr. Panagiotis Andrikopoulos

10

Brief Notes in the Use of SPSS 12

Conventionally, a cutoff value of 5% is taken, with values below this indicating a statistically significant result - i.e. the evidence signifies that the alternative hypothesis is true. If the alternative hypothesis is one-sided then "as extreme or more extreme" is interpreted in terms of one direction only, and the test is a "one-tailed" test. Most tests are two tailed - i.e. extremes in both directions are included. The p-value can be estimated by probability theory (as packaged in a standard test such as the t-test), or by computer simulation. 2. Choosing a Suitable Test In practice this is not easy and you may need to read further or ask for help. The two tests you will need for the examples below are: o Analysis of variance - this is the test when you are comparing the means of two or more subgroups. o Chi square test - this has various uses including a comparison of proportions. (The Fisher Exact test is used instead of chi-square in some circumstances. SPSS will decide when these circumstances apply.) Typical errors that most researchers can make while doing a null hypothesis test are: o If they conclude the result is significant and accept the alternative hypothesis, you may be wrong - the null hypothesis may be true and your result may be due to chance. This error can be avoided by taking a small significance level. o If the result is not significant they should not draw any definite conclusions except that the result is "not proven". This is unhelpful, but you can't be wrong. Things to check for when doing a null hypothesis test: o The samples should be randomly drawn from a more general population or process - or at least this should be a reasonable assumption. Your conclusions will refer to this more general population or process. o The size of the effect found should be large enough to be interesting. The p-value does not tell you how large or important the effect is, just whether chance can be ruled out as the explanation. (The word "significant" is potentially misleading: it does not mean important when used in its statistical sense.) o The method of calculating the probability (Step 4) will depend on assumptions you want to check. These will always include randomness. The assumptions are less likely to be a problem with a simulation method. For example, the additional assumptions behind the ANOVA procedure are that the distribution of values in each group should be roughly normal, and that the variance (or standard deviation) of each of the groups should be roughly the same.

Copyright@2005 Dr. Panagiotis Andrikopoulos

11

Brief Notes in the Use of SPSS 12

The second assumption is more crucial than the first. If these assumptions are not met, an alternative test for comparing two groups is the non-parametric MannWhitney U test. 3. Exercises (In each case you will need to do steps 1, 3 and 4 above) a) The data below comes from two groups, 1 and 2. What is the difference between the means of the measurements from the two groups? Is this difference statistically significant? What does this indicate (you must state the null hypotheses to answer this)? Group 1 Group 2 7 1 5 4 6 3 3 4 4 (Use Compare Means and take the options of One-Sample t-test and ANOVA.) b) The data below shows an identical pattern to the data above but the sample sizes are twice as big. Answer the same questions, and explain the differences in the answer. Group 1 Group 2 7 1 5 4 6 3 3 4 4 1 7 4 5 3 6 4 3 4 c) Another two examples of the use of null hypothesis tests are: o The results below (McGoldrick & Greenland, 1992) come from a survey on the service offered by banks and building societies: Aspect of service Sympathetic / Understanding Helpful / Friendly staff Not too pushy Time for decisions Confidentiality of details Branch manager available Banks' mean rating 6.046 6.495 6.397 6.734 7.834 5.928 Building Society's mean rating 6.389 6.978 6.644 6.865 7.778 6.097 L.O.S. (p) 0.000 0.000 0.003 0.028 NS 0.090

Copyright@2005 Dr. Panagiotis Andrikopoulos

12

Brief Notes in the Use of SPSS 12

The data was obtained from a sample of customers who rated each institution on a scale ranging from 1 (very bad) to 9 (very good.). The above six dimensions are a selection from the 22 reported in the paper. The p-values in the final column of the table give the estimated probability of obtaining the results, which were actually observed - or more extreme ones - if there is really no difference between banks and building societies. (NS means not significant - which in this table means that the p value is greater than 0.1.) o Analysis of some questionnaire data: Top left of data spreadsheet ID 1 2 3 4 5 GP 11 Q1 1 2 5 4 3 Q2 4 1 4 5 2 Q3 4 3 4 5 4 Q4 4 1 4 6 2 Q5 2 4 3 6 4 Q6 4 Q7 3 4 5 5 3 Q8 1 4 3 5 2

(Note: leaving the cell blank indicates missing data.) ANOVA Table Source Between Groups Within Groups Total D.F. 10 43 53 Sum of Squares 49.3789 88.2693 137.6481 Mean Squares 4.9379 2.0528 F Ratio 2.4055 F (p.) 0.0227

Group Grp 01 Grp 02 Grp 03 Grp 04 Grp 05 Grp 06 Grp 07 Grp 08 Grp 09 Grp10 Grp11 Total

Count 3 3 5 3 3 11 8 4 7 4 3 54

Mean 6.333 4.333 3.400 4.333 3.000 3.636 4.500 3.000 2.142 3.500 3.666 3.685

95% Confidence Interval for Mean 4.8991 to 7.7676 2.8991 to 5.7676 0.5415 to 6.2585 -0.8379 to 9.5045 -1.9683 to 7.9683 2.7722 to 4.5005 3.5008 to 5.4992 0.0949 to 5.9051 1.3107 to 2.9750 1.9088 to 5.0912 -0.1280 to 7.4613 3.2453 to 4.1251

Minimum 6.0000 4.0000 1.0000 2.0000 1.0000 2.0000 2.0000 1.0000 1.0000 2.0000 2.0000 1.0000

Maximum 7.0000 5.0000 6.0000 6.0000 5.0000 6.0000 6.0000 5.0000 3.0000 4.0000 5.0000 7.0000

Copyright@2005 Dr. Panagiotis Andrikopoulos

13

Brief Notes in the Use of SPSS 12

F. REGRESSION AND CORRELATION ANALYSES 1. Simple Linear Regression: Some Theory Assuming we have two variables y and x with the former one being a dependent variable while the latter one an independent one. The relationship between y and x is denoted by: y = f(x). This relationship can be either: o Deterministic or mathematical relationship, or o Probabilistic. This relationship does not give unique values of y for given values of x but can be described exactly in probabilistic terms. A deterministic relationship between two variables is outside the scope of these notes. Students should be able to understand how this function works from their prior studies in basic algebra. So, we will focus only in the probabilistic relationship between these two variables. On order to express this probabilistic relationship, we introduce the error term u, which is nothing more than the probability of something happening or not. The ideal situation is that this error term is normally distributed N(0,1) so as the relationship. However, that is not always the case, as we will see later in the notes. A simple linear regression model is nothing more than a stochastic relationship between a dependent (y) and an independent variable (x). Equation of a Straight Line: y = + x + u where, y is the dependent variable, x is the independent or explanatory variable, is the regression intercept, b is the measure of the slope or gradient of the line and u is the error term or disturbance. However, in real life relationships between two variables are often imprecise. Therefore, the best way to replicate a two-variable relationship is by employing the best-fitting regression line of yc = + x where, yc stands for the value of variable y computed from the relationship for a given value of the variable x. The slope coefficient is referred to as a regression coefficient. As actual values will often deviate from the best fitting regression line then a method to estimate these deviations will need to be employed. These deviations between the actual and computed y values are also referred to as residuals of errors and are measured by: ei = yi yc. In order to increase relevance of the analysis of these deviations we have to make sure that the best fitting line can be obtained by ensuring that the sum of the squared

Copyright@2005 Dr. Panagiotis Andrikopoulos

14

Brief Notes in the Use of SPSS 12

deviations is as small as possible. This is the principle of the Ordinary Least Squares method where, 2 Minimise ( y i y c ) Minimise ei2 Therefore, on the basis of the above method we can then calculate and using the formulae: xi * y i n * x * y , and, = xi2 n * x 2
= y *x

Most of the problems in real life, require the testing of various hypotheses, e.g. that the slope of the regression line is either downward or upward and therefore, x can be used to predict the value of the y. Therefore, the principles of hypothesis testing will have to be applied also in the case or regression analysis. So, lets take a notional hypothesis as an example: Ho : B = 0 H : B 0, B>0 or B<0 Next step will be to estimate the deviation of the standard error of the sample statistic , or

SE =

()

2 i

a y i xi y i n2
2 i

nx 2

where, the nominator denotes the standard error of the regression .


If we have appropriate population information we use the z-test statistic, otherwise the t-test statistic. As, the latter test statistic assumes that we do not know the population parameters; it is more commonly used in practice. The formula is:

t=

B SE

()

This has a t-distribution with (n-2) degrees of freedom (d.f). We can then compare this value with the critical value of t from the t-tables for the required level of significance, . If the calculated value of t falls outside the critical limits t (one tailed test) or t /2 (two-tailed test) we accept the alternative hypothesis and conclude that B>0 or B<0 (one tailed test), B 0 (two tailed test).

Copyright@2005 Dr. Panagiotis Andrikopoulos

15

Brief Notes in the Use of SPSS 12

2. Assumptions behind linear regression models The basic OLS formulae should always find the best values of the coefficients in the least squares sense. However, to be useful in practice, any researcher should be able to check the following: o The relationship is approximately linear. If this is not so, it should be obviously from the data, or from common sense. If the relationship is not linear, it may be possible to transform the data (e.g. using the logarithms of one of the variables), or to fit a non-linear model. The next two assumptions are important if estimates of significance levels and confidence intervals are to be reliable.

o The error terms (also known as residuals, u i ) are assumed to have a roughly constant variance or constant standard deviation. This should be obvious from the diagram. If the errors are greater for high values of x, for example, this assumption would not hold (the term heteroskedasticity is used to describe this situation).
o There are no obvious correlations among the errors (residuals). For example, if each error has a tendency to be similar to the last, there will be an autocorrelation with a lag of 1 i.e. a serial correlation. This is a particularly likely with time series data. Again, any striking patterns should be obvious from the diagram. The presence of an autocorrelation may indicate that a different sort of model may be appropriate. The final assumption applies to multiple regressions only. o There should be no large correlations between the independent variables. There is no definite rule for deciding how large the cutoff is; judgment is needed. The difficulty with large correlations is that the variables in question become confused, and the coefficients become unstable and unreliable. 3. Correlation analysis: Some Theory o Measures the strength of a relationship between two variables. o Does not imply causation. Nevertheless, there must be some theoretical justification for believing that there is a meaningful relationship between the two variables. Otherwise, an apparently significant degree of correlation may simply represent spurious correlation. This relationship is measured with the symbol r, which is called correlation coefficient. The limits within the r should range are -1 to +1 with the former denoting a perfectly negative correlation coefficient while the latter one a perfectly positive one.

Copyright@2005 Dr. Panagiotis Andrikopoulos

16

Brief Notes in the Use of SPSS 12

n 1 where sx is the sx s y sample standard deviation of x values and sy is the sample standard deviation of y values.

(x

x )( y i y )

The correlation coefficient, r can be defined as: r =

As we saw earlier in the SPSS facilities, there are numerous versions of formulae or calculating correlation coefficients. Nevertheless, the two most popular ones are the Pearsons product-moment correlation coefficient and the Spearman Rank correlation coefficient. The latter one is a non-parametric technique and therefore it can be used in cases where the data examined are not normally distributed or variables y and x are expressed in rank order form. Nevertheless, assuming that the data follow a normal distribution we can then use the former version using the following formula: r=

[ n x

n xi y i xi y i
2 i

( xi ) 2

][ n y

2 i

( y i ) 2

In order to test the significance of the correlation coefficient r, we need to examine the difference between the value of r calculated from the sample data and the hypothesized population coefficient parameter . Thus, similarly with the regression coefficient, , we need to determine the probability of this difference occurring and whether or not this difference is statistically significant. The test-statistic is calculated as follows:

t=

r 1 r2 n2

By following the hypothesis testing procedure we can check possible significance. In order to do so, we need to compute both the standard error and the t-test statistic using the following formulae:

1 r2 Standard error: s r = n2
Test Statistic: Critical t= -t with n-2df, and, Actual: t =

r sr

Now, I think is about time to proceed into more advanced and challenging testing.

Copyright@2005 Dr. Panagiotis Andrikopoulos

17

Brief Notes in the Use of SPSS 12

4. Advanced Examples CASE A Open the SPSS file: Example 1 - Contrarian Profit Results. Test the following two hypotheses: 1. Value portfolios annual returns are larger than growth portfolio annual returns. 2. Most return measures are highly correlated with each other. Are all differences statistically significant? Comment on the results. CASE B Open the Excel file: Example 2 Beta Differences D10 - D1 Test the following two hypotheses: 1. The systematic risk of value portfolios is higher than Growth Portfolios. 2. There is a partial correlation on the various beta estimates for the different portfolio classifications. CASE C Import in SPSS the following Excel file: Example 3 Stock Markets 1. Formulate the hypothesis that the DAX indexs daily returns are related to the returns of the DJIA. 2. Run a regression model for these two indices and test your hypothesis. Check for possible autocorrelation in the sample data. 3. Form a multiple regression model assuming that the movement of all other three indices can explain FTSE-100 returns. 4. Check for partial correlations and multicolinearity. 5. Explain your results. 6. What is the value of R2. What does this mean? 7. Check the assumptions above. Are they reasonable for your data? 8. Look at the significance test results. What are the null hypotheses? What do they mean? 9. Look at the confidence interval for the x-coefficient. What does this mean? Will a 50% confidence interval be wider or narrower? G. REFERENCES Norusis, M. J. (2000) SPSS10 guide to data analysis. Chicago: SPSS Inc. Maddala, G.S. (2001) Introduction to Econometrics Third Edition. Chichester: Wiley Flemings, M.C. and Nellis, J.G. (2000) Principles of Applied Statistics. London: Thomson Learning.

Copyright@2005 Dr. Panagiotis Andrikopoulos

18

Anda mungkin juga menyukai