Anda di halaman 1dari 2

Homework of Classification and Regression

May 1, 2018

1 Dataset
This task should be answered using the Weekly data set. This data contains 1089 weekly returns
from for the S&P 500 stock index for 21 years, the beginning of 1990 to the end of 2010. A data
frame with 1089 observations on the following 10 variables.

• Id: unique integer corresponding to the number of the week.

• Year: The year that the observation was recorded.

• Lag1: Percentage return for previous week.

• Lag2: Percentage return for 2 weeks previous.

• Lag3: Percentage return for 3 weeks previous.

• Lag4: Percentage return for 4 weeks previous.

• Lag5: Percentage return for 5 weeks previous.

• Volume: Volume of shares traded (average number of daily shares traded in billions).

• Today: Percentage return for this week.

• Direction: A factor with levels Down and Up indicating whether the market had a positive or
negative return on a given week.

Source: Raw values of the S&P 500 were obtained from Yahoo Finance and then converted to
percentages and lagged.

1.1 Questions
1. Produce some numerical and graphical summaries of the Weekly data. Do there appear to be
any patterns?

2. Use the full data set to perform a regression for the quantitative variables as a function of the
week as a time series. Use the KNN method to solve this question.

1
3. Search and study about the Ridge Regression method ( sklearn.linear model.Ridge). Write
down a report about this method including its pros and cons. Now split the data into train
and test subsets and compare the resulting models to the KNN method. Use this method to
perform a regression as in the previous question. Estimate the test error for different train-test
ratios and create an error vs. train percentage.

4. Use the full data set to perform a logistic regression with Direction as the target and the five
lag variables plus Volume as predictors. Do any of the predictors appear to be statistically
significant? If so, which ones?

5. Search and study about the support vector classification method (sklearn.svm.SVC). Write
down a report about this method including its pros and cons. Split the data into train and
test subsets, use the SVC method to generate models of classification and Compare it to
the logistic regression method results. Use this method to perform a classification as in the
previous question. Create a density plot of the probability the target ”Direction” including
a scatter plot for the data. Estimate the test error for different train-test ratios and create an
error vs. train percentage.

6. Write down a report with all the results and its analysis.

Anda mungkin juga menyukai