© All Rights Reserved

1 tayangan

© All Rights Reserved

- Master Time Factor trading
- Sem.org IMAC XIII 13th 13-10-6 Practical Rotordynamics Modeling Using Combined Measured Theoretical
- Multivarite Deep Learning for Ice of Power Lines
- R1Epstein
- DIVA Ex 2 Peanuts
- 3. Management-Activity Prediction for Differently-Mouneshachari S
- Searching for Google's Value
- Zone of Tolerance
- Futures Volume 2 Issue 3 1970 [Doi 10.1016%2F0016-3287%2870%2990031-5] I.F. Clarke -- The Pattern of Prediction 1763–1973- H.G. Wells- Preacher and Prophet
- BTP Stage 2 Final
- KM Slides Ch12
- Chapter 2 Rice Add
- [TCC-002-TA] GSM_Frequency_Planning_Process.pdf
- k-sink or float lesson 1
- SPE-170669-MS
- Predicting Chest Press Strength From a 4RM Triceps Brachii Exercise in Trained Women
- 02v206g
- final-report.pdf
- 9.pdf
- MIPRO2018 Martin Modified

Anda di halaman 1dari 4

Edvin Aspelin

Abstract— This project aims to explore the field of stock mar- Making predictions of the future in stock markets could

ket predictions using deep neural networks. More specifically, be expressed as associating observations of its previous price

with LSTM networks using Monte Carlo dropouts to model its trends with future yield. Somehow the network will need to

prediction uncertainty.

capture these trends in time and make appropriate predictions

Previously, exponentially moving average has shown to be

very capable of predicting stock prices from day to day. But

from it. Hence, the central work point will be around an

the aim is now to do short term predictions, where the time LSTM-based network. While there is no attempt to say that

span of the prediction is several days. The performance of the this is an obvious choice, it has proven to very effective in

network will be evaluated with different time horizons. Where processing sequential data.

both the mean and variance of a prediction are illustrated.

Moreover, it will go through the details regarding imple-

menting such a network and improving upon it. However, the II. BACKGROUND THEORY

network is limited by using a dataset with only volume data of

stock trades each day. A. Monte Carlo Dropout

As with Monte Carlo techniques in reinforcement learning

(RL), the goal of the method is to create unique scenarios

I. INTRODUCTION

each time a sequence is run through. Compared to RL, one

The ability to reliably predict the stock market is obviously implementation of the same behaviour in networks with for

a rewarding concept, as the result would be rather profitable. example LSTM layers is introducing dropouts. Generally

But no matter how tempting that implementation may be, dropouts are used to inhibit overfitting. The dropout layer

this task turns out to be a very difficult one. As one might is then used during training in an attempt to make the

imagine, there has been a lot of tries. For example in [1], expressiveness of the network more general.

where they with some success are able to catch the stock

behaviour with one-day ahead predictions using various ma-

chine learning techniques. But generally, predicting further

into the future has not proven to be reliable yet.

An interesting note is that most of the proposed networks

in current articles have a deterministic approach. Where the

prediction outputs a value, and the model is evaluated with

its accuracy or loss. This method implies using for example

mean square error loss to determine the performance when

training. However, if the network would to be used on a stock

Fig. 1. Illustration of how dropouts affect a classic network with nodes,

market, there is no way of telling the loss of the current where blacked out nodes are dropped out. Network to the right without

prediction, or risk of an investment. And the key problem dropout and to the left with dropout.

with using such a method is that there is not really a way of

determining the accuracy on real-time data, where the answer Classically, the approach shown in Fig. 1 are used when

is unknown. training. The difference with Monte Carlo dropouts are that

Risk is an important factor in stock market trading. It is an the same behaviour also applies to predictions. It then follows

indication of how uncertain an investment is. In other words, that each prediction will produce a different result, thus one

there is a want of knowing how certain our predictions are. can extract a mean and variance. Suppose a trained network

Behaviour of stock markets tends to be stochastic, and has as a generic function, y(x) = f (x). If the function is stochastic

often a lot of uncorrelated behaviour in respect of solely its and we fixate the input x, the properties of the output y can

previous values. If we were to succumb to the fact that we do be analysed like following.

not have the tools to make reliable stock market predictions,

can we instead measure the uncertainty of it? Implementing 1 N

Monte Carlo dropout in prediction leads the model to inherit E(y) ≈ ∑ ŷn (1)

N n=1

Bayesian properties[2] , and satisfying the want of having a

way to measure uncertainty. Thus, creating a tool to analyse ∑Nn=1 (yn − E(y))

Var(y) ≈ (2)

the prediction with a Bayesian approach. N −1

With Eq. (1) and (2) in combination with the law of distribution and the prediction. SE is a measurement of how

large numbers, the mean and variance will converge as N many standard deviations the prediction is from the truth. To

increases. summarise, the loss function in training will be MSE and

It is not obvious that this is a Gaussian process. However, when evaluating the predicting SE is used.

the proof of it is beyond the scope of this project and is cov-

ered by [2]. But the important conclusion from the derivation

is that averaging forward passes through the network is in

fact equal to Monte Carlo integration over a Gaussian process

posterior estimate.

B. LSTM networks

A major shortcoming in traditional neural networks is

the inability to capture previous events in a sequence to

Fig. 3. An example of a Gaussian distribution with standard deviations

make better predictions. Long Short Term Memory (LSTM) and percentiles illustrated.

networks tries to solve this problem by introducing a cell

state, where the LSTM layer has the ability to read and From Fig. 4 we can conclude that minimising the SE

write to this cell state. The cell state is forwarded during would express the error of our prediction in terms of how

the whole sequence, hence why information easily can be the distribution looks like. In other words, a bad prediction

handed forward through the sequence. will be measured differently depending on the uncertainty of

the prediction.

The SE, σe , is calculated with the prediction ŷ and its

variance µŷ , compared to the ground truth y.

p

(ŷ − y)2

σe = (4)

σŷ

Eq. 4 will then express the performance for the final

prediction test on the trained network.

B. Data Preprocessing

Fig. 2. A single cell in an LSTM network. Note that there are variations

to this, but it is the structure used in this report. When training the model, the dataset called Huge Stock

Market Dataset[3] is used. The dataset is available at Kaggle

As illustrated in Fig. 2, the cell state c is passed forward free of charge. It contains information about stock prices

through the sequence of several LSTM cells. Applying this from the very beginning of NYSE, NASDAQ, and NYSE

structure to stock market prices would mean that the network MKT. Data stored are limited to {Date, Open, High, Low,

more easily could capture price behaviours through time, and Close, Volume, OpenInt}, an important note is that the price

potentially getting a better prediction. is adjusted for dividends and splits. Training data for the

network is taken from the closing price each day and fed as

III. METHOD a time sequence.

A. Performance Evaluation

When evaluating a network, one generally use classic loss

functions like cross entropy, mean square error and so on.

The model should learn to minimise the absolute value of the Fig. 4. Illustration of how the data from the major time sequence is split

euclidean distance between the true value and the prediction. up in x and y data for the network to train on.

Thus the model is set to learn minimising the Mean Square

Error (MSE) between prediction and ground truth, as it is A key component when training networks are data prepro-

simply the squared euclidean distance. However, the output cessing. Without it the network might learn a different task

of the model in prediction will be a Gaussian distribution. To than intended. Stock prices can differ vastly just between a

evaluate the networks ability to capture the distribution, it is couple of different stocks, and the only reasonable way to

no longer sufficient to solely calculate the MSE. Instead, the measure the yield or loss is by analysing it with percentages.

measurement for the networks performance in prediction will Predicting stock markets forces some kind of generalisation

be to minimise the Standard Error (SE) between the posterior of the data input to not mislearn from stock prices that are

different from the one to predict. This can be applied in Fig. 5 shows that the models’ performance converges with

various ways. Suppose input sequence x, and normalised sufficient number of training epochs. An explanation of this

input x̃. could be the lack of input data. Currently, the input is a 2-

dimensional vector with time and price. If more information

x were to be fed to the deeper networks, it may put the

x̃ = (5)

|x|∞ increased expressiveness to use.

y

ỹ = (6)

|x|∞

of differing stock prices would be solved. Note that the

normalisation is done at each input sequence, not the stocks D. Model Prediction Uncertainty

entire price sequence. Avoiding the problem of learning

differently from stocks that have a lot of increase or decrease.

As stated previously the mean and variance are extracted

However, this is not completely true. The same problem still

from the following equations.

persist, just in a smaller time frame. But the small bias it

produces in the input is then considered negligible.

1 N

E(y) ≈ ∑ ŷn

To find an optimal design of the network, several models N n=1

were designed and tested. All with only LSTM and dropout ∑Nn=1 (yn − E(y))

layers, but with various layer widths and depths. There Var(y) ≈

N −1

was some experiments done having varying layer width

throughout the network, but it was concluded that the best

result was achieved when having roughly the same width in

all layers. Training our network, in a classical manner, yields the

trained parameters used in the prediction. Then using Monte

TABLE I

Carlo dropouts in the prediction and iterating over the

M ODEL WITH BEST PERFORMANCE .

prediction gives the mean and variance. To make sure that

we got the correct mean and covariance, we iterated until the

Model No. of layers No. of nodes in each layer values converged (≈ 1000 times).

Best model 2 512

In all models there was only one dropout layer, which was

applied on the last LSTM layer. When trying to find the best

model, deeper models was also experimented with. But the

depth did not increase performance noticeably beyond the

point of the best model, and only increased the time to train

it.

Fig. 6. Using the network to predict three days in the future. Illustrating

how the stock price compares to the prediction, with both mean and 3σ -lines

to show how it aligns with the distribution.

to have some bias. During several tests the same behaviour,

Fig. 5. Training loss with various models over epochs. as seen Fig. 6, was persistent and no solution to it was found.

ratios and debt ratios are all important information for the

investor. So why would it not be for the network? The

short answer is that it probably would be beneficial for

the network, as more information which is correlated with

the prediction generally improves performance. The catch is

that none of the datasets available free of charge includes

such data. It would be interesting to see how the network

would analyse a more complex input, and to see what figures

actually matters the most.

It is important to note that the uncertainty of the prediction

is not measured in terms of true uncertainty, or variance, of a

stock price. As getting the true uncertainty would mean that

we in hindsight would be able to tell how a stock price has

aligned with its distribution. We are not saying that there

Fig. 7. Predicting stock price in a single given point in time, with its is no way of approximating this, but that is not what the

prediction horizon increasing. network does. Output of the network will tell how it believes

the price will develop as well as its own uncertainty about the

Now predicting from a single point in time, with an prediction. If the implementation of the network with Monte

increasing time horizon for the prediction. In Fig. 7 the Carlo is just right, the uncertainty may very well approximate

prediction horizon goes from 1 to 10 days in the future. the true distribution of the stock price, but we have no way

When looking carefully, the variance actually behaves as one of verifying it with the tools used in this project.

would expect. It increases through time. Often in these cases

the prediction was actually better in the end of then in the V. CONCLUSIONS

middle. For example, compare the prediction at day 5 and

day 10. This could be explained by the stock markets natural In disregard of stock market predictions, introducing un-

movement is a slightly increasing line which the stock market certainty in a network is an interesting take on the classical

oscillates around, and our network just happen to catch that deterministic approach. In some cases one can already re-

in the given time frame. trieve the networks probabilities from the softmax layer, in

for example a simple classification network. But Gaussian

processes have some interesting properties to them, and

IV. DISCUSSION many methods assume the distribution as just that. Hence,

The result from predicting three days into the future seen applying a Bayesian approach might open up doors to further

in Fig. 6 may resemble the correct price rather well. But improving our networks.

keep in mind that the error from day to day often is around

1$ out of a total ∼ 65$, which is roughly an error 1.5%! R EFERENCES

And the mean error is around 1%. We concluded quite early

[1] Hegazy, Osman Soliman, Omar S. Abdul Salam, Mustafa. (2013). A

that stock market predictions was going to be unreliable. Machine Learning Model for Stock Market Prediction. International

Predicting a stock price with a mean square error of Journal of Computer Science and Telecommunications.

[2] Gal, Yarin Ghahramani, Zoubin. (2015). What My

roughly 1% in just one day is not a very good prediction. A Deep Model Doesn’t Know... [Blog] Yarin. Available at:

stock is nearly expected to move by that in a one day pre- http://mlg.eng.cam.ac.uk/yarin/blog 3d801aa532c1ce.html [Accessed

diction. However, it was very interesting to look at the error 27 Oct. 2018].

[3] B. Marjanovic, Huge Stock Market Dataset. Kaggle.

in standard deviations instead, giving it another meaning. 2017. Accessed on: 10th Oktober, 2018. Available:

We could then conclude that the network actually was quite https://www.kaggle.com/borismarjanovic/price-volume-data-for-

uncertain of its predictions. With the 3σ -lines bounding all all-us-stocks-etfs

true stock market prices. Even though the poor estimation,

there is a great strength in knowing the uncertainty of the

network. Another approach to introduce a similar behaviour

could be to use a Bayesian Neural Network (BNN), which

introduces weight uncertainty.

It was also quite difficult to improve upon the model. The

best model was a quite simple one with only two LSTM

layers, and other attempts to make it better did not really

have a take.

Further improvements to the network would definitely be

increasing the information fed to the network. In reality, an

investor looks at vast amount of data to assess a companies

value or future yield. Key figures like solvency ratio, liquidity

- Master Time Factor tradingDiunggah olehpaparock34
- Sem.org IMAC XIII 13th 13-10-6 Practical Rotordynamics Modeling Using Combined Measured TheoreticalDiunggah olehMario Tirabassi
- Multivarite Deep Learning for Ice of Power LinesDiunggah olehJoe Reed
- R1EpsteinDiunggah olehBuried_Child
- DIVA Ex 2 PeanutsDiunggah olehAsfandeyar Niazi
- 3. Management-Activity Prediction for Differently-Mouneshachari SDiunggah olehBESTJournals
- Searching for Google's ValueDiunggah olehvahlsten1
- Zone of ToleranceDiunggah olehParul Jain
- Futures Volume 2 Issue 3 1970 [Doi 10.1016%2F0016-3287%2870%2990031-5] I.F. Clarke -- The Pattern of Prediction 1763–1973- H.G. Wells- Preacher and ProphetDiunggah olehManticora Venerabilis
- BTP Stage 2 FinalDiunggah olehPradeep Kundu
- KM Slides Ch12Diunggah olehNoel Estrada
- Chapter 2 Rice AddDiunggah olehmaele2
- [TCC-002-TA] GSM_Frequency_Planning_Process.pdfDiunggah olehJunaidMalik
- k-sink or float lesson 1Diunggah olehapi-247304162
- SPE-170669-MSDiunggah olehvladimirch86
- Predicting Chest Press Strength From a 4RM Triceps Brachii Exercise in Trained WomenDiunggah olehJorge Sato
- 02v206gDiunggah olehLuis Adrián Elguézabal
- final-report.pdfDiunggah olehSelene Ramirez
- 9.pdfDiunggah olehmsmsoft90
- MIPRO2018 Martin ModifiedDiunggah olehFajar Nuswantoro
- (MBAsubjects.com)Robbins9 Ppt09Diunggah olehAnas Saleem
- Normal DistributionDiunggah olehInnocentOvick
- Assignment 15Diunggah olehAlex Silvey
- Prediction OfDiunggah olehpsaelan
- 60b7d51853b7861014Diunggah olehوائل شرف
- Unit 04 12springDiunggah olehRika Yunita Chandra Harimurti
- 1-s2.0-S0169207011000914-main.pdfDiunggah olehchompo83
- mtechcs10Diunggah olehNilanjan Sarkar
- A_Methodological_Approach_for_the_Safety_Evaluatio.pdfDiunggah olehramadan1978
- chi_square_review_plackett_1983.pdfDiunggah olehAnonymous 105zV1

- Spatial Variability of Soil Properties and Delineation of Soil Management Zones of Oil Palm Plantations Grown in a Hot and Humid Tropical Region of Southern IndiaDiunggah olehChandra Prakash Sharma
- Demand MilkDiunggah olehom
- Quantitative TechniqueDiunggah olehBlessing Ricalde
- 4215 Midterm SolutionDiunggah olehMohammad Irshead
- gurbir 1Diunggah olehMayur Madhavi
- 6.Mass BalancingDiunggah olehRaul Dionicio
- Forecasting TechniquesDiunggah olehMeheck Hemnani
- Spray Drying of Tomato Pulp in Dehumidified AirDiunggah olehdimi_fra
- Lesson1_simple Linier RegressionDiunggah olehiin_saja
- kf1Diunggah olehNguyễn Thành Trung
- Image Denoising Using Wavelet TransformDiunggah olehesatjournals
- Char LieDiunggah olehppec
- A Difference-Cum-Exponential Type Estimator for Estimating the Population Mean Under StratificationDiunggah olehBONFRING
- Synthetic Control Methods.pdfDiunggah olehFelipe Tavener
- gisc9216d3Diunggah olehapi-287017021
- SUPER MAXO_Short term wind speed forecasting in La Venta, Oaxaca, MexDiunggah olehMaribel Guerrero
- Lecture 6Diunggah olehgeorgez111
- VEGETABLE PRICE PREDICTION BASED ON TIME SERIES ANALYSISDiunggah olehAhmad Raziq
- Prediction and Application of Solar Radiation With Soft Computing Over Traditional and Conventional Approach – a Comprehensive ReviewDiunggah olehahonenkone
- ajas681509-1514Diunggah olehBakhtawar Siddiqui
- SPEDiunggah olehrenato
- Developing Multi Linear Regression Models for Estimation of Marshall StabilityDiunggah olehIJAERS JOURNAL
- Pipeline-Corrosion-Prediction-And-Reliability-Analysis-A-Systematic-Approach-With-Monte-Carlo-Simulation-And-Degradation-Models.pdfDiunggah olehvessel
- Modeling Match Results in Soccer Using a Hierarchical Bayesian Poisson ModelDiunggah olehBartoszSowul
- Mean Squared ErrorDiunggah olehSujak Ibrahim
- Haneda 2016Diunggah olehhendra lam
- Stat a IntroDiunggah olehdontcare16355
- 10 Things to Know About Covariate Adjustment.docxDiunggah olehAhmad Rustam
- 9-Fundamentals of DesignsfDiunggah olehSivi Almanaf Ali Shahab
- 02_whole.pdfDiunggah olehraiyandu