Machine Learning for forecasting stock market movements with support vector machine AbstractSupport vector machine (SVM) is a type of learning algorithms characterized by the capacity control of the decision function, the use of the kernel functions and the sparsity of the solution. I investigate here the worthiness of predictability of movement of various financial instruments with SVM by forecasting the monthly movement. To evaluate the forecastability of SVM I use cross validation technique. Various kernels were tried in first stages of the research analysis. Data structures and approach to data analysis
Conclusions from literature review and personal experience in financial markets
The econometrics literature points to the fact that daily weekly and monthly price differentials are possibly somehow correlate to future price movement. However instrument price itself rarely is independent on other world changes. While classic technical analysis is mostly concerned with instrument’s own price movement, but the reality shows that world economics has a high pressure and sometimes is more decisive than the stock itself. Especially it is true for indices, which incorporate multiple stocks. Radial Basis Function kernels have proved useful in financial market prediction problems, (see http://cs229.stanford.edu/proj2013/ChenChenYe-ForecastingTheDirectionAndStrengthOfStockMarketMovement.pdf) Experimental designI decided to use SVM for machine learning due to its many advantages and popularity. I investigated SVM using polinomial, radial and linear kernels with various parameters for C, ǫ and σ (where relevant). After settling on a kernel and pre-processing method, different values for C, ǫ and σ were used for the SVM. As Radial Basis Function kernels have often proved useful in financial market prediction problems, I use this SVM kernel with different parameters’ values optimised per each index instrument. In addition feature set per each to predict instrument was selected separately based on importance/noise of each feature. This means that each instrument to be predicted has its own specific combination of features set, kernel parameters for gamma and Cost constructed from historical data. The problem of index prediction is that any move that is predicted has to be significant enough to profit from it to beat brokerage and probability error. This why we are concerned with a multiclass classification of that movement. I attempt to predict whether the trader should go long, short or neutral (eg. iron condor option strategy). With this in mind, SVMs per each index instrument are trained on the data with the following labelling: 1 for 2% up, 2 for 2% down, 0 for anything in between. Kernel parameter selectionGamma parameterFor Radial Basis SVM kernel function the gamma parameter is the influence of a single training example. The low values meaning ‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors. Other words large gamma leads to high bias and low variance models, and vice-versa. The C parameterC parameter is a "soft margin" idea parameter, is a trade off between wrong classification vs simplicity of the decision plane. The lower C the smoother a decision curve (plane). The higher C the more (or all) training examples get classified correctly, because you penalize the cost of misclassification a lot. It gives a model ability to select more samples as support vectors. Which makes it potentially easily breakable for new (future) events which do not exactly match the historical events. Other words C is the cost of classification. The parameter grid searchThe behavior of the model is very sensitive to the above parameters and a big deal comes to a proper selection methods. After first versions trying default parameters,then brutal force parameter selection I came to use of exhaustive search over specified parameter values for an estimator. At the moment I use a Grid Search. The parameters get optimized by cross-validation grid search over a parameter grid. Current version uses a logarithmic grid with basis 10 on a range of each parameter which is beyond ones which found optimal eventually. Next version I plan to use it as an initial step only with a following finer tuning, a kinda “zoom in” approach. It can be achieved but at a much higher cost and planned to be implemented in next releases. For details see http://pyml.sourceforge.net/doc/howto.pdf and http://journal.imbio.de/articles/pdf/jib-201.pdf Feature selectionAs explained earlier although there is a large number of features defined for overall financial markets, however each financial instrument may have its own specific combination of feature set as some features might be a noise and need to be removed. I use a very basic algorithm to check if a feature makes better correlation or not and exclude it if needed per financial instrument. It is performed in a special operation and the result gets saved in DB for later use by classification fit, prediction and other operations. ScalingAnother task to be done is to standardise features by removing the mean and scaling to unit variance. Standardisation of a dataset is a normal requirement as SVM with RBF might behave badly if the individual feature do not look like standard normally distributed data. For example in RBF kernel of SVM assumes that all features are somehow centered and have variance in the same order. Otherwise a feature with a variance much larger than others might dominate the function and make the estimator unable to learn well or at all from other features. This is why I perform scaling on the feature dataset. See https://pdfs.semanticscholar.org/e9d7/44aa8fe841927006ccff08263a0aa1fa1bac.pdf General workflowThere are 5 steps to follow to use this tool. Every instrument to be predicted should use same steps. However all steps allow to use batches of instruments which are specified as comma separated lists as one of parameters. Some steps need a particular order some not. Soe steps rely on data in database, which sometimes stored like configuration, sometimes as a result of a previous step. The steps are:
Notes: Every time Step 2 runs then you need to rerun step 2 before Step 5 as features array may change. At the same time you may want to rerun step 1 after step 2, which makes no point to automatically run step 4 every time. That’s why steps are separated. The most expensive steps are 1 and 2. Steps 4 and 5 are cheap. Step 5 can be done by a low performance devices like small cloud instances or mobile devices. Experimental resultsSo far the mean or prediction scores of cross validation per various financial indices for ASX is around 80% and shows that at least the direction of the move is correctly predicted for a future month. The below is a csv of results : Date, Symbol, Outcome OrigPrice PredictedPrice ActualChange 2017-05-11,^AORD, 2, 5912.0, 5766.5, 0.975389039242 2017-05-11,^AXDJ, 0, 2253.600098, 2199.100098, 0.975816472475 2017-05-11,^AXSJ, 2, 9696.299805, 9086.599609, 0.937120323395 2017-05-11,^AXEJ, 2, 9544.5, 9019.0, 0.944942113259 2017-05-11,^AXFJ, 0, 6686.399902, 6399.100098, 0.957032213417 2017-05-11,^AXXJ, 0, 7457.600098, 7137.100098, 0.957023707924 2017-05-11,^AXHJ, 2, 23931.400391, 23616.099609, 0.986824808542 2017-05-11,^AXNJ, 0, 5762.200195, 5824.399902, 1.010794437 2017-05-11,^AXIJ, 0, 903.599976, 857.200012, 0.948649883541 2017-05-11,^AXMJ, 1, 9634.200195, 9803.400391, 1.01756245382 2017-05-11,^AXJR, 0, 3311.300049, 3329.5, 1.00549631587 2017-05-11,^AXUJ, 2, 9068.099609, 8843.700195, 0.975253975621 Where Outcome is a prediction with labels:
As we see from the above results while it predicted the market direction well but some of predictions did not follow the predicted label outcome. Overall conclusionsLearning the kernel and feature set selection requires significant computational time. However this can be done only once per say 3 months to 1 year to accommodate possible market behavioural changes. The fitting takes very little processing power and the prediction even less. The prediction itself with pre-calculated and preloaded models potentially might be done on a mobile devices. Which creates a potential business opportunity to sell such services. How good are the results? What context should we use this numbers in? Does it mean it is 80% of making profits and investing real money would give 80% of 2% of monthly gain? I think unfortunately not. To me the context of these results is closer to OCR task. In OCR you would get with such results every 5th character determined wrong. It might get similar characters close enough but still wrong. Like ‘1’ and ‘l’ look similar in some fonts. While they obviously should not be misclassified but would have been. 80% correlation would result in a text full of mistakes, which would not be useful for most life applications without human intervention. In our case of financial markets prediction it would erroneous swings of market might wipe out some of previous gains. If you also take in consideration brokerage to beat on top of errors we get, it might be even more challenging trader/investor task. I still see a usefulness of this as a prediction mechanism as one of potential signals for market entry/exit decision. I would compare it with a tailwind for airplane which might be a good thing and we might use it but not a determining factor for successful flight. Potential commercial opportunities and challengesAs the prediction itself with pre-calculated and preloaded models potentially might be done on a mobile devices, it suggests an interesting service to be provided for everyone who is in financial indicators. For example we can create and host models and disseminate them to subscribers and let the user themself run a prediction and find an outcome as an indicator. And to interpret results themselves. Potential issue is that in most countries any financial market prediction service requires licensing and is a complex issue. However in our case it is equal to mathematical calculations like say stochastic indicator, which everyone can do themselves at their own computers or their mobile devices. It means that in our case we do not advise a customer with a prediction of financial market direction, but rather let them run math (very complex but still just a math) indicator themselves and derive their own conclusions based on its result. We also can provide all the detailed explanations that it is not our prediction but only a math calculation and how it has been done. In a sense I see it just a more complex indicator than moving averages crossover indicator. It is a machine learnt indicator. |