Machine Learning for stock market

Machine Learning for forecasting stock market movements with support vector machine


Support vector machine (SVM) is a type of learning algorithms characterized by the capacity control of the decision function, the use of the kernel functions and the sparsity of the solution. I investigate here the worthiness of predictability of movement of various financial instruments with SVM by forecasting the monthly movement. To evaluate the forecastability of SVM I use cross validation technique. Various kernels were tried in first stages of the research analysis.

Data structures and approach to data analysis

  1. Table of daily data as it is by instrument,Date,OHLC

    1. Gets updated by downloader on a regular basis

    2. Source is YAHOO financials freely available

  2. Database view created, which has for each instrument it’s adjusted close differentials for day, week and month and all most important world indices, commodities and forex data also with close or adjusted close price (depending on availability) differentials for day, week and month.

  3. A few helper postgres functions were written to simplify data extraction usable for machine learning datasets.

  4. Helper tables added for the list of instruments which data need to be updated and instruments properties table to keep details for machine learned instruments like serialized fit classifiers, mean score parameters, best chosen features list etc.

  5. Existing limitations of data analysis

    1. Bond data is not present

    2. Volume is not used as in my previous machine learning analysis research it didn’t show worth correlation. However it is still a subject of inclusion and has its priority

    3. I see at the moment mostly a lack of additional pre-programmed functions for technical analysis like stochastics, money flow, breakouts etc.

    4. Month might make a difference in predictions.

    5. Would be interesting to add economic news like the GDP advance release, private sector manufacturing report differentials with historical values for large economies if possible to get such data feed consistently and reliably.

Conclusions from literature review and personal experience in financial markets

The econometrics literature points to the fact that daily weekly and monthly price differentials are possibly somehow correlate to future price movement. However instrument price itself rarely is independent on other world changes. While classic technical analysis is mostly concerned with instrument’s own price movement, but the reality shows that world economics has a high pressure and sometimes is more decisive than the stock itself. Especially it is true for indices, which incorporate multiple stocks.

Radial Basis Function kernels have proved useful in financial market prediction problems, (see

Experimental design

I decided to use SVM for machine learning due to its many advantages and popularity. I  investigated SVM using polinomial, radial and linear kernels with various parameters for C, ǫ and σ (where relevant). After settling on a kernel and pre-processing method, different values for C, ǫ and σ were used for the SVM.

As Radial Basis Function kernels have often proved useful in financial market prediction problems, I use this SVM kernel with different parameters’ values optimised per each index instrument. In addition feature set per each to predict instrument was selected separately based on importance/noise of each feature. This means that each instrument to be predicted has its own specific combination of features set, kernel parameters for gamma and Cost constructed from historical data.

The problem of index prediction is that any move that is predicted has to be significant enough to profit from it to beat brokerage and probability error. This why we are concerned with a multiclass classification of that movement. I attempt to predict whether the trader should go long, short or neutral (eg. iron condor option strategy).

With this in mind, SVMs per each index instrument are trained on the data with the following labelling: 1 for 2% up, 2 for 2% down, 0 for anything in between.

Kernel parameter selection

Gamma parameter

For Radial Basis SVM kernel function the gamma parameter is the influence of a single training example. The low values meaning ‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors.

Other words large gamma leads to high bias and low variance models, and vice-versa.

The C parameter

C parameter is a "soft margin" idea parameter, is a trade off between wrong classification vs simplicity of the decision plane. The lower C the smoother a decision curve (plane). The higher C the more (or all) training examples get classified correctly, because you penalize the cost of misclassification a lot. It gives a model ability to select more samples as support vectors. Which makes it potentially easily breakable for new (future) events which do not exactly match the historical events. Other words C is the cost of classification.

The parameter grid search

The behavior of the model is very sensitive to the above parameters and a big deal comes to a proper selection methods.

After first versions trying default parameters,then brutal force parameter selection I came to use of exhaustive search over specified parameter values for an estimator. At the moment I use a Grid Search. The parameters get optimized by cross-validation grid search over a parameter grid. Current version uses a logarithmic grid with basis 10 on a range of each parameter which is beyond ones which found optimal eventually. Next version I plan to use it as an initial step only with a following finer tuning, a kinda “zoom in” approach. It can be achieved but at a much higher cost and planned to be implemented in next releases.

For details see and

Feature selection

As explained earlier although there is a large number of features defined for overall financial markets, however each financial instrument may have its own specific combination of feature set as some features might be a noise and need to be removed. I use a very basic algorithm to check if a feature makes better correlation or not and exclude it if needed per financial instrument. It is performed in a special operation and the result gets saved in DB for later use by classification fit, prediction and other operations.


Another task to be done is to standardise features by removing the mean and scaling to unit variance. Standardisation of a dataset is a normal requirement as SVM with RBF might behave badly if the individual feature do not look like standard normally distributed data.

For example in RBF kernel of SVM assumes that all features are somehow centered and have variance in the same order. Otherwise a feature with a variance much larger than others might dominate the function and make the estimator unable to learn well or at all from other features.

This is why I perform scaling on the feature dataset.


General workflow

There are 5 steps to follow to use this tool. Every instrument to be predicted should use same steps. However all steps allow to use batches of instruments which are specified as comma separated lists as one of parameters. Some steps need a particular order some not. Soe steps rely on data in database, which sometimes stored like configuration, sometimes as a result of a previous step. The steps are:

  1. Learn Kernel parameters

    1. Gamma and C parameters are learnt through GridSearch and saved in DB per each financial instrument in batch (batch can be a single instrument only)

    2. It can use all available features (provided by view) or use learnet best features depending on parameter supplied.

    3. Notes: sometimes you want first to run this step and with its results stored in DB run feature selection step, sometimes the opposite, sometimes run this first on all features, then run thi on later learnt best features.

  2. Learn best features set

    1. Best features set learnt with noisy instruments excluded and saved into DB

    2. It expects best Gamma and C parameter been learnt and stored in DB for each instrument in the batch

  3. Maybe to rerun step 1 with best features set and then rerun step 2 with new Gamma and C

  4. Run fit and save to store learnet serialized model into DB per each instrument in batch

  5. Run predictions


Every time Step 2 runs then you need to rerun step 2 before Step 5 as features array may change. At the same time you may want to rerun step 1 after step 2, which makes no point to automatically run step 4 every time. That’s why steps are separated.

The most expensive steps are 1 and 2. Steps 4 and 5 are cheap. Step 5 can be done by a low performance devices like small cloud instances or mobile devices.

Experimental results

So far the mean or prediction scores of cross validation per various financial indices for ASX is around 80% and shows that at least the direction of the move is correctly predicted for a future month. The below is a csv of results :

Date,      Symbol,  Outcome OrigPrice       PredictedPrice  ActualChange

2017-05-11,^AORD,      2,   5912.0,         5766.5,         0.975389039242

2017-05-11,^AXDJ,      0,   2253.600098,    2199.100098,    0.975816472475

2017-05-11,^AXSJ,      2,   9696.299805,    9086.599609,    0.937120323395

2017-05-11,^AXEJ,      2,   9544.5,         9019.0,         0.944942113259

2017-05-11,^AXFJ,      0,   6686.399902,    6399.100098,    0.957032213417

2017-05-11,^AXXJ,      0,   7457.600098,    7137.100098,    0.957023707924

2017-05-11,^AXHJ,      2,   23931.400391,   23616.099609,   0.986824808542

2017-05-11,^AXNJ,      0,   5762.200195,    5824.399902,    1.010794437

2017-05-11,^AXIJ,      0,   903.599976,     857.200012,     0.948649883541

2017-05-11,^AXMJ,      1,   9634.200195,    9803.400391,    1.01756245382

2017-05-11,^AXJR,      0,   3311.300049,    3329.5,         1.00549631587

2017-05-11,^AXUJ,      2,   9068.099609,    8843.700195,    0.975253975621

Where Outcome is a prediction with labels:

  • 0: between 2% up or down

  • 1: >2% up

  • 2: drop more than 2%

As we see from the above results while it predicted the market direction well but some of predictions did not follow the predicted label outcome.

Overall conclusions

Learning the kernel and feature set selection requires significant computational time. However this can be done only once per say 3 months to 1 year to accommodate possible market behavioural changes. The fitting takes very little processing power and the prediction even less. The prediction itself with pre-calculated and preloaded models potentially might be done on a mobile devices. Which creates a potential business opportunity to sell such services.

How good are the results? What context should we use this numbers in? Does it mean it is 80% of making profits and investing real money would give 80% of 2% of monthly gain? I think unfortunately not. To me the context of these results is closer to OCR task. In OCR you would get with such results every 5th character determined wrong. It might get similar characters close enough but still wrong. Like ‘1’ and ‘l’ look similar in some fonts. While they obviously should not be misclassified but would have been. 80% correlation would result in a text full of mistakes, which would not be useful for most life applications without human intervention. In our case of financial markets prediction it would erroneous swings of market might wipe out some of previous gains. If you also take in consideration brokerage to beat on top of errors we get, it might be even more challenging trader/investor task. I still see a usefulness of this as a prediction mechanism as one of potential signals for market entry/exit decision. I would compare it with a tailwind for airplane which might be a good thing and we might use it but not a determining factor for  successful flight.

Potential commercial opportunities and challenges

As the prediction itself with pre-calculated and preloaded models potentially might be done on a mobile devices, it suggests an interesting service to be provided for everyone who is in financial indicators. For example we can create and host models and disseminate them to subscribers and let the user themself run a prediction and find an outcome as an indicator. And to interpret results themselves.

Potential issue is that in most countries any financial market prediction service requires licensing and is a complex issue. However in our case it is equal to mathematical calculations like say stochastic indicator, which everyone can do themselves at their own computers or their mobile devices. It means that in our case we do not advise a customer with a prediction of financial market direction, but rather let them run math (very complex but still just a math) indicator themselves and derive their own conclusions based on its result. We also can provide all the detailed explanations that it is not our prediction but only a math calculation and how it has been done. In a sense I see it just a more complex indicator than moving averages crossover indicator. It is a machine learnt indicator.