Search
× Search
Saturday, December 21, 2024

Archived Discussions

Recent member discussions

The Algorithmic Traders' Association prides itself on providing a forum for the publication and dissemination of its members' white papers, research, reflections, works in progress, and other contributions. Please Note that archive searches and some of our members' publications are reserved for members only, so please log in or sign up to gain the most from our members' contributions.

Machine Learning: Questions about the quality/(quantity) for Labels, Features, Samples and Accuracy

photo

 Mikael Furesjö, Quantitative Researcher

 Monday, February 16, 2015

Making the assumption that you have technically correct made your cross validation, over/under fitting correction etc in a acceptable way, I have some questions regarding Machine Learning for stock market prediction in Python and Scikit.learn Label settings: What kind of settings are normally quant´s using on the labels when doing ”Multi-label classification”, and is it preferable that the amount for each label are evenly distributed for the samples in the dataset. Example: I have start using five labels, ranging from 2 to -2 and indicating the future performance after X no of bars, preferable from the starting numbers of the Fibonacci series. Using the Moving Average from the futures bars Low value, to indicate a stable positive return and the Moving Average from the High values to indicate a stable negative return. Anything between would be considered a neutral outcome and be labeled “0”. Features quantity/quality How many and what kind of features to the dataset are general quant’s working with? Is it normal just to use features derived from the original HLOC data (e.i. TA-indicators). Example: I have vectorized the price data 100 bars back in time by taking the percentage change from the present Close to the High, the Low, the Open and the Close for every of the 100 bars back in time making it + 400 features. Sample quantity: What is the minimum amount of data considered needed for making quality predictions? 100MB, 1GB, 100GB or 1TB. Example: Using 15min data on SPY from 1999 to 2011 including pre market and after hours make around 144K samples. With around 250 features (4 decimals) that dataset makes around 250 MB. Would that be considered little, average or a lot for any stock market predictive system? Prediction accuracy: What kind of accuracy for the scoring value would be enough for quant´s for putting the system in any kind of live production? Off course, taking in to account the percentage for random accuracy with a multi labeled classified target array. Is 50 percent pretty god for an evenly distributed target array with five labels (e.i. compared to the 20 percent random accuracy). Thanks in advance for any reply // Mikael


Print

4 comments on article "Machine Learning: Questions about the quality/(quantity) for Labels, Features, Samples and Accuracy"

photo

 Krzysztof F., Independent Telecommunications Professional

 Saturday, February 21, 2015



I'm using label 0 when trade is a loser and label 1 when trade is a winner.



Training 20000x200 features



Measures what i'm using are mathew correlation index, kappa, precision, recall and accuracy.



Krzysztof



http://www.trade2win.com/boards/trading-software/105880-3rd-generation-nn-deep-learning-deep-belief-nets-restricted-boltzmann-machines.html


photo

 Muhammad A., Independent Day Trader at Equity Day-Trader

 Monday, February 23, 2015



Not sure I understood the part where you talked about the features. Did you say for each of the OHLC data you calculated the change for 100 days giving you the 400 features. If yes that is too many features there are better ways to do that with less data.


photo

 Muhammad A., Independent Day Trader at Equity Day-Trader

 Monday, February 23, 2015



correction, a hundred bars not days.


photo

 Krzysztof F., Telecommunication consultant. Data analyst

 Monday, February 23, 2015



aaaa question was

Making the assumption that you have technically correct made your cross validation

cross validation ???? we deal with time series here !!!! cross validation = future leak !!! Love it

Please login or register to post comments.

TRADING FUTURES AND OPTIONS INVOLVES SUBSTANTIAL RISK OF LOSS AND IS NOT SUITABLE FOR ALL INVESTORS
Terms Of UsePrivacy StatementCopyright 2018 Algorithmic Traders Association