Search

Register

Login

× Search

Menu

Wednesday, February 5, 2025

Archived Discussions

Recent member discussions

The Algorithmic Traders' Association prides itself on providing a forum for the publication and dissemination of its members' white papers, research, reflections, works in progress, and other contributions. Please Note that archive searches and some of our members' publications are reserved for members only, so please log in or sign up to gain the most from our members' contributions.

Search

Machine Learning: Questions about the quality/(quantity) for Labels, Features, Samples and Accuracy

Mikael Furesjö, Quantitative Researcher

Monday, February 16, 2015

Making the assumption that you have technically correct made your cross validation, over/under fitting correction etc in a acceptable way, I have some questions regarding Machine Learning for stock market prediction in Python and Scikit.learn Label settings: What kind of settings are normally quant´s using on the labels when doing ”Multi-label classification”, and is it preferable that the amount for each label are evenly distributed for the samples in the dataset. Example: I have start using five labels, ranging from 2 to -2 and indicating the future performance after X no of bars, preferable from the starting numbers of the Fibonacci series. Using the Moving Average from the futures bars Low value, to indicate a stable positive return and the Moving Average from the High values to indicate a stable negative return. Anything between would be considered a neutral outcome and be labeled “0”. Features quantity/quality How many and what kind of features to the dataset are general quant’s working with? Is it normal just to use features derived from the original HLOC data (e.i. TA-indicators). Example: I have vectorized the price data 100 bars back in time by taking the percentage change from the present Close to the High, the Low, the Open and the Close for every of the 100 bars back in time making it + 400 features. Sample quantity: What is the minimum amount of data considered needed for making quality predictions? 100MB, 1GB, 100GB or 1TB. Example: Using 15min data on SPY from 1999 to 2011 including pre market and after hours make around 144K samples. With around 250 features (4 decimals) that dataset makes around 250 MB. Would that be considered little, average or a lot for any stock market predictive system? Prediction accuracy: What kind of accuracy for the scoring value would be enough for quant´s for putting the system in any kind of live production? Off course, taking in to account the percentage for random accuracy with a multi labeled classified target array. Is 50 percent pretty god for an evenly distributed target array with five labels (e.i. compared to the 20 percent random accuracy). Thanks in advance for any reply // Mikael

More links

Link to Forum Discussion

Print

4 comments on article "Machine Learning: Questions about the quality/(quantity) for Labels, Features, Samples and Accuracy"

Krzysztof F., Independent Telecommunications Professional

Saturday, February 21, 2015

I'm using label 0 when trade is a loser and label 1 when trade is a winner.

Training 20000x200 features

Measures what i'm using are mathew correlation index, kappa, precision, recall and accuracy.

Krzysztof

http://www.trade2win.com/boards/trading-software/105880-3rd-generation-nn-deep-learning-deep-belief-nets-restricted-boltzmann-machines.html

Muhammad A., Independent Day Trader at Equity Day-Trader

Monday, February 23, 2015

Not sure I understood the part where you talked about the features. Did you say for each of the OHLC data you calculated the change for 100 days giving you the 400 features. If yes that is too many features there are better ways to do that with less data.

Muhammad A., Independent Day Trader at Equity Day-Trader

Monday, February 23, 2015

correction, a hundred bars not days.

Krzysztof F., Telecommunication consultant. Data analyst

Monday, February 23, 2015

aaaa question was

Making the assumption that you have technically correct made your cross validation

cross validation ???? we deal with time series here !!!! cross validation = future leak !!! Love it

Please login or register to post comments.

Newsletters

Get up-to-date industry news and the latest algorithmic trading insights when you subscribe to our free newsletter.

Email Address

**

Already a Subscriber?
To manage your current subscription settings, enter your email address.

* Go

We value your privacy. We will never rent, sell, or otherwise use your email address for anything other than the services you specifically request. Period.

About our Association

The Algorithmic Trader’s Association, established in 2009, is the world's leading professional organization and resource center for the discussion of algorithmic trading strategy, methods, software, and technical analysis.

Get in touch

Address: Delaware, United States
Phone: (888) 668-0571
Email: info@atassn.com

Media

Follow Us

TRADING FUTURES AND OPTIONS INVOLVES SUBSTANTIAL RISK OF LOSS AND IS NOT SUITABLE FOR ALL INVESTORS

Legal Disclaimer

Terms Of Use Privacy StatementCopyright 2018 Algorithmic Traders Association