Newsletter
Article Library
Videos
What's New
About Us
Site Map
Search

 

 

The Breakout Bulletin

The following article was originally published in the December 2010 issue of The Breakout Bulletin.
 

Is Scrambled Data Suitable for Strategy Testing?

Different traders will often disagree on what constitutes a good trading strategy. However, everyone generally agrees that trading strategies should be robust. What does robust mean when it comes to algorithmic or systematic trading strategies? Generally speaking, I'd say a robust trading strategy is one that is insensitive to variations in the market. These variations might include market noise -- that is, anything other than the signal that constitutes the tradable part of the market data -- as well as variations in a broader sense, such as changes in cycle length, volatility, trending behavior, and so on. The more robust a strategy is, the greater the likelihood that it will hold up in the future when the market changes, as markets inevitably do.

 

One common technique to build robust strategies is to test the strategy over multiple markets. For example, a strategy that performs well over corn, crude oil, E-mini S&P, and T-bonds is probably pretty robust. However, even if you specialize in one particular market and prefer to focus solely on that market, a variation in this approach can still be used. I wrote an article several years ago about creating synthetic price data. The basic idea, developed by Chande,1 is to randomly select price differences and use them to reconstruct a synthetic price series based on the price changes from the original series. Chande refers to this technique as data scrambling.

 

Some of the beneficial characteristics of data scrambling are (1) you can create a price series as long as you want by randomly sampling from the original price series as much as necessary, and (2) the synthetic prices retain some of the basic structure of the original data since it's based on the same statistical distribution of price changes. For these reasons, synthetic price series are good candidates for building robust strategies. Rather than testing your strategy over different markets, you can construct several different synthetic price series from the same market data and test the strategy over the different synthetic series. If the strategy performs well over all the different synthetic series, it means the strategy is insensitive to the variations in the different series, which is indicative of a robust strategy.

 

However, in light of last month's article on nonrandomness in the markets, there's one question that must be addressed before this approach can be recommended. Namely, does the scrambled data retain the nonrandomness of the original series? If a trading strategy is profitable because it exploits the nonrandomness of a price series, it's important to make sure this nonrandomness is not removed by the data scrambling process. If the synthetic price series doesn't contain nonrandomness, then there's no reason to expect the trading strategy to be profitable on the synthetic prices, which would render them useless for strategy testing.

 

To answer this question, I started with the market data I used in last month's article: daily bars of the E-mini S&P 500 futures from 5/17/2006 to 11/19/2010 (symbol @ES.D in TradeStation 8). I then applied the spreadsheet-based approach I described in my article on synthetic price data (spreadsheet available on the download page) to create 10 different synthetic price series. This approach randomizes the original price changes so that the resulting data series is the same length as the original data. I started the new prices at the close of the first bar of the original series so that all the series would be as similar as possible except for the randomized order of the price changes. For example, this means that if the price changes were not randomized, my "synthetic" price series would be identical to the original series. Lastly, I copied the date, time, volume, and open interest columns from the original series to complete the synthetic price series.

 

For each synthetic price series, I loaded the data into a TradeStation chart using the 3rd Party tab on the Symbol Lookup window under Format Symbol (available from the "Lookup" button on the Settings tab). I set the symbol properties to the same values as the E-mini S&P 500 futures (symbol @ES.D in TradeStation 8).

 

Once the data were loaded into the chart window, I applied the same approach as described in last month's article to estimate the degree of nonrandomness in the market. In particular, I inserted the indicator "AutocorrMax" into the chart with a MaxLag value of 20 and NBars equal to 100. This means it calculated all the autocorrelations over the prior 100 bars for lags from 1 to 20 and plotted the most significant one on each bar.

 

As I did for last month's article, I saved the indicator values to a file by selecting Data Window from the View menu in TradeStation and clicking the disk icon to save the data to a text file. This saves both the price data and any indicator values that are plotted for each bar on the chart. I opened the text file in Excel and tabulated all values that were statistically significant based on the significance bands that are also plotted by the indicator and saved in the text file. This allowed me to calculate the percentage of bars for which the largest (in absolute value) autocorrelation value over the different lags was statistically significant. A statistically significant autocorrelation value means the serial correlation in the daily closes is significant, which means the price changes are nonrandom. This gives us the percentage of the market data that is nonrandom.

 

As I reported in last month's article, when I performed this analysis on the original E-mini S&P data series, I found that 62% of the bars had significant nonrandomness. Here are the results for the 10 synthetic price series derived from the original E-mini S&P data:

 

 Series % Nonrandom
1 29
2 61
3 37
4 64
5 44
6 47
7 42
8 66
9 48
10 73
Ave: 51.1
Std: 14.2

 

The values range from 29% to 73% with an average of 51% and a standard deviation of 14%. This tells me several things. First, randomizing the price changes to create the synthetic data doesn't completely remove the serial correlations as measured by the autocorrelation. In fact, the average degree of nonrandomness (51%) is not far from the value (62%) of the original data. Secondly, although this is a small sample size, we might expect that, on average, if we generate synthetic prices, we'll find similar levels of nonrandomness in the synthetic data to those seen in the original series.

 

On the other hand, if one of our synthetic series has a particularly low degree of nonrandomness, it might affect the performance of the strategy. This suggests that it might not be wise to rely on the results from just a few synthetic price series when testing for robustness. With only a few synthetic price series, the odds are much higher that they'll all have relatively low levels of nonrandomness as compared to the original series. It would be better to average the results over a larger number of synthetic price series.

 

Another approach to using synthetic prices to test for robustness would be to include some calculation of nonrandomness, such as the one used here, into the generation of the price series. That way, you could keep generating synthetic price series until you obtained one that had roughly the same degree of nonrandomness as the original data. A reasonable threshold for acceptability might be one standard deviation. For example, based on the data above, an acceptable value for the percentage of nonrandomness would be 47.8% to 76.2% (i.e., 62 ± 14.2%). In other words, you'd generate different synthetic price series randomly until one had a percentage of nonrandomness between 48% and 76%.

 

Conclusions

Synthetic prices generated using the data scrambling method seem to retain a substantial degree of nonrandomness. This suggests they may be useful in testing trading strategies for robustness. To use them effectively for this purpose, either a sufficient number of different synthetic price series should be used to overcome the probability of a low degree of nonrandomness, or the price series should be checked to make sure the degree of nonrandomness is similar to that of the original series.

 

Trading strategies that perform well over different price series are more likely to hold up well going forward. Because synthetic price series retain key characteristics of the original series, including nonrandomness, while introducing variation into the price data, they can be a useful tool to test for this type of robustness.

 

Reference
1. Tushar Chande, Beyond Technical Analysis, 2nd ed., John Wiley & Sons, Inc., New York, 2001, pp. 346-352.

 

 

Mike Bryant

Breakout Futures

 

 

HYPOTHETICAL OR SIMULATED PERFORMANCE RESULTS HAVE CERTAIN INHERENT LIMITATIONS. UNLIKE AN ACTUAL PERFORMANCE RECORD, SIMULATED RESULTS DO NOT REPRESENT ACTUAL TRADING. ALSO, SINCE THE TRADES HAVE NOT ACTUALLY BEEN EXECUTED, THE RESULTS MAY HAVE UNDER- OR OVER-COMPENSATED FOR THE IMPACT, IF ANY, OF CERTAIN MARKET FACTORS, SUCH AS LACK OF LIQUIDITY. SIMULATED TRADING PROGRAMS IN GENERAL ARE ALSO SUBJECT TO THE FACT THAT THEY ARE DESIGNED WITH THE BENEFIT OF HINDSIGHT. NO REPRESENTATION IS BEING MADE THAT ANY ACCOUNT WILL OR IS LIKELY TO ACHIEVE PROFITS OR LOSSES SIMILAR TO THOSE SHOWN.