**Improving
the Accuracy of Trading System Evaluations**

An integral step in developing
a trading system is evaluating its performance. When we test a system on
historical data, we are, in effect, simulating how that system will perform
when we trade it. It only makes sense, then, that the more accurate the
simulation, the better we can evaluate the system. If we could accurately
account for all aspects of trading -- slippage, commissions, account size,
number of contracts, market behavior, etc. -- then we would know just what
to expect if we were to trade the system, and our decision about whether or not
to trade the system would be better as a result.

Of course there's no 100%
accurate way to simulate trading a system, but there are ways to increase the
accuracy of the simulation. There are two aspects in particular that I've
addressed in my own work that I'd like to share. The first is position sizing.
Those of you, like me, who spend a lot of time working in TradeStation, are
probably used to seeing the results of system testing expressed in terms of
one-contract profits and losses. The TradeStation performance report is based on
the assumption that all trades are the same size. For futures, this means
each trade is typically one contract. For stocks, you would typically take 100
shares or some multiple of that for each trade.

TradeStation does allow for a
variable number of shares or contracts for each trade. EasyLanguage makes
it relatively simple to program just about any method you can think of to vary
the number of contracts or shares for each trade. The problem is that most of
the metrics in the TradeStation performance report lose their meaning if
the number of contracts or shares is not fixed. Metrics such as the average
trade, standard deviation of the average trade, the average win, average loss,
maximum dollar drawdown, and net dollar profit become distorted and difficult to
interpret if the number of contracts or shares varies from trade to trade.
This is particularly true if you program rules to increase the position size as
profits accumulate. In that case, many of the performance metrics, such as
average trade and dollar drawdown, tend to increase with the number of
trades.

Because TradeStation isn't
designed to report performance results when using a variable number of contracts
or shares, most traders probably stick to a uniform trade size when
evaluating systems in TradeStation. What's wrong with that? In a nutshell,
it makes it difficult to see the connection between risk and reward. We're all
familiar with the concept that the greater the risk, the greater the
reward. Higher profits are the compensation for taking on a greater risk. No one
wants to assume more risk without being compensated for it. However, if you
don't understand the true risk inherent in a system, then how do you know if
you're being fairly compensated for it? More to the point, when comparing
trading systems or when comparing parameter sets for a given system, we
generally want to choose the one that produces the greatest reward for a given
level of risk. In order to do this, we need to understand the risk-reward
characteristics of our system.

As an example, consider the
one-contract results from the following system:

Net profit:
$5944

Number of trades:
487

Ave. trade:
$12

Ave. winning trade:
$28

Ave. losing trade:
-$23

Percent profitable:
69.6%

Profit factor:
2.73

Largest loss:
-$212

Max drawdown:
-$1,573

Max consecutive losers:
4

These results are from a day
trading system for the E-mini S&P 500 futures. Most people probably wouldn't
trade this system because of its small average trade size. Also, notice
that the largest loss is -$212. This comes from the fixed size stop used
for each trade. This means that you're risking $212 on each trade to make $28.
Not exactly a favorable risk-reward ratio. Put another way, with an average
trade of $12, every time a trade is stopped out, the results of the past 17
trades are erased. On the other hand, the system has a high profit factor, and
the drawdown is pretty reasonable for one contract. Assuming the average trade
is achievable, would it make sense to try to scale this system up by trading
more contracts or is the risk-reward ratio an insurmountable
obstacle?

An accurate simulation of this
system taking into account position sizing, account equity, and margin
requirements would answer this question. The type of position sizing I have
in mind is based on risking a percentage of the trading account on each
trade. For example, we might risk 3% of account equity on each trade. For the
system above, if we had a $15,000 account, and the risk per contract is $212,
risking 3% of the account would give us 0.03*15000/212 = 2.12 or 2 contracts. As
the account equity grew, we'd be risking 3% of a larger number, which would give
us more contracts. Added to a trading system simulation, this type of position
sizing allows us to relate risk to reward.

If we ran the trade simulation
for the system above assuming 3% of equity was risked on each trade, we could
see what kind of drawdown we might expect, what kind of equity curve we might
get, and what kind of returns to expect. We could try other risk percentages,
too. If we did, we'd see that higher risk percentages give higher rates of
return but higher drawdowns as well. By testing a number of different risk
percentages, we could get a pretty good sense of the relationship between risk
and return for this system. This is what I meant when I said that position
sizing is a way to relate risk to reward.

You might have noticed that I'm
using the word "risk" in two different ways. On the one hand, we use risk to
refer to the amount of money or percentage of the trading account at risk on a
particular trade. If the trade is a loss, we could lose $212, for example, or
perhaps 3% of the trading account. This is the trade risk. On the other hand,
the worst-case peak-to-valley drawdown of a trading system is a common and
practical measure of the overall risk of a trading system. By risking a
percentage of the account on each trade, the simulation can relate the trade
risk to the drawdown risk as well as the rate of return to the drawdown risk.

This brings us to the second
way to improve the accuracy of trading system simulations. Inasmuch as maximum
peak-to-valley drawdown is a useful measure of system risk, improving the
calculation of the drawdown will improve our simulation results and thereby
provide us with a better evaluation of the system. Although we can't predict how
the market will differ tomorrow from what we've seen in the past, we do know it
will be different. If we calculate the maximum drawdown based on the historical
sequence of trades, we're basing our calculations on a sequence of trades we
know won't be repeated exactly. Even if the distribution of trades (in the
statistical sense) is the same in the future, the sequence of those trades is
largely a matter of chance. Calculating the drawdown based on one particular
sequence is somewhat arbitrary. Moreover, the sequence of trades has a very
large effect on the calculated drawdown. If you choose a sequence of trades
where five losses occur in a row, you could get a very large drawdown. The same
trades arranged in a different order, such that the losses are evenly dispersed,
might have a negligible drawdown.

As a way to address
this problem, we can apply a Monte Carlo approach. The idea is to
randomize the sequence of historical trades and calculate the rate of return and
drawdown for the randomized sequence. We then repeat the process several hundred
or thousand times. Looking at the results in aggregate, we might find, for
example, that in 95% of the sequences, the drawdown was less than 30% when 4% of
the equity was risked on each trade. We would interpret this to mean that
there's a 95% chance that the drawdown will be less than 30% when 4% is risked
on each trade. I discuss this process in more detail in the user's guide for the
MonteCarlo console program. The user's guide is available for free download.*

Combining the Monte Carlo approach with risk-based position
sizing improves our system trading simulations considerably. As an
example, let's go back to the system results presented above. I took 200
consecutive trades from the system, spanning about 10 months. The risk
for each trade was the same: $212. I started with an account size of
$20,000. Running the trades through my Monte Carlo simulator produced the
following table of results:

RESULTS AT 95%
PROBABILITY

f value Return(%)
Drawdown(%)

0.01
0
0

0.02
20.575 4.48405

0.03 39.475
6.80517

0.04 60.63
9.81719

0.05
84.6125 12.359

0.06 111.568
14.7371

0.07
141.488 17.3188

0.08 174.873
20.2822

0.09
211.705 22.2958

0.1 252.18
25.5526

0.11
295.582 26.8196

0.12 344.142
30.3462

0.13
397.335 31.8658

0.14 456.025
34.3499

0.15
518.737 37.4941

0.16 586.433
38.1433

0.17
661.447 42.035

0.18 740.9
44.3473

0.19
827.923 45.8333

0.2 921.35
46.3659

0.21
1021.49 47.9949

0.22 1128.46
51.2275

The first
column, "f value" is the fraction of the account risked on each trade, also
known as the fixed fraction. For example, 0.03 means that 3% is risked on each
trade. The second column, "Return(%)", is the net rate of return on the starting
equity over the period, and "Drawdown(%)" is the maximum (i.e., worst-case)
peak-to-valley drawdown expressed as a percentage of the equity existing prior
to the start of the drawdown. A drawdown of 20%, for example, means the account
equity fell 20% from the highest equity peak preceding the drawdown. All
calculations are on a closed trade basis. The results are tabulated at a
confidence level of 95%.

This table
provides an answer to the question posed earlier: would it make sense to try to
scale this system up by trading more contracts or is the risk-reward ratio an
insurmountable obstacle? Again, assuming the average one-contract trade
size of $12 is achievable in practice, the Monte Carlo simulation suggests that,
yes, this system is viable. Consider, for example, a fixed fraction of 0.06. By
risking 6% of account equity on each trade, the Monte Carlo simulation estimates
that the rate of return would be 111% and the worst-case drawdown would be about
15% with 95% confidence. As expected, we'd need to trade a fairly large number
of contracts. A risk percentage of 6% implies that we'd be starting with 0.06 *
20000/212 = 5 contracts (rounded down). With an initial margin requirement of
about $3500, this would require about $17,500 in initial margin. This would
leave enough available equity to cover the position if it was stopped out at the
maximum loss of $212. Even though the drawdown at the slightly higher risk
percentage of 7% is only 17%, we would not be able to afford the number of
contracts that this would require. The largest risk percentage that would work
for this system is about 6%.

The Monte Carlo
simulation is particularly useful for this system because it properly accounts
for the unusual distribution of wins and losses in this system. While most of
the trades in the system are small wins or small losses, there are periodic
large wins and large losses. On a one-contract basis, the system produces very
little profit. However, it's difficult to determine from the one-contract
results if the relatively large risk represented by those periodic large losses
would allow it to be scaled up to a large enough number of contracts to be
viable. The Monte Carlo simulation takes all these factors into account to tell
us that it should work.

While most
systems are probably more straight-forward than the example presented here, the
rationale for combining risk-based position sizing and Monte Carlo simulation is
simple: the more accurately we can simulate the performance of a trading system,
the better we can evaluate the system. And the better we can evaluate a system
or a set of parameter values for a system, the better our trading is likely to
be.