One of the challenges I’ve experienced in testing a variety of systems is determining whether one system is “better” than the other. Obviously there are questions of timeframe, risk objective, and style, but even having narrowed it down to those specifics, there were often times that I had to choose between two or more systems. Just recently I have finalized a system to track, but it was only after exploring a variety of metrics to make the decision between two choices. It occurs to me that walking through that decision might make a worthwhile explanation of what risk and return metrics I find useful in developing systems.

I have two trading plans with nine years of backtested results. The first plan, Plan A, has a standard deviation (StDev) of annual returns equal to 45.1%, while Plan B, the second plan, has a standard deviation of annual returns equal to 27.9%. Plan A looks a lot riskier! Ah, but what are the average returns in relation to the variability? Is the extra risk of Plan A versus Plan B worth it, given the returns that Plan A and Plan B generate?

Average Annual Gain (AAG) is just the gain for each year, summed, and divided by the number of years.

Plan A has an average annual gain of 43.8%.
Plan B has an average annual gain of 36.2%.

CAGR is Cumulative Annualized Growth Rate. Equity at the end is divided by equity at the beginning, and some exponential math is done to see what kind of annualized growth rate would be needed to go from beginning to end in the number of years observed. Usually this number is lower than the average annual gain, because it takes effort to recover from losses. A 10% loss demands an 11.1% gain to recover, 20% loss demands a 25% gain, etc.

Plan A has a CAGR of 39.0%.
Plan B has a CAGR of 34.2%.

Is Plan A’s larger return worth the “risk” in my opinion?

One measure of risk-adjusted return is the CAGR/StDev.

Plan A has a CAGR/StDev of 0.865.
Plan B has a CAGR/StDev of 1.227.

In terms of this measure, Plan B is better. However, the CAGR/StDev isn’t always the best measurement. Imagine investing in T-Bills or CDs; the annual return on cash isn’t very high, but the “risk” is negligible. When a return in the low single-digits is divided by a standard deviation that doesn’t move much, the CAGR/StDev goes through the roof! So while I like the CAGR/StDev measurement, I know that I need some backdrop of target returns that need to be exceeded in order for me to use it, or another measurement altogether, that takes into account some benchmark. In this case, with CAGRs well above 30%, it may be appropriate by itself.

I talked twice before about the meaninglessness of Alpha and Beta outside of their regression equations, but now it’s time to really put them in their proper place. To measure these, as well as to measure the Sharpe ratio, I need definitions for a Risk Free Rate of Return (RF), Excess Return (ER), and a Benchmark.

The benchmark is one place where analysts fall into trouble. The right benchmark for a U.S.-traded stock plan is probably some index of stocks traded in the U.S., maybe the S&P 500, maybe the NYSE composite, who knows. The right benchmark for global macro trading plans is probably some global asset allocation scheme, the benchmark for merger arbitrage hedge funds should be some index of merger arbitrage funds, etc. In other words, the benchmark should be a rather inclusive subset or index of the universe of securities that the trading plan samples from. Improper benchmarking can make a plan look better than it is! For my purposes, both Plan A and Plan B are based on U.S.-traded stocks and are benchmarked to the S&P 500.

The Risk Free Rate of Return (RF) is just the return on cash. For giggles, I’m using 4.5% annually.

Excess Return (ER) is the return of a plan that exceeds what one could get from cash. I take each year’s return and subtract 4.5% to obtain the ER for that year, for that plan. I also need to do this for the Benchmark, in this case, the S&P 500.

For each plan to be evaluated, I need to do a simple linear regression, with the Excess Returns (ER) of the Benchmark as the X variable, and the ER of the plan as the Y variable. The slope of the regression line is Beta. Y-axis intercept, or the value of the regression line hits when the ER of the Benchmark is zero, is Alpha.

Outside of the above definition, Alpha and Beta are meaningless. They are a tool for evaluating returns generated by a money manager or by a system, nothing more, nothing less.

Plan A has Alpha of 34.4% and Beta of 1.770.
Plan B has Alpha of 30.1% and Beta of 0.582.

Neither of these plans has any judgment applied in the selection of stocks or in the timing of entry and exit, i.e., they are both completely mechanical in nature. The significance of the Alpha and Beta measurements is simply that both have a wide amount of outperformance relative to the benchmark, but Plan A, with its higher Beta, tends to outperform even more when the S&P 500 is having a good year, and tends to have a lot less performance when the S&P 500 is having a bad year.

The Sharpe ratio is another tool for measuring risk-adjusted returns. The calculation is pretty simple, just take the straight average of several years’ Excess Return (ER) and divide it by the standard deviation (StDev) of the Excess Return (ER). It’s very similar to the CAGR/StDev ratio, except that Sharpe takes into account the Risk Free Rate of Return (RF) as a benchmark.

What does the Sharpe ratio tell me at some particular Risk Free Rate of Return (RF)? In terms of the statistic itself, it’s really a Z-Test for how likely it is that the observed returns are significantly different from Risk Free. Higher numbers are better. Another way of looking it is by saying that, if my Sharpe is high at a certain Risk Free Rate of Return (RF), then I could borrow tons of money from my broker at that rate and execute this plan profitably with no worries at all! Rrrriiiigggghtt! Tell me how well that works at 20:1 leverage, OK?

Plan A has Sharpe of 0.871 at RF = 4.5%.
Plan B has Sharpe of 1.139 at RF = 4.5%.

Now, is the Risk Free Rate of Return (RF) an appropriate benchmark? I mean, would I ever seriously consider holding cash as a viable alternative in this world of fiat currency inflation? Well, maybe not, but I could always use some other benchmark, say a targeted minimum return annually instead of the RF to determine the scores of my plans. Suppose I wanted to compound at 2% monthly. My target annual return is then 26.8%.

Plan A has Sharpe of 0.377 at Target = 26.8%.
Plan B has Sharpe of 0.338 at Target = 26.8%.

Drawdown (DD) occurs when the equity of a system falls below where it once was. I think of an index making a new high and then a correction, and the period of time where the index is below a prior high is the drawdown (DD). Drawdowns have magnitude (how much equity is lost) and duration, and a system or index could spend much of its life in a drawdown. Not counting dividends, the S&P 500 was in drawdown from 2000 until 2007, and at one point was down almost 50% from its high!

Plan A has a maximum DD of 22.9% with 56.0% of months spent in DD.
Plan B has a maximum DD of 20.3% with 43.1% of months spent in DD.

Yet another methodology would be to take the CAGRs generated above, and divide them by the maximum drawdowns.

Plan A has a CAGR/DD of 1.700.
Plan B has a CAGR/DD of 1.684.

I know me, and inevitably, I am going to examine the returns of any system I use on a short-term duration. It’s nice to have a system that consistently generates positive returns and beats the S&P 500 on a month-to-month basis.

Plan A beats the index 62.9% of the time, with 68.1% of months positive.
Plan B beats the index 66.4% of the time, with 71.6% of months positive.

Most of the above statistics are compiled with non-overlapping full years of returns, but there’s something else worth looking at, in my opinion. For every possible starting point of executing the system, what is the best 12-month result in testing, and what is the worst result?

Plan A had a best of +171.6% and a worst of -14.1%.
Plan B had a best of +99.0% and a worst of -11.7%.

It’s interesting to note that none of the above metrics really give me a definitive answer to my question, that is, is the extra risk of Plan A versus Plan B worth it, given the higher returns that Plan A generates? In my book, I tend towards the simpler metrics. I also tend to see testing metrics that are relatively close as being approximately equal. Thus, I will present my “scorecard” for debate, knowing full well that it will be debated by my readers.

I think that Plan A has a strong lead in AAG, but not so much in the CAGR category, where I think that Plan A has a very slight lead. If I had used the last seven years of backtested results to evaluate the systems, instead of the last nine years, Plan B would have the lead in both measures. Given the inherent randomness of “only” nine years, I don’t count this difference as being very large, but I really like the high numbers generated. It’s interesting to note that CAGR is a measure of risk-adjusted return when compared to AAG, since the amount of gain it takes to recover from losses results in CAGR being lower than AAG somewhat in proportion to the StDev of returns.

I like the advantage of Plan B in the CAGR/StDev metric, and the CAGR is large enough that I don’t feel the need to adjust for a benchmark, so I discard Alpha, Beta, and Sharpe in my analysis. I recognize that Beta and Sharpe are excellent “selling points” if an institution were examining these plans, but I also think they’d be much more interested in how much money could be run through them – which is another discussion entirely!

I consider the Max DDs as equal, but like the fact that Plan B spent fewer months in DD.

I think that Plan B has a slight edge in terms of consistently eking out positive months and months that outperform the index. If I were to ever need a plan to trade for a living, that is, make constant small withdrawals from trading equity in order to pay bills, etc., I would favor consistency, if long-term returns were equal.

Plan B has an edge in its “worst 12 months” being better than Plan A’s. This comes at the expense of forgoing the windfall profits of Plan A’s best year.

For my money, I consider Plan B to be superior to Plan A in risk-adjusted return, based on the backtested results.

There is one, usually unmentioned, assumption in analyzing returns. Generally it is assumed that the sample of returns the system came from is somehow indicative of the possible future returns. The standard disclaimer applies, past results are no guarantee of future returns, etc., blah blah blah, yada yada yada. I would be suspect of returns from only a bull market or a bear market, but these systems have nine years of data from 1997 to 2007 and cycle through 100+ round trips per year, holding 20 positions at a time. Is that good enough? I dunno. Backtesting is also prone to errors in the data files used, in entry/exit and expense assumptions, I don’t have dividends in the returns, there are a host of possible errors involved. The point isn’t really about Plan A or Plan B, even though both exist, and I will be tracking one of them at my new site, The Rempel Report at billrempel.com. No, the point is that there are lots of different ways to evaluate returns, and I felt that my readers might want to see them explained in the context in which I use them.

As regular readers know, I’ve moved my personal trades and the tracking of specific, actionable trading plans to the new site, The Rempel Report at billrempel.com. The new site won’t link out, except in the context of a post’s content, and is reserved for actionable trading items only! I expect to make 2-3 posts a week there, primarily on the weekends, unless something drastic happens in the markets during the week. Please visit, and if you like what you see, you can register to receive email updates with each new post, leave comments, and join the discussion!