
Ty, what did you shoot today?
Oh, Judge, I don’t keep score.
Then how do you measure yourself against other golfers?
By height.
Benchmarking … is important. It is one of many basics that most traders ignore, or do in an ignorant fashion. It is related to evaluating performance metrics, but it’s not the same thing as measuring performance.

You know, you should play with Dr. Beeper and myself. I mean, he’s been club champion for three years running and I’m no slouch myself.
Don’t sell yourself short, Judge, you’re a tremendous slouch.
In a game of golf, the score is the performance metric. The benchmark is something else; it’s what you judge whether a score is good, or not, by comparison to. Being the club champion means that performance in the tournament ranking is the metric, and the composite performance of tournament participants is the benchmark. If somebody talks about their handicap, they are applying a performance metric to the benchmark of “par.”

This is a hybrid. This is a cross, ah, of Bluegrass, Kentucky Bluegrass, Featherbed Bent, and Northern California Sensemilia. The amazing stuff about this is, that you can play 36 holes on it in the afternoon, take it home and just get stoned to the bejeezus-belt that night on this stuff.
Some benchmarks, apparently, involve multitasking.
In trading, there are two basic types of benchmark: relative and absolute.
Relative benchmarks mean that performance is measured against an index, a category, or some other measurement that moves from year to year, as the market moves. There are two separate reasons why this might be done; in the case of money managers, the index is supposedly representative of the trading style used, and in the case of many individual investors, the index is the alternative investment choice, if the portfolio was unmanaged.
The idea behind measuring a manager against a relative benchmark comes from the idea that we shouldn’t hold things against the manager, if those things were outside the manager’s control. The best gold sector fund manager in 1981 had a tough row to hoe! Relative benchmarking says, “hey, it’s not your fault, let’s see how you did against the other guys in the same type of fund!” The poorest-quality manager in emerging market funds has probably given the best-quality manager in U.S. domestic diversified funds a good clock cleaning, performance wise, but how did they each do against their respective benchmarks?
The problem many people have is assigning the proper benchmark. Is the proper benchmark the index that approximates the universe of investable issues for that fund? Is that by default the S&P 500, the Russell 1000, the DJ Total Market? Usually this comes up as a problem when the manager can broadly diversify.
For example, what is the benchmark for a hedge fund? Is it the hedge fund index? Which hedge fund index? What about merger arbitrage funds? Stat arb funds? Global Macro? If the improper benchmark is selected, for example, measuring a stat arb fund against the S&P 500, our metrics might make us believe that the fund’s management added value when it didn’t, or didn’t add value when it did.
Alpha and Beta are functions of a benchmark’s excess returns over the risk-free rate, against the manager’s excess returns over the risk-free rate. It’s a mouthful, but it boils down to needing: (1) the return of “cash” over the time period, (2) the return of an appropriate relative benchmark over the time period, with “appropriate” being a problematic word, (3) the manager’s returns over the time period, and (4) a computer to do the linear regression on. I find it very unfortunate that “alpha” and “beta” have entered into the vernacular, because nowadays it’s almost impossible to know exactly what most pundits, bloggers, and scholars mean when they use the terms, unless you ask them - and then you still may not know, because THEY can’t define it! Outside of the regression equation used as a means to evaluate a return series, the words “alpha” and “beta” are meaningless. I include them here because, when used mathematically and correctly, they are benchmarking functions that might possibly be useful.
For retail traders, often the relative benchmark is the alternative investment. This is actually a pretty easy selection â ask yourself, “Self,” because that’s what I call myself when I talk to myself, “Self, what would you buy and hold if you weren’t playing the home game?” And there’s your relative benchmark.
We might argue whether the rate on cash varies enough from year to year to be considered a “relative” benchmark, but it’s my post, so I’m including it here as a relative benchmark, and various performance metrics use cash (or the “risk-free rate”) as a benchmark, the Sharpe Ratio being one of them. The Sharpe Ratio is a comparison to the risk-free rate as a benchmark, providing either the odds that the return is statistically better than cash, or the odds of making money if using leverage at cash to fund the system, depending on how you look at it.
But wait, there’s more! What about … absolute benchmarks?
With an absolute benchmark, the target return or performance is constant regardless of what the market around it does. This doesn’t mean the benchmark is set in stone for decades, it could be revisited annually, but the benchmark is set irrespective of the broader market moves.
I can think of two good reasons to set an absolute benchmark. First, if the goal is set based on a minimal desired outcome, and second, if the goal is set based on a backtested strategy result.
Imagine “Mr. Retired” who needs to spend, this year, about 4% of his investable wealth to live life as he is accustomed. Further, he anticipates his style of life will increase in costs at 4% a year. Perhaps he should set something around +8.2% as a minimum benchmark? Maybe even crank a little compounding into the mix, so that his “cushion” in times of equity drawdown or unforeseen expenses will grow? This absolute benchmark is set based on a need or desire. If the index posts a +25% that year, “Mr. Retired” is still happy if he gets his steady, consistent +8.2%, because that’s his benchmark.
Now take “Mr. Day Trader,” who needs to make 40% on his stake to meet expenses. He had darn sure put some compounding and cost of living in his benchmark, and then carefully consider whether he’s got the capital to trade for a living, considering his rather high absolute benchmark.
Finally, take “Mr. System Developer” who has tested a system over a long time period. It would make sense that his benchmark isn’t set on a need or desire, but is set based on what he believes the system is capable of providing. In this case, perhaps the index returns a typical +10% in total return, and “Mr. System Developer” makes a +18% that year. Is he happy? Not if his benchmark is +25%, he isn’t!
Benchmarkin’ ain’t easy, but it’s necessary. Is your benchmark a relative one, or an absolute one? And why?





12 Comments
Absolute, because you can’t eat relative!
I have a single benchmark, which is my percentage of being right. The money management skills take care of the rest.
Jeff
Two comments for absolute benchmarking so far …
Sliding scale of absolute.
I don’t want to rest or get tight if I hit a target early. For instance, it could’ve been my year for a double had I not gotten so conservative when I hit 50%. Other than that, say I start the year with a nasty drawdown - getting back above water would be nice, and then when I get there start looking around for how much I can eat with what’s left of the year.
I thought about this topic after the series of remarks at Alephblog and just caught up with the elaboration here. The main difference of view that I have is much less confidence in anybody’s ability to accurately estimate what absolute performance in the future will be. Can you say a bit more about where and why you are confidence in such estimates? I tend to see all stock market phenomena as non-stationary, and to see relative performance as having a lot lower variance (or whatever is analogous to variance for non-stationary estimation). Since very long term performance of most of the indices is expected to look similar to GDP growth rate plus inflation, I feel more confident thinking about layering relative performance on top of that than believing particular absolute numbers will be accurate.
One point where my view diverges is a belief that estimating relative performance is easier in a statistical sense (or whatever is analogous to “statistical” for a non-stationary estimation problem).
In my opinion, the best estimate of future long term performance for a system is the actual long term performance or a long term backtest. When such a test includes a variety of market conditions stretching over close to a decade, I tend to have a good deal of confidence in it. I would make an exception in terms of years tested if the system was very active and very short term, since those would have more trades per unit time.
If the system were based on known anomalies, that would increase my confidence in the backtest over a given period of time (or alternately reduce the amount of time needed for confidence).
I’ve been known in the past to trade experimentally based on known anomalies while trying to work out the optimal methods for using them, but for right now, I think I’ve got enough systems that I like, so I’m tending to leave the experimenting on the computer.
Trivially, you could use a long-term stock market return as your absolute benchmark and thus keep it constant.
We can agree to disagree about which is easier to estimate, relative or absolute.
Some participants demand a minimal rate of return and need an absolute benchmark. For the most part, I think a lot of the amateur’s focus on relative returns is in the context of a “consolation prize.” Your mileage may vary.
I’m not quite following. The 1990s were mostly a bull market the 1970s were mostly a range bound market (with very poor returns after inflation). The 2000’s have been a bull market for emerging markets and commodities and a U shaped market for the U.S. equities. Can you backtest going back to the late 1960s for a U.S. common stock and bond based strategy?
If a trading system is both simple an successful, then it makes intuitive sense to think it will be adopted by enough people that the advantage will disappear over time. Academics claim that is the case. Do you think they are wrong about that long term?
When I was a part time investor, I noticed post hoc that my annual performance looked a lot like a small cap value index +10%. My goal as a more or less full time investor was to try to roughly beat some mixed cap index by 20% annually (but what index). So far I am ahead of that by most ways of reckoning, but I haven’t been trying long enough to judge whether that is sustainable, and also the formation of an appropriate index for last year was difficult because I traded both emerging markets and U.S. small value stocks which were miles apart in sector performance.
Index timing systems that use end of day data can be tested back to the 50s for the S&P and to the 20s for the Dow. 1997-today includes a bull, a crash, the bull continued, a bear, and another bull market - quite a variety, and lots of retail databases go back that far. If you have access to CompuStat, you have end of month data for equities going back many, many decades.
The academics are full of it on that point (as well as many others), both long-term and short-term.
Lack of an appropriate index to benchmark against may be a reason to set an absolute benchmark. If you have a fairly consistent methodology, perhaps you should quantify the rules that you are using (or some facsimile thereof) into a mechanical system and backtest it …
I use most of the strategies you lay out here - http://www.billakanodoodahs.com/2007/01/fundamental-technical-and-fundatechnical-stock-selection-criteria/
but I also pay attention to index oscillators and my own macro judgments. Combining macro judgments with the required data cleaning for all the funda databases I’ve looked at is something I’m not close to automating.
Couple more questions: 1) What funda database do you find complete and clean enough to use for backtesting? 2) Funda/value screening hasn’t been helpful over the last 7 months; IMO, it’s potentially harmful in any period where small cap value is substantially underperforming, so perhaps it makes sense to factor that into a funda screening based methodology. What do you think? 3) you often mention using momentum but where does it fit in the post linked above?
I considered putting this in a post, but decided it was better left in the comments section. Darn it, more people should read comments!
(1) the AAII database has been compiled (by others) historically in an “as was” condition, and I use it currently through their website. When I first started value investing, I subscribed to ValueLine for a brief period and assembled my own limited data set. I accept that the data will never be perfectly clean, and I am used to making judgments based on that, professionally speaking, since insurance company databases are typically about as pure as the meat in hot dogs.
(2) I should have used this link in my first response, re: the academics. It applies here, read completely then digest it a bit, along with the following points.
No method has been shown to outperform the indices over each and every X-month period, and humans being humans, they will lose confidence in the system and ditch it at that point. This downturn in value methods will likely discourage lots of weak hands and weak-minded wannabe Buffetts from pursuing that method. Greenblatt alluded to this phenomenon when he disclosed the basic methodology that he still uses (and probably had a poor year with, but I haven’t checked).
Some people just can’t accept the fact that other ideas besides theirs might work (reference most value guys).
Even the most open-minded individual may never have taken a good look at alternative trading philosophies. Take a look at David’s curriculum vitae and this recent post. David’s a professional investor and long-standing columnist, and yet he’s JUST NOW downloading some of those papers and just recently investigating momentum as a positive aspect, despite the fact it’s one of the more powerful historical anomalies. Eddy Elfenbein just recently did some posts where he was “surprised” by the impact of momentum, saying he was “planning to look into more of the academic literature.” So there are experienced people who I read and respect (as opposed to just read but don’t overly respect, read and openly disrespect, and neither read nor respect (I’m trying to move those in the second category into the third)), who haven’t done the homework that they would NEED to be doing, in order to arbitrage away these “anomalies.”
Now, I’m not picking on David or Eddy personally, they’re just timely examples of people who’ve been around a while and who obviously haven’t been fully exposed to the vastness of strategy options available. I don’t pull punches often, whether it’s friend or foe. Those are two cats whose writing and thinking styles I like, even though I don’t agree with them always.
Read around on others’ blogs, and you’ll see predominantly the same thing: an ingnorance, or in some cases, an outright denial, of others’ ideas that work. Keep in mind that the vast majority of people have a closed mind to investigating what might work, if is outside their comfort zone or experience. I was very much like that initially, but as I looked around and saw lots of other people making money (without being “value investors”), I decided to investigate the broader range of styles.
Many of the styles are either mutually exclusive or independent. Jim Simons’ trades will be adding liquidity to the trades of those with longer horizons. A lot of value guys will catch the falling knife, and make up for it by holding (maybe even doubling down) long enough for it to go up to a good profit. Deep value investors buy later, but all the value investors will probably sell long before the stock makes a new all-time high, whereupon it will first attract the attention of trend-trader stock funds, possibly having caught the CANSLIM and Marty Zweig crowd much earlier along the way. In the meantime, the StockBees of the world will probably have traded it two or three times, and it might grab the day-traders’ attention a dozen or more times along the way. In all that time, different anomalies will be capitalized on by different players in different timeframes. One could be “right” and profitable being short while at the same time another, longer-term player is “right” and profitable being long …
This is some of the reasoning why the vast majority of “anomalies” will never be arbitraged out! The pros are not necessarily any better-informed about what works than amateurs and the self-taught are. Most people never get beyond unconscious incompetence to conscious incompetence, much less read and digest any form of literature on the market, or do any rational investigation. Face it, the vast majority of people AREN’T RATIONAL, which means a key EMH assumption is flawed.
Some people find that things “don’t work” because they don’t work FOR THEM, or because they use faulty methods of analyzing them (you hearing this Victor?).
That’s why I’ve stated several times that one should use a multi-year period to look at returns from a system, and that one bad year, maybe two, in a row aren’t indicative of anything for an intermediate-to-long-term player. That’s also why I’ve been so critical about 5 calendar years of underperformance for Hussman’s methodology, since it’s FIVE years IN A ROW and characterized by being in the same type of market, which is indicative of a specific problem, i.e., his method fails to identify bull markets.
So finally getting around to the short answer to your question. Adding a small-cap momentum filter to your value trading system might work, but I have to ask, what would you use instead of value during those times? Would you be better off with a blend of systems (I address this in several posts, use the search feature on the right sidebar)? Or, you could stick with a system that’s having a bad year, possibly raiding the sofa cushions for cash to throw into it right now. A long backtest should include times of relative underperformance and times of drawdown.
(3) “Trends persist over a medium term” is the essence of the second bullet point under “technical factors” in that post. Days/weeks or several years = reversion, on the same timeframe. Several months to a year or so = persistence for a few months or more.
I was explicitly thinking about funds like those run by Jim Simons and David Shaw as key examples of arbitragers. That is funds employing armies of Ph.D.’s and programmers to look for every exploitable market edge they can find. They have certainly looked at momentum from many angles and have the ability to raise a lot of capital to take advantage of whatever market efficiencies they find. On the other hand, I agree with RevShark (of realmoney.com) that an individual who can buy and sell very quickly has a distinct advantage (other things being equal), so some strategies can be profitable for an individual trader and not for a large organization. So yeah, I’m guilty of the charge that I’ve been led to assume that it wouldn’t be worthwhile to look for strategies with a large edge that are easily mechanized and exploitable using liquid stocks. You’re persuading me to look more carefully.
Your point “(3)” above makes sense. What is the best reference for that empirical study?
With most strategies that are used in practice, there is an element of the formulaic and an element of discretion. The more discretion is involved, the more an individual needs to be a good fit to the system.
Here’s an example of how I combine informal systematic observations, informal observations, and intuition. In mid-December I was mostly in cash based on a combination of the overbought state of the broader market, the overbought state of the stocks I had been holding in my portfolio, a slight backup in mortgage rates, and a macro view that the severe housing downturn will represent a long term drag on the world economy. I’m generally increasing long exposure as the market goes lower, but less aggressively than I would be if I didn’t regard the economic situation as serious (by way of contrast, I didn’t regard the dips last March or July as economically serious). Today I was buying FCX which I regard as attractive because a) moderate valuation, b) benefits from increase in prices of copper and gold which generally benefit from a weaker dollar and from international demand, c) 30 day commodity charts for copper and goal are positive trending, d) stock has had a good selloff, and e) when fast money hedge funds were selling hard last July FCX was particularly hard hit and then did a V reversal at the bottom as they piled back into it. So I intend to continue scaling into this stock until I reach my maximum allowable position size or the price goes up, whichever happens first. The example is intended to show a combination of factors, some of which are easy to mechanize and some which are not.
I posted these to the MarketThoughts forum two years ago. I’m sure some of the links are still valid.
CANSLIM and Zweig are two of my favorite examples of relative strength in action. The Jan 2008 AAII Journal lists several of their guru screens that use momentum.
Keep in mind that many of the big technical short-holding-time shops are taking advantage of execution speed, doing things electronically and at much lower costs because of the number of trades they execute and their arrangements with brokers. Oh yeah, and levering the dog-doo-doo out of it! As I understand it, most of Simons’ work is trading spurious correlation pairs — not momentum. But I could have that wrong.
Test what you can mechanize, and tinker with it to improve it on a mechanical basis, then add the intuition … and set the absolute benchmark based on the mechanical test + whatever percentage you think you can add …
5 Trackbacks
[…] The performance isn’t what I’d call impressive; +9.7% annualized through 12/31/07 and +6.75% annualized through 3/7/08, currently in the middle of a drawdown in the teens. On a relative basis, it’s outperformance to the major domestic U.S. stock indices, but I’m not a fan of relative benchmarks. […]
[…] fund that benchmarks itself to one of these indices, the indices are nothing more than a potential relative benchmark for your performance. By the way, if the majority of your stake is in a U.S. stock market index […]
[…] definition of the term. I would call a system “robust” if it met my predetermined absolute benchmarks, or performed better than a competing system in terms of risk-adjusted […]
[…] to Aggressive. I believe that, over the longer term, I’m more likely to achieve my personal absolute return benchmarks having made this switch. While Rotational has advantages over Aggressive, such as scalability, […]
[…] this process until a DD level is reached that you find tolerable. Now check the CAGR against your predetermined absolute benchmark, and decide if this system is worth […]