One thing I’ve tried focusing on, sometimes successfully and sometimes not, is simplification. It’s an extension of Occam’s Razor, wherein if two explanations for a phenomenon are viewed to be equally explanatory, the simpler explanation is the “best” one. This is a mental model or heuristic, and like all mental models, it ain’t perfect, but in my experience it works out for the best more often than not.
My first exposure to Occam’s Razor was in some now-forgotten science fiction novella I read as a youth, and the idea just “clicked” with me as intuitively correct. In collegiate (or “university”) settings, I was exposed to the benefits of stepwise regression and paring down multivariate models through discarding those independent variables with low Student’s t. I use this paring down or pruning technique at work as well as when examining trading strategies or opportunities. My first question, when faced with complex models, has for a long time been “I wonder how many of those variables actually do most of the work?”
Most people, from what I’ve observed in my interactions, don’t think this way. They fall in love with the “optics” of complex models, because they look good, and tend to impress people who think broadly but not deeply. The common sales technique for model-makers is “model bloat,” adding complexity in order to impress the buyer, while having a deleterious impact on the signal-to-noise ratio of the model. I am not sure if this is intentionally misleading behavior on the part of the model-maker, or if the model-maker is simply falling prey to the same mental flaw of being unduly impressed by complexity.
A case in point, which affects almost all of you, is credit scoring. I’ve seen several of FICO’s models, including one common model which uses thirty-nine variables and impacts the prices y’all pay for certain financial services. From many years of working with it, I strongly suspect that a model with only five to eight of the variables would achieve substantially all of the predictive power contained in their thirty-nine variable model. The “wildly cynical” part of me suspects that FICO made a thirty-nine variable model because they would feel embarrassed trying to sell a five or eight variable model; the less cynical part of me suspects that FICO believes their own bullshit.
Here is an extended quote from the CIA’s Center for the Study of Intelligence, where they reference an unpublished manuscript of a study from 1973. I came across this a few years back, but the entire document is now making the rounds on this “series of tubes” called the internets, and the reminder sparked this post.
Eight experienced horserace handicappers were shown a list of 88 variables found on a typical past-performance chart–for example, the weight to be carried; the percentage of races in which horse finished first, second, or third during the previous year; the jockey’s record; and the number of days since the horse’s last race. Each handicapper was asked to identify, first, what he considered to be the five most important items of information–those he would wish to use to handicap a race if he were limited to only five items of information per horse. Each was then asked to select the 10, 20, and 40 most important variables he would use if limited to those levels of information.
At this point, the handicappers were given true data (sterilized so that horses and actual races could not be identified) for 40 past races and were asked to rank the top five horses in each race in order of expected finish. Each handicapper was given the data in increments of the 5, 10, 20 and 40 variables he had judged to be most useful. Thus, he predicted each race four times–once with each of the four different levels of information. For each prediction, each handicapper assigned a value from 0 to 100 percent to indicate degree of confidence in the accuracy of his prediction.
When the handicappers’ predictions were compared with the actual outcomes of these 40 races, it was clear that average accuracy of predictions remained the same regardless of how much information the handicappers had available. Three of the handicappers actually showed less accuracy as the amount of information increased, two improved their accuracy, and three were unchanged. All, however, expressed steadily increasing confidence in their judgments as more information was received. This relationship between amount of information, accuracy of the handicappers’ prediction of the first place winners, and the handicappers’ confidence in their predictions is shown in Figure 5.
With only five items of information, the handicappers’ confidence was well calibrated with their accuracy, but they became overconfident as additional information was received.
The same relationships among amount of information, accuracy, and analyst confidence have been confirmed by similar experiments in other fields.
How many indicators and inputs are in your trading (or economic) models, and how many of them actually do the heavy lifting? What can you live without and get the same results? Is every filter on your stock screen necessary? Do you really need more information?





7 Comments
Excellent post. However, I think many model builders (Ned Davis) use numerous indicators in their models not because they believe it improves their predictive value, but because they concede many of the indicators they use will a) go through streaks of failure or b) eventually stop working.
Btw, there’s a story I read somewhere about Ned Davis speaking at a conference whereby he gave a very convincing case to why he believed the stock market was destined to sell off- showing numerous charts and studies that supported his case. Half way through his presentation he stopped and announced to the audience that he had brought the wrong presentation. He then continued with an equally convincing talk demonstrating why the market was poised to go through the roof.
The point of his performance was even the most reliable indicators can support divergent messages and one shouldn’t fall into the trap of running to the indicators that best support their market view.
The edge lies within the trader and his mental models gained through experience/proper training - even for system traders. The key is to be able to zoom in on the relevant information for whatever the current conditions and circumstances require.
only one indicator counts. the price.
Great post. Made me realize that sometimes I fall into the trap of thinking more information is better, but there really is beauty in simplicity.
Your wish is my command. Good to hear from you…cheers dk
Great post.
I use 4 indicators and a system that’s proprietary. My system is so simple, even I can use it.
Jeff
This is not an impressive an observation. Its merely a dynamic of random thinking. The inputs (the information) and the outputs (the processors), are going through an inductive process. Present a series of numbers of 1, 3, 5, 7 and ask the processor to assign number 1 to Sunday, and deduce the days for numbers, 2, 4, 6, etc., etc. Keep adding farily typical components to it. The number of components is directly related to the processor’s capibility in predicting results. It depends upon the inputs. The inputs described in the CIA study, ironically, is an example of Occam’s Razor: There are too many unrelated and random points of information. Its not the quantity of information supplied which has any impact, its the quality. And in the CIA bloated and doesn’t indicate anything! Its just a random result, it doesn’t show anything because the quality, not the quanity, is poor.