One thing I’ve tried focusing on, sometimes successfully and sometimes not, is simplification. It’s an extension of Occam’s Razor, wherein if two explanations for a phenomenon are viewed to be equally explanatory, the simpler explanation is the “best” one. This is a mental model or heuristic, and like all mental models, it ain’t perfect, but in my experience it works out for the best more often than not.
My first exposure to Occam’s Razor was in some now-forgotten science fiction novella I read as a youth, and the idea just “clicked” with me as intuitively correct. In collegiate (or “university”) settings, I was exposed to the benefits of stepwise regression and paring down multivariate models through discarding those independent variables with low Student’s t. I use this paring down or pruning technique at work as well as when examining trading strategies or opportunities. My first question, when faced with complex models, has for a long time been “I wonder how many of those variables actually do most of the work?”
Most people, from what I’ve observed in my interactions, don’t think this way. They fall in love with the “optics” of complex models, because they look good, and tend to impress people who think broadly but not deeply. The common sales technique for model-makers is “model bloat,” adding complexity in order to impress the buyer, while having a deleterious impact on the signal-to-noise ratio of the model. I am not sure if this is intentionally misleading behavior on the part of the model-maker, or if the model-maker is simply falling prey to the same mental flaw of being unduly impressed by complexity.
A case in point, which affects almost all of you, is credit scoring. I’ve seen several of FICO’s models, including one common model which uses thirty-nine variables and impacts the prices y’all pay for certain financial services. From many years of working with it, I strongly suspect that a model with only five to eight of the variables would achieve substantially all of the predictive power contained in their thirty-nine variable model. The “wildly cynical” part of me suspects that FICO made a thirty-nine variable model because they would feel embarrassed trying to sell a five or eight variable model; the less cynical part of me suspects that FICO believes their own bullshit.
Here is an extended quote from the CIA’s Center for the Study of Intelligence, where they reference an unpublished manuscript of a study from 1973. I came across this a few years back, but the entire document is now making the rounds on this “series of tubes” called the internets, and the reminder sparked this post.
Eight experienced horserace handicappers were shown a list of 88 variables found on a typical past-performance chart–for example, the weight to be carried; the percentage of races in which horse finished first, second, or third during the previous year; the jockey’s record; and the number of days since the horse’s last race. Each handicapper was asked to identify, first, what he considered to be the five most important items of information–those he would wish to use to handicap a race if he were limited to only five items of information per horse. Each was then asked to select the 10, 20, and 40 most important variables he would use if limited to those levels of information.
At this point, the handicappers were given true data (sterilized so that horses and actual races could not be identified) for 40 past races and were asked to rank the top five horses in each race in order of expected finish. Each handicapper was given the data in increments of the 5, 10, 20 and 40 variables he had judged to be most useful. Thus, he predicted each race four times–once with each of the four different levels of information. For each prediction, each handicapper assigned a value from 0 to 100 percent to indicate degree of confidence in the accuracy of his prediction.
When the handicappers’ predictions were compared with the actual outcomes of these 40 races, it was clear that average accuracy of predictions remained the same regardless of how much information the handicappers had available. Three of the handicappers actually showed less accuracy as the amount of information increased, two improved their accuracy, and three were unchanged. All, however, expressed steadily increasing confidence in their judgments as more information was received. This relationship between amount of information, accuracy of the handicappers’ prediction of the first place winners, and the handicappers’ confidence in their predictions is shown in Figure 5.
With only five items of information, the handicappers’ confidence was well calibrated with their accuracy, but they became overconfident as additional information was received.
The same relationships among amount of information, accuracy, and analyst confidence have been confirmed by similar experiments in other fields.
How many indicators and inputs are in your trading (or economic) models, and how many of them actually do the heavy lifting? What can you live without and get the same results? Is every filter on your stock screen necessary? Do you really need more information?





