TRPITS stands for “Turning a Research Paper Into a Trading System.” This paper by Andrew W. Lo and Pankaj N. Patel, 130/30: The New Long-Only (PDF), was intended to promulgate an index, or grouping of stocks, that is transparent, liquid, and mechanically implemented, in order that it may be used as a benchmark index for funds of the 130/30 long/short style.

Over the next several weeks, I will use it to construct a mechanical trading system that is suitable for retail use. To do this, I must first understand detail behind the construction of their index. Then, I shall examine the remainder of the paper to see what lessons can be learned from it. Next, I shall determine what steps are needed in order to make the system trade-able, and suitably profitable, on a retail scale, using commonly-available tools. I shall then test the system and provide analysis of its results, including equity curves, drawdown amounts, and other metrics. It is suggested that readers download the paper (PDF) and familiarize themselves with it prior to my beginning this task.

While this example will be worked with one particular research paper, the steps taken are applicable to any research paper that documents a Technical, Fundamental, or FundaTechnical stock market “anomaly” in detail. This is entirely consistent with the theme explored in Using Public Information to Find a Trading System Edge, and some of the installments may appear as columns in MarketThoughts.

Steps Needed to Make This a Retail System

(I) Set the timeframe. I will evaluate every four weekends, making changes on the next trading day. Turnover will not be constrained.

(II) Set the number of positions and position weights. I lack the “black box” that Credit Suisse used, and desire a smaller scale for retail trading, but I’m still curious as to the impact of position count on returns. Since this is a scored algorithm, we can assume that increasing the position count might increase diversification and decrease volatility (i.e. “company risk”), but will almost certainly decrease returns (if the scoring algorithm is, indeed, predictive). The other concerns I have are related to retail traders using margin! Leverage costs will be higher, and the very real possibility of a margin call during high drawdown is harder to set up a test for. For me as a retail trader, it’s entirely possible that margin requirements will vary by security.

I am going to postpone the actual setting of position counts, and test both long-only and short-only, with equal weight to all positions, and an assortment of arbitrary position counts to start. After analyzing the results, including some variables determined later, I will determine what level of diversification is the best trade-off, and whether a long-short extension is possible for a retail trader in a mechanical system, based on this algorithm.

(III) Set the investable universe. This could be based on capitalization, average volume traded, or average dollar volume of shares traded. Since this is not a day-trading program, I’m not overly interested in enough liquidity to get in and out quickly, and I don’t need a great deal of volume, either, since it’s a retail program. I will set the minimum requirements as being traded on the NYSE, AMEX, or NASDAQ (no pink sheets) and a $100 million market cap.

(IV) Set the scoring algorithm.

This is going to involve some difficulty and quite a few tradeoffs. Their “10 Credit Suisse composite alpha factors” are assigned equal weight, despite several of them being redundant inside the seven-category functional list I composed from their indicator definitions. I have several objectives: first, I want to make sure that the data can be defined by the backtesting engine I use, as well as defined by the regular screening software I use, and this will eliminate some items; second, I want to remove the redundancies inherent in their definitions; finally, I want to keep the weighting at least somewhat similar to theirs, to try and assure that what I come up with is close to what they were looking at.

What I defined functionally as category (1), “value” ratios of price to earnings, sales, book value, etc., were originally defined by Credit Suisse in their paper as two different categories, each with 1/10th weighting. In “traditional value” they included forward estimated P/E ratio, Price/Sales, Price/Cash Flow, Dividend Yield, and Price/Book, scored on an absolute basis versus all stocks in their universe. In “relative value” they included the following on an industry-relative basis: Price/Sales, Trailing P/E, Price/Cash Flow, Price/Sales versus 5-year average, Trailing P/E versus 5-year average, and Price/Cash Flow versus 5-year average.

I didn’t want to use the same metric twice when scaling this down. I also didn’t want to use forward estimates twice, since that’s an item that comes up later. Another difficulty was presented by having a percentage ranking algorithm built into the screener for industry-relative metrics, but not having an equivalent algorithm built into the backtest engine. In the end, I settled for Price/Book and Price/Sales on on an absolute basis, to set two variables.

What I defined functionally as category (2), growth in earnings, was originally contained in three categories in the paper. The first was “historical growth,” which also contained sales and cash flow growth; the second was “expected growth,” which was all about analysts’ estimates for EPS changes; and the third was “earnings momentum,” which is really a misnomer, and applies to changes in analysts’ estimates for growth.

I will use 12 month year over year Cash Flow growth for one variable.

I will use long-term EPS analysts’ percentage growth estimate for another variable.

Because I have difficulties with the software I use, I can’t compute a ranking variable based on analysts’ upgrades of earnings. I can, however, use the presence of analysts’ upgrades to consensus earnings as a filter for the stocks the system will trade, and use the presence of downgrades as a filter on the short side.

What I defined functionally as category (3), growth in sales and cash flows on a historical basis, was initially part of two different variables in the Credit Suisse paper. It shared a category (”historical growth”) with earnings and cash flows, and there was a separate category for “sales momentum” as well. It is the latter that I will focus on here. My variable will be year over year growth in sales.

Functional category (4), assorted fundamental trends in financial statement ratios, has quite a variety of ratios available. I will pick just one, profit margin (net income divided by revenues).

Credit Suisse used 1/10th of their weighting in functional category (5), price momentum / relative strength. Every measure they used was lagged by 20 days, which is consistent with other research that I’ve read, and techniques used by some other active trader / bloggers. This is problematic, since I don’t have any screener or backtester that uses lagged momentum. Arrrggh. I could, again, use two screeners to find the relevant stocks, but then I couldn’t backtest! Arrgggh.

These are the categories that Credit Suisse averaged to develop their momentum measure: Slope of 52-Week Trendline (Calculated with 20-Day Lag), Percent Above 260-Day Low (Calculated with 20-Day Lag), 4/52-Week Price Oscillator (Calculated with 20-Day Lag), 39-Week Return (Calculated with 20-Day Lag), and 52-Week Volume Price Trend (Calculated with 20-Day Lag).

The backtesting engine I use, as well as the regular screening software I use, each have relative strength ranks in 4, 13, and 52-week flavors. For the long model, I considered using a strong 52-week performance as a positive factor and a strong 4-week factor as a negative factor to take the place of the lag, but I didn’t know what that would do to the weighting. It also occurs to me that if I using only 52-week performance, I could catch a high-riser that is in a strong, recent downtrend. In the end, I decided to test various combinations: simply use a strong 52-week relative strength as a variable, knowing that it wasn’t a perfect match; using relative strength as a variable with a percentage from 52-week high filter, to avoid recent strong downtrends; and using relative strength and percentage from 52-week highs both as filters. For the short model, I will reverse the criteria and combinations, using a model with low relative strength as a variable; low relative strength as a variable with a “within X% of 52-week low” as a filter; and filters for both items.

My technical stock screening software doesn’t have jack for FundaTechnical or Fundamental data, but it does have all the short-term reversal indicators that Credit Suisse considered. On the other hand, the backtesting engine I use, as well as the regular screening software I use, don’t have these reversal indicators at all. I could therefore do a two-step process in screening, wherein I took one set and plugged that watchlist for the reversal indicators and grades, but I wouldn’t be able to backtest that. Arrggh. I am therefore deleting (6), short-term mean-reversion technical indicators, from consideration. This eliminates 1/10th of the Credit Suisse weighting variables.

Credit Suisse used 1/10th of their weighting in functional category (7), market capitalization (smaller is better), and since I’ve removed the S&P 500 limitation from the definition of the trade-able universe, I think this is redundant, and I’m removing it.

Final Scoring Algorithm:

* 52-week price momentum tested three ways: a scored variable, a scored variable with a filter, and as two filters

* one filter (no weight applied) for analysts’ consensus earnings estimates (must be negative for shorts, positive for longs)

Others are all ranked and scored variables
* Price to Book ratio in universe of all stocks
* Price to Sales ratio in universe of all stocks
* Year over Year Cash Flow growth percentage
* Analysts’ long-term EPS growth estimates
* Year over Year Sales growth percentage
* Profit margin

The test version has 1 undetermined factor, 6 scored variables, and 1 filter, accounting for 8/10ths of what Credit Suisse had selected for their variables. Is this the “best” possible solution? Define “best.” Assuming the variables are predictive and that variables in the same functional categories are partially redundant, this solution should be effective while allowing for a simpler calculation on screeners available to retail traders. It attempts to be faithful to the variables promulgated by Credit Suisse, but does not follow their lead 100%. “Improvements” from that perspective are left to the reader.

I envision this as an iterative process. The scoring algorithm and timeframe will be tested with arbitrary position counts, and a variety of momentum indicators, in both long and short versions. From this analysis, an interim optimization may be made.

Next: testing the new retail system