CRSM Regression
In the last post, I presented a schematic of how Cluster Random Subspace (CRSM) would work for portfolio optimization. But the concept can be extended to prediction and classification. Obviously “Random Forests” could incorporate this concept to build superior decision trees with fewer samples–but I would caution that decision trees tend to have poor performance for prediction of financial time series (this is due to binary thresholding and over-fitting). However, standard multiple regression has been a workhorse in finance simply because it is more robust and less prone to over-fitting than other machine-learning approaches. One of the challenges in regression is that it breaks down when you have a lot of variables to choose from and some of them are highly correlated. There are many established ways for dealing with this problem (PCA, stepwise regression etc), but CRSM remains an excellent candidate since it can be used to form a robust ensemble forecast from a large group of predictors. It does not address the initial choice of variables, but at least it can automatically handle a large group of candidates that may contain some highly correlated variables. CRSM Regression is a good way to proceed when you have a lot of possible indicators, but don’t have any pre-conceived ideas for constructing a good model. Here is a process diagram for one of many possible methods to apply CRSM Regression:
Cluster Random Subspace- A Process Diagram
In the last post, I introduced a method for improving portfolio optimization called the Cluster Random Subspace Method (CRSM). The paper was written by Michael Guan for his thesis in computer science, and in case you missed it a link to the paper can be found here: CRSM Thesis. CRSM has demonstrated considerable advantages versus using conventional Markowitz tangency/max-sharpe (MSR) portfolios especially on either large-scale or homogeneous universes. This can be expected because CRSM is designed to reduce variance by aggregation or use of a statistical ensemble approach. Conventional MSR suffers from the “curse of dimensionality” in these situations and tends to serve more as an “error-maximizer” rather than produce effective portfolio allocations. Since MSR is used within CRSM to maximize the portfolio sharpe ratio in this paper, it can be directly compared to using a standard MSR approach along with using RSM/RSO– the original random subspace method which also uses MSR, and just an equal weight portfolio. CRSM-R is just a variant of CRSM that uses replacement in the sampling process. I took this chart from the paper which aggregates the results across six different universes using each algorithm type. For a more in depth breadown, I would suggest reading the original paper:
In terms of the objective function- maximizing the sharpe ratio- CRSM is by far the best algorithm and vastly superior to MSR across the universes tested in the paper. While I don’t want to spoil the surprise, this does not come at the cost of a reduced CAGR- as is typically the case with say Michaud resampling etc. In fact CRSM also has the highest CAGR- beating out equal weight, which is impressive especially for homoegeneous universes. To get a better sense of how CRSM works, it is useful to look at the process diagram below:

Cluster Random Subspace Method for Portfolio Management
href=”https://cssanalytics.wordpress.com/wp-content/uploads/2014/11/cluster-image-3.png”>
One of the many areas that I have explored in my own research is creating new methods to improve upon mean-variance optimization. A while back I wrote about the concept of applying the Random Subspace Method (RSM) as a viable alternative to improve upon some of the deficiencies in standard portfolio optimization . I called the application of RSM to optimization RSO– which showed promise versus traditional mean-variance for homogeneous universes. The original concept for random subspaces originated at the famous Bell Labs, and was designed to reduce dimensionality for prediction or classification. The most popular application of RSM is in “Random Forests” which is used for generating more robust “decision trees” in machine learning. RSM uses bagging to draw samples of predictors and combine their estimates together in an “ensemble.” The primary advantage is that the noise created by each group of predictors tends to be somewhat unrelated to other randomly selected groups. As a consequence, the noise gets “cancelled out” and what remains is a more stable and accurate predictor ensemble.
While the RSM framework is statistically sound, it does have some obvious areas of weakness that require a more refined approach. I worked together with Michael Guan of Systematic Edge as an advisor for his computer science thesis on a superior approach called “Cluster Random Subspace Method” (CRSM). Michael is a very smart guy, and it was a lot of fun working with him. We also received some valuable feedback from Adam Butler of GestaltU. The application we used to demonstrate the advantages of CRSM was portfolio optimization, but the concept can be applied to prediction and classification as well (including Random Forests). The thesis can be found here: CRSO Thesis<a . I would encourage everyone to read the thesis, but for those that want more of a simple overview, I will be providing a summary in the next post.
Predicting Bonds with Stocks: A Strategy to Improve Timing in Corporate and High Yield Bonds

Alpha Architect recently posted a good article summary of an academic paper that links the returns of corporate bonds to equity returns. The authors find that the equity market can lead corporate bond returns by up to one month. I have explored this concept before on the premise that there is a logical linkage between stocks and corporate bonds (and high yield) since they share vulnerability to common economic factors (growth, credit risk etc). Since stocks are far more liquid than corporate or high yield bonds, it makes sense that the stock market should have a leading signal.
As a simple test to demonstrate the potential value of this leading signal, one can compare a simple moving average strategy that uses the stock market to time either corporate or high yield bonds versus using the underlying market to generate the signal. The hypothesis would be that the stock or equity market signal should generate superior results to using a traditional moving average strategy using the underlying time series. Furthermore, we would hypothesize that the advantage should be superior relatively speaking for high yield bonds versus corporate bonds since they have more similar economic factor risk to equities (investment grade corporates are to some extent correlated to economic growth and credit risk, but also have greater interest rate sensitivity given the ratio of their credit spread to the total nominal yield). In addition, this effect is further enhanced by the fact that high yield bonds have lower liquidity than corporate bonds given that they tend to be issued by smaller/less diversified companies. This means that the lead/lag relationship between equities and high yield should be longer and more stable than for corporate bonds.
For our simple test, we use a 20-day moving average strategy to trade either High Yield (HYG) or Corporate Bonds (LQD) using either the underlying time series to generate signals or the S&P500 (SPY) to represent equity market signals. As with most SMA strategies, we go long if the close is above the SMA and in this case we allocate to cash (SHY) below the SMA. Here are the results extending the bond indices back to 1995:
Results are very much in line with both hypotheses: the equity market is a superior signal than using the underlying time series, and the advantage in applying this effect is stronger in high yield bonds versus using corporate bonds. The reduction in drawdown, and increase in risk-adjusted returns using the equity market signal to trade high yield bonds is substantial. As a note of caution, this analysis does not consider relative trading costs, but one would expect a similar number of total trading signals regardless of which market is used since they are exposed to similar risk factors and also use the same moving average lag. The lesson for quants is that it often helps to pay attention to the fundamental drivers of a market’s underlying return rather than applying naaive technical trading rules.
Flexible Asset Allocation With Conditional Correlations
Recently, I have been engaged in some research collaboration with Ilya Kipnis from QuantStrat TradeR. Ilya is a talented quant with a passion for testing new ideas. One of the ideas relates to a recent post he wrote on replicating an heuristic method for constructing portfolios called “Flexible Asset Allocation”. A while back I was forwarded an interesting article by Wouter Keller on “Flexible Asset Allocation” (FAA) that gained popularity for its novel approach for blending momentum, correlations and volatility into one composite ranking scheme for tactical asset allocation. The correlation component of the ranking was apparently inspired by the Minimum Correlation Algorithm. The general ranking method is essentially a weighted average of the ranking of each asset versus the universe in terms of momentum (return over a chosen window, higher is better), volatility (standard deviation, lower is better) and correlation. For the correlation component, Mr. Keller suggests ranking assets relative to their average correlation to all other assets- since that is a good proxy for the diversification potential (lower is better). The final result is that FAA manages to demonstrate both good performance and also robustness across time. For a good review of FAA, Wes Gray of Alpha Architect (which is a very good research resource) wrote a good post showing the superiority of the approach over simpler methods here.
In an email to the author, I was of course flattered, but suggested as an improvement that he use “conditional” correlation rankings since the real diversification of adding the “nth” asset was dependent on what was currently selected in the portfolio. In other words, you can’t rank everything all at once, it needs to be done in the sequence in which you choose assets: for example- holding all other factors constant (ie momentum and volatility), if I first select a bond fund (owing to its low correlation to other assets) the next lowest correlation or best diversifying asset would be a stock index rather than say another bond fund. This conditional correlation ranking approach avoids redundancy and leads to superior diversification and presumably better risk-adjusted performance than using the original method. However, it is important to note that it still does not solve the thorny issue of how many assets to choose from the portfolio- ie selecting the “top n” by composite rank. Furthermore, the choice of “top n” to hold in the portfolio is compounded by the original selection of the asset universe. These are separate problems that can be solved by using a more elegant framework, and Mr. Keller has several new articles out using variations on standard MPT in a dynamic format. The drawback to these approaches (and many other viable alternatives) is that they tend to have complex mathematical implementations that are not as simple and intuitive as the original FAA.
Getting back to the concept of conditional correlations , one should replace the correlation component in FAA with a dynamic version of the average asset correlation. This means that the average correlation relates only to new assets not already included in the portfolio to the current assets included in the portfolio. After selecting the first asset, you would rank all remaining assets in terms of their correlation to the first asset (lower is better). For example, if I select a bond fund first, I would then rank all remaining assets in the universe based on their correlation to the bond fund. Once you have two or more assets, you would find the average correlation of each remaining asset to the current portfolio assets. This average correlation becomes the ranking method for the remaining assets (lower is better). So if I have a bond fund and a stock fund in the portfolio, I would find the average of the correlation of each asset remaining (not included in the portfolio) in the universe to the bond fund and the stock fund. So if for example I was doing this calculation for a gold fund, this would be the average of the correlation of gold to the stock fund and the correlation of gold to the bond fund. This would be calculated for each remaining asset, from which point you can rank them accordingly. This process continues recursively until you reach the desired target “top n” /number of assets in the portfolio. So if you choose say 5 assets from a 15 asset universe, you would have to compute the regular average correlation method from the original FAA to select the first asset and then calculate this conditional correlation four separate times.
I know this sounds confusing, so Ilya at QuantStrat Trader plans to post a spreadsheet soon showing how this is calculated and also associated R Code for the full implementation. Presumably, this approach will be more robust when applied to a wider range of universes and also will be less sensitive to the choice of “top n”. This new version of FAA is still fairly simple, it just requires a few more calculations and it is much less complex than implementing MPT-type optimization. There are several adjustments that can be made to this new version of FAA that would make it robust to the issues mentioned above. Its just always a question of how deep one is willing to go down the rabbit hole- for hands-on practitioners, this modified version of FAA will probably be practical enough with some common sense calibration. For quants (and ultimately for the end investor), it is more beneficial to consider a more nuanced/sophisticated approach- and probably a different framework altogether. In reality, there are many dimensions to creating investor portfolios that require not only precise consideration of how to blend different elements (such as momentum, correlation, and volatility) but more importantly there needs to be a method for dealing with constraints and matching investor risk preferences. Of course, this brings back the necessity for a framework that computes a set of portfolio weights. There are some very interesting solutions to this problem that I will present at some point in the future.
Risk Management is Back in Vogue

As human beings, we are by design our own worst enemies- especially when it comes to navigating the ups and downs of the market. The longer a person experiences prosperity and does not suffer any drawdowns, the less risky they perceive the market to be. The reverse is true after a person experiences large losses and severe drawdowns. After the equity market goes up significantly for years without a serious correction, most investors begin to forget about the importance of risk management. Typically the only risk they think of is keeping up with the returns of their peers- the fear of missing out on the gravy train. This causes investors to push to overweight stocks beyond both their required allocation for financial planning and also their inherent comfort level. Winners bet more than they should on stocks and losers bet less. No one wants to hear about tactical asset allocation when the market is going up- they would rather listen to stock tips from a “5-star” fund manager- or from their advisor. After a bear market, investors seek out the same tactical advisors they previously ignored, and tend to especially favor advisors with compelling bearish stories that resonate strongly with investor fear and skepticism. Of course they ignore “black-box” systematic strategies that are more likely to capture returns in the next bull market.
The bottom line is that the longer an investor experiences a market state, the more comfortable that they become that it will last forever. The old saying about boiling a frog rings true: if you want to trick the frog, just start out with cold water and slowly bring things to a boil and it won’t jump out until its too late. The stock market demonstrates predictable long-term mean-reversion- the longer and higher something goes up, the more likely it will go down in the future. The result is that investors on balance badly underperform virtually any benchmark imaginable–even accounting for their chosen asset mix.
After a large and sharp market pullback, suddenly investors become alert to the realities of market risk. If you read a lot of the media these days, risk is the only topic that everyone wants to talk about. Risk management is suddenly back in vogue and no one wants to hear about the “bargain stocks” or “good deals” that their beloved fund managers are talking up on TV to rally support for the cult of equities (long equities only buy and hold). But we should all take a step back and consider that risk management is not a fashion fad, but rather should be a core staple in an investor’s portfolio. Managing risk not only protects an investor’s portfolio against market risk, it also helps to protect them from the greatest risk of all: themselves.
Error-Adjusted Momentum
In the last post I introduced a method to normalize returns using the VIX to improve upon a standard momentum or trend-following strategy. There are many possible extensions of this idea, and I would encourage readers to look at one of the comments in the previous post which may inspire some new ideas. The motivation for this method was to provide an alternative approach that is more broadly applicable to other assets than a VIX-based strategy (which is more appropriate for equities). This method uses the standard error of the mean to adjust returns instead of using the VIX, which is a proxy for market noise. The logic is that returns should be weighted more when predictability is high, and conversely weighted less when predictability is low. In this case, the error-adjusted moving average will hopefully be more robust to market noise than a standard moving average. To calculate the standard error, I used the 10-day average return to generate a forecast, and then took the 10-day mean absolute error of the forecast. To normalize returns, I divide each return by this standard error estimate prior to taking a 200-day average of the re-scaled returns. The rules for the ER-MOM strategy are the same as in the last post (although poorly articulated):
Go LONG when the Error-Adjusted Momentum is > 0, Go to CASH if the Error-Adjusted Momentum is < 0
Here is how this strategy compares to both the VIX-adjusted strategy and the other two baseline strategies:
The error-adjusted momentum strategy has the best returns and risk-adjusted returns- edging out the previous method that used the VIX. In either case, both adjusted momentum strategies performed better than their standard counterparts. One concept to note is that the benefit or edge of the adjusted momentum strategies tends to be more significant at longer trend-following lookbacks. This makes sense because there are likely to be a wider range of variance regimes throughout a long stretch of time than over a shorter lookback. Adjusting for these different variance regimes gives a clearer picture of the long-term trend. Using the historical standard deviation is also a viable alternative to either using the standard error or the VIX, and there are a lot of other ways to measure variability/noise that can be used as well.
VIX-Adjusted Momentum
The addition of many small details can make a big difference in seemingly simple strategies. I often like to use cooking analogies, and so I like to think of tomato sauce as a classic example: it contains few ingredients and is simple to make but difficult to master without understanding the interaction between components. Trend-following strategies are no different: anyone can create a simple strategy, few can master the nuances. One of the problems in measuring trends in financial market data is that the variance is not constant. In statistics we know that heteroscedacity can render the use of traditional regression analysis meaningless. Therefore, to use un-adjusted price data in conjunction with a moving average strategy, or even taking the simple compound return or ROC (rate of change) can lead to potentially poor timing decisions and increase the frequency of trading.
The good news is that it is well-accepted that volatility is highly predictable in financial markets. Perhaps one of the best measures of volatility is implied volatility reflected by market participants in the VIX. A simple idea would be to use the VIX to adjust daily returns in order to create a trend-following strategy that is more robust to non-constant variance. The method as follows is very simple:
1) compute daily returns or log returns for the S&P500 time series
2) divide each daily return by the VIX level on the same day
3) take a lag of your choosing and compute the simple average–say 200-days in this example
Strategy: Go LONG when the VIX-Adjusted Momentum>0, Go to cash if sma, cash if not) and a 200-day traditional momentum strategy (go long when the ROC>0, cash if not).
here is a graph comparing the strategies:
Clearly the VIX-ajusted momentum is superior to the traditional trend-following strategies using this set of parameters. This concept can be extended in several different ways- for example, one could instead use historical volatility, or the difference between historical and implied in other creative ways. Hopefully readers will be inspired to take a fresh look at improving upon a simple and traditional strategy.

In part 1 of the series we introduced a three-factor model that decomposes momentum profitability and how that can be translated into a momentum score for an asset universe. In this post we will show how momentum strategies can be profitable even under the conditions where the market is efficient and time series performance is not predictable.
The momentum score we introduced in the last post was comprised of: 1) time series predictability (T) 2) dispersion in mean returns (D) and 3) the existence of lead/lag relationships (L). The score is computed by adding T and D and subtracting out the value of L. More formally, we would take the average auto-covariance across time series, the variance in cross-sectional mean returns and the average cross-serial auto-covariance between asset returns.
One of the core predictions of a truly efficient market is that asset prices should follow a random walk and hence should not be predictable using past prices (or any form of technical analysis). The next period price in this context is a function of the current price plus a random distribution output with a mean and error term. Whether this theory is in fact true based upon the empirical evidence is not a subject that I will address in this article. Instead, what I personally found more interesting was to determine whether the presence of an efficient market would still permit a momentum strategy to be successful. The answer boils down to the formula that de-composes momentum profitability:
T+D-L= Momentum Score
in more formal technical terms, the equation breaks down to:
Momentum Profitability= average asset auto-covariance (T) + cross-sectional variance in asset means (D) – average asset cross-serial auto-covariance (L)
Returning back to the concept of a random walk, this would imply that both auto-correlations and cross-serial autocorrelations would be equal to zero (or close to zero). In that case the formula breaks down as follows:
Momentum Profitability= cross-sectional variance in asset means (D)
Thus, even in the case of a true random walk or an efficient market, we can expect profits to a momentum strategy as long as there is dispersion in the asset means– in other words, we would require that the asset means be heterogeneous to some degree to capture momentum profits. Technically another requirement is that the asset means are fairly stationary– in other words they can drift over time but their means stay approximately the same. However, from a practical perspective many risk premiums are fairly stable over long periods of time (ie the return to investing in the stock market for example). Hence the existence of variation in asset returns alone can support the existence of momentum profits even if the market was considered to be efficient. This helps reconcile why Eugene Fama- the father of the Efficient Markets Hypothesis- can still claim that momentum is the “premier anomaly” and still not technically be a hypocrite (even though it sounds that way to many industry practitioners).
In the last post, we showed that a broad multi-asset class universe had a higher momentum score using the formula presented above than a sector equity universe. This was demonstrated to be primarily due to the fact that the dispersion in asset means is much higher in an asset class universe than a sector universe. To add further to this result, we would expect that the mean returns for asset classes will be more stationary than the means for sectors or individual stocks since they reflect broad risk premiums rather than idiosyncratic or specific risk. As markets become more efficient over time and all assets become more unpredictable, the importance of cross-sectional dispersion in the means (and also mean stationarity) become essential to preserving momentum profits. The implication for investors is that the safest way to profit from a momentum strategy is to employ tactical asset allocation on an asset class universe in order to achieve greater consistency in returns over time.
Momentum Score Matrices
In the previous post we introduced the momentum score as a measure of the potential for momentum profits for a given investment universe. Before proceeding to part 2 of the series, I thought it would be interesting for readers to see a pairwise matrix of momentum scores to get a better feel for how they work in practice. Note that higher scores indicate higher potential for momentum profits. Below are the pairwise momentum score matrices for both sectors and asset classes:
Notice that sectors with similar macro-factor exposure have lower scores: for example materials and energy which tend to thrive in cyclical upturns in the economy (XLE/XLB), or health care and consumer staples (XLP/XLV) which thrive in recessions or cyclical downturns. The highest scores accrue to sectors that are likely to do well at different times in the economic cycle such as energy and utilities (XLE/XLU). This makes logical sense– momentum strategies require the ability to rotate to assets that are doing well at different times.
Notice that pairings of equity, real estate or commodity assets classes (e.g. SPY,EEM, IEV, DBC, RWX) with TLT and GLD tend to have the highest momentum scores. The combination of near substitute assets such as intermediate bonds (IEF) and long-term bonds (TLT), or say S&P500 (SPY) and European stocks (IEV) tend to have very low scores by comparison. In general most pairwise scores for asset classes are substantially higher than those contained within the sector momentum score matrix.










