Skip to content

The Essence of Being A Quant

January 6, 2014

quant

During the holidays, a person gets a chance to reflect on the more philosophical matters. In my case, one question that stood out was to define the essence and importance of the profession to investment management. I began to realize that the term itself or even the profession is poorly defined and articulated even within the industry. The first time I was asked what a “quant” was, I simply explained that they were number-crunchers that created systems for trading money. The reality is not far off from this simplistic explanation. But having read and heard a lot of different explanations (and accusations!), the essence of being a quant is to translate any possible unit of useable information for either financial forecasting, algorithm development, or rules-based systems for trading-  lets call this broad class simply “trading models“. This includes (but is definitely not limited to): 1) fundamental data 2) technical data 3) economic data 4) news 5) weather 6) or anything that might be considered useful or predictive. The analytical tools of the trade have become highly cross-disciplinary and come from a wide variety of fields such as (but not limited to): 1)math 2) statistics 3) physics 4) economics 5) linguistics 6) psychology 7)biology. A lot of the common methods used across fields fall in the now burgeoning interdisciplinary field of “data mining and analysis.”

A quant is simply interested in creating a quantifiable method for developing trading models. If a concept is not quantifiable it is often because it is either 1) not clearly defined or 2) simply not testable. These principles are generally respected by all scientists regardless of discipline. There is truthfully no area that should not be of interest to a quant, there are just areas that are more or less fruitful or simply worth prioritizing. Since financial price data of many time series are of high quality and readily available with long histories, this is a natural area of priority. Why then do quants seem to frown or pick on technical analysis which makes use of the same data? The answer is because most of the original technical analysis literature and media falls into the two categories identified as being difficult to quantify. Often the concepts and ideas are highly elaborate with a wide range of extenuating circumstances where the same conclusion would hold a different meaning. This implies a highly complex decision tree approach (assuming all of the nuances can be identified or articulated). The downside to believing in traditional technical analysis is twofold: 1) a lack of statistical robustness 2)  the flawed assumption that markets are stationary- we can rely on gravity but we cannot rely on any measurable financial phenomena to always follow the same set of laws or rules since they operate in a dynamic ecosystem of different market players; asset managers, central banks, and governments constantly try to influence or anticipate the actions of eachother. While that may sound harsh— it does not mean that we should abandon using technical indicators or ideas. It simply means that indicators or ideas represent testable inputs into a trading model that can and should be monitored or modified as conditions change.

What about the die-hard fundamental analysis approach? They are more similar to traditional quants (thanks to value investors for example that often create quantitative rules of thumb that are easy to test) and tend to use statistical analysis or some form of quantitative application in portfolio management regardless of their level of discretionary judgment.  However, they are also guilty of some of the same flaws as technical analysts because they often rely on concepts that are  either not observable  or not testable from a practical standpoint (and hence not quantifiable). For example, if a portfolio manager talks about the importance of having a meeting with management and assessing their true intentions for making corporate decisions– this is not really testable for a quant. Neither is the leap of foresight that an investor has about whether a product that has never been sold will be popular. The downside to believing in a purely fundamentalist approach is that the relative value of the insights that they claim are important is very difficult to assess or measure. Regardless of how important these individuals claim (and rationally product foresight is potentially a real skill) their qualitative or intuitive insights are, they must be separated from the style or factor exposure that is quantifiable (that is taken on either intentionally or unintentionally) to determine some baseline of usefulness. For example if a portfolio manager claims to buy large cap value stocks with high quality balance sheets, but uses additional “judgment factors” to narrow down their list for the portfolio, their performance should be benchmarked against a stock screen or factor model that approximates their approach. This gives some insight as to how much positive or negative value has been added by their added judgment. In many cases this value tends to be negative–which calls into question the utility in paying a portfolio manager such exorbitant compensation.

In truth the quant is as much a threat to the classic portfolio manager role as the machine is to human labor. A quant can manufacture investment approaches that are far cheaper, more disciplined, have greater scale, and are more reliable. The more advanced and creative the quant is, the more information can be quantified, and the more approaches that can be replicated. A quant’s best friend is therefore the computer programmer who performs the very important task of creating the automated model. Unlike a machine, once a model is created, it can and should be frequently improved and monitored. How this process is done distinguishes the professional from the amateur quant. A professional quant will make sensible and robust improvements that will improve a model’s prospects for dealing with uncertainty, regardless of what the model performance is in the short-term– whether it is good or bad.  The amateur quant will make cosmetic improvements to backtests of the model by primarily tweaking parameters in hindsight, or simply make adjustments based on short-term performance that would have caught the latest rally or avoided the latest decline. Here are a few other key differences between the pros and the amateurs when it comes to quant: The professional starts with a simple and clear idea, and then increases complexity gradually to suit the problem within an elegant framework. The amateur tries to incorporate everything in an awkward framework at the same time. The professional will seek to use logic and theory that is as general and durable as possible to guide model development or improvement. The amateur strives to create exceptional backtests and will be too data-driven with model development and refinement decisions. The professional takes a long time to slowly but earnestly improve a model fully aware that it can never be perfect. The amateur either wants to complete the model and proceed to trading immediately, or in the opposite context they are so afraid of failure that they 1)always find something that could go wrong 2) are overly skeptical and easily persuaded by peer hyperbole over fact 3)are addicted to finding new avenues to test. Finally–and perhaps most importantly towards impacting performance: The  professional will not give in easily to outside influences (management, clients etc) to make adjustments for ill-advised reasons unless there is a long-term or clear business case for doing so (the typical trades that the model takes on are perhaps difficult to execute in practice). The amateur will buckle to any pressure or negative feedback and try to please everyone.

If the last paragraph sounds arrogant, I would be happy to admit being guilty of one or more of such amateur mistakes earlier in my own career. But this is much like the path of development for any field of expertise. In truth, almost any “professional” quant learns these lessons whether through peer instruction or the “school of hard knocks”. But one of the positive benefits of experience and honest self-assessment is that you can learn how to lean in the right direction. Without being honest with yourself, one can never get better. To end on a sympathetically qualitative note, it is useful to think of a professional quant as also sharing the qualities of a martial artist: a quant must also have solid control of their own mind and emotions as they relate to working with trading models to be able to rise to the highest level. When practiced well, this is not a frenetic and purposeful state, but rather what appears to be a focused but almost detached state where the decisions are more important than the outcomes. One really good idea in a relaxed moment is superior to a hundred hours of determined exploration. The benefit to such a state tends to be good and consistent outcomes with regards to model performance. Ironically, too much focus and energy invested in the outcome (performance) has the opposite effect. The psychology and emotional maturity of a quant can be as or more important as their inherent talent or knowledge towards driving investment performance. Of course this hypothesis is subject (and should be) to quantitative examination.

Free Web-Based Cluster Application Using FTCA

December 17, 2013

One of the most useful ways to understand clusters is to work with a visual application and see different cluster groupings over time. In a previous post, I introduced the Fast Threshold Clustering Algorithm– FTCA-  as a simple and intuitive method of forming asset clusters.  A fellow quant researcher and long-time reader- Pierre Chretien- was kind enough to build an FTCA Shiny web application in collaboration with Michael Kapler of Systematic Investor in R. It shows the use of clustering with FTCA on a broad range of asset classes that have been back-extended using index data. Users can experiment with varying different parameters from the time period at cluster formation (plot date), the correlation threshold for cluster formation, and the lookback period for the correlation calculation.  Another useful output is the return and volatility for each asset at the time of cluster formation as well as a transition map of the historical cluster memberships for each asset. Two images of the application are presented below.  Very cool stuff and nice work!  I personally enjoyed using it to see the impact of different parameters and also the historical clusters over different time periods.

The link to the application can be found here:

cluster web application

cluster application1

cluster application2

 

 

 

 

FTCA2

December 5, 2013

The strength of FTCA is both speed and simplicity. One of the weaknesses that FTCA has however, is that cluster membership is determined by a threshold to one asset only at each step (either MC or LC). Asset relationships can be complex, and there is no assurance that all members of a cluster have a correlation to each other member that is higher than the threshold. This can lead to fewer clusters, and potentially incorrect cluster membership assignments. To improve upon these weaknesses, FTCA2 uses the same baseline method but computes the correlation threshold to ALL current cluster members rather than just to MC or LC. In this case, the average correlation to current cluster members is always calculated to determine the threshold. It also selects assets in order of the closest correlation to the current cluster members.

The pseudo-code is presented below:

 

While there are assets that have not been assigned to a cluster

  • If only one asset remaining then
    • Add a new cluster
    • Only member is the remaining asset
  • Else
    • Find the asset with the Highest Average Correlation (HC) to all assets not yet been assigned to a Cluster
    • Find the asset with the Lowest Average Correlation (LC) to all assets not yet assigned to a Cluster
    • If Correlation between HC and LC > Threshold
      • Add a new Cluster made of HC and LC
      • Try adding each of the remaining assets that have not yet been assigned to a Cluster in order of highest correlation to the current cluster if correlation of the asset is > the average correlation of the the current cluster.
    • Else
      • Add a Cluster made of HC
        • Try adding each of the remaining assets that have not yet been assigned to a Cluster in order of highest correlation to the current cluster if correlation of the asset is > the average correlation of the the current cluster.
      • Add a Cluster made of LC
        • Try adding each of the remaining assets that have not yet been assigned to a Cluster in order of highest correlation to the current cluster if correlation of the asset is > the average correlation of the the current cluster
    • End if
  • End if

End While

Fast Threshold Clustering Algorithm (FTCA)

November 26, 2013

cluster image

Often it can be surprisingly difficult to improve upon a simple and time-tested recipe. During  the summer of 2011, I worked with Corey Rittenhouse to develop algorithms for grouping asset classes. At the time, I did not have any familiarity with “clustering” algorithms that are often used in  data mining research. The first algorithm that was created resulted from a desire to simplify the complex problem of grouping assets with very few steps, and also to make it computationally simple. As it turns out, ignorance was bliss. The Fast Threshold Clustering Algorithm (FTCA) has many desirable properties that traditional clustering algorithms do not: 1) it produces fairly stable clusters 2) it is fast and deterministic 3) it is easy to understand.  When Michael Kapler and I conducted cluster research for our Cluster Risk Parity portfolio allocation approach with modern clustering methods, one of the biggest issues we both saw was that the resulting clusters changed too frequently– creating excessive turnover. Furthermore, highly correlated datasets such as the Dow 30, had more clusters than logic or rationale would tend to dictate. This  results from the fact that most cluster algorithms function like an optimization routine that seeks to maximize inter-cluster dissimilarity and intra-cluster similarity. This can mean that clusters will change because of very small changes in the correlation matrix that are more likely to be a function of noise. Threshold clustering by comparison uses a logical correlation threshold to proxy “similar” versus “dissimilar.” In FTCA, I initially used a correlation threshold of .5 (approximately the level of statistical significance) to separate similar from dissimilar assets.  The FTCA works similar to the Minimum Correlation Algorithm in that it uses the average correlation of each asset to all other assets as a means of determining how closely or distantly related an asset is to the universe of assets chosen. A graphic of how the FTCA creates clusters is presented below:

Fast Threshold Clustering

The pseudocode for FTCA is presented below:

While there are assets that have not been assigned to a cluster

  • If only one asset remaining then
    • Add a new cluster
    • Only member is the remaining asset
  • Else
    • Find the asset with the Highest Average Correlation (HC) to all assets not yet been assigned to a Cluster
    • Find the asset with the Lowest Average Correlation (LC) to all assets not yet assigned to a Cluster
    • If Correlation between HC and LC > Threshold
      • Add a new Cluster made of HC and LC
      • Add to Cluster all other assets that have yet been assigned to a Cluster and  have an Average Correlation to HC and LC > Threshold
    • Else
      • Add a Cluster made of HC
        • Add to Cluster all other assets that have yet been assigned to a Cluster and have a Correlation to HC > Threshold
      • Add a Cluster made of LC
        • Add to Cluster all other assets that have yet been assigned to a Cluster and have Correlation to LC > Threshold
    • End if
  • End if

End While

It is interesting to look at the historical clusters using FTCA with a threshold of .5 on previously used datasets such as the 8 major liquid ETFs (GLD,TLT,SPY,IWM,QQQ,EFA,EEM,IYR), and the 9 sector Spyders (S&P500 sectors: XLP,XLV,XLU,XLK,XLI,XLE,XLB,XLY,XLF). The database was updated until May of 2013 and shows the historical allocations/last 12 months of clusters generated using a 252-day lookback for correlation with monthly rebalancing. First the 8 major liquid ETFs:

ETF8 FTCA

Notice that the clusters are very logical and do not change once within the 12 month period. The clusters essentially represent Gold, Bonds and Equities (note that a 60 month period shows very little change as well). Now lets take a look at the clusters generated on the famously noisy S&P500 sectors:

FTCA Spyders

Again, the last 12 months of clustering shows very little change in cluster membership. Most of the time, the sectors are intuitively considered to be one cluster, while occasionally the utilities sector shows a lack of correlation to the rest of the group. The choice of threshold will change the number and stability of the clusters- with higher thresholds showing more clusters and a greater change in membership than lower thresholds. As much as I have learned about very sophisticated clustering methods in the last year, I often am drawn back to the simplicity and practicality of FTCA.  From a portfolio management standpoint, it makes using clusters far more practical as well for tactical asset allocation or implementing cluster risk parity.

Shrinkage: A Simple Composite Model Performs the Best

October 31, 2013

shrinkage

In the last two posts we discussed using an adaptive shrinkage approach, and also introduced the average correlation shrinkage model. The real question is; what shrinkage method across a wide variety of different models works best? In backtesting across multiple universes from stocks to asset classes and even futures, Michael Kapler of Systematic Investor calculated a composite score for each shrinkage model based upon the following criteria: Portfolio Turnover, Sharpe Ratio, Volatility, and Diversification using the Composite Diversification Indicator (CDI). Lower turnover was preferred, for sharpe ratio obviously higher was better, for volatility lower was better and for promoting diversification higher was considered better. The backtests and code can be found here.

The models considered were as follows:

models

 

The best performing shrinkage model can be implemented by virtually anyone with a minimum of excel skills: it is the simple average of the sample correlation matrix, the anchored correlation matrix (all history), and the average correlation shrinkage model. This produced the best blend of characteristics that would be desirable for money managers. The logic is simple: the anchored correlation matrix provides a long-term memory of inter-asset relationships, the sample provides a short-term/current memory, and the average correlation shrinkage assumes that the average correlation of an asset to all other assets provides a more stable short-term/current estimate than the sample. This is a good example of how simple implementations can trump sophisticated as long as the concepts are sound. As a generality, this is my preferred approach whenever possible because it is easier to implement in real life, easier to de-bug, and easier to understand and explain. Another interesting result from the rankings is that the ensemble approaches to shrinkage models performed better. Again this makes more sense. The adaptive shrinkage model (best sharpe) performed poorly by comparision–especially when considering turnover as a factor. It is possible that using only a 252-day window, or using only sharpe as an objective criterion were suboptimal. Readers are encouraged to experiment with other approaches. (we did investigate some methods that showed a lot of promise)

Finally it is important to recognize that shrinkage is not a magic bullet regardless of which approach was used. The results are better but not worlds apart from using just the sample correlation. There is a practical limit to what can be achieved using a lower variance estimate of the correlation matrix with shrinkage. More accurate predictors for correlations are required to achieve greater gains in performance.

Average Correlation Shrinkage (ACS)

October 27, 2013

In the last post on Adaptive Shrinkage, the Average Correlation Shrinkage (ACS) model was introduced and compared with other standard shrinkage models.  A spreadsheet demonstrating how to implement ACS can be found here: average correlation shrinkage.  This method is meant to be an alternative shrinkage model that can be blended or used in place of standard models. One of the most popular models is the “Constant Correlation Model” which assumes that there is a constant correlation between assets. The strength of this model is that it is a very simple and well-structured estimator. The weakness is that it is too rigid and its performance is dependent on the number of similarly correlated versus uncorrelated assets in the universe. Average Correlation Shrinkage proposes that a good estimator for an asset’s pairwise correlation is the average of all of its pairwise correlations to other assets.  For any pair of assets, their new pairwise correlation is the average of their respective average correlations to all other assets. This makes intuitive sense, and the average is less sensitive to errors than a single correlation estimate. It is also less restrictive than assuming that all correlations are the same.

The graphic below depicts the sample versus the average correlation matrix:

average correlation shrinkage

 

As you can see, the average correlation matrix tends to pull down the correlations of the assets that have high correlations, and increases the correlations of the assets that have low correlations.  One can weight the average correlation matrix with the previous sample correlation matrix in some proportion using the following formula:

adjusted correlation matrix=w*(average correlation matrix)+(1-w)*sample correlation matrix

By using this shrinkage method to adjust the correlation inputs for optimization, the resulting weights are less extreme towards the assets with a low correlation and assets with a high correlation have a better chance of being included in the final portfolio. Like all shrinkage methods, this is meant to be a sort of compromise

Here is a graphic depicting the final adjusted correlation matrix using a shrinkage factor (w) of .5:

adjusted correlation matrix

 

 

Adaptive Shrinkage

October 24, 2013

The covariance matrix can be quite tricky to model accurately due to the curse of dimensionality. One approach to improving estimation is to use “shrinkage” to a well-structured estimator. The general idea is that a compromise between a logical/theoretical estimator and a sample estimator will yield better results than either method. It is analogous to saying that in the summer, the temperature in many states is likely to be 80 degrees and that you will blend your weather forecast estimate with this baseline number in some proportion to reduce forecast error.

Here are two good articles worth reading as a primer:

honey i shrunk the sample covariance matrix– Ledoit/Wolf

shrinkage-simpler is better  – Benninga

Michael Kapler of Systematic Investor and I tested a wide variety of different shrinkage approaches at the beginning of the year on numerous datasets. Michael is perhaps the most talented and experienced person I have ever worked with (both from a quant and also from a programming standpoint), and it is always a pleasure to get his input. Two interesting ideas evolved from the research: 1) Average Correlation Shrinkage Model (ACS): the correlation between each asset versus all other assets as used in the Minimum Correlation Algorithm is a logical shrinkage model that produces very competitive performance and is simpler to implement that most other models (spreadsheet to follow in the next post) 2) Adaptive Shrinkage:  this chooses the “best” model from a number of different shrinkage models based on the version that delivered the best historical sharpe ratio for minimum variance allocation.

Adaptive Shrinkage makes a lot of sense since the appropriate shrinkage estimator to use is different depending on the composition of the asset universe. For example a universe with one bond and fifty stocks will perform better with a different shrinkage estimator than one with all stocks or multiple diverse asset classes.  In addition, there may be one model that is a composite of different estimators for example, that consistently performs better than others.  In our testing, we chose to define success as the out-of-sample sharpe ratio attained by minimizing variance using a given estimator. This makes more sense than minimizing volatility- which is often used in the literature to evaluate different shrinkage approaches. A higher sharpe ratio  implies that you could achieve lower volatility than a sample minimum variance (assuming volatility happens to be higher) by holding some proportion of your portfolio in cash.  However, the objective function for Adaptive Shrinkage could be anything that you would like to achieve—for example minimum turnover might also be an objective, or some combination of minimum turnover with maximum sharpe ratio.  Here are some of the different shrinkage estimators that we tested. Note that “Average David” refers to the ACS/Average Correlation Shrinkage:

S = sample covariance matrix (no shrinkage)
* A50= 50% average.david + 50% sample

* S_SA_A= 1/3*[average + sample + sample.anchored] 

* A_S= average.david and sample using Ledoit and Wolf math
* D_S= diagonal and sample using Ledoit and Wolf math
* CC_S=constant correlation and sample using Ledoit and Wolf math
* SI_S= single index  and sample using Ledoit and Wolf math
* SI2_S=two.parameter covariance matrix  and sample using Ledoit and Wolf math
* AVG_S= average and sample using Ledoit and Wolf math, where average = 1/5 * (average.david + diagonal + constant correlation + single index + two.parameter covariance matrix)

* A= average.david
* D= diagonal
* CC=constant correlation
* SI= single index
* SI2=two.parameter covariance matrix
* AVG= average = 1/5 * (average.david + diagonal + constant correlation + single index + two.parameter covariance matrix)

* Best Sharpe=Adaptive Shrinkage -invest all capital into the method (S or SA or A) that has the best sharpe ratio over the last 252 days

We used a 60-day parameter to compute the variance/covariance matrix, and 252 days as a lookback to find the shrinkage method with the best sharpe ratio. The report/results can be found here: all60 comprehensive. Interestingly enough the Adaptive Shrinkage/Best Sharpe produced the highest sharpe ratio on almost all datasets. This demonstrates a promising method to potentially improve upon a standard shrinkage approach, and also remove the need to determine which is the best model to use as the base estimator. Readers can draw their own conclusions from the results from this extensive report. I would generalize by saying that most shrinkage estimators produce similar performance, and that combinations of estimators seem to perform better than single estimators. Shrinkage does not deliver substantial improvements in performance versus just using the sample estimator in these tests. Finally, the Average Correlation Shrinkage Model is very competitive if not superior to most estimators and it delivers lower turnover as well. This is true of many of the different variants that use the ACS.

Additional RSO Backtests

October 15, 2013

Another blogger worth following is Michael Guan from Systematic Edge.  Michael writes frequently about different methods for asset allocation. Some of his posts for example on Principal Components Analysis (PCA) are very comprehensive and well worth reading. Recently he wrote a post showing some additional tests using the Random Subspace Optimization (RSO) with maximum sharpe optimization (long only) on various universes with a step parameter test for “k.” The results are interesting and support the following conclusions: 1) RSO is a promising method to increase return and reduce risk 2) the choice of “k” is important to get the best results.  I will present a few more ideas on extending RSO to follow.

RSO MVO vs Standard MVO Backtest Comparison

October 10, 2013

In a previous post I introduced Random Subspace Optimization as a method to reduce dimensionality and improve performance versus standard optimization methods. The concept is theoretically sound and is  traditionally applied in machine learning to improve classification accuracy.  It makes sense that it would be useful for portfolio optimization.  To test this method, I used a very naaive/simplistic RSO model where one selects “k” subspaces from the universe and running classic “mean-variance” optimization (MVO) with “s” samples and averaging the portfolio weights found across all of the samples to produce a final portfolio. The MVO was run unconstrained (long and shorts permitted) to reduce computation time since there is a closed form solution.  Two datasets were used: the first is an 8 ETF universe used in previous studies for the Minimum Correlation and Minimum Variance Algorithms, the second was using the S&P sector spyder ETFs. Here are the parameters and the results:

rso comp

 

On these two universes, with this set of parameters, RSO mean-variance was a clear winner in terms of both returns and risk-adjusted returns– and the results are even more compelling when you factor the lower average exposure used as a function of averaging across 100 portfolios. Turnover is also more stable, which can be expected because of the averaging process. Results were best in these two cases when k<=3, but virtually all k outperformed the baseline. The choice of k is certainly a bit clunky (like in nearest neighbourhood analysis), and it needs to be either optimized or considered in relation to the number of assets in the universe. The averaging process across portfolios is also naaive, it doesn’t care whether the objective function is high or low for a given portfolio. There are a lot of ways to improve upon this baseline RSO version. I haven’t done extensive testing at this point, but theory and preliminary results suggest a modest improvement over baseline MVO (and other types) of optimizations. RSO is not per se a magic bullet, but in this case it appears better capable of handling noisy datasets at the very least- where matrix inversion used within typical unconstrained MVO can be unstable. 

Quantum RSO

October 8, 2013

quantum

In the last post on Random Subspace Optimization (RSO) I introduced a method to reduce dimensionality for optimization to improve the robustness of the results. One concept proposed in the previous article was to weight the different subspace portfolios in some manner rather than just equally weighting their resulting portfolio weights to find the final portfolio. Theoretically this should improve the resulting performance out of sample.

One logical idea is to compound the algorithm multiple times. This idea is driven from the notion that complex problems can be more accurately solved by breaking them down into smaller sub-problems. Quantum theory is the theoretical basis of modern physics that explains the nature and behavior of matter and energy on the atomic and subatomic level. Energy, radiation and matter can be quantized- or divided up into increasingly smaller units which helps to better explain their properties.

By continuing to synthesize and aggregate from smaller subsamples, it may be possible to do a better job at optimizing the universe of assets than optimizing globally only once with all assets present. There is no reason why RSO can’t borrow the same concept to optimally weight subspace portfolios. Imagine taking the subspace portfolios formed at the first level and then running the same optimization (with the same objective function) using the RSO on the subspace portfolios. The analogy would be: RSO(RSO) where the RSO portfolios become “assets” for a new RSO. This is like the concept of generations in genetic algorithms. In theory this could proceed multiple times- ie RSO(RSO(RSO)). Borrowing a concept from micro-GA, one could run a small number of samples and run multiple levels of RSO and then start the process over again instead of expending computational resources on one large sample.

 

 

Design a site like this with WordPress.com
Get started