Stories by Auquan on Medium

Web Dev — Windows(Native/WSL), Mac-OSX or Linux?

Auquan — Mon, 13 May 2019 14:43:05 GMT

Web Dev — Windows(Native/WSL), Mac-OSX or Linux?

Recently at Auquan, I had to make a choice between Windows, OSX and Linux operating systems. It took me a few weeks to try out all the operating systems and and there were quite a few factors that went into the final decision. In this article, I will break down my experiences, the decision making process and how we finally arrived at Windows(WSL) as the operating system of choice.

Context

Auquan is currently a team of 5 engineers and the number is expected to grow to 10+. We have a MERN(Mongo-Express-React-Node) stack and so far as a seed stage startup we were able to get away with everyone working on their personal laptops. However, it was becoming clear that we need to get our engineers work machines so that the team can have a standard dev environment. Our current state was: 2 of us were on OSX, 1 on Windows(native) and 2 on Linux. So started the long debate of which OS should we go with.

How To Decide?

First things first, I decided to give each one a fair shot. I was actively working in OSX and I had a native windows environment as well that I had used on and off over the past year. For a linux environment I fired up an AWS machine as it would serve well for purposes of evaluating. It was around this time that I remembered something like WSL exists and had enjoyed quite the hype when it first arrived. For the uninitiated, WSL is Windows Subsystem For Linux and it lets you have bash support without having to install linux as a separate partition. I added WSL to the mix of contenders and now we had 4 environments to evaluate. Following are the factors we were considering:

Ease of Setup

Before you can start using an environment you have to set it up so it is important the setup is easy and smooth. For each of the factors I will talk about my experience.

Windows(Native): I did the setup through installers and it was fairly simple to get the environment up and running. I did not install a local version of MongoDB so I cannot speak to simplicity of that.

Mac-OSX/Linux: Absolutely simple once you have xcode/brew setup. Just run a bunch of commands in terminal, clone your repository and you are good to go.

WSL: Getting WSL setup is the added cost here, but you can pick up any guide and you will be set up in ~30 mins 25 of which will be the download time, i.e. its super simple to set up WSL. Afterwards installing node is just like linux but setting up mongoDb required a bit more research as WSL is still very new. It took me exponentially more time than either of the other OS’s to set this configuration up but once I figured out the steps the complexity becomes same as the Windows(native) set up. I will make another post with a full rundown of how to set up a web env on WSL.

Ease of Use / Support Issues

This factor was to do with how good is the tooling around our environment. Will we need to repeatedly deal with unsupported libraries or minor issues that make the development process harder.

Windows(Native): The support framework leaves a lot to be desired. Most new things have to be installed separately and the cmd line just feels lacking. Powershell has improved upon that but for our use case it did not seem to deliver. However considering office/excel is also a part of our workflow windows had the clear edge there and the GUI exploration is just better.

Mac-OSX: Great command line and tools around it. iterm has to be the best terminal app and everything you need is a command away. The rest of the OS however did not feel as powerful as windows for our use case of exploring big data sets using GUI or general tooling.

Linux: Once again great command line and tooling. The GUI has improved over the years but compared to other 2 it is still lagging behind.

WSL: This is where WSL shone, we could get the linux command line while keeping the windows GUI. What’s there to lose? Well, possibly quite a bit. WSL is new and does not have extensive support documentation so you may find yourself on wild goose chases every once in a while. However, most of the issues I dealt with had solutions that were applicable for linux and more importantly there always WAS a solution even if it was not officially supported.

Learning Curve

If people have to switch OS’s it always comes with an added cost of learning the new OS/env. It slows you down from getting to the bits that actually matter. I am yet to meet a person who has not used a windows machine ever. Possibly the younger engineers have not but people in their 20’s have all grown up with windows and the OS is very familiar to everyone. So windows was a clear winner here over OSX or Linux.

Associated Hardware

Macs are expensive, the quality of hardware justifies some of that cost but a entry level 15 inch Macbook Pro will still cost you ~$2600 with taxes. To top it the current generation Macbook keyboard is just lackluster and the touch bar is one of the worst gimmicks I have ever experienced. You can easily obtain out performing and great looking hardware from HP/Dell for sub $1500 range. And if you want absolutely beautiful looking hardware just look at HP Spectre or Dell Xps but they will run you a little more.

Final Conclusion

After working with WSL for over a month I could not find any reasons to not choose it as my environment of choice. Once I installed cmder I got close to having an iterm like experience and all my everyday commands in WSL just seemed to work. There were a couple gotchas here and there, mostly to do with how the file system operates but a quick google search always gave me answers I needed. My experience, coupled with the linux command line, windows interface and ~$1500 saving on every machine we got led us to choose Windows + WSL as our environment. Next, I will be doing a detailed post on how to set up a MERN environment on WSL but in the meantime let me know your thoughts.

Web Dev — Windows(Native/WSL), Mac-OSX or Linux? was originally published in auquan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Long-Short Equity Strategy using Ranking: : Simple Trading Strategies Part 4

Auquan — Thu, 11 Jan 2018 06:51:01 GMT

In the last post, we covered Pairs trading strategy and demonstrated how to leverage data and mathematical analysis to create and automate a trading strategy.

Long-Short Equity Strategy is a natural extension of Pairs Trading applied to a basket of stocks.

Download Ipython Notebook here.

Underlying Principle

Long-Short equity strategy is both long and short stocks simultaneously in the market. Just like pairs trading identifies which stock is cheap and which is expensive in a pair, a Long-Short strategy will rank all stocks in a basket to identify which stocks are relatively cheap and expensive. It will then go long (buys) the top n equities based on the ranking, and short (sells) the bottom n for equal amounts of money(Total value of long position = Total value of short position).

Remember how we said that Pairs Trading is a market neutral strategy? So is a Long-Short strategy as the equal dollar volume long and short positions ensure that the strategy will remain market neutral (immune to market movements). The strategy is also statistically robust — by ranking stocks and entering multiple positions, you are making many bets on your ranking model rather than just a few risky bets. You are also betting purely on the quality of your ranking scheme.

What is a Ranking Scheme?

A ranking scheme is any model that can assign each stock a number based on how they are expected to perform, where higher is better or worse. Examples could be value factors, technical indicators, pricing models, or a combination of all of the above. For example, you could use a momentum indicator to give a ranking to a basket of trend following stocks: stocks with highest momentum are expected to continue to do well and get the highest ranks; stocks with lowest momentum will perform the worst and get lowest rans.

The success of this strategy lies almost entirely in the ranking scheme used — the better your ranking scheme can separate high performing stocks from low performing stocks, better the returns of a long-short equity strategy. It automatically follows that developing a ranking scheme is nontrivial.

What happens once you have a Ranking Scheme?

Once we have determined a ranking scheme, we would obviously like to be able to profit from it. We do this by investing an equal amount of money into buying stocks at the top of the ranking, and selling stocks at the bottom. This ensures that the strategy will make money proportionally to the quality of the ranking only, and will be market neutral.

Let’s say you are ranking m equities, have n dollars to invest, and want to hold a total of 2p positions (where m > 2p ). If the stock at rank 1 is expected to perform the worst and stock at rank m is expected to perform the best:

You take the stocks in position 1,…,p in the ranking, sell n/2p dollars worth of each stock
For each stock in position m−p,…,m in the ranking, buy n/2p dollars worth of each stock

Note: Friction Because of Prices Because stock prices will not always divide n/2p evenly, and stocks must be bought in integer amounts, there will be some imprecision and the algorithm should get as close as it can to this number. For a strategy running with n=100000 and p=500, we see that

n/2p=100000/1000 =100

This will cause big problems for stocks with prices > 100 since you can’t buy fractional stock. This is alleviated by trading fewer equities or increasing the capital.

Let’s run through a hypothetical example

We generate random stock names and a random factor on which to rank them. Let’s also assume our future returns are actually dependent on these factor values.

import numpy as np
import statsmodels.api as sm
import scipy.stats as stats
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

## PROBLEM SETUP ##

# Generate stocks and a random factor value for them

stock_names = ['stock ' + str(x) for x in range(10000)]
current_factor_values = np.random.normal(0, 1, 10000)

# Generate future returns for these are dependent on our factor values
future_returns = current_factor_values + np.random.normal(0, 1, 10000)

# Put both the factor values and returns into one dataframe
data = pd.DataFrame(index = stock_names, columns=['Factor Value','Returns'])
data['Factor Value'] = current_factor_values
data['Returns'] = future_returns
# Take a look
data.head(10)

Now that we have factor values and returns, we can see what would happen if we ranked our equities based on factor values, and then entered the long and short positions.

# Rank stocks
ranked_data = data.sort_values('Factor Value')

# Compute the returns of each basket with a basket size 500, so total (10000/500) baskets
number_of_baskets = int(10000/500)
basket_returns = np.zeros(number_of_baskets)

for i in range(number_of_baskets):
    start = i * 500
    end = i * 500 + 500 
    basket_returns[i] = ranked_data[start:end]['Returns'].mean()

# Plot the returns of each basket
plt.figure(figsize=(15,7))
plt.bar(range(number_of_baskets), basket_returns)
plt.ylabel('Returns')
plt.xlabel('Basket')
plt.legend(['Returns of Each Basket'])
plt.show()

Our strategy is to sell the basket at rank 1 and buy the basket at rank 10. The returns of this strategy are:

basket_returns[number_of_baskets-1] - basket_returns[0]

4.172

We’re basically putting our money on our ranking model being able to separate and spread high performing stocks from low performing stocks.

For the rest of this post, we’ll talk about how to evaluate a ranking scheme. The nice thing about making money based on the spread of the ranking is that it is unaffected by what the market does.

Let’s consider a real world example.

We load data for 32 stocks from different sectors in S&P500 and try to rank them.

from backtester.dataSource.yahoo_data_source import YahooStockDataSource
from datetime import datetime

startDateStr = '2010/01/01'
endDateStr = '2017/12/31'
cachedFolderName = '/Users/chandinijain/Auquan/yahooData/'
dataSetId = 'testLongShortTrading'
instrumentIds = ['ABT','AKS','AMGN','AMD','AXP','BK','BSX',
                'CMCSA','CVS','DIS','EA','EOG','GLW','HAL',
                'HD','LOW','KO','LLY','MCD','MET','NEM',
                'PEP','PG','M','SWN','T','TGT',
                'TWX','TXN','USB','VZ','WFC']
ds = YahooStockDataSource(cachedFolderName=cachedFolderName,
                            dataSetId=dataSetId,
                            instrumentIds=instrumentIds,
                            startDateStr=startDateStr,
                            endDateStr=endDateStr,
                            event='history')

price = 'adjClose'

Let’s start by using one month normalized momentum as a ranking indicator

## Define normalized momentum
def momentum(dataDf, period):
    return dataDf.sub(dataDf.shift(period), fill_value=0) / dataDf.iloc[-1]

## Load relevant prices in a dataframe
data = ds.getBookDataByFeature()[‘Adj Close’]

#Let's load momentum score and returns into separate dataframes
index = data.index
mscores = pd.DataFrame(index=index,columns=assetList)
mscores = momentum(data, 30)
returns = pd.DataFrame(index=index,columns=assetList)
day = 30

Now we’re going to analyze our stock behavior and see how our universe of stocks work w.r.t our chosen ranking factor.

Analyzing data

Stock behavior

We look at how our chosen basket of stocks behave w.r.t our ranking model. To do this, let’s calculate one week forward return for all stocks. Then we can look at the correlation of 1 week forward return with previous 30 day momentum for every stock. Stocks that exhibit positive correlation are trend following and stocks that exhibit negative correlation are mean reverting.

# Calculate Forward returns
forward_return_day = 5
returns = data.shift(-forward_return_day)/data -1
returns.dropna(inplace = True)

# Calculate correlations between momentum and returns
correlations = pd.DataFrame(index = returns.columns, columns = [‘Scores’, ‘pvalues’])

mscores = mscores[mscores.index.isin(returns.index)]

for i in correlations.index:
    score, pvalue = stats.spearmanr(mscores[i], returns[i])
    correlations[‘pvalues’].loc[i] = pvalue
    correlations[‘Scores’].loc[i] = score
correlations.dropna(inplace = True)
correlations.sort_values(‘Scores’, inplace=True)

l = correlations.index.size
plt.figure(figsize=(15,7))
plt.bar(range(1,1+l),correlations[‘Scores’])
plt.xlabel(‘Stocks’)
plt.xlim((1, l+1))
plt.xticks(range(1,1+l), correlations.index)
plt.legend([‘Correlation over All Data’])
plt.ylabel(‘Correlation between %s day Momentum Scores and %s-day forward returns by Stock’%(day,forward_return_day));
plt.show()

All our stocks are mean reverting to some degree! (Obviously we choose the universe to be this way :) ) This tells us that if a stock ranks high on momentum score, we should expect it to perform poorly next week.

Correlation between Ranking due to Momentum Score and Returns

Next, we need to look at correlation between our ranking score and forward returns of our universe, i.e. how predictive of of forward returns is our ranking factor? Does a high relative rank predict poor relative returns or vice versa?

To do this, we calculate daily correlation between 30 day momentum and 1 week forward returns of all stocks.

correl_scores = pd.DataFrame(index = returns.index.intersection(mscores.index), columns = [‘Scores’, ‘pvalues’])

for i in correl_scores.index:
    score, pvalue = stats.spearmanr(mscores.loc[i], returns.loc[i])
    correl_scores[‘pvalues’].loc[i] = pvalue
    correl_scores[‘Scores’].loc[i] = score
correl_scores.dropna(inplace = True)

l = correl_scores.index.size
plt.figure(figsize=(15,7))
plt.bar(range(1,1+l),correl_scores[‘Scores’])
plt.hlines(np.mean(correl_scores[‘Scores’]), 1,l+1, colors=’r’, linestyles=’dashed’)
plt.xlabel(‘Day’)
plt.xlim((1, l+1))
plt.legend([‘Mean Correlation over All Data’, ‘Daily Rank Correlation’])
plt.ylabel(‘Rank correlation between %s day Momentum Scores and %s-day forward returns’%(day,forward_return_day));
plt.show()

Daily Correlation is quite noisy, but very slightly negative (This is expected, since we said all the stocks are mean reverting). Let’s also look at average monthly correlation of scores with 1 month forward returns.

monthly_mean_correl =correl_scores['Scores'].astype(float).resample('M').mean()
plt.figure(figsize=(15,7))
plt.bar(range(1,len(monthly_mean_correl)+1), monthly_mean_correl)
plt.hlines(np.mean(monthly_mean_correl), 1,len(monthly_mean_correl)+1, colors='r', linestyles='dashed')
plt.xlabel('Month')
plt.xlim((1, len(monthly_mean_correl)+1))
plt.legend(['Mean Correlation over All Data', 'Monthly Rank Correlation'])
plt.ylabel('Rank correlation between %s day Momentum Scores and %s-day forward returns'%(day,forward_return_day));
plt.show()

We can see that the average correlation is slightly negative again, but varies a lot daily as well from month to month.

Average Basket Return

Now we compute the returns of baskets taken out of our ranking. If we rank all equities and then split them into nn groups, what would the mean return be of each group?

The first step is to create a function that will give us the mean return in each basket in a given the month and a ranking factor.

def compute_basket_returns(factor, forward_returns, number_of_baskets, index):

data = pd.concat([factor.loc[index],forward_returns.loc[index]], axis=1)
    # Rank the equities on the factor values
    data.columns = ['Factor Value', 'Forward Returns']
    data.sort_values('Factor Value', inplace=True)
    # How many equities per basket
    equities_per_basket = np.floor(len(data.index) / number_of_baskets)

basket_returns = np.zeros(number_of_baskets)

# Compute the returns of each basket
    for i in range(number_of_baskets):
        start = i * equities_per_basket
        if i == number_of_baskets - 1:
            # Handle having a few extra in the last basket when our number of equities doesn't divide well
            end = len(data.index) - 1
        else:
            end = i * equities_per_basket + equities_per_basket
        # Actually compute the mean returns for each basket
        #s = data.index.iloc[start]
        #e = data.index.iloc[end]
        basket_returns[i] = data.iloc[int(start):int(end)]['Forward Returns'].mean()
        
    return basket_returns

We calculate the average return of each basket when equities are ranked based on this score. This should give us a sense of the relationship over a long timeframe.

number_of_baskets = 8
mean_basket_returns = np.zeros(number_of_baskets)
resampled_scores = mscores.astype(float).resample('2D').last()
resampled_prices = data.astype(float).resample('2D').last()
resampled_scores.dropna(inplace=True)
resampled_prices.dropna(inplace=True)
forward_returns = resampled_prices.shift(-1)/resampled_prices -1
forward_returns.dropna(inplace = True)

for m in forward_returns.index.intersection(resampled_scores.index):
    basket_returns = compute_basket_returns(resampled_scores, forward_returns, number_of_baskets, m)
    mean_basket_returns += basket_returns

mean_basket_returns /= l    
print(mean_basket_returns)
# Plot the returns of each basket
plt.figure(figsize=(15,7))
plt.bar(range(number_of_baskets), mean_basket_returns)
plt.ylabel('Returns')
plt.xlabel('Basket')
plt.legend(['Returns of Each Basket'])
plt.show()

Seems like we are able to separate high performers from low performers with very small success.

Spread Consistency

Of course, that’s just the average relationship. To get a sense of how consistent this is, and whether or not we would want to trade on it, we should look at it over time. Here we’ll look at the monthly spreads for the first two years. We can see a lot of variation, and further analysis should be done to determine whether this momentum score is tradeable.

total_months = mscores.resample('M').last().index
months_to_plot = 24
monthly_index = total_months[:months_to_plot+1]
mean_basket_returns = np.zeros(number_of_baskets)
strategy_returns = pd.Series(index = monthly_index)
f, axarr = plt.subplots(1+int(monthly_index.size/6), 6,figsize=(18, 15))
for month in range(1, monthly_index.size):
    temp_returns = forward_returns.loc[monthly_index[month-1]:monthly_index[month]]
    temp_scores = resampled_scores.loc[monthly_index[month-1]:monthly_index[month]]
    for m in temp_returns.index.intersection(temp_scores.index):
        basket_returns = compute_basket_returns(temp_scores, temp_returns, number_of_baskets, m)
        mean_basket_returns += basket_returns
    
    strategy_returns[monthly_index[month-1]] = mean_basket_returns[ number_of_baskets-1] - mean_basket_returns[0]
    
    mean_basket_returns /= temp_returns.index.intersection(temp_scores.index).size
    
    r = int(np.floor((month-1) / 6))
    c = (month-1) % 6
    axarr[r, c].bar(range(number_of_baskets), mean_basket_returns)
    axarr[r, c].xaxis.set_visible(False)
    axarr[r, c].set_title('Month ' + str(month))
plt.show()

plt.figure(figsize=(15,7))
plt.plot(strategy_returns)
plt.ylabel(‘Returns’)
plt.xlabel(‘Month’)
plt.plot(strategy_returns.cumsum())
plt.legend([‘Monthly Strategy Returns’,’Cumulative Strategy Returns’])
plt.show()

Finally, lets look at the returns if we had bought the last basket and sold the first basket every month (assuming equal capital allocation to each security)

total_return = strategy_returns.sum()
ann_return = 100*((1 + total_return)**(12.0 /float(strategy_returns.index.size))-1)
print('Annual Returns: %.2f%%'%ann_return)

Annual Returns: 5.03%

We see that we have a very faint ranking scheme that only mildly separates high performing stocks from low performing stocks. Besides, this ranking scheme has no consistency and varies a lot month to month.

Finding the correct ranking scheme

To execute a long-short equity, you effectively only have to determine the ranking scheme. Everything after that is mechanical. Once you have one long-short equity strategy, you can swap in different ranking schemes and leave everything else in place. It’s a very convenient way to quickly iterate over ideas you have without having to worry about tweaking code every time.

The ranking schemes can come from pretty much any model as well. It doesn’t have to be a value based factor model, it could be a machine learning technique that predicted returns one-month ahead and ranked based on that.

Choice and Evaluation of a Ranking Scheme

The ranking scheme is where a long-short equity strategy gets its edge, and is the most crucial component. Choosing a good ranking scheme is the entire trick, and there is no easy answer.

A good starting point is to pick existing known techniques, and see if you can modify them slightly to get increased returns. We’ll discuss a few starting points here:

Clone and Tweak: Choose one that is commonly discussed and see if you can modify it slightly to gain back an edge. Often times factors that are public will have no signal left as they have been completely arbitraged out of the market. However, sometimes they lead you in the right direction of where to go.
Pricing Models: Any model that predicts future returns can be a factor. The future return predicted is now that factor, and can be used to rank your universe. You can take any complicated pricing model and transform it into a ranking.
Price Based Factors (Technical Indicators): Price based factors, like we discussed today, take information about the historical price of each equity and use it to generate the factor value. Examples could be moving average measures, momentum ribbons, or volatility measures.
Reversion vs. Momentum: It’s important to note that some factors bet that prices, once moving in a direction, will continue to do so. Some factors bet the opposite. Both are valid models on different time horizons and assets, and it’s important to investigate whether the underlying behavior is momentum or reversion based.
Fundamental Factors (Value Based): This is using combinations of fundamental values like P.E ratio, dividend etc. Fundamental values contain information that is tied to real world facts about a company, so in many ways can be more robust than prices.

Ultimately, developing predictive factors is an arms race in which you are trying to stay one step ahead. Factors get arbitraged out of markets and have a lifespan, so it’s important that you are constantly doing work to determine how much decay your factors are experiencing, and what new factors might be used to take their place.

Additional Considerations

Rebalancing Frequency

Every ranking system will be predictive of returns over a slightly different timeframe. A price-based mean reversion may be predictive over a few days, while a value-based factor model may be predictive over many months. It is important to determine the timeframe over which your model should be predictive, and statistically verify that before executing your strategy. You don’t want to overfit by trying to optimize the rebalancing frequency — you will inevitably find one that is randomly better than others, but not necessary because of anything in your model. Once you have determined the timeframe on which your ranking scheme is predictive, try to rebalance at about that frequency so you’re taking full advantage of your models.

Capital Capacity and Transaction Costs

Every strategy has a minimum and maximum amount of capital it can trade before it stops being profitable. The minimum threshold is usually set by transaction costs.

Trading many equities will result in high transaction costs. Say that you want to purchase 1000 equities, you will incur a few thousand dollars in costs per rebalance. Your capital base must be high enough that the transaction costs are a small percentage of the returns being generated by your strategy. For example, if your capital is 100,000$ and your strategy makes 1% per month(1000$) , then all of these returns will be taken up by transaction costs.. You would need to be running the strategy on millions of dollars for it to be profitable over 1000 equities.

The minimum capacity is quite high as such, and dependent largely on the number of equities traded. However, the maximum capacity is also incredibly high, with long-short equity strategies capable of trading hundreds of millions of dollars without losing their edge. This is true because the strategy rebalances relatively infrequently, and the total dollar volume is divided by the number of equities traded. Therefore dollar-volume per equity is quite low and you don’t have to worry about impacting the market by your trades. Let’s say you’re trading 1000 equities with 100,000,000$. If you rebalance your entire portfolio every month, you are only trading 100,000 dollar-volume per month for each equity, which isn’t enough to be a significant market share for most securities.

Long-Short Equity Strategy using Ranking: : Simple Trading Strategies Part 4 was originally published in auquan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Pairs Trading using Data-Driven Techniques: Simple Trading Strategies Part 3

Auquan — Wed, 20 Dec 2017 03:06:02 GMT

Pairs trading is a nice example of a strategy based on mathematical analysis. We’ll demonstrate how to leverage data to create and automate a pairs trading strategy.

Download Ipython Notebook here.

Underlying Principle

Let’s say you have a pair of securities X and Y that have some underlying economic link, for example two companies that manufacture the same product like Pepsi and Coca Cola. You expect the ratio or difference in prices (also called the spread) of these two to remain constant with time. However, from time to time, there might be a divergence in the spread between these two pairs caused by temporary supply/demand changes, large buy/sell orders for one security, reaction for important news about one of the companies etc. In this scenario, one stock moves up while the other moves down relative to each other. If you expect this divergence to revert back to normal with time, you can make a pairs trade.

When there is a temporary divergence, the pairs trade would be to sell the outperforming stock (the stock that moved up )and to buy the underperforming stock (the stock that moved down ). You are making a bet that the spread between the two stocks would eventually converge by either the outperforming stock moving back down or the underperforming stock moving back up or both — your trade will make money in all of these scenarios. If both the stocks move up or move down together without changing the spread between them, you don’t make or lose any money.

Hence, pairs trading is a market neutral trading strategy enabling traders to profit from virtually any market conditions: uptrend, downtrend, or sideways movement.

Explaining the Concept: We start by generating two fake securities.

import numpy as np
import pandas as pd

import statsmodels
from statsmodels.tsa.stattools import coint
# just set the seed for the random number generator
np.random.seed(107)

import matplotlib.pyplot as plt

Let’s generate a fake security X and model it’s daily returns by drawing from a normal distribution. Then we perform a cumulative sum to get the value of X on each day.

Fake Security X with returns drawn from a normal distribution

# Generate daily returns

Xreturns = np.random.normal(0, 1, 100)

# sum them and shift all the prices up

X = pd.Series(np.cumsum(
    Xreturns), name='X') 
    + 50
X.plot(figsize=(15,7))
plt.show()

Now we generate Y which has a deep economic link to X, so price of Y should vary pretty similarly as X. We model this by taking X, shifting it up and adding some random noise drawn from a normal distribution.

Cointegrated Securities X and Y

noise = np.random.normal(0, 1, 100)
Y = X + 5 + noise
Y.name = 'Y'

pd.concat([X, Y], axis=1).plot(figsize=(15,7))

plt.show()

Cointegration

Cointegration, very similar to correlation, means that the ratio between two series will vary around a mean. The two series, Y and X follow the follwing:

Y = ⍺ X + e

where ⍺ is the constant ratio and e is white noise. Read more here

For pairs trading to work between two timeseries, the expected value of the ratio over time must converge to the mean, i.e. they should be cointegrated.

The time series we constructed above are cointegrated. We’ll plot the ratio between the two now so we can see how this looks.

Ratio between prices of two cointegrated stocks and it’s mean

(Y/X).plot(figsize=(15,7))

plt.axhline((Y/X).mean(), color='red', linestyle='--')

plt.xlabel('Time')
plt.legend(['Price Ratio', 'Mean'])
plt.show()

Testing for Cointegration

There is a convenient test that lives in statsmodels.tsa.stattools. We should see a very low p-value, as we've artificially created two series that are as cointegrated as physically possible.

# compute the p-value of the cointegration test
# will inform us as to whether the ratio between the 2 timeseries is stationary
# around its mean
score, pvalue, _ = coint(X,Y)
print pvalue

1.81864477307e-17

Note: Correlation vs. Cointegration

Correlation and cointegration, while theoretically similar, are not the same. Let’s look at examples of series that are correlated, but not cointegrated, and vice versa. First let's check the correlation of the series we just generated.

X.corr(Y)

0.951

That’s very high, as we would expect. But how would two series that are correlated but not cointegrated look? A simple example is two series that just diverge.

Two correlated series (that are not co-integrated)

ret1 = np.random.normal(1, 1, 100)
ret2 = np.random.normal(2, 1, 100)

s1 = pd.Series( np.cumsum(ret1), name='X')
s2 = pd.Series( np.cumsum(ret2), name='Y')

pd.concat([s1, s2], axis=1 ).plot(figsize=(15,7))
plt.show()

print 'Correlation: ' + str(X_diverging.corr(Y_diverging))
score, pvalue, _ = coint(X_diverging,Y_diverging)
print 'Cointegration test p-value: ' + str(pvalue)

Correlation: 0.998
Cointegration test p-value: 0.258

A simple example of cointegration without correlation is a normally distributed series and a square wave.

Y2 = pd.Series(np.random.normal(0, 1, 800), name='Y2') + 20
Y3 = Y2.copy()

Y3[0:100] = 30
Y3[100:200] = 10
Y3[200:300] = 30
Y3[300:400] = 10
Y3[400:500] = 30
Y3[500:600] = 10
Y3[600:700] = 30
Y3[700:800] = 10

Y2.plot(figsize=(15,7))
Y3.plot()
plt.ylim([0, 40])
plt.show()

# correlation is nearly zero
print 'Correlation: ' + str(Y2.corr(Y3))
score, pvalue, _ = coint(Y2,Y3)
print 'Cointegration test p-value: ' + str(pvalue)

Correlation: 0.007546
Cointegration test p-value: 0.0

The correlation is incredibly low, but the p-value shows perfect cointegration!

How to make a pairs trade?

Because two cointegrated time series (such as X and Y above) drift towards and apart from each other, there will be times when the spread is high and times when the spread is low. We make a pairs trade by buying one security and selling another. This way, if both securities go down together or go up together, we neither make nor lose money — we are market neutral.

Going back to X and Y above that follow Y = ⍺ X + e, such that ratio (Y/X) moves around it’s mean value ⍺, we make money on the ratio of the two reverting to the mean. In order to do this we’ll watch for when X and Y are far apart, i.e ⍺ is too high or too low:

Going Long the Ratio This is when the ratio ⍺ is smaller than usual and we expect it to increase. In the above example, we place a bet on this by buying Y and selling X.
Going Short the Ratio This is when the ratio ⍺ is large and we expect it to become smaller. In the above example, we place a bet on this by selling Y and buying X.

Note that we always have a “hedged position”: a short position makes money if the security sold loses value, and a long position will make money if a security gains value, so we’re immune to overall market movement. We only make or lose money if securities X and Y move relative to each other.

Using Data to find securities that behave like this

The best way to do this is to start with securities you suspect may be cointegrated and perform a statistical test. If you just run statistical tests over all pairs, you’ll fall prey to multiple comparison bias.

Multiple comparisons bias is simply the fact that there is an increased chance to incorrectly generate a significant p-value when many tests are run, because we are running a lot of tests. If 100 tests are run on random data, we should expect to see 5 p-values below 0.05. If you are comparing n securities for co-integration, you will perform n(n-1)/2 comparisons, and you should expect to see many incorrectly significant p-values, which will increase as you increase. To avoid this, pick a small number of pairs you have reason to suspect might be cointegrated and test each individually. This will result in less exposure to multiple comparisons bias.

So let’s try to find some securities that display cointegration. Let’s work with a basket of US large cap tech stocks — in S&P 500. These stocks operate in a similar segment and could have cointegrated prices. We scan through a list of securities and test for cointegration between all pairs. It returns a cointegration test score matrix, a p-value matrix, and any pairs for which the p-value was less than 0.05. This method is prone to multiple comparison bias and in practice the securities should be subject to a second verification step. Let’s ignore this for the sake of this example.

def find_cointegrated_pairs(data):
    n = data.shape[1]
    score_matrix = np.zeros((n, n))
    pvalue_matrix = np.ones((n, n))
    keys = data.keys()
    pairs = []
    for i in range(n):
        for j in range(i+1, n):
            S1 = data[keys[i]]
            S2 = data[keys[j]]
            result = coint(S1, S2)
            score = result[0]
            pvalue = result[1]
            score_matrix[i, j] = score
            pvalue_matrix[i, j] = pvalue
            if pvalue < 0.02:
                pairs.append((keys[i], keys[j]))
    return score_matrix, pvalue_matrix, pairs

Note: We include the market benchmark (SPX) in our data — the market drives the movement of so many securities that often you might find two seemingly cointegrated securities; but in reality they are not cointegrated with each other but both conintegrated with the market. This is known as a confounding variable and it is important to check for market involvement in any relationship you find.

from backtester.dataSource.yahoo_data_source import YahooStockDataSource
from datetime import datetime

startDateStr = '2007/12/01'
endDateStr = '2017/12/01'
cachedFolderName = 'yahooData/'
dataSetId = 'testPairsTrading'
instrumentIds = ['SPY','AAPL','ADBE','SYMC','EBAY','MSFT','QCOM',
                 'HPQ','JNPR','AMD','IBM']
ds = YahooStockDataSource(cachedFolderName=cachedFolderName,
                            dataSetId=dataSetId,
                            instrumentIds=instrumentIds,
                            startDateStr=startDateStr,
                            endDateStr=endDateStr,
                            event='history')

data = ds.getBookDataByFeature()['Adj Close']

data.head(3)

Now let’s try to find cointegrated pairs using our method.

# Heatmap to show the p-values of the cointegration test
# between each pair of stocks

scores, pvalues, pairs = find_cointegrated_pairs(data)
import seaborn
m = [0,0.2,0.4,0.6,0.8,1]
seaborn.heatmap(pvalues, xticklabels=instrumentIds, 
                yticklabels=instrumentIds, cmap=’RdYlGn_r’, 
                mask = (pvalues >= 0.98))
plt.show()
print pairs

[('ADBE', 'MSFT')]

Looks like ‘ADBE’ and ‘MSFT’ are cointegrated. Let’s take a look at the prices to make sure this actually makes sense.

Plot of Price Ratio between MSFT and ADBE from 2008–2017

S1 = data['ADBE']
S2 = data['MSFT']
score, pvalue, _ = coint(S1, S2)
print(pvalue)
ratios = S1 / S2
ratios.plot()
plt.axhline(ratios.mean())
plt.legend([' Ratio'])
plt.show()

The ratio does look like it moved around a stable mean.The absolute ratio isn’t very useful in statistical terms. It is more helpful to normalize our signal by treating it as a z-score. Z score is defined as:

Z Score (Value) = (Value — Mean) / Standard Deviation

WARNING

In practice this is usually done to try to give some scale to the data, but this assumes an underlying distribution. Usually normal. However, much financial data is not normally distributed, and we must be very careful not to simply assume normality, or any specific distribution when generating statistics. The true distribution of ratios could be very fat-tailed and prone to extreme values messing up our model and resulting in large losses.

def zscore(series):
    return (series - series.mean()) / np.std(series)

Z Score of Price Ratio between MSFT and ADBE from 2008–2017

zscore(ratios).plot()
plt.axhline(zscore(ratios).mean())
plt.axhline(1.0, color=’red’)
plt.axhline(-1.0, color=’green’)
plt.show()

It’s easier to now observe the ratio now moves around the mean, but sometimes is prone to large divergences from the mean, which we can take advantages of.

Now that we’ve talked about the basics of pair trading strategy, and identified co-integrated securities based on historical price, let’s try to develop a trading signal. First, let’s recap the steps in developing a trading signal using data techniques:

Collect reliable Data and clean Data
Create features from data to identify a trading signal/logic
Features can be moving averages or ratios of price data, correlations or more complex signals — combine these to create new features
Generate a trading signal using these features, i.e which instruments are a buy, a sell or neutral

Step 1: Setup your problem

Here we are trying to create a signal that tells us if the ratio is a buy or a sell at the next instant in time, i.e our prediction variable Y:

Y = Ratio is buy (1) or sell (-1)

Y(t)= Sign( Ratio(t+1) — Ratio(t) )

Note we don’t need to predict actual stock prices, or even actual value of ratio (though we could), just the direction of next move in ratio

Step 2: Collect Reliable and Accurate Data

Auquan Toolbox is your friend here! You only have to specify the stock you want to trade and the datasource to use, and it pulls the required data and cleans it for dividends and stock splits. So our data here is already clean.

We are using the following data from Yahoo at daily intervals for trading days over last 10 years (~2500 data points): Open, Close, High, Low and Trading Volume

Step 3: Split Data

Don’t forget this super important step to test accuracy of your models. We’re using the following Training/Validation/Test Split

Training 7 years ~ 70%
Test ~ 3 years 30%

ratios = data['ADBE'] / data['MSFT']
print(len(ratios))
train = ratios[:1762]
test = ratios[1762:]

Ideally we should also make a validation set but we will skip this for now.

Step 4: Feature Engineering

What could relevant features be? We want to predict the direction of ratio move. We’ve seen that our two securities are cointegrated so the ratio tends to move around and revert back to the mean. It seems our features should be certain measures for the mean of the ratio, the divergence of the current value from the mean to be able to generate our trading signal.

Let’s use the following features:

60 day Moving Average of Ratio: Measure of rolling mean
5 day Moving Average of Ratio: Measure of current value of mean
60 day Standard Deviation
z score: (5d MA — 60d MA) /60d SD

ratios_mavg5 = train.rolling(window=5,
                               center=False).mean()

ratios_mavg60 = train.rolling(window=60,
                               center=False).mean()

std_60 = train.rolling(window=60,
                        center=False).std()

zscore_60_5 = (ratios_mavg5 - ratios_mavg60)/std_60
plt.figure(figsize=(15,7))
plt.plot(train.index, train.values)
plt.plot(ratios_mavg5.index, ratios_mavg5.values)
plt.plot(ratios_mavg60.index, ratios_mavg60.values)

plt.legend(['Ratio','5d Ratio MA', '60d Ratio MA'])

plt.ylabel('Ratio')
plt.show()

60d and 5d MA of Price Ratios

plt.figure(figsize=(15,7))
zscore_60_5.plot()
plt.axhline(0, color='black')
plt.axhline(1.0, color='red', linestyle='--')
plt.axhline(-1.0, color='green', linestyle='--')
plt.legend(['Rolling Ratio z-Score', 'Mean', '+1', '-1'])
plt.show()

60–5 ZScore of Price Ratio

The Z Score of the rolling means really brings out the mean reverting nature of the ratio!

Step 5: Model Selection

Let’s start with a really simple model. Looking at the z-score chart, we can see that whenever the z-score feature gets too high, or too low, it tends to revert back. Let’s use +1/-1 as our thresholds for too high and too low, then we can use the following model to generate a trading signal:

Ratio is buy (1) whenever the z-score is below -1.0 because we expect z score to go back up to 0, hence ratio to increase
Ratio is sell(-1) when the z-score is above 1.0 because we expect z score to go back down to 0, hence ratio to decrease

Step 6: Train, Validate and Optimize

Finally, let’s see how our model actually does on real data? Let’s see what this signal looks like on actual ratios

# Plot the ratios and buy and sell signals from z score
plt.figure(figsize=(15,7))

train[60:].plot()
buy = train.copy()
sell = train.copy()
buy[zscore_60_5>-1] = 0
sell[zscore_60_5<1] = 0
buy[60:].plot(color=’g’, linestyle=’None’, marker=’^’)
sell[60:].plot(color=’r’, linestyle=’None’, marker=’^’)
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,ratios.min(),ratios.max()))
plt.legend([‘Ratio’, ‘Buy Signal’, ‘Sell Signal’])
plt.show()

Buy and Sell Signal on Price Ratios

The signal seems reasonable, we seem to sell the ratio (red dots) when it is high or increasing and buy it back when it's low (green dots) and decreasing. What does that mean for actual stocks that we are trading? Let’s take a look

# Plot the prices and buy and sell signals from z score
plt.figure(figsize=(18,9))
S1 = data['ADBE'].iloc[:1762]
S2 = data['MSFT'].iloc[:1762]

S1[60:].plot(color='b')
S2[60:].plot(color='c')
buyR = 0*S1.copy()
sellR = 0*S1.copy()

# When buying the ratio, buy S1 and sell S2
buyR[buy!=0] = S1[buy!=0]
sellR[buy!=0] = S2[buy!=0]
# When selling the ratio, sell S1 and buy S2 
buyR[sell!=0] = S2[sell!=0]
sellR[sell!=0] = S1[sell!=0]

buyR[60:].plot(color='g', linestyle='None', marker='^')
sellR[60:].plot(color='r', linestyle='None', marker='^')
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,min(S1.min(),S2.min()),max(S1.max(),S2.max())))

plt.legend(['ADBE','MSFT', 'Buy Signal', 'Sell Signal'])
plt.show()

Buy and Sell Signals for MSFT and ADBE stocks

Notice how we sometimes make money on the short leg and sometimes on the long leg, and sometimes both.

We’re happy with our signal on the training data. Let’s see what kind of profits this signal can generate. We can make a simple backtester which buys 1 ratio (buy 1 ADBE stock and sell ratio x MSFT stock) when ratio is low, sell 1 ratio (sell 1 ADBE stock and buy ratio x MSFT stock) when it’s high and calculate PnL of these trades.

# Trade using a simple strategy
def trade(S1, S2, window1, window2):
    
    # If window length is 0, algorithm doesn't make sense, so exit
    if (window1 == 0) or (window2 == 0):
        return 0
    
    # Compute rolling mean and rolling standard deviation
    ratios = S1/S2
    ma1 = ratios.rolling(window=window1,
                               center=False).mean()
    ma2 = ratios.rolling(window=window2,
                               center=False).mean()
    std = ratios.rolling(window=window2,
                        center=False).std()
    zscore = (ma1 - ma2)/std
    
    # Simulate trading
    # Start with no money and no positions
    money = 0
    countS1 = 0
    countS2 = 0
    for i in range(len(ratios)):
        # Sell short if the z-score is > 1
        if zscore[i] > 1:
            money += S1[i] - S2[i] * ratios[i]
            countS1 -= 1
            countS2 += ratios[i]
            print('Selling Ratio %s %s %s %s'%(money, ratios[i], countS1,countS2))
        # Buy long if the z-score is < 1
        elif zscore[i] < -1:
            money -= S1[i] - S2[i] * ratios[i]
            countS1 += 1
            countS2 -= ratios[i]
            print('Buying Ratio %s %s %s %s'%(money,ratios[i], countS1,countS2))
        # Clear positions if the z-score between -.5 and .5
        elif abs(zscore[i]) < 0.75:
            money += S1[i] * countS1 + S2[i] * countS2
            countS1 = 0
            countS2 = 0
            print('Exit pos %s %s %s %s'%(money,ratios[i], countS1,countS2))
            
            
    return money

trade(data['ADBE'].iloc[:1763], data['MSFT'].iloc[:1763], 5, 60)

629.71

So that strategy seems profitable! Now we can optimize further by changing our moving average windows, by changing the thresholds for buy/sell and exit positions etc and check for performance improvements on validation data.

We could also try more sophisticated models like Logisitic Regression, SVM etc to make our 1/-1 predictions.

For now, let’s say we decide to go forward with this model, this brings us to

Step 7: Backtest on Test Data

Backtesting is simple, we can just use our function from above to see PnL on test data

trade(data[‘ADBE’].iloc[1762:], data[‘MSFT’].iloc[1762:], 5, 60)

1017.61

The model does quite well! This makes our first simple pairs trading model.

Avoid Overfitting

Before ending the discussion, we’d like to give special mention to overfitting. Overfitting is the most dangerous pitfall of a trading strategy. An overfit algorithm may perform wonderfully on a backtest but fails miserably on new unseen data — this mean it has not really uncovered any trend in data and no real predictive power. Let’s take a simple example.

In our model, we used rolling parameter estimates and may wish to optimize window length. We may decide to simply iterate over all possible, reasonable window length and pick the length based on which our model performs the best . Below we write a simple loop to to score window lengths based on pnl of training data and find the best one.

# Find the window length 0-254 
# that gives the highest returns using this strategy
length_scores = [trade(data['ADBE'].iloc[:1762], 
                data['MSFT'].iloc[:1762], l, 5) 
                for l in range(255)]
best_length = np.argmax(length_scores)
print ('Best window length:', best_length)

('Best window length:', 246)

Now we check the performance of our model on test data and we find that this window length is far from optimal! This is because our original choice was clearly overfitted to the sample data.

# Find the returns for test data
# using what we think is the best window length
length_scores2 = [trade(data['ADBE'].iloc[1762:], 
                  data['MSFT'].iloc[1762:],l,5) 
                  for l in range(255)]
print (best_length, 'day window:', length_scores2[best_length])

# Find the best window length based on this dataset, 
# and the returns using this window length
best_length2 = np.argmax(length_scores2)
print (best_length2, 'day window:', length_scores2[best_length2])

(1, 'day window:', 10.06)
(218, 'day window:', 527.92)

Clearly fitting to our sample data doesn't always give good results in the future. Just for fun, let's plot the length scores computed from the two datasets

plt.figure(figsize=(15,7))
plt.plot(length_scores)
plt.plot(length_scores2)
plt.xlabel('Window length')
plt.ylabel('Score')
plt.legend(['Training', 'Test'])
plt.show()

We can see that anything above about 90 would be a good choice for our window.

To avoid overfitting, we can use economic reasoning or the nature of our algorithm to pick our window length. We can also use Kalman filters, which do not require us to specify a length; this method will be covered in another notebook later.

Next Steps

In this post, we presented some simple introductory approaches to demonstrate the process of developing a pairs trading strategy. In practice one should use more sophisticated statistics, some of which are listed here

Hurst exponent
Half-life of mean reversion inferred from an Ornstein–Uhlenbeck process
Kalman filters

Pairs Trading using Data-Driven Techniques: Simple Trading Strategies Part 3 was originally published in auquan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Time Series Analysis for Financial Data VI— GARCH model and predicting SPX returns

Auquan — Wed, 13 Dec 2017 07:06:01 GMT

Download the iPython notebook here

In this mini series on Time Series modelling for Financial Data, so far we’ve used AR, MA and a combination of these models on asset prices to try and model how our asset behaves. We’ve found that we were able to model certain time periods well with these models and failed at other times.

This was because of volatility clustering or heteroskedasticity. In this post, we will discuss conditional heteroskedasticity, leading us to our first conditional heteroskedastic model, known as ARCH. Then we will discuss extensions to ARCH, leading us to the famous Generalised Autoregressive Conditional Heteroskedasticity model of order p,q, also known as GARCH(p,q). GARCH is used extensively within the financial industry as many asset prices are conditional heteroskedastic.

Let’s do a quick recap first:

We have considered the following models so far in this series (it is recommended reading the series in order if you have not done so already):

Now we are at the final piece of the puzzle. We need a model to examine conditional heteroskedasticity in financial series that exhibit volatility clustering.

What is conditional heteroskedasticity?

Conditional heteroskedasticity exists in finance because asset returns are volatile.

A collection of random variables is heteroskedastic if there are subsets of variables within the larger set that have a different variance from the remaining variables.

Consider a day when equities markets undergo a substantial drop. The market gets into panic mode, automated risk management systems start getting of their long positions by selling their positions and all of this leads to a further fall in prices. An increase in variance from the initial price drop leads to to significant further downward volatility.

That is, an increase in variance is serially correlated to a further increase in variance in such a “sell-off” period. Or looking at it the other way around, a period of of increased variance is conditional on an initial sell-off . Thus we say that such series are conditional heteroskedastic.

Conditionally heteroskedastic(CH) series are non stationary since its variance is not constant in time. One of the challenging aspects of conditional heteroskedastic series is ACF plots of a series with volatility might still appear to be a realisation of stationary discrete white noise.

How can we incorporate CH in our model? One way could be to create an AR model for the variance itself — a model that actually accounts for the changes in the variance over time using past values of the variance.

This is the basis of the Autoregressive Conditional Heteroskedastic (ARCH) model.

Autoregressive Conditionally Heteroskedastic Models — ARCH(p)

ARCH(p) model is simply an AR(p) model applied to the variance of a time series.

ARCH(1) is given by:

Var(x(t)) = σ²(t) = ⍺*σ²(t-1) + ⍺1

The actual time series is given by:

x(t) = w(t)* σ(t) = w(t)* ⎷(⍺*σ²(t-1) + ⍺1)

where w(t) is white noise

When To Apply ARCH(p)?

Let’s say we fit an AR(p) model and the residuals look almost like white noise but we are concerned about decay of the p lag on a ACF plot of the series. If we find that we can apply an AR(p) to the square of residuals as well, then we have an indication that an ARCH(p) process may be appropriate.

Note that ARCH(p) should only ever be applied to a series that has already had an appropriate model fitted sufficient to leave the residuals looking like discrete white noise. Since we can only tell whether ARCH is appropriate or not by squaring the residuals and examining the ACF, we also need to ensure that the mean of the residuals is zero.

ARCH should only ever be applied to series that do not have any trends or seasonal effects, i.e. that has no (evident) serially correlation. ARIMA is often applied to such a series, at which point ARCH may be a good fit.

# Simulate ARCH(1) series
# Var(yt) = a_0 + a_1*y{t-1}**2
# if a_1 is between 0 and 1 then yt is white noise

np.random.seed(13)

a0 = 2
a1 = .5

y = w = np.random.normal(size=1000)
Y = np.empty_like(y)

for t in range(len(y)):
    y[t] = w[t] * np.sqrt((a0 + a1*y[t-1]**2))

# simulated ARCH(1) series, looks like white noise
tsplot(y, lags=30)

ARCH(1) series

Notice the time series looks just like white noise. However, let’s see what happens when we plot the square of the series.

tsplot(y**2, lags=30)

Square of ARCH(1) series

Now the ACF, and PACF seem to show significance at lag 1 indicating an AR(1) model for the variance may be appropriate.

An obvious question to ask at this stage is if we are going to apply an AR(p) process to the variance, why not a Moving Average MA(q) model as well? Or a mixed model such as ARMA(p,q)?

This is actually the motivation for the Generalised ARCH model, known as GARCH.

Generalized Autoregressive Conditionally Heteroskedastic Models — GARCH(p,q)

Just like ARCH(p) is AR(p) applied to the variance of a time series, GARCH(p, q) is an ARMA(p,q) model applied to the variance of a time series. The AR(p) models the variance of the residuals (squared errors) or simply our time series squared. The MA(q) portion models the variance of the process.

The GARCH(1,1) model is:

σ²(t) = a*σ²(t-1) + b*e²(t-1) + w

(a+b) must be less than 1 or the model is unstable. We can simulate a GARCH(1, 1) process below.

# Simulating a GARCH(1, 1) process

np.random.seed(2)

a0 = 0.2
a1 = 0.5
b1 = 0.3

n = 10000
w = np.random.normal(size=n)
eps = np.zeros_like(w)
sigsq = np.zeros_like(w)

for i in range(1, n):
    sigsq[i] = a0 + a1*(eps[i-1]**2) + b1*sigsq[i-1]
    eps[i] = w[i] * np.sqrt(sigsq[i])

_ = tsplot(eps, lags=30)

GARCH(1,1) process

Again, notice that overall this process closely resembles white noise, however take a look when we view the squared eps series.

_ = tsplot(eps**2, lags=30)

Square of GARCH(1,1) process

There is substantial evidence of a conditionally heteroskedastic process via the decay of successive lags. The significance of the lags in both the ACF and PACF indicate we need both AR and MA components for our model. Let’s see if we can recover our process parameters using a GARCH(1, 1) model. Here we make use of the arch_model function from the ARCH package.

# Fit a GARCH(1, 1) model to our simulated EPS series
# We use the arch_model function from the ARCH package

am = arch_model(eps)
res = am.fit(update_freq=5)
print(res.summary())

Iteration:      5,   Func. Count:     38,   Neg. LLF: 12311.7950557
Iteration:     10,   Func. Count:     71,   Neg. LLF: 12238.5926559
Optimization terminated successfully.    (Exit mode 0)
            Current function value: 12237.3032673
            Iterations: 13
            Function evaluations: 89
            Gradient evaluations: 13
                     Constant Mean - GARCH Model Results                      
====================================================================
Dep. Variable:                 y   R-squared:                 -0.000
Mean Model:        Constant Mean   Adj. R-squared:            -0.000
Vol Model:                 GARCH   Log-Likelihood:          -12237.3
Distribution:             Normal   AIC:                      24482.6
Method:       Maximum Likelihood   BIC:                      24511.4
                                   No. Observations:           10000
Date:           Tue, Feb 28 2017   Df Residuals:                9996
Time:                   20:52:48   Df Model:                       4
                             Mean Model                                  
====================================================================
                coef    std err      t  P>|t|     95.0% Conf. Int.
--------------------------------------------------------------------
mu       -6.7225e-03  6.735e-03 -0.998  0.318 [-1.992e-02,6.478e-03]
                            Volatility Model                            
====================================================================
               coef    std err        t      P>|t|  95.0% Conf. Int.
--------------------------------------------------------------------
omega        0.2021  1.043e-02   19.383  1.084e-83 [  0.182,  0.223]
alpha[1]     0.5162  2.016e-02   25.611 1.144e-144 [  0.477,  0.556]
beta[1]      0.2879  1.870e-02   15.395  1.781e-53 [  0.251,  0.325]
====================================================================

Covariance estimator: robust

We can see that the true parameters all fall within the respective confidence intervals.

Application to Financial Time Series

Now apply the procedure to a financial time series. Here we’re going to use SPX returns. The process is as follows:

Iterate through combinations of ARIMA(p, d, q) models to best fit our time series.
Pick the GARCH model orders according to the ARIMA model with lowest AIC.
Fit the GARCH(p, q) model to our time series.
Examine the model residuals and squared residuals for autocorrelation

Here, we first try to fit SPX return to an ARIMA process and find the best order.

import auquanToolbox.dataloader as dl

end = ‘2017–01–01’
start = ‘2010–01–01’
symbols = [‘SPX’]
data = dl.load_data_nologs(‘nasdaq’, symbols , start, end)[‘ADJ CLOSE’]
# log returns
lrets = np.log(data/data.shift(1)).dropna()

def _get_best_model(TS):
    best_aic = np.inf 
    best_order = None
    best_mdl = None

    pq_rng = range(5) # [0,1,2,3,4]
    d_rng = range(2) # [0,1]
    for i in pq_rng:
        for d in d_rng:
            for j in pq_rng:
                try:
                    tmp_mdl = smt.ARIMA(TS, order=(i,d,j)).fit(
                        method='mle', trend='nc'
                    )
                    tmp_aic = tmp_mdl.aic
                    if tmp_aic < best_aic:
                        best_aic = tmp_aic
                        best_order = (i, d, j)
                        best_mdl = tmp_mdl
                except: continue
    print('aic: {:6.2f} | order: {}'.format(best_aic, best_order))                    
    return best_aic, best_order, best_mdl

TS = lrets.SPX
res_tup = _get_best_model(TS)

aic: -11323.07 | order: (3, 0, 3)

order = res_tup[1]
model = res_tup[2]

Since we've already taken the log of returns, we should expect our integrated component d to equal zero, which it does. We find the best model is ARIMA(3,0,3). Now we plot the residuals to decide if they possess evidence of conditional heteroskedastic behaviour

tsplot(model.resid, lags=30)

We find the residuals look like white noise. Let’s look at the square of residuals

tsplot(model.resid**2, lags=30)

We can see clear evidence of autocorrelation in squared residuals. Let’s fit a GARCH model and see how it does.

# Now we can fit the arch model using the best fit arima model parameters

p_ = order[0]
o_ = order[1]
q_ = order[2]


am = arch_model(model.resid, p=p_, o=o_, q=q_, dist='StudentsT')
res = am.fit(update_freq=5, disp='off')
print(res.summary())

              Constant Mean - GARCH Model Results                         
====================================================================
Dep. Variable:                    None   R-squared:       -56917.881
Mean Model:              Constant Mean   Adj. R-squared:  -56917.881
Vol Model:                       GARCH   Log-Likelihood:    -4173.44
Distribution: Standardized Student's t   AIC:                8364.88
Method:             Maximum Likelihood   BIC:                8414.15
                                         No. Observations:      1764
Date:                 Tue, Feb 28 2017   Df Residuals:          1755
Time:                         20:53:30   Df Model:                 9
                               Mean Model                               
====================================================================
               coef    std err        t      P>|t|  95.0% Conf. Int.
--------------------------------------------------------------------
mu          -2.3189  9.829e-03 -235.934      0.000 [ -2.338, -2.300]
                            Volatility Model                              
====================================================================
               coef  std err      t     P>|t|       95.0% Conf. Int.
--------------------------------------------------------------------
omega    1.2926e-04 2.212e-04 0.584     0.559 [-3.043e-04,5.628e-04]
alpha[1]     0.0170 1.547e-02 1.099     0.272 [-1.332e-02,4.733e-02]
alpha[2]     0.4638 0.207     2.241 2.500e-02    [5.824e-02,  0.869]
alpha[3]     0.5190 0.213     2.437 1.482e-02      [  0.102,  0.937]
beta[1]  7.9655e-05 0.333 2.394e-04     1.000      [ -0.652,  0.652]
beta[2]  3.8056e-05 0.545 6.980e-05     1.000      [ -1.069,  1.069]
beta[3]  1.6184e-03 0.312 5.194e-03     0.996      [ -0.609,  0.612]
                              Distribution                              
====================================================================
               coef    std err        t      P>|t|  95.0% Conf. Int.
--------------------------------------------------------------------
nu           7.7912      0.362   21.531 8.018e-103 [  7.082,  8.500]
====================================================================

Covariance estimator: robust

Let’s plot the residuals again

tsplot(res.resid, lags=30)

The plots looks like a realisation of a discrete white noise process, indicating a good fit. Let’s plot a square of residuals to be sure

tsplot(res.resid**2, lags=30)

We have what looks like a realisation of a discrete white noise process, indicating that we have “explained” the serial correlation present in the squared residuals with an appropriate mixture of ARIMA(p,d,q) and GARCH(p,q).

Next Steps — Sample Trading Strategy

We are now at the point in our time series analysis where we have studied ARIMA and GARCH, allowing us to fit a combination of these models to a stock market index, and to determine if we have achieved a good fit or not.

The next step is to actually produce forecasts of future daily returns values from this combination and use it to create a basic trading strategy for the S&P500.

import auquanToolbox.dataloader as dl

end = '2016-11-30'
start = '2000-01-01'
symbols = ['SPX']
data = dl.load_data_nologs('nasdaq', symbols ,
                           start, end)['ADJ CLOSE']
# log returns
lrets = np.log(data/data.shift(1)).dropna()

Strategy Overview

Let’s try to create a simple strategy using our knowledge so far about ARIMA and GARCH models. The idea of this strategy is as below:

Fit an ARIMA and GARCH model everyday on log of S&P 500 returns for previous T days
Use the combined model to make a prediction for the next day’s return
If the prediction is positive, buy the stock and if negative, short the stock at today’s close
If the prediction is the same as the previous day then do nothing

Strategy Implementation

Let’s start by choosing an appropriate window T of previous days we are going to use to make a prediction. We are going to use T = 252 (1 year), but this parameter should be optimised in order to improve performance or reduce drawdown.

windowLength = 252

We will now attempt to generate a trading signal for length(data)- T days

foreLength = len(lrets) - windowLength
signal = 0*lrets[-foreLength:]

To backtest our strategy, let’s loop through every day in the trading data and fit an appropriate ARIMA and GARCH model to the rolling window of length 252. We’ve defined the functions to fit ARIMA and GARCH above (Given that we try 32 separate ARIMA fits and fit a GARCH model, for each day, the indicator can take a long time to generate)

for d in range(foreLength):
    
    # create a rolling window by selecting 
    # values between d+1 and d+T of S&P500 returns
    
    TS = lrets[(1+d):(windowLength+d)] 
    
    # Find the best ARIMA fit 
    # set d = 0 since we've already taken log return of the series
    res_tup = _get_best_model(TS)
    order = res_tup[1]
    model = res_tup[2]
    
    #now that we have our ARIMA fit, we feed this to GARCH model
    p_ = order[0]
    o_ = order[1]
    q_ = order[2]
    
    am = arch_model(model.resid, p=p_, o=o_, q=q_, dist='StudentsT')
    res = am.fit(update_freq=5, disp='off')
    
    # Generate a forecast of next day return using our fitted model
    out = res.forecast(horizon=1, start=None, align='origin')
    
    #Set trading signal equal to the sign of forecasted return
    # Buy if we expect positive returns, sell if negative
      
    signal.iloc[d] = np.sign(out.mean['h.1'].iloc[-1])

Note: The backtest is doesn't take commission or slippage into account, hence the performance achieved in a real trading system would be lower than what you see here.

Strategy Results

Now that we have generated our signals, we need to compare its performance to ‘Buy and Hold’: what would our returns be if we simply bought the S&P 500 at the start of our backtest period.

returns = pd.DataFrame(index = signal.index, 
                       columns=['Buy and Hold', 'Strategy'])
returns['Buy and Hold'] = lrets[-foreLength:]
returns['Strategy'] = signal['SPX']*returns['Buy and Hold']

eqCurves = pd.DataFrame(index = signal.index, 
                       columns=['Buy and Hold', 'Strategy'])
eqCurves['Buy and Hold']=returns['Buy and Hold'].cumsum()+1
eqCurves['Strategy'] = returns['Strategy'].cumsum()+1

eqCurves['Strategy'].plot(figsize=(10,8))
eqCurves['Buy and Hold'].plot()
plt.legend()
plt.show()

Long/Short SPX strategy based GARCH + ARIMA model from 2000–2016

We find the model does outperform a naive Buy and Hold strategy. However, the model doesn’t perform well all the time, you can see majority of the gains have happened during short durations in 2000–2001 and 2008. It seems there are certain market conditions when the model does exceedingly well.

Long/Short SPX strategy based GARCH + ARIMA model from 2000–2003

In periods of high volatility, or when S&P 500 had periods of ‘sell-off’ , such as 2000–2002 or the crash of 2008–09, the strategy does extremely well, possibly because our GARCH model captures the conditional volatility well. During periods of uptrend in S&P500, such as the bull run from 2002–2007 the model performs on par with S&P 500.

Long/Short SPX strategy based GARCH + ARIMA model from 2003–2007 bull period

In the current bull run from 2009, the model has performed poorly compared to S&P 500. The index behaved like what looks to be more a stochastic trend, the model performance suffered in this duration.

There are some caveats here: We don’t account for slippages or trading costs here, which would significantly eat into profits. Also, we’ve performed a backtest on a stock market index and not a tradeable instrument. Ideally, we should perform the same modelling and backtest on S&P500 futures or a Exchange Traded Fund (ETF) like SPY .

Long/Short SPX strategy based GARCH + ARIMA model during crash of 2008–09

This strategy can be easily applied to other stock market indices, other regions, equities or other asset classes.

Long/Short SPX strategy based GARCH + ARIMA model from 2009-present

You should try researching other instruments, playing with window parameters and see if you can make improvements on the results presented here. Other improvements to the strategy could include buying/selling only if predicted returns are more or less than a certain threshold, incorporating variance of prediction into the strategy etc.

If you do find interesting strategies, participate in our competition, QuantQuest and earn profit shares on your strategies!

Time Series Analysis for Financial Data VI— GARCH model and predicting SPX returns was originally published in auquan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Time Series Analysis for Financial Data V — ARIMA Models

Auquan — Thu, 07 Dec 2017 16:01:01 GMT

Download IPython Notebook here.

In the previous posts in this series, we combined the Autoregressive models and Moving Average models to produce Auto Regressive Moving Average(ARMA) models. We found that we were still unable to fully explain autocorrelation or obtain residuals that are discrete white noise.

Let’s further extend this discussion of merging AR and MA models to Auto Regressive Integrated Moving Average(ARIMA) models and see what we get.

Autoregressive Integrated Moving Average Models — ARIMA(p, d, q)

ARIMA is a natural extension to the class of ARMA models — they can reduce a non-stationary series to a stationary series using a sequence of differences.

We’ve seen that many of our TS are not stationary, however they can be made stationary by differencing. We saw an example of this in part 1 of the post when we took the first difference of non-stationary Guassian random walk and proved that it equals stationary white noise.

ARIMA essentially performs same function, but does so repeatedly, d times, in order to reduce a non-stationary series to a stationary one.

A time series x(t), is integrated of order d if differencing the series d times results in a discrete white noise series.

A time series x(t) is ARIMA(p,d,q) model if the series is differenced d times, and it then follows an ARMA(p,q) process.

Let’s simulate an ARIMA(2,1,1) model, with alphas equal to [0.5,-0.25] and beta = [-0.5] . We will fit an ARIMA model to our simulated data, attempt to recover the parameters.

# Simulate an ARIMA(2,1,1) model 
# alphas=[0.5,-0.25] 
# betas=[-0.5]

max_lag = 30

n = int(5000)
burn = 2000

alphas = np.array([0.5,-0.25])
betas = np.array([-0.5])

ar = np.r_[1, -alphas]
ma = np.r_[1, betas]

arma11 = smt.arma_generate_sample(ar=ar, ma=ma, nsample=n, burnin=burn)
arima111 = arma11.cumsum()
_ = tsplot(arima111, lags=max_lag)

ARIMA(2,1,1) model

# Fit ARIMA(p, d, q) model
# pick best order and final model based on aic

best_aic = np.inf 
best_order = None
best_mdl = None

pq_rng = range(5) # [0,1,2,3]
d_rng = range(2) # [0,1]
for i in pq_rng:
    for d in d_rng:
        for j in pq_rng:
            try:
                tmp_mdl = smt.ARIMA(arima111, 
                                    order=(i,d,j)).fit(method='mle',
                                    trend='nc')
                tmp_aic = tmp_mdl.aic
                if tmp_aic < best_aic:
                    best_aic = tmp_aic
                    best_order = (i, d, j)
                    best_mdl = tmp_mdl
            except: continue


print('aic: %6.2f | order: %s'%(best_aic, best_order))

# ARIMA model resid plot
_ = tsplot(best_mdl.resid, lags=30)

Residuals of ARIMA(2,1,1) fit

aic: 14227.34 | order: (2, 1, 1)

As expected, we predict a ARIMA(2,1,1) model and the residuals looking like a realisation of discrete white noise:

sms.diagnostic.acorr_ljungbox(best_mdl.resid, lags=[20], boxpierce=False)

from statsmodels.stats.stattools import jarque_bera

score, pvalue, _, _ = jarque_bera(mdl.resid)

if pvalue < 0.10:
    print 'The residuals may not be normally distributed.'
else:
    print 'The residuals seem normally distributed.'

(array([ 13.88378716]), array([ 0.83633895]))
The residuals seem normally distributed.

We perform the Ljung-Box test and find the p-value is significantly larger than 0.05 and as such we can state that there is strong evidence for discrete white noise being a good fit to the residuals. Hence, the ARIMA(2,1,1) model is a good fit, as expected.

Modelling SPX returns

Let’s now iterate through a non-trivial number of combinations of (p, d, q) orders, to find the best ARIMA model to fit SPX returns. We use the AIC to evaluate each model. The lowest AIC wins.

# Fit ARIMA(p, d, q) model to SPX log returns
# pick best order and final model based on aic

best_aic = np.inf 
best_order = None
best_mdl = None

pq_rng = range(5) # [0,1,2,3]
d_rng = range(2) # [0,1]
for i in pq_rng:
    for d in d_rng:
        for j in pq_rng:
            try:
                tmp_mdl = smt.ARIMA(lrets.SPX, 
                          order=(i,d,j)).fit(method='mle',
                          trend='nc')
                tmp_aic = tmp_mdl.aic
                if tmp_aic < best_aic:
                    best_aic = tmp_aic
                    best_order = (i, d, j)
                    best_mdl = tmp_mdl
            except: continue


print('aic: {:6.2f} | order: {}'.format(best_aic, best_order))

# ARIMA model resid plot
_ = tsplot(best_mdl.resid, lags=30)

Residuals of Modelling SPX returns from 2007–2015 as ARIMA(3,0,2) model

aic: -11515.95 | order: (3, 0, 2)

Note that the best model has a differencing of 0. This is expected because we already took the first difference of log prices to calculate the stock returns. The result is essentially identical to the ARMA(3, 2) model we fit in the previous post. Clearly this ARIMA model has not explained the conditional volatility in the series either! The ljung box test below also shows a pvalue of less than 0.05

sms.diagnostic.acorr_ljungbox(best_mdl.resid, lags=[20], boxpierce=False

(array([ 39.20689685]), array([ 0.00628326]))

Excluding periods of Conditional Volatility

Let’s now try the same model on SPX data from 2010–2016

end = '2016-01-01'
start = '2010-01-01'
symbols = ['SPX']
data = dl.load_data_nologs('nasdaq', symbols , start, end)['ADJ CLOSE']
# log returns
lrets = np.log(data/data.shift(1)).dropna()

# Fit ARIMA(p, d, q) model to SPX log returns
# pick best order and final model based on aic

best_aic = np.inf 
best_order = None
best_mdl = None

pq_rng = range(5) # [0,1,2,3]
d_rng = range(2) # [0,1]
for i in pq_rng:
    for d in d_rng:
        for j in pq_rng:
            try:
                tmp_mdl = smt.ARIMA(lrets.SPX, 
                order=(i,d,j)).fit(method='mle', trend='nc')
                tmp_aic = tmp_mdl.aic
                if tmp_aic < best_aic:
                    best_aic = tmp_aic
                    best_order = (i, d, j)
                    best_mdl = tmp_mdl
            except: continue

print('aic: {:6.2f} | order: {}'.format(best_aic, best_order))

# ARIMA model resid plot
_ = tsplot(best_mdl.resid, lags=30)

Residuals of Modelling SPX returns from 2010–2016 as ARIMA(3,0,3) model

aic: -9622.34 | order: (3, 0, 3)

Our residuals look much closer to white noise! How did our model suddenly improve?

sms.diagnostic.acorr_ljungbox(best_mdl.resid, lags=[20], boxpierce=False)

from statsmodels.stats.stattools import jarque_bera

score, pvalue, _, _ = jarque_bera(mdl.resid)

if pvalue < 0.10:
    print 'The residuals may not be normally distributed.'
else:
    print 'The residuals seem normally distributed.'

(array([ 18.93350068]), array([ 0.52615227]))

The residuals seem normally distributed.

The p-value of our test is now greater than 0.05!

We deliberately truncated the S&P500 data to start from 2010 onwards, which conveniently excludes the volatile periods around 2007–2008. Hence we have excluded a large portion of the S&P500 where we had excessive volatility clustering. This impacts the serial correlation of the series and hence has the effect of making the series seem “more stationary” than it has been in the past.

This is a very important point. When analysing time series we need to be extremely careful of conditionally heteroscedastic series, such as stock market indexes. In quantitative finance, trying to determine periods of differing volatility is often known as “regime detection”. It is one of the harder tasks to achieve!

Time Series Forecasting

Finally, we are able to do what we actually set out to do! We have at least accumulated enough knowledge to make a simple forecast of future returns. We use statmodels forecast() method — we need to provide the number of time steps to predict, and a decimal for the alpha argument to specify the confidence intervals. The default setting is 95% confidence. For 99% set alpha equal to 0.01.

# Create a 21 day forecast of SPY returns with 95%, 99% CI

n_steps = 21

f, err95, ci95 = best_mdl.forecast(steps=n_steps) # 95% CI
_, err99, ci99 = best_mdl.forecast(steps=n_steps, alpha=0.01) # 99% 

idx = pd.date_range(data.index[-1], periods=n_steps, freq='D')
fc_95 = pd.DataFrame(np.column_stack([f, ci95]), index=idx, columns=
                     ['forecast', 'lower_95', 'upper_95'])
fc_99 = pd.DataFrame(np.column_stack([ci99]), index=idx, columns=
                     ['lower_99', 'upper_99'])
fc_all = fc_95.combine_first(fc_99)
fc_all.head()

           |forecast | lower_95  |  lower_99 | upper_95 | upper_99 |
--------------------------------------------------------------------
2015-12-31 |-0.00079 | -0.020347 | -0.026490 | 0.018754 | 0.024897 |
2016-01-01 |0.000004 | -0.019563 | -0.025712 | 0.019572 | 0.025721 |
2016-01-02 |0.000358 | -0.019215 | -0.025365 | 0.019931 | 0.026081 |
2016-01-03 |0.000667 | -0.018968 | -0.025138 | 0.020302 | 0.026472 |
2016-01-04 |0.000586 | -0.019051 | -0.025222 | 0.020223 | 0.026394 |

# Plot 21 day forecast for SPX returns

plt.style.use('bmh')
fig = plt.figure(figsize=(15,10))
ax = plt.gca()

ts = lrets.SPX.iloc[-500:].copy()
ts.plot(ax=ax, label='SPX Returns')
# in sample prediction
pred = best_mdl.predict(ts.index[0], ts.index[-1])
pred.plot(ax=ax, style='r-', label='In-sample prediction')

styles = ['b-', '0.2', '0.75', '0.2', '0.75']
fc_all.plot(ax=ax, style=styles)

plt.fill_between(fc_all.index, fc_all.lower_95, 
                 fc_all.upper_95, color='gray', alpha=0.7)
plt.fill_between(fc_all.index, fc_all.lower_99, 
                 fc_all.upper_99, color='gray', alpha=0.2)
plt.title('{} Day SPX Return Forecast\nARIMA{}'.format(n_steps,
                 best_order))

plt.legend(loc='best', fontsize=10)

SPX returns forecasted by ARIMA model

Now that we have the ability to fit and forecast models such as ARIMA, we’re very close to being able to create strategy indicators for trading.

You can already start analysing different time series, like difference between prices of two correlated stocks, and try the above models to check for stationarity. Once you find a model that fits the series well enough to leave white noise like residuals, you can start using that model for forecasting future values. And that’s really all there is to forecasting!

Time Series Analysis for Financial Data V — ARIMA Models was originally published in auquan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Time Series Analysis for Financial Data IV— ARMA Models

Auquan — Thu, 07 Dec 2017 15:31:01 GMT

Download IPython Notebook here.

In the previous posts in this series, we talked about Auto-Regressive Models and Moving Average Models and found that both these models only partially explained the log-returns of stock prices.

We now combine the Autoregressive models and Moving Average models to produce more sophisticated models — Auto Regressive Moving Average(ARMA) and Auto Regressive Integrated Moving Average(ARIMA) models.

Auto Regressive Moving Average(ARMA) Models

ARMA model is simply the merger between AR(p) and MA(q) models:

AR(p) models try to explain the momentum and mean reversion effects often observed in trading markets (market participant effects).
MA(q) models try to capture the shock effects observed in the white noise terms. These shock effects could be thought of as unexpected events affecting the observation process e.g. Surprise earnings, wars, attacks, etc.

ARMA model attempts to capture both of these aspects when modelling financial time series. ARMA model does not take into account volatility clustering, a key empirical phenomena of many financial time series which we will discuss later.

ARMA(1,1) model is:

x(t) = a*x(t-1) + b*e(t-1) + e(t)

is e(t) white noise with E[e(t)] = 0

An ARMA model will often require fewer parameters than an AR(p) or MA(q) model alone. That is, it is redundant in its parameters.

Let’s try to simulate an ARMA(2, 2) process with given parameters, then fit an ARMA(2, 2) model and see if it can correctly estimate those parameters. Set alphas equal to [0.5,-0.25] and betas equal to [0.5,-0.3].

import pandas as pd
import numpy as np

import statsmodels.tsa.api as smt
import statsmodels.api as sm
import scipy.stats as scs
import statsmodels.stats as sms

import matplotlib.pyplot as plt
%matplotlib inline

def tsplot(y, lags=None, figsize=(10, 8), style='bmh'):
    if not isinstance(y, pd.Series):
        y = pd.Series(y)
    with plt.style.context(style):    
        fig = plt.figure(figsize=figsize)
        layout = (3, 2)
        ts_ax = plt.subplot2grid(layout, (0, 0), colspan=2)
        acf_ax = plt.subplot2grid(layout, (1, 0))
        pacf_ax = plt.subplot2grid(layout, (1, 1))
        qq_ax = plt.subplot2grid(layout, (2, 0))
        pp_ax = plt.subplot2grid(layout, (2, 1))
        
        y.plot(ax=ts_ax)
        ts_ax.set_title('Time Series Analysis Plots')
        smt.graphics.plot_acf(y, lags=lags, ax=acf_ax, alpha=0.05)
        smt.graphics.plot_pacf(y, lags=lags, ax=pacf_ax, alpha=0.05)
        sm.qqplot(y, line='s', ax=qq_ax)
        qq_ax.set_title('QQ Plot')        
        scs.probplot(y, sparams=(y.mean(), y.std()), plot=pp_ax)

        plt.tight_layout()
    return

# Simulate an ARMA(2, 2) model
# alphas=[0.5,-0.25]
# betas=[0.5,-0.3]

max_lag = 30

n = int(5000) # lots of samples to help estimates
burn = int(n/10) # number of samples to discard before fit

alphas = np.array([0.5, -0.25])
betas = np.array([0.5, -0.3])
ar = np.r_[1, -alphas]
ma = np.r_[1, betas]

arma22 = smt.arma_generate_sample(ar=ar, ma=ma, nsample=n, burnin=burn)
_ = tsplot(arma22, lags=max_lag)

ARMA(2,2) process

mdl = smt.ARMA(arma22, order=(2, 2)).fit(
    maxlag=max_lag, method='mle', trend='nc', burnin=burn)
print(mdl.summary())

                          ARMA Model Results                              
====================================================================
Dep. Variable:              y   No. Observations:         5000
Model:             ARMA(2, 2)   Log Likelihood       -7054.211
Method:                   mle   S.D. of innovations      0.992
Date:        Mon, 27 Feb 2017   AIC                  14118.423
Time:                21:27:58   BIC                  14151.009
Sample:                     0   HQIC                 14129.844
                                                                              
====================================================================
               coef  std err        z    P>|z|    [0.025      0.975]
--------------------------------------------------------------------
ar.L1.y      0.5476    0.058    9.447    0.000     0.434       0.661
ar.L2.y     -0.2566    0.015  -17.288    0.000    -0.286      -0.228
ma.L1.y      0.4548    0.060    7.622    0.000     0.338       0.572
ma.L2.y     -0.3432    0.055   -6.284    0.000    -0.450      -0.236
                                    Roots                                    
====================================================================
               Real         Imaginary         Modulus      Frequency
--------------------------------------------------------------------
AR.1          1.0668         -1.6609j          1.9740        -0.1591
AR.2          1.0668         +1.6609j          1.9740         0.1591
MA.1         -1.1685         +0.0000j          1.1685         0.5000
MA.2          2.4939         +0.0000j          2.4939         0.0000
--------------------------------------------------------------------

If you run the above code a few times, you may notice that the confidence intervals for some coefficients may not actually contain the original parameter value. This outlines the danger of attempting to fit models to data, even when we know the true parameter values!

However, for trading purposes we just need to have a predictive power that exceeds chance and produces enough profit above transaction costs, in order to be profitable in the long run.

So how do we decide the values of p and q ?

We exapnd on the method described in previous sheet. To fit data to an ARMA model, we use the Akaike Information Criterion (AIC)across a subset of values for p,q to find the model with minimum AIC and then apply the Ljung-Box test to determine if a good fit has been achieved, for particular values of p,q. If the p-value of the test is greater the required significance, we can conclude that the residuals are independent and white noise.

# Simulate an ARMA(3, 2) model 
# alphas=[0.5,-0.4,0.25] 
# betas=[0.5,-0.3]

max_lag = 30

n = int(5000)
burn = 2000

alphas = np.array([0.5, -0.4, 0.25])
betas = np.array([0.5, -0.3])

ar = np.r_[1, -alphas]
ma = np.r_[1, betas]

arma32 = smt.arma_generate_sample(ar=ar, ma=ma, nsample=n, burnin=burn)
_ = tsplot(arma32, lags=max_lag)

ARMA(3,2) model

# pick best order by aic 
# smallest aic value wins
best_aic = np.inf 
best_order = None
best_mdl = None

rng = range(5)
for i in rng:
    for j in rng:
        try:
            tmp_mdl = smt.ARMA(arma32,
                      order=(i, j)).fit(method='mle', trend='nc')
            tmp_aic = tmp_mdl.aic
            if tmp_aic < best_aic:
                best_aic = tmp_aic
                best_order = (i, j)
                best_mdl = tmp_mdl
        except: continue


print('aic: %6.2f | order: %s'%(best_aic, best_order))

aic: 14110.88 | order: (3, 2)

sms.diagnostic.acorr_ljungbox(best_mdl.resid, lags=[20], boxpierce=False)

(array([ 11.602271]), array([ 0.92908567]))

Notice that the p-value is greater than 0.05, which states that the residuals are independent at the 95% level and thus an ARMA(3,2) model provides a good model fit (ofcourse, we knew that).

Let’s also check if the model residuals are indeed white noise

_ = tsplot(best_mdl.resid, lags=max_lag)

from statsmodels.stats.stattools import jarque_bera

score, pvalue, _, _ = jarque_bera(mdl.resid)

if pvalue < 0.10:
    print 'The residuals may not be normally distributed.'
else:
    print 'The residuals seem normally distributed.'

Residuals after finding best fit for ARMA(3,2)

The residuals seem normally distributed.

Finally, let’s fit an ARMA model to SPX returns.

import auquanToolbox.dataloader as dl

# download data
end = '2015-01-01'
start = '2007-01-01'
symbols = ['SPX','DOW','AAPL','MSFT']
data = dl.load_data_nologs('nasdaq', symbols , start, end)['ADJ CLOSE']
# log returns
lrets = np.log(data/data.shift(1)).dropna()

# Fit ARMA model to SPY returns

best_aic = np.inf 
best_order = None
best_mdl = None

rng = range(5) # [0,1,2,3,4,5]
for i in rng:
    for j in rng:
        try:
            tmp_mdl = smt.ARMA(lrets.SPX, order=(i, j)).fit(
                      method='mle', trend='nc')
            tmp_aic = tmp_mdl.aic
            if tmp_aic < best_aic:
                best_aic = tmp_aic
                best_order = (i, j)
                best_mdl = tmp_mdl
        except: continue


print('aic: {:6.2f} | order: {}'.format(best_aic, best_order))

_ = tsplot(best_mdl.resid, lags=max_lag)

Residuals after fitting ARMA(3,2) to SPX returns from 2007–2015

aic: -11515.95 | order: (3, 2)

The best fitting model has ARMA(3,2). Notice that there are some significant peaks, especially at higher lags. This is indicative of a poor fit. Let’s perform a Ljung-Box test to see if we have statistical evidence for this:

sms.diagnostic.acorr_ljungbox(best_mdl.resid, lags=[20], boxpierce=False)

(array([ 39.20681465]), array([ 0.00628341]))

As we suspected, the p-value is less that 0.05 and as such we cannot say that the residuals are a realisation of discrete white noise. Hence there is additional autocorrelation in the residuals that is not explained by the fitted ARMA(3,2) model. This is obvious from the plot of residuals as well, we can see areas of obvious conditional volatility (heteroskedasticity) that the model has not captured.

In the next post, we will take this concept of merging AR and MA models even further and discuss ARIMA models.

We will also finally talk about how everything we’ve learned so far can be used for forecasting future values of any time series. Stay tuned!

Time Series Analysis for Financial Data IV— ARMA Models was originally published in auquan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Application of Machine Learning Techniques to Trading

Auquan — Wed, 01 Nov 2017 16:41:27 GMT

Auquan recently concluded another version of QuantQuest, and this time, we had a lot of people attempt Machine Learning with our problems. It was good learning for both us and them (hopefully!). This post is inspired by our observations of some common caveats and pitfalls during the competition when trying to apply ML techniques to trading problems.

IF you haven’t read our previous posts, we recommend going through our guide on building automated systems and A Systematic Approach to Developing Trading Strategies before this post.

Creating a Trade Strategy

The final output of a trading strategy should answer the following questions:

DIRECTION: identify if an asset is cheap/expensive/fair value
ENTRY TRADE: if an asset is cheap/expensive, should you buy/sell it
EXIT TRADE: if an asset is fair priced and if we hold a position in that asset(bought or sold it earlier), should you exit that position
PRICE RANGE: which price (or range) to make this trade at
QUANTITY: Amount of capital to trade(example shares of a stock)

Machine Learning can be used to answer each of these questions, but for the rest of this post, we will focus on answering the first, Direction of trade.

Strategy Approach

There can be two types of approaches to building strategies, model based or data mining. These are essentially opposite approaches. In model-based strategy building, we start with a model of a market inefficiency, construct a mathematical representation(eg price, returns) and test it’s validity in the long term. This model is usually a simplified representation of the true complex model and it’s long term significance and stability need to verified. Common trend-following, mean reversion, arbitrage strategies fall in this category.

On the other hand, we first look for price patterns and attempt to fit an algorithm to it in data mining approach. What causes these patterns is not important, only that patterns identified will continue to repeat in the future. This is a blind approach and we need rigorous checks to identify real patterns from random patterns. Trial-and-error TA, candle patterns, regression on a large number of features fall in this category.

Clearly, Machine Learning lends itself easily to data mining approach. Let’s look into how we can use ML to create a trade signal by data mining.

You can follow along the steps in this model using this IPython notebook. The code samples use Auquan’s python based free and open source toolbox. You can install it via pip: `pip install -U auquan_toolbox`. We use scikit learn for ML models. Install it using `pip install -U scikit-learn`.

Using ML to create a Trading Strategy Signal — Data Mining

Before we begin, a sample ML problem setup looks like below

Sample ML problem setup

We create features which could have some predictive power (X), a target variable that we’d like to predict(Y) and use historical data to train a ML model that can predict Y as close as possible to the actual value. Finally, we use this model to make predictions on new data where Y is unknown. This leads to our first step:

Step 1 — Setup your problem

What are you trying to predict? What is a good prediction? How do you evaluate

In our framework above, what is Y?

What are you trying to predict?

Are you predicting Price at a future time, future Return/Pnl, Buy/Sell Signal, Optimizing Portfolio Allocation, try Efficient Execution etc?
Let’s say we’re trying to predict price at the next time stamp. In that case, Y(t) = Price(t+1). Now we can complete our framework with historical data

Note Y(t) will only be known during a backtest, but when using our model live, we won’t know Price(t+1) at time t. We make a prediction Y(Predicted,t) using our model and compare it with actual value only at time t+1. This means you cannot use Y as a feature in your predictive model.

Once we know our target, Y, we can also decide how to evaluate our predictions. This is important to distinguish between different models we will try on our data. Choose a metric that is a good indicator of our model efficiency based on the problem we are solving. For example, if we are predicting price, we can use the Root Mean Square Error as a metric. Some common metrics(RMSE, logloss, variance score etc) are pre-coded in Auquan’s toolbox and available under features.

ML frame for predicting future price

For demonstration, we’re going to use a problem from QuantQuest(Problem 1). We are going to create a prediction model that predicts future expected value of basis, where:

basis = Price of Stock — Price of Future

basis(t)=S(t)−F(t)

Y(t) = future expected value of basis = Average(basis(t+1),basis(t+2),basis(t+3),basis(t+4),basis(t+5))

Since this is a regression problem, we will evaluate the model on RMSE. We’ll also use Total Pnl as an evaluation criterion

Our Objective: Create a model so that predicted value is as close as possible to Y

Step 2: Collect Reliable Data

Collect and clean data that helps you solve the problem at hand

You need to think about what data will have predictive power for the target variable Y? If we were predicting Price, you could use Stock Price Data, Stock Trade Volume Data, Fundamental Data, Price and Volume Data of Correlated stocks, an Overall Market indicator like Stock Index Level, Price of other correlated assets etc.

You will need to setup data access for this data, and make sure your data is accurate, free of errors and solve for missing data(quite common). Also ensure your data is unbiased and adequately represents all market conditions (example equal number of winning and losing scenarios) to avoid bias in your model. You may also need to clean your data for dividends, stock splits, rolls etc.

If you’re using Auquan’s Toolbox, we provide access to free data from Google, Yahoo, NSE and Quandl. We also pre-clean the data for dividends, stock splits and rolls and load it in a format that rest of the toolbox understands.

For our demo problem, we are using the following data for a dummy stock ‘MQK’ at minute intervals for trading days over one month(~8000 data points): Stock Bid Price, Ask Price, Bid Volume, Ask Volume Future Bid Price, Ask Price, Bid Volume, Ask Volume, StockVWAP, Future VWAP. This data is already cleaned for Dividends, Splits, Rolls.

# Load the data
from backtester.dataSource.quant_quest_data_source import QuantQuestDataSource

cachedFolderName = '/Users/chandinijain/Auquan/qq2solver-data/historicalData/'
dataSetId = 'trainingData1'

instrumentIds = ['MQK']
ds = QuantQuestDataSource(cachedFolderName=cachedFolderName,
                                    dataSetId=dataSetId,
                                    instrumentIds=instrumentIds)

def loadData(ds):
    data = None
    for key in ds.getBookDataByFeature().keys():
        if data is None:
            data = pd.DataFrame(np.nan, index = ds.getBookDataByFeature()[key].index, columns=[])
        data[key] = ds.getBookDataByFeature()[key]
    data['Stock Price'] =  ds.getBookDataByFeature()['stockTopBidPrice'] + ds.getBookDataByFeature()['stockTopAskPrice'] / 2.0
    data['Future Price'] = ds.getBookDataByFeature()['futureTopBidPrice'] + ds.getBookDataByFeature()['futureTopAskPrice'] / 2.0
    data['Y(Target)'] = ds.getBookDataByFeature()['basis'].shift(-5)
    del data['benchmark_score']
    del data['FairValue']
    return data

data = loadData(ds)

Auquan’s Toolbox has downloaded and loaded the data into a dictionary of dataframes for you. We now need to prepare the data in a format we like. The function ds.getBookDataByFeature() returns a dictionary of dataframes, one dataframe per feature. We create a new data dataframe for the stock with all the features.

Step 3: Split Data

Create Training, Cross-Validation and Test Datasets from the data

This is an extremely important step! Before we proceed any further, we should split our data into training data to train your model and test data to evaluate model performance. Recommended split: 60–70% training and 30–40% test

Split Data into Training and Test Data

Since training data is used to evaluate model parameters, your model will likely be overfit to training data and training data metrics will be misleading about model performance. If you do not keep any separate test data and use all your data to train, you will not know how well or badly your model performs on new unseen data. This is one of the major reasons why well trained ML models fail on live data — people train on all available data and get excited by training data metrics, but the model fails to make any meaningful predictions on live data that it wasn’t trained on.

Split Data into Training, Validation and Test Data

There is a problem with this method. If we repeatedly train on training data, evaluate performance on test data and optimise our model till we are happy with performance we have implicitly made test data a part of training data. Eventually our model may perform well for this set of training and test data, but there is no guarantee that it will predict well on new data.

To solve for this we can create a separate validation data set. Now you can train on training data, evaluate performance on validation data, optimise till you are happy with performance, and finally test on test data. This way the test data stays untainted and we don’t use any information from test data to improve our model.

Remember once you do check performance on test data don’t go back and try to optimise your model further. If you find that your model does not give good results discard that model altogether and start fresh. Recommended split could be 60% training data, 20% validation data and 20% test data.

For our problem we have three datasets available, we will use one as training set, second as validation set and the third as our test set.

# Training Data
dataSetId =  'trainingData1'
ds_training = QuantQuestDataSource(cachedFolderName=cachedFolderName,
                                    dataSetId=dataSetId,
                                    instrumentIds=instrumentIds)

training_data = loadData(ds_training)

# Validation Data
dataSetId =  'trainingData2'
ds_validation = QuantQuestDataSource(cachedFolderName=cachedFolderName,
                                    dataSetId=dataSetId,
                                    instrumentIds=instrumentIds)
validation_data = loadData(ds_validation)

# Test Data
dataSetId =  'trainingData3'
ds_test = QuantQuestDataSource(cachedFolderName=cachedFolderName,
                                    dataSetId=dataSetId,
                                    instrumentIds=instrumentIds)
out_of_sample_test_data = loadData(ds_test)

To each of these, we add the target variable Y, defined as average of next five values of basis

def prepareData(data, period):
    data['Y(Target)'] = data['basis'].rolling(period).mean().shift(-period)
    if 'FairValue' in data.columns:
        del data['FairValue']
    data.dropna(inplace=True)

period = 5
prepareData(training_data, period)
prepareData(validation_data, period)
prepareData(out_of_sample_test_data, period)

Step 4: Feature Engineering

Analyze behavior of your data and Create features that have predictive power

Now comes the real engineering. The golden rule of feature selection is that the predictive power should come from primarily from the features and not from the model. You will find that the choice of features has a far greater impact on performance than the choice of model. Some pointers for feature selection:

Don’t randomly choose a very large set of features without exploring relationship with target variable
Little or no relationship with target variable will likely lead to overfitting
Your features might be highly correlated with each other, in that case a fewer number of features will explain the target just as well
I generally create a few features that make intuitive sense, look at correlation of target variable with those features, as well as their inter correlation to decide what to use
You could also try ranking candidate features according to Maximal Information Coefficient (MIC), performing Principal Component Analysis(PCA) and other methods

Feature Transformation/Normalization:

ML models tend to perform well with normalization. However, normalization is tricky when working with time series data because future range of data is unknown. Your data could fall out of bounds of your normalization leading to model errors. Still you could try to enforce some degree of stationarity:

Scaling: divide features by standard deviation or interquartile range
Centering: subtract historical mean from current value
Normalization: both of the above (x — mean)/stdev over lookback period
Regular normalization: standardize data to the range -1 to +1 over lookback period (x-min)/(max-min) and re-center

Note since we are using historical rolling mean, standard deviation, max or min over lookback period, the same normalized value of feature will mean different actual value at different times. For example, if the current value of feature is 5 with a rolling 30-period mean of 4.5, this will transform to 0.5 after centering. Later if the rolling 30-period mean changes to 3, a value of 3.5 will transform to 0.5. This may be a cause of errors in your model; hence normalization is tricky and you have to figure what actually improves performance of your model(if at all).

If you are using our toolbox, it already comes with a set of pre coded features for you to explore.

For this first iteration in our problem, we create a large number of features, using a mix of parameters. Later we will try to see if can reduce the number of features

def difference(dataDf, period):
    return dataDf.sub(dataDf.shift(period), fill_value=0)

def ewm(dataDf, halflife):
    return dataDf.ewm(halflife=halflife, ignore_na=False,
                      min_periods=0, adjust=True).mean()

def rsi(data, period):
    data_upside = data.sub(data.shift(1), fill_value=0)
    data_downside = data_upside.copy()
    data_downside[data_upside > 0] = 0
    data_upside[data_upside < 0] = 0
    avg_upside = data_upside.rolling(period).mean()
    avg_downside = - data_downside.rolling(period).mean()
    rsi = 100 - (100 * avg_downside / (avg_downside + avg_upside))
    rsi[avg_downside == 0] = 100
    rsi[(avg_downside == 0) & (avg_upside == 0)] = 0

return rsi

def create_features(data):
    basis_X = pd.DataFrame(index = data.index, columns =  [])
    
    basis_X['mom3'] = difference(data['basis'],4)
    basis_X['mom5'] = difference(data['basis'],6)
    basis_X['mom10'] = difference(data['basis'],11)
    
    basis_X['rsi15'] = rsi(data['basis'],15)
    basis_X['rsi10'] = rsi(data['basis'],10)
    
    basis_X['emabasis3'] = ewm(data['basis'],3)
    basis_X['emabasis5'] = ewm(data['basis'],5)
    basis_X['emabasis7'] = ewm(data['basis'],7)
    basis_X['emabasis10'] = ewm(data['basis'],10)

    basis_X['basis'] = data['basis']
    basis_X['vwapbasis'] = data['stockVWAP']-data['futureVWAP']
    
    basis_X['swidth'] = data['stockTopAskPrice'] -
                        data['stockTopBidPrice']
    basis_X['fwidth'] = data['futureTopAskPrice'] -
                        data['futureTopBidPrice']
    
    basis_X['btopask'] = data['stockTopAskPrice'] -
                         data['futureTopAskPrice']
    basis_X['btopbid'] = data['stockTopBidPrice'] -
                         data['futureTopBidPrice']

    basis_X['totalaskvol'] = data['stockTotalAskVol'] -
                             data['futureTotalAskVol']
    basis_X['totalbidvol'] = data['stockTotalBidVol'] -
                             data['futureTotalBidVol']
    
    basis_X['emabasisdi7'] = basis_X['emabasis7'] -
                             basis_X['emabasis5'] + 
                             basis_X['emabasis3']
    
    basis_X = basis_X.fillna(0)
    
    basis_y = data['Y(Target)']
    basis_y.dropna(inplace=True)
    
    print("Any null data in y: %s, X: %s"
            %(basis_y.isnull().values.any(), 
             basis_X.isnull().values.any()))
    print("Length y: %s, X: %s"
            %(len(basis_y.index), len(basis_X.index)))
    
    return basis_X, basis_y

basis_X_train, basis_y_train = create_features(training_data)
basis_X_test, basis_y_test = create_features(validation_data)

Step 5: Model Selection

Choose an appropriate statistical/ML model based on chosen problem

The choice of model will depend on the way the problem is framed. Are you solving a supervised (every point X in feature matrix maps to a target variable Y ) or unsupervised learning problem(there is no given mapping, model tries to learn unknown patterns)? Are you solving a regression (predict the actual price at a future time) or a classification problem (predict only the direction of price(increase/decrease) at a future time).

Supervised v/s unsupervised learning

Regression v/s classification

Some common supervised learning algorithms to get you started are:

I recommend starting with a simple model, for example linear or logistic regression and building up to more sophisticated models from there if needed. Also recommend reading the Math behind the model instead of blindly using it as a black box.

Step 6: Train, Validate and Optimize (Repeat steps 4–6)

Train and Optimize your model using Training and Validation Datasets

Now you’re ready to finally build your model. At this stage, you really just iterate over models and model parameters. Train your model on training data, measure it’s performance on validation data, and go back, optimize, re-train and evaluate again. If you’re unhappy with a model’s performance, try using a different model. You loop over this stage multiple times till you finally have a model that you’re happy with.

Only when you have a model who’s performance you like, proceed to the next step.

For our demo problem, let’s start with a simple linear regression

from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score

def linear_regression(basis_X_train, basis_y_train,
                      basis_X_test,basis_y_test):
    
    regr = linear_model.LinearRegression()
    # Train the model using the training sets
    regr.fit(basis_X_train, basis_y_train)
    # Make predictions using the testing set
    basis_y_pred = regr.predict(basis_X_test)

    # The coefficients
    print('Coefficients: \n', regr.coef_)
    
    # The mean squared error
    print("Mean squared error: %.2f"
          % mean_squared_error(basis_y_test, basis_y_pred))
    
    # Explained variance score: 1 is perfect prediction
    print('Variance score: %.2f' % r2_score(basis_y_test,
                                            basis_y_pred))

    # Plot outputs
    plt.scatter(basis_y_pred, basis_y_test,  color='black')
    plt.plot(basis_y_test, basis_y_test, color='blue', linewidth=3)

    plt.xlabel('Y(actual)')
    plt.ylabel('Y(Predicted)')

    plt.show()
    
    return regr, basis_y_pred

_, basis_y_pred = linear_regression(basis_X_train, basis_y_train, 
                                    basis_X_test,basis_y_test)

Linear Regression with no normalization

('Coefficients: \n', array([ -1.0929e+08, 4.1621e+07, 1.4755e+07, 5.6988e+06, -5.656e+01, -6.18e-04, -8.2541e-05,4.3606e-02, -3.0647e-02, 1.8826e+07, 8.3561e-02, 3.723e-03, -6.2637e-03, 1.8826e+07, 1.8826e+07, 6.4277e-02, 5.7254e-02, 3.3435e-03, 1.6376e-02, -7.3588e-03, -8.1531e-04, -3.9095e-02, 3.1418e-02, 3.3321e-03, -1.3262e-06, -1.3433e+07, 3.5821e+07, 2.6764e+07, -8.0394e+06, -2.2388e+06, -1.7096e+07]))

Mean squared error: 0.02
Variance score: 0.96

Look at the model coeffecients. We can’t really compare them or tell which ones are important since they all belong to different scale. Let’s try normalization to conform them to same scale and also enforce some stationarity.

def normalize(basis_X, basis_y, period):
    basis_X_norm = (basis_X - basis_X.rolling(period).mean())/
                    basis_X.rolling(period).std()
    basis_X_norm.dropna(inplace=True)
    basis_y_norm = (basis_y - 
                    basis_X['basis'].rolling(period).mean())/
                    basis_X['basis'].rolling(period).std()
    basis_y_norm = basis_y_norm[basis_X_norm.index]
    
    return basis_X_norm, basis_y_norm

norm_period = 375
basis_X_norm_test, basis_y_norm_test = normalize(basis_X_test,basis_y_test, norm_period)
basis_X_norm_train, basis_y_norm_train = normalize(basis_X_train, basis_y_train, norm_period)

regr_norm, basis_y_pred = linear_regression(basis_X_norm_train, basis_y_norm_train, basis_X_norm_test, basis_y_norm_test)

basis_y_pred = basis_y_pred * basis_X_test['basis'].rolling(period).std()[basis_y_norm_test.index] + basis_X_test['basis'].rolling(period).mean()[basis_y_norm_test.index]

Linear Regression with normalization

Mean squared error: 0.05
Variance score: 0.90

The model doesn’t improve on the previous model, but it’s not much worse either. And now we can actually compare coefficients to see which ones are actually important.

Let’s look at the coefficients

for i in range(len(basis_X_train.columns)):
    print('%.4f, %s'%(regr_norm.coef_[i], basis_X_train.columns[i]))

19.8727, emabasis4
-9.2015, emabasis5
8.8981, emabasis7
-5.5692, emabasis10
-0.0036, rsi15
-0.0146, rsi10
0.0196, mom10
-0.0035, mom5
-7.9138, basis
0.0062, swidth
0.0117, fwidth
2.0883, btopask
2.0311, btopbid
0.0974, bavgask
0.0611, bavgbid
0.0007, topaskvolratio
0.0113, topbidvolratio
-0.0220, totalaskvolratio
0.0231, totalbidvolratio

We can clearly see that some features have a much higher coeffecient compared to others, and probably have more predictive power.

Let’s also look at correlation between different features.

import seaborn

c = basis_X_train.corr()
plt.figure(figsize=(10,10))
seaborn.heatmap(c, cmap='RdYlGn_r', mask = (np.abs(c) <= 0.8))
plt.show()

Correlation between features

The areas of dark red indicate highly correlated variables. Let’s create/modify some features again and try to improve our model.

For example, I can easily discard features like emabasisdi7 that are just a linear combination of other features

def create_features_again(data):
    basis_X = pd.DataFrame(index = data.index, columns =  [])
    basis_X['mom10'] = difference(data['basis'],11)

    basis_X['emabasis2'] = ewm(data['basis'],2)
    basis_X['emabasis5'] = ewm(data['basis'],5)
    basis_X['emabasis10'] = ewm(data['basis'],10)

    basis_X['basis'] = data['basis']
    basis_X['totalaskvolratio'] = (data['stockTotalAskVol']
                                 - data['futureTotalAskVol'])/
                                   100000
    basis_X['totalbidvolratio'] = (data['stockTotalBidVol']
                                 - data['futureTotalBidVol'])/
                                   100000

    basis_X = basis_X.fillna(0)
    
    basis_y = data['Y(Target)']
    basis_y.dropna(inplace=True)

    return basis_X, basis_y

basis_X_test, basis_y_test = create_features_again(validation_data)
basis_X_train, basis_y_train = create_features_again(training_data)
_, basis_y_pred = linear_regression(basis_X_train, basis_y_train, basis_X_test,basis_y_test)

basis_y_regr = basis_y_pred.copy()

('Coefficients: ', array([ 0.03246139,
0.49780982, -0.22367172,  0.20275786,  0.50758852,
-0.21510795, 0.17153884]))

Mean squared error: 0.02
Variance score: 0.96

See, our model performance does not change, and we only need a few features to explain our target variable. I recommend playing with more features above, trying new combinations etc to see what can improve our model.

We can also try more sophisticated models to see if change of model may improve performance

K Nearest Neighbours

from sklearn import neighbors
n_neighbors = 5

model = neighbors.KNeighborsRegressor(n_neighbors, weights='distance')
model.fit(basis_X_train, basis_y_train)
basis_y_pred = model.predict(basis_X_test)
basis_y_knn = basis_y_pred.copy()

SVR

from sklearn.svm import SVR

model = SVR(kernel='rbf', C=1e3, gamma=0.1)

model.fit(basis_X_train, basis_y_train)
basis_y_pred = model.predict(basis_X_test)
basis_y_svr = basis_y_pred.copy()

Decision Trees

model=ensemble.ExtraTreesRegressor()
model.fit(basis_X_train, basis_y_train)
basis_y_pred = model.predict(basis_X_test)
basis_y_trees = basis_y_pred.copy()

Step 7: Backtest on Test Data

Check for performance of Real Out of Sample Data

Backtest performance on (yet untouched) Test Dataset

This is the moment of truth. We run our final, optimized model from last step on that Test Data that we had kept aside at the start and did not touch yet.

This provides you with realistic expectation of how your model is expected to perform on new and unseen data when you start trading live. Hence, it is necessary to ensure you have a clean dataset that you haven’t used to train or validate your model.

If you don’t like the results of your backtest on test data, discard the model and start again. DO NOT go back and re-optimize your model, this will lead to over fitting! (Also recommend to create a new test data set, since this one is now tainted; in discarding a model, we implicitly know something about the dataset).

For backtesting, we use Auquan’s Toolbox

import backtester
from backtester.features.feature import Feature
from backtester.trading_system import TradingSystem
from backtester.sample_scripts.fair_value_params import FairValueTradingParams

class Problem1Solver():

def getTrainingDataSet(self):
        return "trainingData1"

def getSymbolsToTrade(self):
        return ['MQK']

def getCustomFeatures(self):
        return {'my_custom_feature': MyCustomFeature}

def getFeatureConfigDicts(self):
                            
        expma5dic = {'featureKey': 'emabasis5',
                 'featureId': 'exponential_moving_average',
                 'params': {'period': 5,
                              'featureName': 'basis'}}
        expma10dic = {'featureKey': 'emabasis10',
                 'featureId': 'exponential_moving_average',
                 'params': {'period': 10,
                              'featureName': 'basis'}}                     
        expma2dic = {'featureKey': 'emabasis3',
                 'featureId': 'exponential_moving_average',
                 'params': {'period': 3,
                              'featureName': 'basis'}}
        mom10dic = {'featureKey': 'mom10',
                 'featureId': 'difference',
                 'params': {'period': 11,
                              'featureName': 'basis'}}
        
        return [expma5dic,expma2dic,expma10dic,mom10dic]    
    
    def getFairValue(self, updateNum, time, instrumentManager):
        # holder for all the instrument features
        lbInstF = instrumentManager.getlookbackInstrumentFeatures()
        mom10 = lbInstF.getFeatureDf('mom10').iloc[-1]
        emabasis2 = lbInstF.getFeatureDf('emabasis2').iloc[-1]
        emabasis5 = lbInstF.getFeatureDf('emabasis5').iloc[-1]
        emabasis10 = lbInstF.getFeatureDf('emabasis10').iloc[-1] 
        basis = lbInstF.getFeatureDf('basis').iloc[-1]
        totalaskvol = lbInstF.getFeatureDf('stockTotalAskVol').iloc[-1] - lbInstF.getFeatureDf('futureTotalAskVol').iloc[-1]
        totalbidvol = lbInstF.getFeatureDf('stockTotalBidVol').iloc[-1] - lbInstF.getFeatureDf('futureTotalBidVol').iloc[-1]
        
        coeff = [ 0.03249183, 0.49675487, -0.22289464, 0.2025182, 0.5080227, -0.21557005, 0.17128488]
        newdf['MQK'] = coeff[0] * mom10['MQK'] + coeff[1] * emabasis2['MQK'] +\
                      coeff[2] * emabasis5['MQK'] + coeff[3] * emabasis10['MQK'] +\
                      coeff[4] * basis['MQK'] + coeff[5] * totalaskvol['MQK']+\
                      coeff[6] * totalbidvol['MQK']
                    
        newdf.fillna(emabasis5,inplace=True)
        return newdf

problem1Solver = Problem1Solver()
tsParams = FairValueTradingParams(problem1Solver)
tradingSystem = TradingSystem(tsParams)
tradingSystem.startTrading(onlyAnalyze=False, 
                           shouldPlot=True,
                           makeInstrumentCsvs=False)

Backtest Results, Pnl in USD (Pnl doesn’t account for transaction costs and other fees)

Step 8: Other ways to improve model

Rolling Validation, Ensemble Learning, Bagging, Boosting

Besides collecting more data, creating better features or trying more models, there’s a few things you can try to train your model better.

1. Rolling Validation

Rolling Validation

Market conditions rarely stay same. Let’s say you have data for a year and you use Jan-August to train and Sep-Dec to test your model, you might end up training over a very specific set of market conditions. Maybe there was no market volatility for first half of the year and some extreme news caused markets to move a lot in September, your model will not learn this pattern and give you junk results.

It might be better to try a walk forward rolling validation — train over Jan-Feb, validate over March, re-train over Apr-May, validate over June and so on.

2. Ensemble Learning

Ensemble Learning

Some models may work well in prediction certain scenarios and other in prediction other scenarios. Or a model may be extremely overfitting in a certain scenario. One way of reducing error and overfitting both is to use an ensemble of different model. Your prediction is the average of predictions made by many model, with errors from different models likely getting cancelled out or reduced. Some common ensemble methods are Bagging and Boosting.

Bagging

Boosting

To keep this post short, I will skip these methods, but you can read more about them here.

Let’s try an ensemble method for our problem

basis_y_pred_ensemble = (basis_y_trees + basis_y_svr +
                         basis_y_knn + basis_y_regr)/4

Mean squared error: 0.02
Variance score: 0.95

All the code for the above steps is available in this IPython notebook. You can read more below:

https://medium.com/media/38233cc1928120274e3a6b170c89d320/href

That was quite a lot of information. Let’s do a quick Recap:

Frame your problem
Collect reliable Data and clean Data
Split Data into Training, Validation and Test sets
Create Features and Analyze Behavior
Choose an appropriate training model based on Behavior
Use Training Data to train your model to make predictions
Check performance on validation set and re-optimize
Verify final performance on Test Set

Phew! But that’s not it. You only have a solid prediction model now. Remember what we actually wanted from our strategy? You still have to:

Develop Signal to identify trade direction based on prediction model
Develop Strategy to identify Entry/Exit Points
Execution System to identify Sizing and Price

And then you can finally send this order to your broker, and make your automated trade!

Important Note on Transaction Costs: Why are the next steps important? Your model tells you when your chosen asset is a buy or sell. It however doesn’t take into account fees/transaction costs/available trading volumes/stops etc. Transaction costs very often turn profitable trades into losers. For example, an asset with an expected $0.05 increase in price is a buy, but if you have to pay $0.10 to make this trade, you will end up with a net loss of -$0.05. Our own great looking profit chart above actually looks like this after you account for broker commissions, exchange fees and spreads:

Backtest Results after transaction fees and spreads, Pnl in USD

Transaction fees and spreads take up more than 90% of our Pnl! We will discuss these in detail in a follow-up post.

Finally, let’s look at some common pitfalls.

DO’s and DONT’s

AVOID OVERFITTING AT ALL COSTS!
Don’t retrain after every datapoint: This was a common mistake people made in QuantQuest. If your model needs re-training after every datapoint, it’s probably not a very good model. That said, it will need to be retrained periodically, just at a reasonable frequency (example retraining at the end of every week if making intraday predictions)
Avoid biases, especially lookahead bias: This is another reason why models don’t work — Make sure you are not using any information from the future. Mostly this means, don’t use the target variable, Y as a feature in your model. This is available to you during a backtest but won’t be available when you run your model live, making your model useless.
Be wary of data mining bias: Since we are trying a bunch of models on our data to see if anything fits, without an inherent reason behind it fits, make sure you run rigorous tests to separate random patterns from real patterns which are likely to occur in the future. For example what might seem like an upward trending pattern explained well by a linear regression may turn out to be a small part of a larger random walk!

Avoid Overfitting

This is so important, I feel the need to mention it again.

Overfitting is the most dangerous pitfall of a trading strategy
A complex algorithm may perform wonderfully on a backtest but fails miserably on new unseen data —this algorithm has not really uncovered any trend in data and no real predictive power. It is just fit very well to the data it has seen
Keep your systems as simple as possible. If you find yourself needing a large number of complex features to explain your data, you are likely over fitting
Divide your available data into training and test data and always validate performance on Real Out of Sample data before using your model to trade live.

https://medium.com/media/a66b79b7cfcbefb0da5e60a7acf71973/href

Webinar Video: If you prefer listening to reading and would like to see a video version of this post, you can watch this webinar link instead.

Application of Machine Learning Techniques to Trading was originally published in auquan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Time Series Analysis for Financial Data III— Moving Average Models

Auquan — Wed, 13 Sep 2017 04:51:02 GMT

Download IPython Notebook here.

In the second post in this series, we talked about Auto-Regressive Models — models which only depend on past data of the system. We saw that these models only partially explained the log-returns of stock prices.

We turn to another model, the Moving Average model to see if they perform better on our data.

Moving Average Models

MA(q) models are very similar to AR(p) models. MA(q) model is a linear combination of past error terms as opposed to a linear combination of past observations like the AR(p) model. The motivation for the MA model is that we can explain “shocks” in the error process directly by fitting a model to the error terms. The first order model, MA(1) is:

x(t) = b*e(t-1) + e(t)

where b is the coefficient and e is the error term.

https://medium.com/media/608b024f44561493dfd5429486329043/href

Time Series Analysis for Financial Data III— Moving Average Models was originally published in auquan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Time Series Analysis for Financial Data II — Auto-Regressive Models

Auquan — Wed, 13 Sep 2017 04:11:50 GMT

Download the iPython notebook here

In the first post on Time Series Analysis, we talked about the basics of time series analysis - Staionarity and AutoCorrelation. We also talked about simple time series models, White Noise and Random Walks.

In this post, we take the concept forward and introduce a more sophisticated time series model, namely Auto Regressive(AR) model.

AutoRegressive Models

The autoregressive model is simply an extension of the random walk. It is a regression model which depends linearly on the previous terms. An order 1 regression model, AR(1) is:

x(t) = a*x(t-1) + w

where a is the auto-regressive coefficient and w is the white noise term. In simple words, the current value only depends on the previous value of the system. Note that an AR(1) model with a set equal to 1 is a random walk!

https://medium.com/media/a6040958fbc10be68994c67fb11ce144/href

In the next post, we will talk about another class of models, Moving Average models (not to be confused with simple moving average — a rolling measure of average).

Time Series Analysis for Financial Data II — Auto-Regressive Models was originally published in auquan on Medium, where people are continuing the conversation by highlighting and responding to this story.

Measuring Momentum for Momentum Models: Simple Trading Strategies Part 3

Auquan — Tue, 05 Sep 2017 06:16:00 GMT

In this series, we cover some basic trading strategies that can help you get started with developing your own automated trading systems.

Download IPython Notebook here.

In our previous post, we talked about the concept of Momentum: extrapolating from existing trends. Momentum strategies assume that stocks which are going up will continue to go up and stocks which are going down will continue going down, and buy and sell accordingly.

The obvious question is, how do we determine or measure this momentum? In this post, we will talk about different ways to measure momentum.

You will notice that some of the signals used for momentum trading could well be used as inverse signals in a mean reversion system as well. This should be expected, momentum and mean reversion are opposite strategies. This is also why it is very important to check for which kind of behavior might be present in your data before you actually try developing a strategy based on that.

https://medium.com/media/c2d0ac3e3cde009e5448fd1078c46eb5/href

Measuring Momentum for Momentum Models: Simple Trading Strategies Part 3 was originally published in auquan on Medium, where people are continuing the conversation by highlighting and responding to this story.