OpenSourceQuant

In this post on bettersystemtrader.com, Andrew Swanscott interviews Kevin Davey from KJ Trading Systems who discusses why looking at your back-test historical equity curve alone might not give you a true sense of a strategy’s risk profile. Kevin Davey also writes on the topic here for futuresmag.com. So i wrote a Monte Carlo-type simulation function (in R) to see graphically how my back-test results would have looked had the order of returns been randomized. I use ‘n’ to denote the number of simulations i would like to do, and exclude the highest and lowest 5% of return streams to form the blue ribbon. For the middle red ribbon i use the middle 50% of return streams (ie. excluding the 1st and last quartile). The second and last parameter (default TRUE) is a boolean which if TRUE will run the sampling procedure in R with replacement, and without replacement if set to FALSE. See this link if you need an intuitive explanation of what sampling with and without replacement means. Below is an example of the output for 10k simulations, with and without replacement for a random set of returns on a hypothetical strategy spanning 7 months. The black line is the actual historical equity curve. Whether or not the return distributions satisfy your risk appetite will come down to a more refined drawdown analysis together with the host of other risk metrics available for consideration. At least with the function below you have the stats in a dataframe and a graph to start…the starting point for any statistical analysis.

mcsimr_without_rep mcsimr_with_rep

I read the data into R using the read.csv function. You can use the same data here (simSample.xlsx), just be sure to change the file type to CSV first, as WordPress only allows for storing .xls and .xlsx Excel files. Below is the source code to add the function to your environment. I add the required packages then create the mcsimr() function to do the import, manipulate the data and build the graph using ggplot. Remember to set your working directory as the directory in which you have stored the simSample.csv file.

If you would like to run the simulation 1,000 times without replacement, then the function should read:

mcsimr(1000, FALSE)

If you would like to run with replacement, include TRUE for the second argument or leave it blank, as the default is TRUE.

mcsimr(1000)

Ok, below is the full source code

	# Record script start time —————————————-
	t1 <- Sys.time()

	# Load required packages ——————————————
	library(quantmod)
	library(TTR)
	library(PerformanceAnalytics)
	library(ggplot2)
	library(timeSeries)

	# Build the function ———————————————-
	mcsimr <- function(n, b = TRUE){
	# Read price data and build xts object
	data <- read.csv("yourdirectory/simSample.csv", header = TRUE, stringsAsFactors=F)
	s1.dates <- as.Date(data[,2], format="%d-%m-%Y") #beware the formatting may need adjusting for your excel settings
	s1 <- xts(data[,3], s1.dates)

	# Calculate ROC
	ret <- ROC(s1[,1])
	# Chart cum returns
	chart.CumReturns(ret)

	# Set up for Sample() and Replicate()
	ret_sample <- replicate(n,sample(as.vector(ret[-1,]), replace=b)) #use ret[-1] so we exclude 1st NA value from ROC calc
	ret_cum_sample <- apply(ret_sample, 2, function(x) cumsum(x))
	ret_cum_samplexts <- xts(ret_cum_sample, s1.dates[-1]) #use s1.dates[-1] so that length of dates is identical to length of ret_sample

	# Build the 5% and 95% quantile datasets
	ret_5 <- apply(ret_cum_samplexts, 1, function(x) quantile(x, .05))
	ret_5 <- as.xts(ret_5)

	ret_95 <- apply(ret_cum_samplexts, 1, function(x) quantile(x, .95))
	ret_95 <- as.xts(ret_95)

	ret_25 <- apply(ret_cum_samplexts, 1, function(x) quantile(x, .25))
	ret_25 <- as.xts(ret_25)

	ret_75 <- apply(ret_cum_samplexts, 1, function(x) quantile(x, .75))
	ret_75 <- as.xts(ret_75)

	charts <- merge(ret_5, ret_95, ret_25, ret_75)

	# Draw the graph with a ribbon
	h <- ggplot(charts, aes(x = index(charts))) +
	geom_ribbon(aes(ymin = ret_25, ymax = ret_75, colour = "50%"), alpha = 0.3, fill = "red3") +
	geom_ribbon(aes(ymin = ret_5, ymax = ret_95, colour = "90%"), alpha = 0.3, fill = "cornflowerblue") +
	theme(axis.text.x = element_text(angle=0, hjust = 0),
	axis.title = element_text(face = 'bold', size = 14),
	title = element_text(face = 'bold', size = 16),
	legend.position = 'bottom',
	legend.title = element_blank(),
	legend.text = element_text(size = 12),
	legend.key.width = unit(2, 'cm'))
	h <- h + geom_line(aes(y = cumsum(ret[-1,])), colour = "black", linetype = 1) +
	ylab(label="Cumulative Returns") +
	xlab(label="Time") +
	ggtitle("Returns Distribution")
	h

	return(h)
	}
	# End of function ————————————————-

	# Run function —————————————————-
	mcsimr(1000)

	# Record and print time to run script —————————–
	t2 <- Sys.time()
	difftime(t2,t1)

view raw

mcsimr.R

hosted with ❤ by GitHub

I am aiming to have the function code on github soon [EDIT: The above is hosted (with love) by GitHub, however enhancements are still a work in progress], hopefully with some enhancements allowing the user to specify an xts object for the raw data (ie. backtest results) as well as letting the user decide which percentages to use for the bands. Some drawdown stats would be useful too. I have built something which incorporates a drawdown analysis of every ‘n’ return stream, however, it needs some more work to speed it up as you can imagine 10k xts objects with 500+ of daily return data could take some time to analyse individually.

Lastly, i wonder if something like this could fit into the QuantStrat or TradeAnalytics packages…or if there already is something similar do let me know. [EDIT: Since publishing this post Brian Peterson (maintainer of multiple Finance R packages like ReturnAnalytics, quantmod and Quanstrat) got in touch and asked if i would work with him in getting this function into the blotter package, a dependency of Quantstrat. Brian has also suggested a few improvements to the function including using Block Bootstrap to account for autocorrelation and plot.xts as opposed to ggplot for faster graphical rendering. So look out for an updated post on that.]

Happy hacking!

I recently came across a question that required logic and coding skills to solve. It went a little something like this:

Let there be a stick of length 1. Pick 2 points uniformly at random along the stick, and break the stick at those points. What is the probability of the three resulting pieces being able to form a triangle?

Interesting i thought. I do enjoy these types of challenges, and had fun trying to solve this problem. It took me a while to understand that the longest side (or one of the longest sides in the case of an isosceles triangle) had to be smaller than the sum of the 2 smaller sides. In pseudo code a triangle would be possible IF:

Length of longest side < sum of other 2 sides, which is true IF

Length of longest side < 0.5 (half the length of the stick)

When i saw (in my mind’s eye) me breaking a stick in half and folding it so that both pieces overlap, it became obvious that i needed one of the pieces to be longer and then break that piece into 2 pieces of any length to form a would-be triangle. As a side note, i have been hoping to create my first gif for a while and this might be a useful one to animate a stick being broken into half, folded over and then one half growing in length and then breaking into 2 and forming a triangle.

I turned to the statistical computing language R to solve my problem, knowing i could randomly draw values from a uniform distribution with the runif() function.

Continue reading “How i used 18 lines of R code to crack the Triangle Challenge” →

Create an R Package in Rstudio

A Monte Carlo Simulation function for your back-test results – in R

How i used 18 lines of R code to crack the Triangle Challenge

Share this:

Share this:

Share this: