Recursion: account value with distributions

Question

Update: not sure if this is possible without some form of a loop, but np.where will not work here. If the answer is, "you can't", then so be it. If it can be done, it may use something from scipy.signal.

I'd like to vectorize the loop in the code below, but unsure as to how, due to the recursive nature of the output.

Walk-though of my current setup:

Take a starting amount ($1 million) and a quarterly dollar distribution ($5,000):

dist = 5000.
v0 = float(1e6)

Generate some random security/account returns (decimal form) at monthly freq:

r = pd.Series(np.random.rand(12) * .01,
              index=pd.date_range('2017', freq='M', periods=12))

Create an empty Series that will hold the monthly account values:

value = pd.Series(np.empty_like(r), index=r.index)

Add a "start month" to value. This label will contain v0.

from pandas.tseries import offsets
value = (value.append(Series(v0, index=[value.index[0] - offsets.MonthEnd(1)]))
              .sort_index())

The loop I'd like to get rid of is here:

for date in value.index[1:]:
    if date.is_quarter_end:
        value.loc[date] = value.loc[date - offsets.MonthEnd(1)] \
                        * (1 + r.loc[date]) - dist
    else:
        value.loc[date] = value.loc[date - offsets.MonthEnd(1)] \
                        * (1 + r.loc[date])

Combined code:

import pandas as pd
from pandas.tseries import offsets
from pandas import Series
import numpy as np

dist = 5000.
v0 = float(1e6)
r = pd.Series(np.random.rand(12) * .01, index=pd.date_range('2017', freq='M', periods=12))
value = pd.Series(np.empty_like(r), index=r.index)
value = (value.append(Series(v0, index=[value.index[0] - offsets.MonthEnd(1)])).sort_index())
for date in value.index[1:]:
    if date.is_quarter_end:
        value.loc[date] = value.loc[date - offsets.MonthEnd(1)] * (1 + r.loc[date]) - dist
    else:
        value.loc[date] = value.loc[date - offsets.MonthEnd(1)] * (1 + r.loc[date])

In psuedocode, what is loop is doing is just:

for each date in index of value:
    if the date is not a quarter end:
        multiply previous value by (1 + r) for that month
    if the date is a quarter end:
        multiply previous value by (1 + r) for that month and subtract dist

The issue is, I don't currently see how vectorization is possible since the successive value depends on whether or not a distribution was taken in the month prior. I get to the desired result, but pretty inefficiently for higher frequency data or larger time periods.

Don't use floats for money. Ever. (Unless in your model, it's a purely theoretical construct and the resulting sums don't have to match) — ivan_pozdeev
– ivan_pozdeev, Commented Aug 24, 2017 at 15:05
LOL. This remark made my day =) (I just imagined your boss' face when they would hear that) — ivan_pozdeev
– ivan_pozdeev, Commented Aug 24, 2017 at 15:23

JuniorCompressor · Accepted Answer · 2017-08-24 22:36:26Z

9

+100

You could use the following code:

cum_r = (1 + r).cumprod()
result = cum_r * v0
for date in r.index[r.index.is_quarter_end]:
     result[date:] -= cum_r[date:] * (dist / cum_r.loc[date])

You would make:

1 cumulative product for all monthly returns.
1 vector multiplication with scalarv0
n vector multiplication with scalar dist / cum_r.loc[date]
n vector subtractions

where n is the number of quarter ends.

Based on this code we can optimize further:

cum_r = (1 + r).cumprod()
t = (r.index.is_quarter_end / cum_r).cumsum()
result = cum_r * (v0 - dist * t)

which is

1 cumulative product (1 + r).cumprod()
1 division between two series r.index.is_quarter_end / cum_r
1 cumulative sum of the above division
1 multiplication of the above sum with scalar dist
1 subtraction of scalar v0 with dist * t
1 dotwise multiplication of cum_r with v0 - dist * t

edited Aug 24, 2017 at 22:36

answered Aug 24, 2017 at 22:27

JuniorCompressor

20k4 gold badges34 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Brad Solomon Over a year ago

To apply to DataFrames rather than Series you can use r.index.is_quarter_end.reshape((-1,1))

Brad Solomon Over a year ago

yes it does. my point is that .reshape((-1,1)) is needed if r is a DataFrame rather than Series. But I didn't specify that in my question and your response is already on the money

JuniorCompressor Over a year ago

Ah ok. Thanks for the tip!

mortysporty · Accepted Answer · 2017-08-25 13:18:24Z

6

Ok... I'm taking a stab at this.

import numpy as np 
import pandas as pd

#Define a generator for accumulating deposits and returns
def gen(lst):
    acu = 0
    for r, v in lst:
        yield acu * (1 + r) +v
        acu *= (1 + r)
        acu += v


dist = 5000.
v0 = float(1e6)
random_returns = np.random.rand(12) * 0.1

#Create the index. 
index=pd.date_range('2016-12-31', freq='M', periods=13)
#Generate a return so that the value at i equals the return from i-1 to i
r = pd.Series(np.insert(random_returns, 0,0), index=index, name='Return')
#Generate series with deposits and withdrawals
w = [-dist if is_q_end else 0 for is_q_end in index [1:].is_quarter_end]
d = pd.Series(np.insert(w, 0, v0), index=index, name='Movements')

df = pd.concat([r, d], axis=1)
df['Value'] = list(gen(zip(df['Return'], df['Movements'])))

now, your code

#Generate some random security/account returns (decimal form) at monthly freq:
r = pd.Series(random_returns,
          index=pd.date_range('2017', freq='M', periods=12))
#Create an empty Series that will hold the monthly account values:
value = pd.Series(np.empty_like(r), index=r.index)
#Add a "start month" to value. This label will contain v0.
from pandas.tseries import offsets
value = (value.append(pd.Series(v0, index=[value.index[0] - offsets.MonthEnd(1)])).sort_index())
#The loop I'd like to get rid of is here:

def loopy(value) :
    for date in value.index[1:]:
        if date.is_quarter_end:
            value.loc[date] = value.loc[date - offsets.MonthEnd(1)] \
                           * (1 + r.loc[date]) - dist
        else:
           value.loc[date] = value.loc[date - offsets.MonthEnd(1)] \
                           * (1 + r.loc[date]) 

   return value

and comparing and timing

(loopy(value)==list(gen(zip(r, d)))).all()
Out[11]: True

returns same result

%timeit list(gen(zip(r, d)))
%timeit loopy(value)
10000 loops, best of 3: 72.4 µs per loop
100 loops, best of 3: 5.37 ms per loop

and appears to be somewhat faster. Hope it helps.

edited Aug 25, 2017 at 13:18

answered Aug 24, 2017 at 21:35

mortysporty

2,9218 gold badges36 silver badges57 bronze badges

2 Comments

Brad Solomon Over a year ago

This looks to be the faster solution for Series input, but having trouble applying it to a DataFrame

mortysporty Over a year ago

Hi. Great. Edited my response to show how I would do it (assuming I understand your issue correctly). For some reason the execution slows down quite a bit when it is assigned to the dataframe. Perhaps it is faster to store an intermediate list?

Collectives™ on Stack Overflow

Recursion: account value with distributions

2 Answers 2

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related