Rolling sum based on all previous dates NOT previous rows sorted by date

Question

Given the following dataframe:

+------------+--------+
|    Date    | Amount |
+------------+--------+
| 01/05/2019 |     15 |
| 27/05/2019 |     20 |
| 27/05/2019 |     15 |
| 25/06/2019 |     10 |
| 29/06/2019 |     25 |
| 01/07/2019 |     50 |
+------------+--------+

I need to get the rolling sum of all previous dates as follows:

+------------+--------+
|    Date    | Amount |
+------------+--------+
| 01/05/2019 | NaN    |
| 27/05/2019 | 15     |
| 27/05/2019 | 15     |
| 15/06/2019 | 35     |
| 29/06/2019 | 10     |
| 01/07/2019 | 35     |
+------------+--------+

Using:

df = pd.DataFrame(
    {
        'Date': {
            0: datetime.datetime(2019, 5, 1),
            1: datetime.datetime(2019, 5, 27),
            2: datetime.datetime(2019, 5, 27),
            3: datetime.datetime(2019, 6, 15),
            4: datetime.datetime(2019, 6, 29),
            5: datetime.datetime(2019, 7, 1),
        },
        'Amount': {0: 15, 1: 20, 2: 15, 3: 10, 4: 25, 5: 50}
    }
)
df.sort_values("Date", inplace=True)
df_roll = df.rolling("28d", on="Date", closed="left").sum()

Gets me:

+------------+--------+
|    Date    | Amount |
+------------+--------+
| 01/05/2019 |    NaN |
| 27/05/2019 |     15 | 
| 27/05/2019 |     35 | <-- Should be 15
| 15/06/2019 |     35 |
| 29/06/2019 |     10 |
| 01/07/2019 |     35 |
+------------+--------+

Which isn't quite correct.

How would I get the sum of all previous dates rather than all previous rows?

Jossy · Accepted Answer · 2021-12-21 08:37:15Z

2

You can do

df['new'] = df.Date.map(df.groupby('Date').Amount.sum().rolling("28d", closed="left").sum())
df
        Date  Amount   new
0 2019-05-01      15   NaN
1 2019-05-27      20  15.0
2 2019-05-27      15  15.0
3 2019-06-15      10  35.0
4 2019-06-29      25  10.0
5 2019-07-01      50  35.0

edited Dec 21, 2021 at 8:37

Jossy

1,1214 gold badges25 silver badges59 bronze badges

answered Dec 20, 2021 at 1:55

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Jossy Over a year ago

Thanks for this - how would I incorporate the 28 day rolling period?

BENY Over a year ago

@Jossy df.groupby('Date').Amount.sum().roling(....) here

Jossy Over a year ago

Tried df['new'] = df.Date.map(df.groupby('Date').Amount.sum().rolling("28d", on="Date")) but getting error ValueError: invalid on specified as Date, must be a column (of DataFrame), an Index or None

BENY Over a year ago

@Jossydf['new'] = df.Date.map(df.groupby('Date').Amount.sum().rolling("28d"))

Jossy Over a year ago

Sadly that gives another error :( TypeError: 'Rolling' object is not callable

|

cookesd · Accepted Answer · 2021-12-20 01:49:50Z

One way is to aggregate your amounts by date first, then compute the rolling sum, and join this sum to the original list of dates to apply the rolling sum to all dates

# Aggregate (sum) by date
df_agged = (df.groupby('Date')['Amount'].agg(['sum'])
            .reset_index()
            .rename(columns={'sum':'Amount'}))
# Compute rolling sum
df_agged_rolling = df_agged.rolling("28d",on="Date",closed='left').sum()

# Join on original dates to apply rolling sum to duplicate dates
df_with_rolling_agg = df.join(df_agged_rolling.set_index('Date'),on='Date',
                              lsuffix='_orig',rsuffix='_rolling_sum')
df_with_rolling_agg

#         Date  Amount_orig  Amount_rolling_sum
# 0 2019-05-01           15                 NaN
# 1 2019-05-27           20                15.0
# 2 2019-05-27           15                15.0
# 3 2019-06-15           10                35.0
# 4 2019-06-29           25                10.0
# 5 2019-07-01           50                35.0

user17242583 · Accepted Answer · 2021-12-20 01:54:23Z

1

You could drop duplicate dates first, then do a rolling sum, then forward fill the resulting NaNs (occasioned by the duplicate removal):

df = df.assign(Amount=df.drop_duplicates(subset=['Date']).rolling("28d", on="Date", closed="left")['Amount'].sum()).ffill()

Output:

>>> df
        Date  Amount
0 2019-05-01     NaN
1 2019-05-27    15.0
2 2019-05-27    15.0
3 2019-06-15    20.0
4 2019-06-29    10.0
5 2019-07-01    35.0

answered Dec 20, 2021 at 1:54

user17242583

Collectives™ on Stack Overflow

Rolling sum based on all previous dates NOT previous rows sorted by date

3 Answers 3

8 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related