How to sum and to mean one DataFrame to create another DataFrame

Question

After creating DataFrame with some duplicated cell values in the column Name:

import pandas as pd
df = pd.DataFrame({'Name': ['Will','John','John','John','Alex'],
                   'Payment':  [15, 10, 10, 10, 15],
                   'Duration':    [30, 15, 15, 15, 20]})

I would like to proceed by creating another DataFrame where the duplicated values in Name column are consolidated leaving no duplicates. At the same time I want to sum the payments values John made. I proceed with:

df_sum = df.groupby('Name', axis=0).sum().reset_index()

But since df.groupby('Name', axis=0).sum() command applies the sum function to every column in DataFrame the Duration (of the visit in minutes) column is processed as well. Instead I would like to get an average values for the Duration column. So I would need to use mean() method, like so:

df_mean = df.groupby('Name', axis=0).mean().reset_index()

But with mean() function the column Payment is now showing the average payment values John made and not the sum of all the payments.

How to create a DataFrame where Duration values show the average values while the Payment values show the sum?

Please don't use embedded images; use text instead. Images can't be copied and pasted into a console, which means you're asking anyone who wants to match your example to type it in manually. (See here for more.) — DSM
– DSM, Commented Sep 3, 2016 at 17:40
@DSM although to be fair - their very first code block has the code to create the initial DataFrame :) (as well as the commands issued to create the results...) — Jon Clements
– Jon Clements, Commented Sep 3, 2016 at 17:43
@NinjaPuppy: that's only one of the many reasons not to embed images (see the Meta post I linked, e.g.) — DSM
– DSM, Commented Sep 3, 2016 at 17:47

user2285236 · Accepted Answer · 2016-09-03 17:19:47Z

10

You can apply different functions to different columns with groupby.agg:

df.groupby('Name').agg({'Duration': 'mean', 'Payment': 'sum'})
Out: 
      Payment  Duration
Name                   
Alex       15        20
John       30        15
Will       15        30

answered Sep 3, 2016 at 17:19

user2285236

Sign up to request clarification or add additional context in comments.

1 Comment

adabsurdum Over a year ago

Great answer. A less elegant approach would be to split the dataframe, apply the two functions and then combine them together: pd.concat([df.loc[:,['Duration','Name']].groupby('Name').mean(),df.loc[:,['Payment','Name']].groupby('Name').sum()], axis=1)

Collectives™ on Stack Overflow

How to sum and to mean one DataFrame to create another DataFrame

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest