5

After creating DataFrame with some duplicated cell values in the column Name:

import pandas as pd
df = pd.DataFrame({'Name': ['Will','John','John','John','Alex'],
                   'Payment':  [15, 10, 10, 10, 15],
                   'Duration':    [30, 15, 15, 15, 20]})

enter image description here

I would like to proceed by creating another DataFrame where the duplicated values in Name column are consolidated leaving no duplicates. At the same time I want to sum the payments values John made. I proceed with:

df_sum = df.groupby('Name', axis=0).sum().reset_index()

enter image description here

But since df.groupby('Name', axis=0).sum() command applies the sum function to every column in DataFrame the Duration (of the visit in minutes) column is processed as well. Instead I would like to get an average values for the Duration column. So I would need to use mean() method, like so:

df_mean = df.groupby('Name', axis=0).mean().reset_index()

enter image description here

But with mean() function the column Payment is now showing the average payment values John made and not the sum of all the payments.

How to create a DataFrame where Duration values show the average values while the Payment values show the sum?

3
  • Please don't use embedded images; use text instead. Images can't be copied and pasted into a console, which means you're asking anyone who wants to match your example to type it in manually. (See here for more.) Commented Sep 3, 2016 at 17:40
  • @DSM although to be fair - their very first code block has the code to create the initial DataFrame :) (as well as the commands issued to create the results...) Commented Sep 3, 2016 at 17:43
  • @NinjaPuppy: that's only one of the many reasons not to embed images (see the Meta post I linked, e.g.) Commented Sep 3, 2016 at 17:47

1 Answer 1

10

You can apply different functions to different columns with groupby.agg:

df.groupby('Name').agg({'Duration': 'mean', 'Payment': 'sum'})
Out: 
      Payment  Duration
Name                   
Alex       15        20
John       30        15
Will       15        30
Sign up to request clarification or add additional context in comments.

1 Comment

Great answer. A less elegant approach would be to split the dataframe, apply the two functions and then combine them together: pd.concat([df.loc[:,['Duration','Name']].groupby('Name').mean(),df.loc[:,['Payment','Name']].groupby('Name').sum()], axis=1)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.