After creating DataFrame with some duplicated cell values in the column Name:
import pandas as pd
df = pd.DataFrame({'Name': ['Will','John','John','John','Alex'],
'Payment': [15, 10, 10, 10, 15],
'Duration': [30, 15, 15, 15, 20]})
I would like to proceed by creating another DataFrame where the duplicated values in Name column are consolidated leaving no duplicates. At the same time I want to sum the payments values John made. I proceed with:
df_sum = df.groupby('Name', axis=0).sum().reset_index()
But since df.groupby('Name', axis=0).sum() command applies the sum function to every column in DataFrame the Duration (of the visit in minutes) column is processed as well. Instead I would like to get an average values for the Duration column. So I would need to use mean() method, like so:
df_mean = df.groupby('Name', axis=0).mean().reset_index()
But with mean() function the column Payment is now showing the average payment values John made and not the sum of all the payments.
How to create a DataFrame where Duration values show the average values while the Payment values show the sum?


