11

I'm working with a DataFrame having the following structure:

import pandas as pd

df = pd.DataFrame({'group':[1,1,1,2,2,2,2,3,3,3],
                   'brand':['A','B','X','C','D','X','X','E','F','X']})

print(df)

   group brand
0      1     A
1      1     B
2      1     X
3      2     C
4      2     D
5      2     X
6      2     X
7      3     E
8      3     F
9      3     X

My goal is to view only the groups having exactly one brand X associated to them. Since group number 2 has two observations equal to brand X, it should be filtered out from the resulting DataFrame.

The output should look like this:

   group brand
0      1     A
1      1     B
2      1     X
3      3     E
4      3     F
5      3     X

I know I should do a groupby on the group column and then filter those groups having a count of X different than 1. The filtering part is where I struggle. Any help would be appreciated.

4 Answers 4

10

Use series.eq to check if brand is equal to X , then groupby and transform sum and filter groups in which X count is equal to 1:

df[df['brand'].eq('X').groupby(df['group']).transform('sum').eq(1)]

   group brand
0      1     A
1      1     B
2      1     X
7      3     E
8      3     F
9      3     X
Sign up to request clarification or add additional context in comments.

Comments

9

This should work as well

df[df.groupby(['group'])['brand'].transform('sum').str.count('X').eq(1)]

Output

 group  brand
0   1   A
1   1   B
2   1   X
7   3   E
8   3   F
9   3   X

Comments

7

Groupby column and apply a simple filter of count of 'X' character in the group equal to 1

df.groupby('group').filter(lambda x: x['brand'].str.count('X').sum() == 1)

Output

   group brand
0      1     A
1      1     B
2      1     X
7      3     E
8      3     F
9      3     X

Comments

4

Solution with pd.crosstab

df[df['group'].map(pd.crosstab(df['group'],df['brand'])['X'].eq(1))]

#   group brand
#0      1     A
#1      1     B
#2      1     X
#7      3     E
#8      3     F
#9      3     X

We can also use DataFrame.merge with Series.drop_duplicates

df.merge(df.loc[df.brand.eq('X'),'group'].drop_duplicates(keep = False),on='group')
#   group brand
#0      1     A
#1      1     B
#2      1     X
#3      3     E
#4      3     F
#5      3     X

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.