Filtering DataFrame on groups where count of element is different than 1

Question

I'm working with a DataFrame having the following structure:

import pandas as pd

df = pd.DataFrame({'group':[1,1,1,2,2,2,2,3,3,3],
                   'brand':['A','B','X','C','D','X','X','E','F','X']})

print(df)

   group brand
0      1     A
1      1     B
2      1     X
3      2     C
4      2     D
5      2     X
6      2     X
7      3     E
8      3     F
9      3     X

My goal is to view only the groups having exactly one brand X associated to them. Since group number 2 has two observations equal to brand X, it should be filtered out from the resulting DataFrame.

The output should look like this:

   group brand
0      1     A
1      1     B
2      1     X
3      3     E
4      3     F
5      3     X

I know I should do a groupby on the group column and then filter those groups having a count of X different than 1. The filtering part is where I struggle. Any help would be appreciated.

anky · Accepted Answer · 2020-01-16 11:23:01Z

10

Use series.eq to check if brand is equal to X , then groupby and transform sum and filter groups in which X count is equal to 1:

df[df['brand'].eq('X').groupby(df['group']).transform('sum').eq(1)]

   group brand
0      1     A
1      1     B
2      1     X
7      3     E
8      3     F
9      3     X

edited Jan 16, 2020 at 11:23

answered Jan 16, 2020 at 11:06

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

moys · Accepted Answer · 2020-01-16 11:33:35Z

9

This should work as well

df[df.groupby(['group'])['brand'].transform('sum').str.count('X').eq(1)]

Output

 group  brand
0   1   A
1   1   B
2   1   X
7   3   E
8   3   F
9   3   X

answered Jan 16, 2020 at 11:33

moys

8,1273 gold badges20 silver badges51 bronze badges

Comments

Vishnudev Krishnadas · Accepted Answer · 2020-01-16 12:06:34Z

7

Groupby column and apply a simple filter of count of 'X' character in the group equal to 1

df.groupby('group').filter(lambda x: x['brand'].str.count('X').sum() == 1)

Output

   group brand
0      1     A
1      1     B
2      1     X
7      3     E
8      3     F
9      3     X

edited Jan 16, 2020 at 12:06

answered Jan 16, 2020 at 11:34

Vishnudev Krishnadas

11k2 gold badges29 silver badges58 bronze badges

Comments

ansev · Accepted Answer · 2020-01-16 13:21:13Z

4

Solution with pd.crosstab

df[df['group'].map(pd.crosstab(df['group'],df['brand'])['X'].eq(1))]

#   group brand
#0      1     A
#1      1     B
#2      1     X
#7      3     E
#8      3     F
#9      3     X

We can also use DataFrame.merge with Series.drop_duplicates

df.merge(df.loc[df.brand.eq('X'),'group'].drop_duplicates(keep = False),on='group')
#   group brand
#0      1     A
#1      1     B
#2      1     X
#3      3     E
#4      3     F
#5      3     X

edited Jan 16, 2020 at 13:21

answered Jan 16, 2020 at 12:54

ansev

31k5 gold badges21 silver badges33 bronze badges

Collectives™ on Stack Overflow

Filtering DataFrame on groups where count of element is different than 1

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related