5

I have a DataFrame like this:

df = pd.DataFrame({'col0': list('aabb'), 
                   'col1': np.arange(4),
                   'col2': list('wxyz'),
                   'col3': np.nan})

    col0 col1 col2 col3
0   a    0    w    NaN
1   a    1    x    NaN
2   b    2    y    NaN
3   b    3    z    NaN

I want to assign to 'col3' the value of 'col2' corresponding to the minimum value of 'col1', grouped by 'col0'. Expected output:

    col0 col1 col2 col3
0   a    0    w    w
1   a    1    x    w
2   b    2    y    y
3   b    3    z    y

If grouping by 'col0' was not needed, this would work:

df['col3'] = df[df['col1']==df['col1'].min()]['col2'].iloc[0]

    col0 col1 col2 col3
0   a    0    w    w
1   a    1    x    w
2   b    2    y    w
3   b    3    z    w

Similarly, this is my try using groupby/apply, which doesn't work as expected:

df['col3'] = df.groupby('col0').apply(lambda x: x[x['col1']==x['col1'].min()]['col2'].iloc[0])

    col0 col1 col2 col3
0   a    0    w    NaN
1   a    1    x    NaN
2   b    2    y    NaN
3   b    3    z    NaN

3 Answers 3

3

another transforming with idxmin and loc:

df["col3"] = df.groupby("col0").col1.transform(lambda x: df.loc[x.idxmin(), "col2"])

to get

  col0  col1 col2 col3
0    a     0    w    w
1    a     1    x    w
2    b     2    y    y
3    b     3    z    y
Sign up to request clarification or add additional context in comments.

5 Comments

Ok, mine is the worst and yours is the best :)
@Stryder I strongly disagree :)
This works perfectly for the special case in which I'm looking for the index of the min value in col1. What if I have a more general condition as: "assign to 'col3' the value of 'col2' corresponding to a true condition of 'col1', grouped by 'col0'"?
@makpalan I'm not sure what you mean by a true condition, but perhaps it could also be put in loc above instead of idxmin. Can you please elaborate on the condition along with some sample input/output (by editing the question)? If that would make this current question very different, then you might opt for asking another one, but if it is close enough perhaps it is fine, thanks.
@MustafaAydın say I wanted for col3 the (first) value of col2 where col1>0 (a different condition than min value, even if trivial here), then I found this: df["col3"] = df.groupby("col0").col1.transform(lambda x: df.loc[x.index[x>0].tolist()[0], 'col2']). Following your approach is actually possible to get something more general, which is actually what I was looking for, thanks
2

you can use groupby.apply to get a series and then merge it into the df

df
  col0  col1 col2
0    a     0    w
1    a     1    x
2    b     2    y
3    b     3    z

col3 = df.groupby("col0").apply(lambda x: x.loc[x["col1"].idxmin(), "col2"])
col3.name = "col3"
df = df.merge(col3, how="left", left_on= "col0", right_index= True)

df
 col0  col1 col2 col3
0    a     0    w    w
1    a     1    x    w
2    b     2    y    y
3    b     3    z    y

Comments

1

you can groupby with transform idxmin and then series.map:

d = dict(zip(df['col1'],df['col2']))
df['col3'] = df['col3'].fillna(df.groupby("col0")['col1'].transform('idxmin').map(d))

print(df)

  col0  col1 col2 col3
0    a     0    w    w
1    a     1    x    w
2    b     2    y    y
3    b     3    z    y    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.