6

I am attempting to generate a dataframe (or series) based on another dataframe, selecting a different column from the first frame dependent on the row using another series. In the below simplified example, I want the frame1 values from 'a' for the first three rows, and 'b for the final two (the picked_values series).

frame1=pd.DataFrame(np.random.randn(10).reshape(5,2),index=range(5),columns=['a','b'])
picked_values=pd.Series(['a','a','a','b','b'])

Frame1

    a           b
0   0.283519    1.462209
1   -0.352342   1.254098
2   0.731701    0.236017
3   0.022217    -1.469342
4   0.386000    -0.706614

Trying to get to the series:

0   0.283519
1   -0.352342
2   0.731701
3   -1.469342
4   -0.706614

I was hoping values[picked_values] would work, but this ends up with five columns.

In the real-life example, picked_values is a lot larger and calculated.

Thank you for your time.

2 Answers 2

6

Use df.lookup

pd.Series(frame1.lookup(picked_values.index,picked_values))

0    0.283519
1   -0.352342
2    0.731701
3   -1.469342
4   -0.706614
dtype: float64
Sign up to request clarification or add additional context in comments.

1 Comment

And this is why stack overflow is the best. Thanks!
3

Here's a NumPy based approach using integer indexing and Series.searchsorted:

frame1.values[frame1.index, frame1.columns.searchsorted(picked_values.values)]
# array([0.22095278, 0.86200616, 1.88047197, 0.49816937, 0.10962954])

2 Comments

I was wondering why this has a different output, then realized that the sample is a random.
Yup :) could've stuck to the generated one to make it less confusing @anky

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.