8

Suppose I have a Pandas data frame as follows:

Test Parameter Value

X1     0        0.033285423511615113
X1     1        0.78790279861666179
X1     2        0.79136989638378297
X1     3        0.80063190842016707
X1     4        0.7884653622402551
X1     5        0.78561849214309198...
...
X1     22       22: 0.82241991278171311...
...
X2 ...

I'd like to get the row with Parameter value 3. That is the row with the last increasing value before the first drop. Notice that later on we might have higher values (eg row 22). Essentially, I'm trying to get the "last" number before the "first" decrease value.

Also note that there are multiple Tests, so I probably need to do something like:

myDF.groupby("Test").Something
3
  • 1
    Do you want just the last peak or all such peaks? Commented Oct 22, 2017 at 5:24
  • I misunderstood the question I think Commented Oct 22, 2017 at 5:30
  • Do you want the first local maxima? Commented Oct 22, 2017 at 5:33

5 Answers 5

7

Coldspeed nearly has it, to get only the first group you can use cumprod, or similar e.g.

In [11]: df[((df.Value.diff().fillna(1) > 0).cumprod()) == 1].tail(1)
Out[11]:
  Test  Parameter     Value
3   X1          3  0.800632

The trick being:

In [12]: (df.Value.diff().fillna(1) > 0)
Out[12]:
0     True
1     True
2     True
3     True
4    False
5    False
6     True
Name: Value, dtype: bool

In [13]: (df.Value.diff().fillna(1) > 0).cumprod()
Out[13]:
0    1
1    1
2    1
3    1
4    0
5    0
6    0
Name: Value, dtype: int64

Note: My df is this:

In [21]: df
Out[21]:
  Test  Parameter     Value
0   X1          0  0.033285
1   X1          1  0.787903
2   X1          2  0.791370
3   X1          3  0.800632
4   X1          4  0.788465
5   X1          5  0.785618
6   X1         22  0.822420
Sign up to request clarification or add additional context in comments.

4 Comments

There's definitely a nicer way to do this, but I don't recall it.
Cumprod is a very nice trick that has eluded me for ever, gotta remember that. +1'd
@cᴏʟᴅsᴘᴇᴇᴅ you can do a similar trick with cumsum if you can make the changes (rows that "change") True... then you can groupby the result for example, I think that's what I was thinking of/trying to recall.
@AndyHayden: I think this is it -- one question though, if there are multiple tests, how do I groupBy test. Would it be: df[((df.groupby("Test"). Value.diff().fillna(1) > 0).cumprod()) == 1].tail(1)
6

Use np.diff, it will naturally reduce the length of array by one and when I use np.flatnonzero it will identify the ordinal positions prior.

df.iloc[[np.flatnonzero(np.diff(df.Value) < 0)[0]]]

  Test  Parameter     Value
3   X1          3  0.800632

Note:
We can speed this up by accessing the underlying numpy array

df.iloc[[np.flatnonzero(np.diff(df.Value.values) < 0)[0]]]

Explanation

Get differences

np.diff(df.Value)

array([ 0.754618,  0.003467,  0.009262, -0.012167, -0.002847,  0.036802])

Find where differences are negative

np.flatnonzero(np.diff(df.Value) < 0)

array([3, 4])

I want the first one

np.flatnonzero(np.diff(df.Value) < 0)[0]

3

Use double brackets in an iloc

df.iloc[[3]]

  Test  Parameter     Value
3   X1          3  0.800632

The Group By Looks Like

f = lambda d: d.iloc[[np.flatnonzero(np.diff(d.Value.values) < 0)[0]]]
df.groupby('Test').apply(f)

       Test  Parameter     Value
Test                            
X1   3   X1          3  0.800632

2 Comments

Awesome super sir. you can also add the groupby method na. Op wants that for multiple test cases
I mean df.groupby('Test',as_index=False).apply(lambda x : x.iloc[np.flatnonzero(np.diff(x.Value) < 0)[0]]). You might have better
3

Use diff + tail:

df    
  Test  Parameter     Value
0   X1          0  0.033285
1   X1          1  0.787903
2   X1          2  0.791370
3   X1          3  0.800632
4   X1          4  0.788465
5   X1          5  0.785618

df[df.Value.diff().gt(0)].tail(1)    
  Test  Parameter     Value
3   X1          3  0.800632

This will retrieve the last local minima. If you want the first local minima, refer to Andy Hayden's solution involving cumprod.


If you're doing this in a groupby operation, it'd be something like (borrowing from Andy):

df.groupby('Test', group_keys=False)\
      .apply(lambda x: x[((x.Value.diff().fillna(1) > 0).cumprod()) == 1].tail(1))

5 Comments

No no no not the one. I used the same thing with shift but it has to be every increasing value
@Bharathshetty yeah, I think tail should handle that. Anyway, I'll wait for OP.
There had to be two rows 3 and 22
@Bharathshetty Essentially, I'm trying to get the "last" number before the "first" decrease value. So just one.
@Bharathshetty and cᴏʟᴅsᴘᴇᴇᴅ I think OP wants the first local maxima
3

Also from scipy argrelextrema we can do (From finding local maximas)

from scipy.signal import argrelextrema
maxInd = argrelextrema(df['Value'].values, np.greater)
df.iloc[maxInd[0][:1]]
Test  Parameter     Value
3   X1          3  0.800632

A groupby solution if you have a dataframe i.e


 Test  Parameter     Value
0   X1          0  0.033285
1   X1          1  0.787903
2   X1          2  0.791370
3   X1          3  0.800632
4   X1          4  0.788465
5   X2          5  0.785618
6   X2         22  0.822420
7   X2          5  0.785618
def get_maxima(x):
    return x.iloc[argrelextrema(x['Value'].values,np.greater)[0][:1]]

df.groupby('Test').apply(get_maxima)

Output :

    Test  Parameter     Value
0 3   X1          3  0.800632
1 6   X2         22  0.822420

3 Comments

I think this needs to be df.iloc[maxInd[0][:1]], but very neat!
Also added a groupby approach for multiple maximas . Thank you sir
3

I think max can do it ...

df.sort_values('Value', ascending=False).drop_duplicates(['Test'])
Out[226]: 
  Test  Parameter     Value
3   X1          3  0.800632

Or

df[df['Value'] == df.groupby(['Test'])['Value'].transform(max)]
Out[227]: 
  Test  Parameter     Value
3   X1          3  0.800632

Seems this is what your need ...anyway using ugly way to correct my old post~ .

df1=df.loc[(df.Value.diff().fillna(1) > 0).nonzero()[0]].reset_index()
df1.groupby(df1['index'].diff().ne(1).cumsum()).last().iloc[0,]
Out[289]: 
index               3
Test               X1
Parameter           3
Value        0.800632
Name: 1, dtype: object

For groupby

l=[]
for _,dfs in df.groupby('Test'):
    df1 = dfs.loc[(dfs.Value.diff().fillna(1) > 0).nonzero()[0]].reset_index()
    l.append(df1.groupby(df1['index'].diff().ne(1).cumsum()).last().iloc[0,].to_frame().T)


pd.concat(l,axis=0)

6 Comments

Nice one. +1 Max should find the global maxima.
I agree. I meant to say that max is a very sensible solution!
I think 22 has the same parameter test, so don't see how this can work :/ Maybe I misunderstand OPs question !
df.iloc[df.groupby('Test')['Value'].idxmax()]?
@AndyHayden I think you are right , But All I know is about cumprod and scipy(You and Bh already posted), so I made a very ugly way to achieve this ...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.