Group by consecutive index numbers

Question

I was wondering if there is a way to groupby consecutive index numbers and move the groups in different columns. Here is an example of the DataFrame I'm using:

                 0
0     19218.965703
1     19247.621650
2     19232.651322
9     19279.216956
10    19330.087371
11    19304.316973

And my idea is to gruoup by sequential index numbers and get something like this:

                 0             1
0     19218.965703  19279.216956    
1     19247.621650  19330.087371
2     19232.651322  19304.316973

Ive been trying to split my data by blocks of 3 and then groupby but I was looking more about something that can be used to group and rearrange sequential index numbers. Thank you!

This is a good way to transpose, however I would like to avoid setting the boundaries (-1, 3) in case I have larger more consecutive idx numbers to group. and the @anky_91 reply is the answer to my question. Thank you! — Gius
– Gius, Commented Sep 1, 2019 at 19:44

anky · Accepted Answer · 2019-08-29 15:50:20Z

21

Here is one way:

from more_itertools import consecutive_groups
final=pd.concat([df.loc[i].reset_index(drop=True) 
                    for i in consecutive_groups(df.index)],axis=1)
final.columns=range(len(final.columns))
print(final)

              0             1
0  19218.965703  19279.216956
1  19247.621650  19330.087371
2  19232.651322  19304.316973

answered Aug 29, 2019 at 15:50

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Gius Over a year ago

I like the more_itertools solution! Thank you. With 3 answers you guys covered all the possible and elegant solutions!!

user3483203 · Accepted Answer · 2019-08-29 15:52:12Z

10

This is a groupby + pivot_table

m = df.index.to_series().diff().ne(1).cumsum()

(df.assign(key=df.groupby(m).cumcount())
    .pivot_table(index='key', columns=m, values=0))

                1             2
key
0    19218.965703  19279.216956
1    19247.621650  19330.087371
2    19232.651322  19304.316973

answered Aug 29, 2019 at 15:52

user3483203

51.3k10 gold badges73 silver badges104 bronze badges

Comments

piRSquared · Accepted Answer · 2019-08-29 16:06:43Z

10

Create a new `pandas.Series` with a new `pandas.MultiIndex`

a = pd.factorize(df.index - np.arange(len(df)))[0]
b = df.groupby(a).cumcount()

pd.Series(df['0'].to_numpy(), [b, a]).unstack()

              0             1
0  19218.965703  19279.216956
1  19247.621650  19330.087371
2  19232.651322  19304.316973

Similar but with more Numpy

a = pd.factorize(df.index - np.arange(len(df)))[0]
b = df.groupby(a).cumcount()

c = np.empty((b.max() + 1, a.max() + 1), float)
c.fill(np.nan)
c[b, a] = np.ravel(df)
pd.DataFrame(c)

              0             1
0  19218.965703  19279.216956
1  19247.621650  19330.087371
2  19232.651322  19304.316973

edited Aug 29, 2019 at 16:06

answered Aug 29, 2019 at 15:57

piRSquared

296k68 gold badges512 silver badges657 bronze badges

1 Comment

shadowtalker Over a year ago

This solution is very clever! The offset relative to np.arange should increase after every "gap" in the sequence, so the offset value should uniquely identify runs of consecutive values. Then pd.factorize will create unique indicators for those unique offsets.

BENY · Accepted Answer · 2019-08-29 15:52:07Z

7

One way from pandas groupby

s=df.index.to_series().diff().ne(1).cumsum()
pd.concat({x: y.reset_index(drop=True) for x, y in df['0'].groupby(s)}, axis=1)

Out[786]: 
              1             2
0  19218.965703  19279.216956
1  19247.621650  19330.087371
2  19232.651322  19304.316973

answered Aug 29, 2019 at 15:52

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

George Pipis · Accepted Answer · 2019-09-03 15:38:09Z

I think that you have assumed that the number of observations within each consecutive group will be the same. My approach is:

Prepare the data:

import pandas as pd
import numpy as np

df = pd.DataFrame(data ={'data':[19218.965703 ,19247.621650 ,19232.651322 ,19279.216956 ,19330.087371 ,19304.316973]}, index = [0,1,2,9,10,11] )

And the solution:

df['Group'] = (df.index.to_series()-np.arange(df.shape[0])).rank(method='dense')
df.reset_index(inplace=True)
df['Observations'] = df.groupby(['Group'])['index'].rank()
df.pivot(index='Observations',columns='Group', values='data')

Which returns:

Group                  1.0           2.0
Observations                            
1.0           19218.965703  19279.216956
2.0           19247.621650  19330.087371
3.0           19232.651322  19304.316973

Billy Bonaros · Accepted Answer · 2019-09-03 15:02:42Z

1

My way:

df['groups']=list(df.reset_index()['index']-range(0,len(df)))
pd.concat([df[df['groups']==i][['0']].reset_index(drop=True) for i in df['groups'].unique()],axis=1)

              0             0
0  19218.965703  19279.216956
1  19247.621650  19330.087371
2  19232.651322  19304.316973

answered Sep 3, 2019 at 15:02

Billy Bonaros

1,73114 silver badges19 bronze badges

Collectives™ on Stack Overflow

Group by consecutive index numbers

6 Answers 6

1 Comment

Comments

Create a new `pandas.Series` with a new `pandas.MultiIndex`

Similar but with more Numpy

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

Comments

Create a new pandas.Series with a new pandas.MultiIndex

Similar but with more Numpy

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

Create a new `pandas.Series` with a new `pandas.MultiIndex`