Calculate average of every n rows from a csv file

Question

I have a csv file that has 25000 rows. I want to put the average of every 30 rows in another csv file.

I've given an example with 9 rows as below and the new csv file has 3 rows (3, 1, 2):

|   H    |
 ========
|   1    |---\
|   3    |   |--->| 3 |
|   5    |---/
|  -1    |---\
|   3    |   |--->| 1 |
|   1    |---/
|   0    |---\
|   5    |   |--->| 2 |
|   1    |---/

What I did:

import numpy as np
import pandas as pd

m_path = "file.csv"

m_df = pd.read_csv(m_path, usecols=['Col-01']) 
m_arr =  np.array([])
temp = m_df.to_numpy()
step = 30
for i in range(1, 25000, step):
    arr = np.append(m_arr,np.array([np.average(temp[i:i + step])]))

data = np.array(m_arr)[np.newaxis]

m_df = pd.DataFrame({'Column1': data[0, :]})
m_df.to_csv('AVG.csv')

This works well but Is there any other option to do this?

That doesn't answer my question. Opinion-based questions like "is there a better solution" are off-topic on Stack Overflow. "Better" is subjective. — TylerH
– TylerH, Commented Apr 1, 2020 at 21:05
"summarized" is not clear or objective any more than "better". If there's a specific problem you have the code or a specific target, elaborate on that. Otherwise you should ask on Code Review, which allows for more opinion-based reviews of code. — TylerH
– TylerH, Commented Apr 1, 2020 at 21:29

jezrael · Accepted Answer · 2020-03-25 14:22:43Z

9

You can use integer division by step for consecutive groups and pass to groupby for aggregate mean:

step = 30
m_df = pd.read_csv(m_path, usecols=['Col-01']) 
df = m_df.groupby(m_df.index // step).mean()

Or:

df = m_df.groupby(np.arange(len(dfm_df// step).mean()

Sample data:

step = 3
df = m_df.groupby(m_df.index // step).mean()
print (df)
   H
0  3
1  1
2  2

edited Mar 25, 2020 at 14:22

answered Mar 25, 2020 at 14:12

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Dishin Goyani · Accepted Answer · 2020-03-25 14:18:21Z

2

You can get rolling mean using DataFrame.rolling and then filter result using slicing

df.rolling(3).mean()[2::3].reset_index(drop=True)
     a
0  3.0
1  1.0
2  2.0

answered Mar 25, 2020 at 14:18

Dishin Goyani

7,7533 gold badges33 silver badges42 bronze badges

Comments

rajah9 · Accepted Answer · 2020-03-25 15:13:12Z

1

It might be simpler to do it all in numpy.

import numpy as np
x = np.array([1, 3, 5, -1, 3, 1, 0, 5, 1 ])
steps = 3
for i in range(0, len(x), steps):
    avg = np.average(x[i:i+steps])
    print (f'average starting at el {i} is {avg}')

This prints:

average starting at el 0 is 3.0
average starting at el 3 is 1.0
average starting at el 6 is 2.0

answered Mar 25, 2020 at 15:13

rajah9

12.5k5 gold badges49 silver badges59 bronze badges

Collectives™ on Stack Overflow

Calculate average of every n rows from a csv file

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related