9

I have a csv file that has 25000 rows. I want to put the average of every 30 rows in another csv file.

I've given an example with 9 rows as below and the new csv file has 3 rows (3, 1, 2):

|   H    |
 ========
|   1    |---\
|   3    |   |--->| 3 |
|   5    |---/
|  -1    |---\
|   3    |   |--->| 1 |
|   1    |---/
|   0    |---\
|   5    |   |--->| 2 |
|   1    |---/

What I did:

import numpy as np
import pandas as pd

m_path = "file.csv"

m_df = pd.read_csv(m_path, usecols=['Col-01']) 
m_arr =  np.array([])
temp = m_df.to_numpy()
step = 30
for i in range(1, 25000, step):
    arr = np.append(m_arr,np.array([np.average(temp[i:i + step])]))

data = np.array(m_arr)[np.newaxis]

m_df = pd.DataFrame({'Column1': data[0, :]})
m_df.to_csv('AVG.csv')

This works well but Is there any other option to do this?

3
  • What do you mean when you say "better", in objective terms? Commented Apr 1, 2020 at 19:47
  • That doesn't answer my question. Opinion-based questions like "is there a better solution" are off-topic on Stack Overflow. "Better" is subjective. Commented Apr 1, 2020 at 21:05
  • "summarized" is not clear or objective any more than "better". If there's a specific problem you have the code or a specific target, elaborate on that. Otherwise you should ask on Code Review, which allows for more opinion-based reviews of code. Commented Apr 1, 2020 at 21:29

3 Answers 3

9

You can use integer division by step for consecutive groups and pass to groupby for aggregate mean:

step = 30
m_df = pd.read_csv(m_path, usecols=['Col-01']) 
df = m_df.groupby(m_df.index // step).mean()

Or:

df = m_df.groupby(np.arange(len(dfm_df// step).mean()

Sample data:

step = 3
df = m_df.groupby(m_df.index // step).mean()
print (df)
   H
0  3
1  1
2  2
Sign up to request clarification or add additional context in comments.

Comments

2

You can get rolling mean using DataFrame.rolling and then filter result using slicing

df.rolling(3).mean()[2::3].reset_index(drop=True)
     a
0  3.0
1  1.0
2  2.0

Comments

1

It might be simpler to do it all in numpy.

import numpy as np
x = np.array([1, 3, 5, -1, 3, 1, 0, 5, 1 ])
steps = 3
for i in range(0, len(x), steps):
    avg = np.average(x[i:i+steps])
    print (f'average starting at el {i} is {avg}')

This prints:

average starting at el 0 is 3.0
average starting at el 3 is 1.0
average starting at el 6 is 2.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.