Write Pandas DataFrame to String Buffer with Chunking

Question

I have a 10k row csv that I want to write to s3 in chunks of 1k rows.

from io import StringIO

import pandas as pd

csv_buffer = StringIO()
df.to_csv(csv_buffer, chunksize=1000)
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df.csv').put(Body=csv_buffer.getvalue())

This gives me the first 1k rows in a string buffer to write to s3, but it doesn't seem like csv buffer is an iterator that i can loop over.

anyone know how to achieve this?

did the answer help you to find the solution? I have a similar problem, I am curious how you resolved it in the end. — Ruthger Righart
– Ruthger Righart, Commented Jun 26, 2019 at 12:48

Brad Solomon · Accepted Answer · 2018-03-20 01:03:22Z

It looks like StringIO isn't really heeding the chunksize. (.readlines() will always just return one line, never a chunk of lines.)

I'm not too familiar with boto3, but itertools.islice may work for you here in terms of needing to slice an iterable without creating some intermediate data structure.

If this looks like it may suit your needs, I can add some explanation alongside the code:

>>> from io import StringIO
... from itertools import islice
... import sys
... 
... import numpy as np
... import pandas as pd
... 
... df = pd.DataFrame(np.arange(300).reshape(100, -1))
... csv_buffer = StringIO()
... df.to_csv(csv_buffer)
... csv_buffer.seek(0)
... 
... # Account for indivisibility (scoop up a remainder on the final slice).
... chunksize = 33
... rowsize = df.shape[1]
... slices = [(0, chunksize)] * (rowsize - 1) + [(0, sys.maxsize)]
... chunks = (tuple(islice(csv_buffer, i, j)) for i, j in slices)
... 

>>> next(chunks)
(',0,1,2\n',
 '0,0,1,2\n',
 '1,3,4,5\n',
 '2,6,7,8\n',
 '3,9,10,11\n',
 '4,12,13,14\n',
 '5,15,16,17\n',
 '6,18,19,20\n',
 '7,21,22,23\n',
 '8,24,25,26\n',
 '9,27,28,29\n',
 '10,30,31,32\n',
 '11,33,34,35\n',
 '12,36,37,38\n',
 '13,39,40,41\n',
 '14,42,43,44\n',
 '15,45,46,47\n',
 '16,48,49,50\n',
 '17,51,52,53\n',
 '18,54,55,56\n',
 '19,57,58,59\n',
 '20,60,61,62\n',
 '21,63,64,65\n',
 '22,66,67,68\n',
 '23,69,70,71\n',
 '24,72,73,74\n',
 '25,75,76,77\n',
 '26,78,79,80\n',
 '27,81,82,83\n',
 '28,84,85,86\n',
 '29,87,88,89\n',
 '30,90,91,92\n',
 '31,93,94,95\n')

Collectives™ on Stack Overflow

Write Pandas DataFrame to String Buffer with Chunking

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related