3

I have a 10k row csv that I want to write to s3 in chunks of 1k rows.

from io import StringIO

import pandas as pd

csv_buffer = StringIO()
df.to_csv(csv_buffer, chunksize=1000)
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df.csv').put(Body=csv_buffer.getvalue())

This gives me the first 1k rows in a string buffer to write to s3, but it doesn't seem like csv buffer is an iterator that i can loop over.

anyone know how to achieve this?

1
  • did the answer help you to find the solution? I have a similar problem, I am curious how you resolved it in the end. Commented Jun 26, 2019 at 12:48

1 Answer 1

4

It looks like StringIO isn't really heeding the chunksize. (.readlines() will always just return one line, never a chunk of lines.)

I'm not too familiar with boto3, but itertools.islice may work for you here in terms of needing to slice an iterable without creating some intermediate data structure.

If this looks like it may suit your needs, I can add some explanation alongside the code:

>>> from io import StringIO
... from itertools import islice
... import sys
... 
... import numpy as np
... import pandas as pd
... 
... df = pd.DataFrame(np.arange(300).reshape(100, -1))
... csv_buffer = StringIO()
... df.to_csv(csv_buffer)
... csv_buffer.seek(0)
... 
... # Account for indivisibility (scoop up a remainder on the final slice).
... chunksize = 33
... rowsize = df.shape[1]
... slices = [(0, chunksize)] * (rowsize - 1) + [(0, sys.maxsize)]
... chunks = (tuple(islice(csv_buffer, i, j)) for i, j in slices)
... 

>>> next(chunks)
(',0,1,2\n',
 '0,0,1,2\n',
 '1,3,4,5\n',
 '2,6,7,8\n',
 '3,9,10,11\n',
 '4,12,13,14\n',
 '5,15,16,17\n',
 '6,18,19,20\n',
 '7,21,22,23\n',
 '8,24,25,26\n',
 '9,27,28,29\n',
 '10,30,31,32\n',
 '11,33,34,35\n',
 '12,36,37,38\n',
 '13,39,40,41\n',
 '14,42,43,44\n',
 '15,45,46,47\n',
 '16,48,49,50\n',
 '17,51,52,53\n',
 '18,54,55,56\n',
 '19,57,58,59\n',
 '20,60,61,62\n',
 '21,63,64,65\n',
 '22,66,67,68\n',
 '23,69,70,71\n',
 '24,72,73,74\n',
 '25,75,76,77\n',
 '26,78,79,80\n',
 '27,81,82,83\n',
 '28,84,85,86\n',
 '29,87,88,89\n',
 '30,90,91,92\n',
 '31,93,94,95\n')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.