Jacob Perkins: Python Async Gather in Batches
Python’sasyncio.gatherfunction is great for I/O bound parallel processing. There’s a simple utility function I like to use that I callgather_in_batches:
async def gather_in_batches(tasks, batch_size=100, return_exceptions=False): for i in range(0, len(tasks), batch_size): batch = tasks[i:i+batch_size] for result in await asyncio.gather(*batch, return_exceptions=return_exceptions): yield result
The way you use it is
- Generate a list of tasks
- Gather your results
Here’s some simple sample code to demonstrate:
tasks = [process_async(obj) for obj in objects] return [result async for result in gather_in_batches(tasks)]
objectscould be all sorts of things:
- records from a database
- urls to scrape
- filenames to read
Andprocess_asyncis anasyncfunction that would just do whatever processing you need to do on that object. Assuming it is mostly I/O bound, then this is very simple and effective method to process data in parallel, without getting into threads, multi-processing, greenlets, or any other method.
You’ll need to experiment to figure out what the optimalbatch_sizeis for your use case. And unless you don’t care about errors, you should setreturn_exceptions=True, then checkif isinstance(result, Exception)to do proper error handling.
https://streamhacker.com/2025/09/15/python-async-gather-in-batches/