Image

Jacob Perkins: Python Async Gather in Batches

Python’sasyncio.gatherfunction is great for I/O bound parallel processing. There’s a simple utility function I like to use that I callgather_in_batches:

async def gather_in_batches(tasks, batch_size=100, return_exceptions=False): for i in range(0, len(tasks), batch_size): batch = tasks[i:i+batch_size] for result in await asyncio.gather(*batch, return_exceptions=return_exceptions): yield result

The way you use it is

  1. Generate a list of tasks
  2. Gather your results

Here’s some simple sample code to demonstrate:

tasks = [process_async(obj) for obj in objects] return [result async for result in gather_in_batches(tasks)]

objectscould be all sorts of things:

  • records from a database
  • urls to scrape
  • filenames to read

Andprocess_asyncis anasyncfunction that would just do whatever processing you need to do on that object. Assuming it is mostly I/O bound, then this is very simple and effective method to process data in parallel, without getting into threads, multi-processing, greenlets, or any other method.

You’ll need to experiment to figure out what the optimalbatch_sizeis for your use case. And unless you don’t care about errors, you should setreturn_exceptions=True, then checkif isinstance(result, Exception)to do proper error handling.

https://streamhacker.com/2025/09/15/python-async-gather-in-batches/