Skip to content

Memory leak when using set_type_codec and Python 3.10 #874

@roman-g

Description

@roman-g
  • asyncpg version: 0.25.0
  • PostgreSQL version: 13, 14
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    : Reproduced locally
  • Python version: 3.10.1
  • Platform: Linux
  • Do you use pgbouncer?: no
  • Did you install asyncpg with pip?: yes
  • If you built asyncpg locally, which version of Cython did you use?:
  • Can the issue be reproduced under both asyncio and
    uvloop?
    :

The test I'm running: https://github.com/roman-g/asyncpg-memory-leak

Basically there's a table with a JSONB column, if I call await connection.set_type_codec("jsonb", encoder=json.dumps, decoder=json.loads, schema="pg_catalog"), each SELECT including that column causes a memory leak.

The test fetches all rows 2000 times, then calls set_type_codec, repeats reading, reports memory usage in MB between stages.
The reported usage is like

24
24
89

indicating a massive growth after the last reading.

SqlAlchemy always calls set_type_codec, so the issue affects it by default.

import asyncio
import os
import psutil
import asyncpg
import json


async def main():
    connection = await asyncpg.connect('postgresql://postgres:some_secret@postgresql:5432/postgres')
    await prepare_data(connection)

    print_memory_usage_in_mb()
    await read(connection)
    print_memory_usage_in_mb()

    await connection.set_type_codec("jsonb", encoder=json.dumps, decoder=json.loads, schema="pg_catalog")

    await read(connection)
    print_memory_usage_in_mb()

    await connection.close()


async def prepare_data(connection):
    await connection.execute("DROP TABLE IF EXISTS some_table")
    await connection.execute("""
        CREATE TABLE some_table(
            id serial PRIMARY KEY,
            data jsonb
        )
    """)
    for i in range(1000):
        await connection.execute("INSERT INTO some_table(data) VALUES($1)", '{"key":"value"}')


async def read(connection):
    for i in range(2000):
        result = await connection.fetch("SELECT data FROM some_table")
        assert len(result) > 0


def print_memory_usage_in_mb():
    process = psutil.Process(os.getpid())
    print(round(process.memory_info().rss / 1000000))


asyncio.run(main())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions