Skip to content

Poor performance when running a large statement. sqlalchemy + oracledb #172

@kryvokhyzha

Description

@kryvokhyzha

Hi,
I have the following code:

import pandas as pd
from sqlalchemy.orm.session import Session


def compute(engine: Session) -> pd.DataFrame:
        # some steps
        # ....
        df = engine.query(...)

        with engine.get_bind().connect() as conn:
                df = pd.read_sql(sql=df.statement, con=conn)

        return df
  1. If I compile SQL code from the df variable and then run it directly in DBeaver - this statement will complete in 17-20 seconds. And it's okay. Compiled SQL contains 347k characters.
  2. If I run the calculation through pd.read_sql, then this code will run for about 16 minutes. I noticed that the delay happens in the _prepare method (see trace below).
File "/opt/homebrew/Caskroom/miniforge/base/envs/venv/lib/python3.10/site-packages/oracledb/cursor.py", line 137, in _prepare
    self._impl.prepare(statement, tag, cache_statement)
File "src/oracledb/impl/thin/cursor.pyx", line 213, in oracledb.thin_impl.ThinCursorImpl.prepare
File "src/oracledb/impl/thin/connection.pyx", line 235, in oracledb.thin_impl.ThinConnImpl._get_statement
File "src/oracledb/impl/thin/connection.pyx", line 239, in oracledb.thin_impl.ThinConnImpl._get_statement
File "src/oracledb/impl/thin/statement.pyx", line 184, in oracledb.thin_impl.Statement._prepare
File "/opt/homebrew/Caskroom/miniforge/base/envs/venv/lib/python3.10/re.py", line 209, in sub
  return _compile(pattern, flags).sub(repl, string, count)
  1. At the same time, I can run simpler scripts via pd.read_sql and they execute normally.
  2. Also, I build the same script like in df variable, but for PostgreSQL. And it run for about 35 seconds.

Issue

  1. Is this expected behavior when working with large scripts?
  2. Is there any way I can bypass this prepare step?

requirements.txt

# Apple M1 macOS Ventura 13.0
# python 3.10
SQLAlchemy==2.0.9
numpy==1.23.5
pandas==1.5.2
oracledb==1.3.0

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions