-
Notifications
You must be signed in to change notification settings - Fork 96
Closed
Labels
Description
Hi,
I have the following code:
import pandas as pd
from sqlalchemy.orm.session import Session
def compute(engine: Session) -> pd.DataFrame:
# some steps
# ....
df = engine.query(...)
with engine.get_bind().connect() as conn:
df = pd.read_sql(sql=df.statement, con=conn)
return df- If I compile SQL code from the
dfvariable and then run it directly in DBeaver - this statement will complete in 17-20 seconds. And it's okay. Compiled SQL contains 347k characters. - If I run the calculation through
pd.read_sql, then this code will run for about 16 minutes. I noticed that the delay happens in the_preparemethod (see trace below).
File "/opt/homebrew/Caskroom/miniforge/base/envs/venv/lib/python3.10/site-packages/oracledb/cursor.py", line 137, in _prepare
self._impl.prepare(statement, tag, cache_statement)
File "src/oracledb/impl/thin/cursor.pyx", line 213, in oracledb.thin_impl.ThinCursorImpl.prepare
File "src/oracledb/impl/thin/connection.pyx", line 235, in oracledb.thin_impl.ThinConnImpl._get_statement
File "src/oracledb/impl/thin/connection.pyx", line 239, in oracledb.thin_impl.ThinConnImpl._get_statement
File "src/oracledb/impl/thin/statement.pyx", line 184, in oracledb.thin_impl.Statement._prepare
File "/opt/homebrew/Caskroom/miniforge/base/envs/venv/lib/python3.10/re.py", line 209, in sub
return _compile(pattern, flags).sub(repl, string, count)- At the same time, I can run simpler scripts via
pd.read_sqland they execute normally. - Also, I build the same script like in
dfvariable, but for PostgreSQL. And it run for about 35 seconds.
Issue
- Is this expected behavior when working with large scripts?
- Is there any way I can bypass this
preparestep?
requirements.txt
# Apple M1 macOS Ventura 13.0
# python 3.10
SQLAlchemy==2.0.9
numpy==1.23.5
pandas==1.5.2
oracledb==1.3.0Thank you!