×
all 47 comments

[–]Photo-Josh 47 points48 points  (25 children)

Not sure I’m following what the issue was here?

You were using around 10.5 GB and that was too much?

You then moved some things from RAM to Disk, which can only slow things down - not speed up.

Why was 63% RAM usage an issue? It’s there to be used.

[–]Birnenmacht 15 points16 points  (4 children)

Have you measured any improvements through point 4? Imports are cached and importing them locally only delays the point at which you pay their cost, unless you actively prune sys.modules at the end of the function (not recommended, a great way to shoot yourself in the foot)

[–]ofyellow 3 points4 points  (0 children)

When you need x gb and rewrite it so it uses y gb less except for short bursts of time, the effect is that you need x gb still during short bursts of time.

In that way, lazy imports can bite you. You better know the mem needed on worst case moments straight away when you start your app.

More from r/Python

  Hide

Comments, continued...

[–]vaibeslop 5 points6 points  (8 children)

Check out chdb: https://github.com/chdb-io/chdb

Fully pandas compatible API, but lazy loading, much more performant, less memory.

Not affiliated, just a fan of the project.

[–]mikeckennedy[S] 0 points1 point  (0 children)

Very cool, thanks for the heads up u/vaibeslop

[–]ofyellow 0 points1 point  (6 children)

Lazy loading is for optimizing startup time, you load modules as they are needed, causing the load time to be divided over multiple requests until all loadable modules are hit at least once. But it's not a mem optimisation strategy.

[–]vaibeslop 0 points1 point  (5 children)

I'm talking about lazily loading data into memory for operations.

The author of chDB goes into more detail in the v4 announcement post: https://clickhouse.com/blog/chdb.4-0-pandas-hex

I'm neither affiliated with chDB nor Clickhouse.

EDIT: Saw now they even talk about this in the GH Readme now.

[–]ofyellow -1 points0 points  (4 children)

Point 4 mentions local imports.

Yes keeping data out of memory is smart but not inventing sliced bread.

[–]vaibeslop 0 points1 point  (3 children)

Well seeing how not everybody does it, it seems the ease of dissmisively commenting on it is far greater than the ease of implementing it in a real application.

chDB 1 : arm chair CTOs 0

[–]ofyellow 0 points1 point  (2 children)

What click house does is how c# has been doing it for decades.

Of course you first collect .filter() and join logic etc all down the chain before you fetch. I'm stumped this is anything new.

I guess with arm chair you like to drag this into personal insults? Man...the attic sadness dripping from dragging a technical discussion into a weak attempt to insult. What are you, 16?

[–]vaibeslop 0 points1 point  (1 child)

If it were a technical discussion.

All I'm seeing is someone dismissing a very relevant to OPs post, very cool Python compatible project by going of on completely irrelevant, tangential technical details in plain C#.

It's boring, arrogant, off-topic whattaboutism in its purest form.

[–]ofyellow 0 points1 point  (0 children)

Point 4, local imports, do not contribute to less mem need for a web application.

You can call that dismissive but it's a technical fact.

The fact that you pin it on a later remark concerning c# makes your remarks insincere.

[–]0x256 2 points3 points  (0 children)

Switched to a single async Granian worker: Rewrote the app in Quart (async Flask) and replaced the multi-worker web garden with one fully async worker. Saved 542 MB right there.

I would have started reducing the workers to 1 and increase thread count instead of rewriting the entire app, but okay. If you have lots of long running connections (websockets or slow requests) then that's a brave but sensible move.

Raw + DC database pattern: Dropped MongoEngine for raw queries + slotted dataclasses. 100 MB saved per worker and nearly doubled requests/sec.

For a small app with good test coverage and a mature db schema, that's fine.

Subprocess isolation for a search indexer: The daemon was burning 708 MB mostly from import chains pulling in the entire app. Moved the indexing into a subprocess so imports only live for ~30 seconds during re-indexing. Went from 708 MB to 22 MB. 32x reduction.

You reduced the time this memory is used, but not the peak memory consumption. You added a lot of process start overhead and latency. That's a trade-of, not necessarily a win.

Local imports for heavy libs: import boto3 alone costs 25 MB, pandas is 44 MB. If you only use them in a rarely-called function, just import them there instead of at module level. (PEP 810 lazy imports in 3.15 should make this automatic.)

That's not how imports work. You delayed the import, but once imported, the module will live in sys.modules and stay there.

Moved caches to diskcache: Small-to-medium in-memory caches shifted to disk. Modest savings but it adds up.

So instead of a single memory-access, you now create an async task that outsources its blocking disk access to a thread pool, wait for the OS to read from disk, then wait for the async task to get its turn in the event loop again to return the result? Caches should be fast. If SO much overhead for cache access is okay for you, than I wonder what extremely expensive stuff you stored in those caches that it's still worth it to cache at all.

[–]Full-Definition6215 1 point2 points  (0 children)

Running FastAPI + SQLite on a mini PC (31GB RAM, i9-9880H) and memory management matters when you're self-hosting everything on one box.

Biggest wins I found:

  • SQLite instead of Postgres eliminated an entire process worth of memory. WAL mode handles concurrent reads fine, and the total memory footprint for the DB is basically the page cache.
  • uvicorn with --workers 1 for a side project that doesn't need multi-process. Each additional worker duplicates the entire app's memory.
  • Lazy imports for heavy libraries. If Stripe SDK is only used in payment endpoints, don't import it at module level.

The 23 containers on 16GB stat is impressive. I'm at about 5GB usage across all my services on 31GB — plenty of headroom, but that's because I went with SQLite over Postgres for everything that doesn't need a full RDBMS.

[–]bladeofwinds 1 point2 points  (1 child)

I’ve learned about a lot of cool projects from your show! Currently trying out datastar in one of my (non-python) projects

[–]mikeckennedy[S] 3 points4 points  (0 children)

Awesome, great to here u/bladeofwinds :) Datastar is neat for sure.

[–]Substantial-Bed8167 0 points1 point  (1 child)

Did you use any memory profiling or just observed with htop?

[–]mikeckennedy[S] 1 point2 points  (0 children)

No memory profiling, though that would have been interesting. Just process monitoring tools like btop and docker stats.