- All game logic, animation and rendering happens inside DB Engine using queries
- Runs at 30 and 60 frames
repo:
- All game logic, animation and rendering happens inside DB Engine using queries
- Runs at 30 and 60 frames
repo:
dealing with serious SQL performance problem in Spark 3.2.2. My job runs a left join between a large fact table (~100M rows) and a dimension table (~5M rows, ~200MB). During the join, some tasks take much longer than others due to extreme skew, and sometimes the job fails with OOM.
I already increased executor memory to 16GB, which helped temporarily. I enabled AQE (spark.sql.adaptive.enabled = true), but the skew join optimization never triggers. I also tried broadcast join hints, but Spark still chooses a shuffle join. Using random suffixes to redistribute data inflated the size 10x and caused worse memory issues.
My questions.
Why would Spark refuse to apply a broadcast join when the table looks small enough? Could data types, nulls, or statistics prevent it?
Why does AQE not detect such a clear skew, and what exact conditions are needed for it to activate?
Beyond memory increases and random suffix hacks, what real SQL-level optimization strategies could help, like repartitioning, bucketing, custom partitioning, or specific Spark SQL configs?
Any practical experience or insights with large skewed left joins in SQL / Spark SQL would be very helpful.