Go to Reddit Home

Log in to Reddit

Open settings menu

r/SQL

members

online

Best

Open sort options

Change post view

I built the Flappy Bird game using SQL only... Now I need Therapist

u/Low-Distance9808

I built the Flappy Bird game using SQL only... Now I need Therapist

- All game logic, animation and rendering happens inside DB Engine using queries

- Runs at 30 and 60 frames

•

Promoted

Your next job could start with SQL. Here’s the roadmap Data Analysts follow. Start for free now.

datacamp.com

Learn More

Chrome extension to run SQL in Google Sheets

u/Comfortable-Ear-1129

Chrome extension to run SQL in Google Sheets

media poster

SQL optimization advice for large skewed left joins in Spark SQL

u/Routine_Day8121

SQL optimization advice for large skewed left joins in Spark SQL

Spark SQL/Databricks

dealing with serious SQL performance problem in Spark 3.2.2. My job runs a left join between a large fact table (~100M rows) and a dimension table (~5M rows, ~200MB). During the join, some tasks take much longer than others due to extreme skew, and sometimes the job fails with OOM.

I already increased executor memory to 16GB, which helped temporarily. I enabled AQE (spark.sql.adaptive.enabled = true), but the skew join optimization never triggers. I also tried broadcast join hints, but Spark still chooses a shuffle join. Using random suffixes to redistribute data inflated the size 10x and caused worse memory issues.

My questions.

Why would Spark refuse to apply a broadcast join when the table looks small enough? Could data types, nulls, or statistics prevent it?
Why does AQE not detect such a clear skew, and what exact conditions are needed for it to activate?
Beyond memory increases and random suffix hacks, what real SQL-level optimization strategies could help, like repartitioning, bucketing, custom partitioning, or specific Spark SQL configs?
Any practical experience or insights with large skewed left joins in SQL / Spark SQL would be very helpful.

Created Nov 30, 2008

Public

Anyone can view, post, and comment to this community

94K 573

Community Bookmarks

MS SQL / SQL Server

r/SQL Rules

1

Be polite and civil

r/SQL is a community of professionals and hobbyists who are united in a love of data, analytics, and the like.

Let's embrace the things that we have in common. Don't be rude or unpleasant. Do not harass others. Don't resort to value judgments. Don't resort to personal attacks.

2

Do not post links to basic SQL tutorials

r/SQL does not allow links to basic tutorials to be posted. You should post these to r/learnsql instead.

Please see this discussion to understand why the community made this decision.

3

Tell us what DBMS / SQL variant you are using

When requesting help or asking questions, please flair your post with the DBMS (database management system) / SQL variant that you are using.

If you include any of the following in your title, your post will receive the appropriate flair automatically: MySQL, Oracle, MS SQL, PostgreSQL, SQLite, DB2, MariaDB

While naturally we should endeavor to work as platform neutrally as possible many questions and answers require tailoring to the feature set of a specific platform.

4

Provide as much context/background info as possible

If you are a student or just looking for help on your code, please do not just post your questions and expect the community to do all the work for you. We will gladly help where we can as long as you post the work you have already done or show that you have attempted to figure it out on your own.

Additionally, providing your current query (even if it is not functioning as intended), table columns, example data, and so forth will all make it easier for the community to assist you.

5

Format your code

If you are including actual code in a post or comment, please attempt to format it in a way that is readable for other users. This will greatly increase your chances of receiving the help you desire.

To format your code using reddit's built in code formatting, put 4 spaces at the start of each line.

We recommend using SQLFiddle to provide a useful development and testing environment for those who wish to fully understand your problem and help devise a solution.

6

No photos of lengthy code

Please see rule #3 - copy/paste lengthy code (and output, error messages, etc.) and format them.

Photographs of computer screens with code on them are inconvenient; they are difficult to read and mean others can't copy/paste to help you with solutions.

Make it easy for people to help you! Don't paste photos of screens.

7

No solicitation for others to do your homework

Do not post seeking assistance with homework questions of any kind (including take home interview content) without a demonstrable good faith attempt to find solutions on your own

8

No “How do I Start Learning SQL” Posts

We welcome all new interested members to our community. However, the forum is intendedfor requesting help with and discussing specific topics. Please check the wiki for resources for learning SQL as a beginner and be sure to checkout r/learnSQL

Learning SQL

A common question for newcomers is how to learn SQL. Please view the wiki for a list of online resources, or review existing posts about learning SQL.

Filter by DBMS

Related Reddit Communities

r/Database

77,025 members

r/SQLServer icon

r/SQLServer

62,272 members

r/mysql

46,745 members

r/PostgreSQL icon

r/PostgreSQL

74,672 members

r/oracle

30,817 members

r/MSAccess icon

r/MSAccess

14,942 members

r/DB2

1,625 members

r/learnSQL

54,387 members

r/SQLOptimization

7,009 members

r/programming icon

r/programming

6,830,158 members

Moderators

Moderator list hidden. Learn More

View all moderators

Reddit Rules

Privacy Policy

User Agreement

Your Privacy Choices

Accessibility

Reddit, Inc. © 2026. All rights reserved.