Skip to content

properly flush results#2760

Merged
jycor merged 2 commits into
mainfrom
james/flushing
May 27, 2026
Merged

properly flush results#2760
jycor merged 2 commits into
mainfrom
james/flushing

Conversation

@jycor

@jycor jycor commented May 26, 2026

Copy link
Copy Markdown
Contributor

Instead of flushing RowDescription and every RowData right away, have the entire Result buffer in memory first.

@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor
Main PR
covering_index_scan_postgres 1384.87/s 1459.14/s +5.3%
groupby_scan_postgres 77.51/s ${\color{lightgreen}128.36/s}$ ${\color{lightgreen}+65.6\%}$
index_join_postgres 207.21/s ${\color{lightgreen}628.01/s}$ ${\color{lightgreen}+203.0\%}$
index_join_scan_postgres 217.44/s ${\color{lightgreen}756.50/s}$ ${\color{lightgreen}+247.9\%}$
index_scan_postgres 12.82/s ${\color{lightgreen}23.11/s}$ ${\color{lightgreen}+80.2\%}$
oltp_delete_insert_postgres 741.94/s 734.72/s -1.0%
oltp_insert 628.40/s 637.64/s +1.4%
oltp_point_select 2553.70/s 2729.71/s +6.8%
oltp_read_only 2009.04/s ${\color{lightgreen}2776.16/s}$ ${\color{lightgreen}+38.1\%}$
oltp_read_write 1713.92/s ${\color{lightgreen}2199.22/s}$ ${\color{lightgreen}+28.3\%}$
oltp_update_index 653.79/s 678.46/s +3.7%
oltp_update_non_index 675.06/s 694.93/s +2.9%
oltp_write_only 1608.05/s 1562.62/s -2.9%
select_random_points 137.65/s ${\color{lightgreen}455.96/s}$ ${\color{lightgreen}+231.2\%}$
select_random_ranges 933.60/s 946.67/s +1.3%
table_scan_postgres 12.30/s ${\color{lightgreen}22.54/s}$ ${\color{lightgreen}+83.2\%}$
types_delete_insert_postgres 738.23/s 719.20/s -2.6%
types_table_scan_postgres 5.68/s ${\color{lightgreen}9.70/s}$ ${\color{lightgreen}+70.7\%}$

@itoqa

itoqa Bot commented May 27, 2026

Copy link
Copy Markdown

Ito Test Report ✅

8 test cases ran. 8 passed.

All 8 executed test cases passed (8/8) with zero failures, and the unified local verification found no confirmed production-code defects, with deterministic setup achieved by intentionally bypassing authentication/authorization bootstrap for wire-protocol-focused checks. The most important findings were that streaming behavior was correct across large and simple queries (including contiguous 1–300 results and exact 128/129 boundary counts), Parse/Describe/Bind/Execute and RETURNING flows emitted metadata correctly with no duplicate RowDescription and proper completion framing, and forced flush-time transport disconnects produced clean error propagation without false success while recovery after reconnect (for example, SELECT 1) remained reliable.

✅ Passed (8)
Category Summary Screenshot
Describe Prepared-statement Describe/Execute metadata flow is correct with no duplicate RowDescription defect in production code. DESCRIBE-1
Describe RETURNING UPDATE follows the expected describe metadata and execute row-delivery path. DESCRIBE-2
Describe Multi-batch simple query emits exactly one RowDescription before DataRow streaming. DESCRIBE-3
Error Not a product bug. Re-test forced a hard client disconnect at 128 rows; delivery stopped at that batch boundary, no success completion was observed, and server logged connection-reset write failure as expected by the error-propagation path. ERROR-1
Error Not a product bug. Re-test induced a transport interruption after streaming had started (140 rows observed). The broken connection failed cleanly, and a fresh reconnect immediately succeeded with SELECT 1, confirming recoverable cleanup behavior and no ambiguous success state. ERROR-2
Stream Returned 300 contiguous rows for generate_series(1,300) and a follow-up SELECT 1 succeeded on the same connection. STREAM-1
Stream Simple protocol query returned one RowDescription, 10 ordered DataRows, and clean completion framing without hangs. STREAM-2
Stream Both 129-row and 128-row runs produced exact counts with one RowDescription and one completion sequence each. STREAM-3

Commit: 9f6a449

View Full Run


Tell us how we did: Give Ito Feedback

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor
Main PR
Total 42090 42090
Successful 18250 18250
Failures 23840 23840
Partial Successes1 5385 5385
Main PR
Successful 43.3595% 43.3595%
Failures 56.6405% 56.6405%

Footnotes

  1. These are tests that we're marking as Successful, however they do not match the expected output in some way. This is due to small differences, such as different wording on the error messages, or the column names being incorrect while the data itself is correct.

@Hydrocharged

Hydrocharged commented May 27, 2026

Copy link
Copy Markdown
Collaborator

#2760 (comment)
👀

@jycor jycor force-pushed the james/flushing branch from 9f6a449 to 895af74 Compare May 27, 2026 22:07

@zachmu zachmu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good find

Comment thread server/connection_handler.go
@jycor jycor enabled auto-merge (squash) May 27, 2026 23:02
@jycor jycor merged commit 22cf3ad into main May 27, 2026
22 checks passed
@jycor jycor deleted the james/flushing branch May 27, 2026 23:51
@itoqa

itoqa Bot commented May 27, 2026

Copy link
Copy Markdown

Ito Test Report ✅

History reset (rebase or force-push detected). Starting test narrative over.

13 test cases ran. 2 additional findings, 11 passed.

Overall, the unified run executed 13 test cases with 11 passing and 2 failing: all ORDERING, FLUSH, and MEMORY scenarios succeeded, confirming correct row/frame ordering, stable grouped flush behavior (including expected failure handling and clean retry semantics), and stable wide-row/boundary performance. The key issues were two medium-severity extended-protocol defects in server/connection_handler.go where Execute can emit DataRow messages without a prior metadata contract when Describe is skipped, and where Describe metadata can come from a different object context than the executed portal, allowing metadata/row-shape mismatch without protocol rejection.

✅ Passed (11)
Category Summary Screenshot
Flush Stable grouped flush path completed and returned all expected rows. FLUSH-1
Flush Forced mid-stream disconnect surfaced flush/write failure and prevented successful completion output. FLUSH-2
Flush Interrupted first attempt plus immediate retry matched baseline exactly with no duplicated leading rows. FLUSH-3
Memory Seeded 260 rows with 64KiB payloads and confirmed the wide-row SELECT completed with stable responsiveness and memory metrics. MEMORY-1
Memory The 129-row boundary query completed successfully and validation confirmed keys 1..129 exactly once with no gap or duplicate at the 128/129 transition. MEMORY-2
Ordering Verified one RowDescription before DataRows and CommandComplete for a simple SELECT query. N/A
Ordering Verified a >128-row SELECT emitted exactly one RowDescription across all batches with correct completion. N/A
Ordering Verified queued simple and extended query responses remained isolated with no frame interleaving. ORDERING-3
Protocol Describe emitted metadata once and Execute returned rows without duplicate RowDescription. PROTOCOL-1
Protocol Invalid extended-protocol sequences returned deterministic stale-statement errors, and a subsequent valid flow succeeded with expected rows. N/A
Protocol Back-to-back execution checks returned separated, complete result streams without cross-mixed frames. N/A
ℹ️ Additional Findings (2)

These findings are unrelated to the current changes but were observed during testing.

Category Summary Screenshot
Protocol 🟠 Execute can stream DataRow without prior metadata contract when Describe is skipped. PROTOCOL-3
Protocol 🟠 Metadata described from one object can mismatch rows executed from another object without rejection. PROTOCOL-4
🟠 Execute can emit rows without metadata guard
  • What failed: The server sends DataRow messages for Execute without either rejecting the missing Describe contract or emitting metadata first; strict clients can receive an undecodable row stream.
  • Impact: Extended-protocol clients that rely on explicit metadata sequencing can fail decoding or fail fast on valid-looking Execute responses. This breaks a meaningful wire-protocol workflow but has a workaround (Describe before Execute).
  • Steps to reproduce:
    1. Parse and bind a row-returning statement in an extended protocol session.
    2. Skip Describe and send Execute plus Sync for the portal.
    3. Observe backend messages and confirm DataRow appears without prior RowDescription or a protocol rejection.
  • Stub / mock context: Authentication and authorization checks were temporarily bypassed so the protocol sequence could be exercised directly, but the failing behavior maps to production Describe/Execute logic in the connection handler.
  • Code analysis: I reviewed the execute and row-spooling paths in server/connection_handler.go. The Execute handler only verifies portal existence and then executes; it does not enforce prior Describe state. The row callback explicitly suppresses RowDescription for Execute and still emits row payloads.
  • Why this is likely a bug: The code path allows Execute to emit rows with no guard that Describe metadata exists, so protocol sequencing errors can leak into client-visible row streams instead of being rejected safely.

Relevant code:

server/connection_handler.go (lines 660-667)

func (h *ConnectionHandler) handleExecute(message *pgproto3.Execute) error {
    h.waitForSync = true

    // TODO: implement the RowMax
    portalData, ok := h.portals[message.Portal]
    if !ok {
        return errors.Errorf("portal %s does not exist", message.Portal)
    }

server/connection_handler.go (lines 1031-1042)

// EXECUTE does not send RowDescription; instead it should be sent from DESCRIBE prior to it
if !isExecute && !hasSentRowDescription {
    hasSentRowDescription = true
    h.backend.Send(&pgproto3.RowDescription{Fields: res.Fields})
}
for _, row := range res.Rows {
    h.backend.Send(&pgproto3.DataRow{
        Values: row.val,
    })
}
🟠 Describe and Execute object context can diverge
  • What failed: The server can emit a RowDescription from one object context and then stream DataRow values from a different portal shape without a protocol error, producing metadata/row mismatch.
  • Impact: Drivers can fail decoding or mis-handle result sets when metadata and row payload shape diverge within one logical flow. This causes protocol-level reliability issues in extended-query workflows.
  • Steps to reproduce:
    1. Prepare and bind a portal, then issue Describe on a different object context.
    2. Execute the portal whose row shape differs from the described fields.
    3. Compare emitted metadata and DataRow payload columns and verify no protocol rejection occurs.
  • Stub / mock context: The run used a temporary local auth bypass for deterministic protocol probing, and the mismatch was confirmed by exercising statement-versus-portal Describe and Execute sequences against the live handler logic.
  • Code analysis: I inspected how Describe selects fields by object type and how Execute independently chooses portal data. There is no cross-check that the described object context matches the executed portal schema before rows are streamed.
  • Why this is likely a bug: Describe metadata and Execute row payloads are sourced from independently selected objects with no alignment check, so mismatched schema framing can be emitted instead of rejected.

Relevant code:

server/connection_handler.go (lines 583-600)

if message.ObjectType == 'S' {
    preparedStatementData, ok := h.preparedStatements[message.Name]
    if !ok {
        return errors.Errorf("prepared statement %s does not exist", message.Name)
    }
    fields = preparedStatementData.ReturnFields
    bindvarTypes = preparedStatementData.BindVarTypes
    query = preparedStatementData.Query
} else {
    portalData, ok := h.portals[message.Name]
    if !ok {
        return errors.Errorf("portal %s does not exist", message.Name)
    }
    fields = portalData.Fields
    query = portalData.Query
}

server/connection_handler.go (lines 664-687)

portalData, ok := h.portals[message.Portal]
if !ok {
    return errors.Errorf("portal %s does not exist", message.Portal)
}

query := portalData.Query
callback := h.spoolRowsCallback(query, &rowsAffected, true)
err = h.doltgresHandler.ComExecuteBound(
    context.Background(), h.mysqlConn, query.String, portalData.BoundPlan, portalData.FormatCodes, callback,
)

server/connection_handler.go (lines 1071-1075)

if returnsRow(query) {
    return h.send(&pgproto3.RowDescription{
        Fields: fields,
    })
}

Commit: d551f69

View Full Run


Tell us how we did: Give Ito Feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants