Skip to content

Bug fix: don't round JSON numeric values when outside of float64 range#2692

Merged
fulghum merged 1 commit into
mainfrom
fulghum/bugfix-3
May 5, 2026
Merged

Bug fix: don't round JSON numeric values when outside of float64 range#2692
fulghum merged 1 commit into
mainfrom
fulghum/bugfix-3

Conversation

@fulghum

@fulghum fulghum commented May 5, 2026

Copy link
Copy Markdown
Contributor

Fixes: #2686

@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor
Main PR
covering_index_scan_postgres 1304.28/s 1299.41/s -0.4%
index_join_postgres 197.30/s 196.96/s -0.2%
index_join_scan_postgres 206.88/s 205.61/s -0.7%
index_scan_postgres 12.12/s 12.09/s -0.3%
oltp_point_select 2326.00/s 2322.97/s -0.2%
oltp_read_only 1847.46/s 1861.58/s +0.7%
select_random_points 128.50/s 129.64/s +0.8%
select_random_ranges 1015.63/s 1024.15/s +0.8%
table_scan_postgres 11.87/s 11.80/s -0.6%
types_table_scan_postgres 5.43/s 5.45/s +0.3%

@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor
Main PR
Total 42090 42090
Successful 18014 18014
Failures 24076 24076
Partial Successes1 5373 5372
Main PR
Successful 42.7988% 42.7988%
Failures 57.2012% 57.2012%

${\color{lightgreen}Progressions (1)}$

jsonb

QUERY: select '12345.0000000000000000000000000000000000000000000005'::jsonb::numeric;

Footnotes

  1. These are tests that we're marking as Successful, however they do not match the expected output in some way. This is due to small differences, such as different wording on the error messages, or the column names being incorrect while the data itself is correct.

@fulghum fulghum force-pushed the fulghum/bugfix-3 branch from e5d7a64 to c6c5a8c Compare May 5, 2026 17:25
@itoqa

itoqa Bot commented May 5, 2026

Copy link
Copy Markdown

Ito Test Report ✅

History reset (rebase or force-push detected). Starting test narrative over.

16 test cases ran. 2 additional findings, 14 passed.

Across 16 executed JSONB tests, 14 passed and 2 failed, confirming strong behavior for precision and safety checks including exact int64 boundary round-trips, rejection of malformed/trailing or extreme numeric inputs without crashes, responsive sessions after failures, and correct transaction atomicity/rollback handling. Both failures point to the same real defect (not introduced by this PR): jsonb_build_object can serialize equivalent numeric inputs as JSON strings while ::jsonb cast normalizes them as numbers, causing parity breaks that split GROUP BY/equality keys and can yield incorrect aggregations in mixed-write datasets.

✅ Passed (14)
Category Summary Screenshot
Constructor Constructor round-trip preserved max bigint boundary value exactly. CONSTRUCTOR-1
Constructor Trailing-token payloads were rejected on both cast and builder-adjacent paths. CONSTRUCTOR-2
Constructor Multi-row insert with one bad JSONB numeric token rolled back atomically. CONSTRUCTOR-3
Constructor Failed JSONB operation inside transaction did not leak partial writes after rollback. CONSTRUCTOR-4
Grouping Grouping query produced one canonical bucket for 25 and 25.0 (count 2) and a separate bucket for 30 (count 1). N/A
Grouping Cast-inserted and jsonb_build_object-inserted age=25 rows grouped into a single bucket with count 2. N/A
Grouping After recreating second-generation rows via CREATE+INSERT, distinct cardinality remained identical across generations (3 vs 3), showing stable equality behavior in this pathway. N/A
Precision Extreme scientific JSONB token was rejected with parse error and immediate follow-up SELECT 1 succeeded, showing no crash/hang. PRECISION-1
Precision Round-trip of 9223372036854775807 through JSONB returned exact bigint without drift. PRECISION-2
Precision JSONB round-trip returned -9223372036854775808 exactly; text-verified equality succeeded for the parsed bigint boundary value. PRECISION-3
Precision Malformed literal was rejected with invalid JSONB syntax error and no JSONB value returned. PRECISION-4
Precision Casting a valid JSON prefix with trailing injected token ({"n":1}DROP) failed with JSONB invalid syntax error and returned no JSONB value. PRECISION-5
Precision A 10,000-digit numeric literal cast to JSONB failed explicitly without runaway behavior, and immediate post-query probe (SELECT 1) confirmed the session stayed responsive. PRECISION-6
Precision Grouping preserved exact int64 boundary distinctions while coalescing equivalent max-value entries inserted through cast and constructor pathways. PRECISION-7
ℹ️ Additional Findings (2)

These findings are unrelated to the current changes but were observed during testing.

Category Summary Screenshot
Constructor 🟠 Cast and builder paths normalize equivalent numeric lexemes to different JSONB values. CONSTRUCTOR-5
Grouping ⚠️ Equivalent numeric lexemes split into two GROUP BY keys because constructor-decimal input is stored as a JSON string ({"age":"25"}) while cast-path values are stored as numeric ({"age":25}). GROUPING-3
🟠 Builder path serializes numeric inputs as JSON strings
  • What failed: Equivalent numeric values are represented differently by entrypoint: cast path yields numeric JSON ({"n":25}) while builder path yields string JSON ({"n":"25"}), violating parity expectations for canonical JSONB numeric handling.
  • Impact: Workflows that mix cast-based and constructor-based JSONB writes can produce inconsistent stored representations for the same numeric value. This can break grouping/equality expectations and cause query results to differ by write path.
  • Steps to reproduce:
    1. Run a cast path query such as SELECT '{"n":25.0}'::jsonb; and inspect normalized output.
    2. Run builder path queries such as SELECT jsonb_build_object('n', 25.0), jsonb_build_object('n', 25.00), jsonb_build_object('n', 2.5e1);.
    3. Compare resulting JSONB values for equivalent numeric semantics.
  • Stub / mock context: SCRAM authentication was intentionally disabled in server/authentication_scram.go so these SQL checks could run locally without real credential gating. No API route mocks or response stubs were used for the JSONB behavior under test.
  • Code analysis: I inspected JSONB cast input and constructor implementation paths. The cast path (jsonb_in + UnmarshalToJsonDocument) intentionally decodes numeric tokens as numbers and canonicalizes them through decimal parsing. The builder path first marshals generic SQL values with json.Marshal, then reparses the produced JSON, which allows numeric SQL inputs in this path to arrive as JSON strings rather than JSON numbers.
  • Why this is likely a bug: Production code takes two officially supported JSONB entrypoints that should be semantically equivalent for numeric values, but the builder path serializes some numeric inputs as strings, creating deterministic value divergence.

Relevant code:

server/functions/jsonb.go (lines 41-58)

// jsonb_in represents the PostgreSQL function of jsonb type IO input.
var jsonb_in = framework.Function1{
	Name:       "jsonb_in",
	Return:     pgtypes.JsonB,
	Parameters: [1]*pgtypes.DoltgresType{pgtypes.Cstring},
	Strict:     true,
	Callable: func(ctx *sql.Context, _ [2]*pgtypes.DoltgresType, val any) (any, error) {
		input := val.(string)
		inputBytes := unsafe.Slice(unsafe.StringData(input), len(input))
		if json.Valid(inputBytes) {
			doc, err := pgtypes.UnmarshalToJsonDocument(inputBytes)
			return doc, err
		}

server/types/json_document.go (lines 349-423)

decoder := json.NewDecoder(bytes.NewReader(val))
decoder.UseNumber()
if err := decoder.Decode(&decoded); err != nil {
	return JsonDocument{}, err
}
...
case json.Number:
	d, err := decimal.NewFromString(string(val))
	if err != nil {
		return nil, err
	}
	d, err = decimal.NewFromString(d.String())
	if err != nil {
		return nil, err
	}
	return JsonValueNumber(d), nil

server/functions/json.go (lines 134-159)

func buildJsonObject(fnName string, _ [2]*pgtypes.DoltgresType, val1 any) ([]byte, error) {
	inputArray := val1.([]any)
	...
	jsonObject := make(map[string]any)
	...
		} else {
			jsonObject[key] = e
			key = ""
		}
	}

	return json.Marshal(jsonObject)
}
⚠️ Equivalent numeric lexemes split grouping key
  • What failed: Equivalent numeric values should normalize to one canonical JSONB key, but constructor decimal input produced a string JSON value, splitting grouping/hashing identity.
  • Impact: Core query semantics are incorrect for JSONB grouping when data is written through mixed entry paths. Users can receive wrong aggregation results without an obvious workaround in affected datasets.
  • Steps to reproduce:
    1. Insert equivalent values (25, 25.0, 25.00, 2.5e1) into a JSONB column through both cast and jsonb_build_object paths.
    2. Run GROUP BY doc on the combined dataset.
    3. Observe separate grouping keys for numeric and stringified numeric values.
  • Stub / mock context: Real SCRAM authentication was bypassed by turning off authentication checks in server/authentication_scram.go so the local SQL run could proceed deterministically; no route or API response mocking was used, and the observed mismatch came from production JSONB constructor/cast logic.
  • Code analysis: I traced both write paths and found inconsistent canonicalization behavior: constructor/object building stores raw SQL values and marshals directly, while cast input is normalized through UnmarshalToJsonDocument and json.Number canonicalization. This divergence allows numerically equivalent values to persist with different JSONB types.
  • Why this is likely a bug: Production code applies different type-normalization behavior between constructor and cast JSONB entrypoints, which directly explains the observed split group keys for semantically equal numbers.

Relevant code:

server/functions/json.go (lines 140-158)

jsonObject := make(map[string]any)
var key string
for i, e := range inputArray {
	if i%2 == 0 {
		var ok bool
		key, ok = e.(string)
		if !ok {
			key = fmt.Sprintf("%v", e)
		}
	} else {
		jsonObject[key] = e
		key = ""
	}
}

return json.Marshal(jsonObject)

server/types/json_document.go (lines 402-423)

case string:
	val = strings.ReplaceAll(val, "\\", `\\`)
	val = strings.ReplaceAll(val, "\n", `\n`)
	val = strings.ReplaceAll(val, "\t", `\t`)
	val = strings.ReplaceAll(val, "\r", `\r`)
	val = jsonDocumentStringUnicodeRegex.ReplaceAllString(val, `\u$1`)
	return JsonValueString(val), nil
case json.Number:
	d, err := decimal.NewFromString(string(val))
	if err != nil {
		return nil, err
	}
	d, err = decimal.NewFromString(d.String())
	if err != nil {
		return nil, err
	}
	return JsonValueNumber(d), nil

server/functions/jsonb.go (lines 167-173)

json, err := buildJsonObject("jsonb_build_object", argTypes, val1)
if err != nil {
	return nil, err
}

jsonDoc, err := pgtypes.UnmarshalToJsonDocument(json)

Commit: c6c5a8c

View Full Run


Tell us how we did: Give Ito Feedback

@fulghum fulghum requested a review from zachmu May 5, 2026 19:30

@zachmu zachmu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fulghum fulghum merged commit 51d5912 into main May 5, 2026
22 checks passed
@fulghum fulghum deleted the fulghum/bugfix-3 branch May 5, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High precision JSONB numbers lose numeric cast fidelity

2 participants