Skip to content

Conversation

@Turbo87
Copy link
Member

@Turbo87 Turbo87 commented Mar 6, 2025

We currently rely on the Rust semver crate to implement our "sort by semantic versioning" functionality, that is used by web interface, but also to determine the "default version". This has the downside that we need to load the full list of version numbers for a crate from the database to the API server, sort it and then throw away the ones that we don't need.

Ideally, we would be using https://pgxn.org/dist/semver/ or https://github.com/pgcentralfoundation/pgrx to solve this, but unfortunately most PostgreSQL hosters don't allow/support custom extensions, and it would also make local development a bit more challenging.

As a workaround, this PR implements a semver_ord(num) pgSQL function that returns a JSONB array, which has the same ordering precedence as the Semantic Versioning spec (https://semver.org/#spec-item-11), with the small caveat that it only supports up to 15 prerelease parts. The maximum number of prerelease parts in our current dataset is 7, so 15 should be plenty.

Update: After discussion in this PR we changed the maximum number of prerelease specifiers to 10, but the remaining specifiers are now appended to the array as a string, to at least ensure uniqueness and a stable sort order.

The database migration in this commit also adds a new semver_ord column to the versions table, and an on-insert trigger function that automatically derives the semver_ord column from the num column value:

Bildschirmfoto 2025-03-06 um 08 46 02

Once this migration has run, the existing versions can be backfilled by running the following SQL script, until all versions are processed:

with versions_to_update as (
    select id, num
    from versions
    where semver_ord = 'null'::jsonb
    limit 1000
)
update versions
    set semver_ord = semver_ord(num)
    where id in (select id from versions_to_update);

This PR does not yet implement any code to actually use this new column, since it will need to be backfilled first. Once that has happened and verified to produce correct results, we can start to migrate our codebase to move the semver sorting into the database and potentially even calculate the "default version" directly in the database too.

@Turbo87 Turbo87 added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works A-backend ⚙️ labels Mar 6, 2025
@Turbo87 Turbo87 requested a review from a team March 6, 2025 07:45
@Turbo87 Turbo87 requested a review from eth3lbert March 6, 2025 15:46
Copy link
Contributor

@eth3lbert eth3lbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this excellent PR. This will significantly simplify the work involving version sorting. 💪 💪 💪
I've left some non-blocking nitpicks, but everything is currently great! 👍

Comment on lines +36 to +42
-- In JSONB a number has higher precedence than a string but in
-- semver it is the other way around, so we use true/false to
-- work around this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some references to facilitate the review. The ordering for jsonb, as excerpted from the official doc1, is as follows:

Object > Array > Boolean > Number > String > null
Object with n pairs > object with n - 1 pairs
Array with n elements > array with n - 1 elements

The SemVer's pre-release ordering, as excerpted from the spec2, is as follows:

Precedence for two pre-release versions with the same major, minor, and patch version MUST be determined by comparing each dot separated identifier from left to right until a difference is found as follows:
    1. Identifiers consisting of only digits are compared numerically.
    2. Identifiers with letters or hyphens are compared lexically in ASCII sort order.
    3. Numeric identifiers always have lower precedence than non-numeric identifiers.
    4. A larger set of pre-release fields has a higher precedence than a smaller set, if all of the preceding identifiers are equal.

Footnotes

  1. https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING

  2. https://semver.org/#spec-item-11

@Turbo87
Copy link
Member Author

Turbo87 commented Mar 7, 2025

@Gankra you know a lot about semver edge cases. do you want to try and break this before we actually start using the new column? :D

@Turbo87 Turbo87 moved this to For next meeting in crates.io team meetings Mar 7, 2025
@Gankra
Copy link
Contributor

Gankra commented Mar 7, 2025

ahahahaha

I think as long as it handles
https://crates.io/crates/cursed-trying-to-break-cargo/
ok there's not much else for me to throw at it?

@Gankra
Copy link
Contributor

Gankra commented Mar 7, 2025

Also dang I know that page induces NaNs (improved from "bootloops crates.io's client") but what a chaotic sort haha.

@Turbo87
Copy link
Member Author

Turbo87 commented Mar 7, 2025

I think as long as it handles
https://crates.io/crates/cursed-trying-to-break-cargo/
ok there's not much else for me to throw at it?

it's looking good so far :)

@Turbo87
Copy link
Member Author

Turbo87 commented Mar 7, 2025

Also dang I know that page induces NaNs (improved from "bootloops crates.io's client") but what a chaotic sort haha.

are you sure you were in semver sorting mode? https://crates.io/crates/cursed-trying-to-break-cargo/versions?sort=semver looks okayish to me 😅

@Gankra
Copy link
Contributor

Gankra commented Mar 7, 2025

Oh you're totally right, it's so rare to see date and semver not be 1:1 for crates I forgot date is the default :)

Copy link
Contributor

@LawnGnome LawnGnome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PL/pgSQL is very rusty, but this LGTM. 👍

insta::assert_snapshot!(check("0.0.0").await, @r#"[0, 0, 0, {}]"#);
insta::assert_snapshot!(check("1.0.0-alpha.1").await, @r#"[1, 0, 0, [true, "alpha", false, 1, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, ""]]"#);

// see https://crates.io/crates/cursed-trying-to-break-cargo/1.0.0-0.HDTV-BluRay.1020p.YTSUB.L33TRip.mkv – thanks @Gankra!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that this is represented as-is. 😆

Comment on lines +26 to +27
-- A JSONB object has higher precedence than an array, and versions with
-- prerelease specifiers should have lower precedence than those without.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went to check that we're not relying on any undocumented behaviour, and found this gem:

The btree ordering for jsonb datums is seldom of great interest[...]

Little did they know!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, see #10763 (comment) :)

Turbo87 added 5 commits March 10, 2025 11:31
We currently rely on the Rust `semver` crate to implement our "sort by semantic versioning" functionality, that is used by web interface, but also to determine the "default version". This has the downside that we need to load the full list of version numbers for a crate from the database to the API server, sort it and then throw away the ones that we don't need.

This commit implements a `semver_ord(num)` pgSQL function that returns a JSONB array, which has the same ordering precedence as the Semantic Versioning spec (https://semver.org/#spec-item-11), with the small caveat that it only supports up to 15 prerelease parts. The maximum number of prerelease parts in our current dataset is 7, so 15 should be plenty.

The database migration in this commit also adds a new `semver_ord` column to the `versions` table, and an on-insert trigger function that automatically derives the `semver_ord` column from the `num` column value.

Once this migration has run, the existing versions can be backfilled by running the following SQL script, until all versions are processed:

```sql
with versions_to_update as (
    select id, num
    from versions
    where semver_ord = 'null'::jsonb
    limit 1000
)
update versions
    set semver_ord = semver_ord(num)
    where id in (select id from versions_to_update);
```
@Turbo87 Turbo87 merged commit 7a11278 into rust-lang:main Mar 10, 2025
10 checks passed
@Turbo87 Turbo87 deleted the semver-ord branch March 10, 2025 11:23
-- with versions_to_update as (
-- select id, num
-- from versions
-- where semver_ord = 'null'::jsonb
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that I didn't update this correctly. this line should now be where semver_ord is null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-backend ⚙️ C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants