Fix /merges endpoint sql injection risk#12460
Merged
mekarpeles merged 2 commits intoApr 28, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR mitigates a SQL injection risk in the /merges (community edits queue) endpoint by preventing untrusted input from being interpolated into the SQL ORDER BY clause.
Changes:
- Add request-level validation for the
orderquery param on/merges(allow onlyasc/desc). - Replace the generic
orderstring argument inCommunityEditsQueue.get_requests()with a constrainedorder_by_updatedparameter and build theORDER BY updated …clause internally. - Add a defensive runtime check for
order_by_updatedvalues in core.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
openlibrary/plugins/upstream/edits.py |
Validates the order query parameter and passes a safe sort direction to core. |
openlibrary/core/edits.py |
Constrains ordering to `updated asc |
Comment on lines
58
to
65
| merge_requests = CommunityEditsQueue.get_requests( | ||
| page=int(i.page), | ||
| limit=int(i.limit), | ||
| mode=i.mode, | ||
| submitter=i.submitter, | ||
| reviewer=i.reviewer, | ||
| order=f"updated {i.order}", | ||
| order_by_updated=order, | ||
| status=i.status, |
Comment on lines
+85
to
+90
| if order_by_updated: | ||
| # Verify the type to prevent sql injection | ||
| if order_by_updated not in ("asc", "desc"): | ||
| raise ValueError(f'Invalid order_by_updated value: "{order_by_updated}". Must be "asc" or "desc".') | ||
|
|
||
| query_kwargs["order"] = f"updated {order_by_updated}" |
Sadashii
pushed a commit
to Sadashii/openlibrary
that referenced
this pull request
May 11, 2026
* Fix /merges endpoint sql injection risk * Make /merges mode parameter strictly validated
mekarpeles
added a commit
that referenced
this pull request
Jun 14, 2026
* security: parameterize CommonExtras.update_work_ids_individually
WHAT
----
`CommonExtras.update_work_ids_individually` in `openlibrary/core/db.py`
is the conflict-fallback path for `update_work_id`, which is invoked
when an admin merges two works (e.g. via `/admin/resolve_redirects`).
For each row whose `work_id` matched the source, the previous code
built the WHERE clause with f-string-interpolated *values*:
where = " AND ".join(
[f"{k}='{v}'" for k, v in row.items() if k in cls.PRIMARY_KEY]
)
oldb.query(f"UPDATE {cls.TABLENAME} set work_id={new_work_id} where {where}")
oldb.query(f"DELETE FROM {cls.TABLENAME} WHERE {where}")
`v` here is the value of one of the table's primary-key columns
(`username`, `work_id`, `edition_id`, `bookshelf_id`,
`observation_value`, ...). Most of those are integer columns, and
`username` is currently regex-validated at signup
(`[A-Za-z0-9\-_]{3,20}`), so today there is no second-order injection
trail an attacker can ride from a row insert into this UPDATE/DELETE.
**This commit is defense-in-depth, not a fix for an actively
exploitable bug.**
WHY PATCH ANYWAY
----------------
This is the same shape as the bug fixed for `/merges` in
commit 39a2688 (PR #12460): `oldb.select(..., where=...)` with
user-influenced data interpolated into the SQL string instead of bound
via `vars=`. That fix landed *after* the bug had been written, not
before -- the latent footgun pattern is the long-tail risk. If the
username regex ever loosens, if a future admin tool inserts a row with
an unvalidated username (e.g. via an `infogami` write API or a
migration script), or if someone adds a string-typed primary-key column
to one of the inheriting classes, the existing `f"{k}='{v}'"` form
silently becomes a SQL-injection sink in an admin-only UPDATE/DELETE
path -- including the ability to UPDATE `account.password`.
EXPLOIT (theoretical, blocked today)
------------------------------------
If a row in `bookshelves_books` had username `' OR 1=1; UPDATE store
SET json='owned' WHERE key='account/admin' --` (impossible today via
the signup flow), then triggering a work merge against that user's
work_id would have run the malformed UPDATE through Postgres's
multi-statement parser. Demonstrating the vulnerability requires a
secondary write path that bypasses username validation; I could not
find one in the current codebase.
FIX
* security: allowlist `order=` kwarg in YearlyReadingGoals and ImportStats
WHAT
----
Two helpers accept an `order` parameter that is forwarded directly to
`oldb.select(... order=order ...)`. `web.db`'s `order=` keyword is
not parameterized -- it is interpolated raw into the SQL string, the
exact shape behind the /merges SQL injection fixed in PR #12460
(commit 39a2688).
- `openlibrary/core/yearly_reading_goals.py:select_by_username(
username, order: str = "year ASC")`
- `openlibrary/core/imports.py:Stats.get_items(date, order, limit)`
EXPLOIT (theoretical, blocked today)
------------------------------------
Neither function currently has a caller that forwards user input as
`order`:
- `select_by_username` is called from `fastapi/yearly_reading_goals.py`
without an `order` argument.
- `Stats.get_items` is called from `templates/admin/imports.html`
with a hard-coded `order="import_time desc"`, and from
`imports_by_date.html` without `order`.
If a future caller plumbs `web.input().order` (or a JSON-derived
`order` field) through to these functions -- exactly what happened
historically on `/merges` -- the result is in-band SQL injection on a
DB connection that has access to the entire `store` table (where
account JSON, including `enc_password`, is kept).
IMPACT
------
The DB connection these helpers use is the same `web.config.db_parameters`
connection that reaches `store`, `account`, and the full Infogami
schema. A successful injection here would let an attacker
extract or overwrite password hashes from `store` (key
`account/<username>` → JSON containing `enc_password`).
FIX
* security: allowlist column-list and table-name interpolation
WHAT
----
Two helpers compose SQL strings that interpolate a non-value SQL
fragment from a Python parameter:
- `Bestbook.prepare_query(select="*"|"count(*)", ...)` in
`openlibrary/core/bestbook.py` builds
`f"SELECT {select} FROM {cls.TABLENAME}"`.
- `count(table, type, key, value)` in
`openlibrary/plugins/ol_infobase.py` builds
`"SELECT count(*) FROM " + table + " WHERE ..."`.
In both cases the value is a SQL identifier list (column expression /
table name) that `web.db` cannot parameterize -- if user input ever
reaches them, it is a textbook SQL injection.
EXPLOIT (theoretical, blocked today)
------------------------------------
Today every call site passes a hard-coded literal:
- `Bestbook.get_count` and `Bestbook.get_awards` pass `"count(*)"`
or `"*"`.
- `count(...)` is invoked from `count_editions_by_author` and
`count_editions_by_work` with `count("edition_ref", ...)`.
So neither function is reachable from a web request as-written. The
risk is the same regression vector as the /merges fix in PR #12460
(commit 39a2688): a footgun that becomes a vulnerability the moment
a future caller forwards `web.input()` data to `select=` or `table=`.
If reached, an attacker could append a UNION SELECT against the
`store` or `account` tables on the same DB connection (i.e., dump
account password hashes) or execute arbitrary writes via stacked
statements.
FIX
* security: parameterize admin_range__covers query
WHAT
----
`admin_range__covers` in `openlibrary/admin/numbers.py` formats a
date-range SQL query with f-string interpolation:
q1 = f"SELECT count(*) as count from cover where created>= '{start}' and created < '{end}'"
result = db.query(q1)
Sibling helpers in the same file (e.g. `admin_range__bot_edits`,
`admin_range__loans`) use the parameterized form:
db.query("... created >= $start ...", vars=locals())
EXPLOIT (theoretical, blocked today)
------------------------------------
The values come from `kargs["start"].strftime(...)` /
`kargs["end"].strftime(...)`. Both arguments are `datetime` objects
inside the cron-time stats gathering pipeline (`admin/stats.py` calls
`run_gathering_functions(infobase_db, coverstore_db, start, end, ...)`
with internally-generated `datetime` instances). So the format string
output is always shaped `YYYY-MM-DD` / `YYYY-MM-DD HH:MM:SS` and
cannot be injected today.
The risk is shape-regression: if anyone ever invokes this with a
string `start` (e.g. forwarding a query parameter from a future admin
endpoint), the string falls through `strftime` only because Python's
`str.strftime` doesn't exist -- it would throw `AttributeError`. But
the f-string concatenation pattern is the same one fixed in
PR #12460 for /merges, and a forgotten footgun.
FIX
* security(critical): fix /api/versions SQL injection — full credential dump
Update infogami submodule to include sort-parameter allowlist that
prevents unauthenticated extraction of all account data (password
hashes, emails, S3 keys) via a single crafted GET request to
/api/versions.
See vendor/infogami commit for full exploit recipe and impact details.
* security: fix broken LIKE parameterization in booknotes + infogami hardening
Fix '%$search%' which does not substitute in web.db (the $ only works
outside of string literals). Build the search pattern explicitly so
LIKE parameterization works correctly.
Also bump infogami submodule to include error-redaction and condition-
validation hardening (defense-in-depth against future SQL injection).
# Conflicts:
# vendor/infogami
* chore: add ClassVar annotation to _ALLOWED_ORDERS (ruff RUF012)
---------
Co-authored-by: Scott Barnes (sec audit) <scott.barnes@archive.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Can allow for non-valid sql to be input in the ORDER BY clause. Note this has been blocked on production, so is not usable there.
Technical
Testing
Confirm https://testing.openlibrary.org/merges.json?order=foo errors.
Screenshot
Stakeholders