Skip to content

Fix /merges endpoint sql injection risk#12460

Merged
mekarpeles merged 2 commits into
internetarchive:masterfrom
cdrini:security/fix-merges-sql-injection
Apr 28, 2026
Merged

Fix /merges endpoint sql injection risk#12460
mekarpeles merged 2 commits into
internetarchive:masterfrom
cdrini:security/fix-merges-sql-injection

Conversation

@cdrini

@cdrini cdrini commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

Can allow for non-valid sql to be input in the ORDER BY clause. Note this has been blocked on production, so is not usable there.

Technical

Testing

Confirm https://testing.openlibrary.org/merges.json?order=foo errors.

Screenshot

Stakeholders

Copilot AI review requested due to automatic review settings April 28, 2026 18:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR mitigates a SQL injection risk in the /merges (community edits queue) endpoint by preventing untrusted input from being interpolated into the SQL ORDER BY clause.

Changes:

  • Add request-level validation for the order query param on /merges (allow only asc/desc).
  • Replace the generic order string argument in CommunityEditsQueue.get_requests() with a constrained order_by_updated parameter and build the ORDER BY updated … clause internally.
  • Add a defensive runtime check for order_by_updated values in core.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
openlibrary/plugins/upstream/edits.py Validates the order query parameter and passes a safe sort direction to core.
openlibrary/core/edits.py Constrains ordering to `updated asc

Comment on lines 58 to 65
merge_requests = CommunityEditsQueue.get_requests(
page=int(i.page),
limit=int(i.limit),
mode=i.mode,
submitter=i.submitter,
reviewer=i.reviewer,
order=f"updated {i.order}",
order_by_updated=order,
status=i.status,
Comment thread openlibrary/core/edits.py
Comment on lines +85 to +90
if order_by_updated:
# Verify the type to prevent sql injection
if order_by_updated not in ("asc", "desc"):
raise ValueError(f'Invalid order_by_updated value: "{order_by_updated}". Must be "asc" or "desc".')

query_kwargs["order"] = f"updated {order_by_updated}"
Comment thread openlibrary/core/edits.py
@mekarpeles mekarpeles self-assigned this Apr 28, 2026
@mekarpeles mekarpeles merged commit 39a2688 into internetarchive:master Apr 28, 2026
3 checks passed
@cdrini cdrini deleted the security/fix-merges-sql-injection branch April 28, 2026 20:05
Sadashii pushed a commit to Sadashii/openlibrary that referenced this pull request May 11, 2026
* Fix /merges endpoint sql injection risk
* Make /merges mode parameter strictly validated
mekarpeles added a commit that referenced this pull request Jun 14, 2026
* security: parameterize CommonExtras.update_work_ids_individually

WHAT
----
`CommonExtras.update_work_ids_individually` in `openlibrary/core/db.py`
is the conflict-fallback path for `update_work_id`, which is invoked
when an admin merges two works (e.g. via `/admin/resolve_redirects`).
For each row whose `work_id` matched the source, the previous code
built the WHERE clause with f-string-interpolated *values*:

    where = " AND ".join(
        [f"{k}='{v}'" for k, v in row.items() if k in cls.PRIMARY_KEY]
    )
    oldb.query(f"UPDATE {cls.TABLENAME} set work_id={new_work_id} where {where}")
    oldb.query(f"DELETE FROM {cls.TABLENAME} WHERE {where}")

`v` here is the value of one of the table's primary-key columns
(`username`, `work_id`, `edition_id`, `bookshelf_id`,
`observation_value`, ...). Most of those are integer columns, and
`username` is currently regex-validated at signup
(`[A-Za-z0-9\-_]{3,20}`), so today there is no second-order injection
trail an attacker can ride from a row insert into this UPDATE/DELETE.
**This commit is defense-in-depth, not a fix for an actively
exploitable bug.**

WHY PATCH ANYWAY
----------------
This is the same shape as the bug fixed for `/merges` in
commit 39a2688 (PR #12460): `oldb.select(..., where=...)` with
user-influenced data interpolated into the SQL string instead of bound
via `vars=`. That fix landed *after* the bug had been written, not
before -- the latent footgun pattern is the long-tail risk. If the
username regex ever loosens, if a future admin tool inserts a row with
an unvalidated username (e.g. via an `infogami` write API or a
migration script), or if someone adds a string-typed primary-key column
to one of the inheriting classes, the existing `f"{k}='{v}'"` form
silently becomes a SQL-injection sink in an admin-only UPDATE/DELETE
path -- including the ability to UPDATE `account.password`.

EXPLOIT (theoretical, blocked today)
------------------------------------
If a row in `bookshelves_books` had username `' OR 1=1; UPDATE store
SET json='owned' WHERE key='account/admin' --` (impossible today via
the signup flow), then triggering a work merge against that user's
work_id would have run the malformed UPDATE through Postgres's
multi-statement parser. Demonstrating the vulnerability requires a
secondary write path that bypasses username validation; I could not
find one in the current codebase.

FIX

* security: allowlist `order=` kwarg in YearlyReadingGoals and ImportStats

WHAT
----
Two helpers accept an `order` parameter that is forwarded directly to
`oldb.select(... order=order ...)`. `web.db`'s `order=` keyword is
not parameterized -- it is interpolated raw into the SQL string, the
exact shape behind the /merges SQL injection fixed in PR #12460
(commit 39a2688).

  - `openlibrary/core/yearly_reading_goals.py:select_by_username(
        username, order: str = "year ASC")`
  - `openlibrary/core/imports.py:Stats.get_items(date, order, limit)`

EXPLOIT (theoretical, blocked today)
------------------------------------
Neither function currently has a caller that forwards user input as
`order`:
  - `select_by_username` is called from `fastapi/yearly_reading_goals.py`
    without an `order` argument.
  - `Stats.get_items` is called from `templates/admin/imports.html`
    with a hard-coded `order="import_time desc"`, and from
    `imports_by_date.html` without `order`.

If a future caller plumbs `web.input().order` (or a JSON-derived
`order` field) through to these functions -- exactly what happened
historically on `/merges` -- the result is in-band SQL injection on a
DB connection that has access to the entire `store` table (where
account JSON, including `enc_password`, is kept).

IMPACT
------
The DB connection these helpers use is the same `web.config.db_parameters`
connection that reaches `store`, `account`, and the full Infogami
schema. A successful injection here would let an attacker
extract or overwrite password hashes from `store` (key
`account/<username>` → JSON containing `enc_password`).

FIX

* security: allowlist column-list and table-name interpolation

WHAT
----
Two helpers compose SQL strings that interpolate a non-value SQL
fragment from a Python parameter:

  - `Bestbook.prepare_query(select="*"|"count(*)", ...)` in
    `openlibrary/core/bestbook.py` builds
    `f"SELECT {select} FROM {cls.TABLENAME}"`.
  - `count(table, type, key, value)` in
    `openlibrary/plugins/ol_infobase.py` builds
    `"SELECT count(*) FROM " + table + " WHERE ..."`.

In both cases the value is a SQL identifier list (column expression /
table name) that `web.db` cannot parameterize -- if user input ever
reaches them, it is a textbook SQL injection.

EXPLOIT (theoretical, blocked today)
------------------------------------
Today every call site passes a hard-coded literal:
  - `Bestbook.get_count` and `Bestbook.get_awards` pass `"count(*)"`
    or `"*"`.
  - `count(...)` is invoked from `count_editions_by_author` and
    `count_editions_by_work` with `count("edition_ref", ...)`.

So neither function is reachable from a web request as-written. The
risk is the same regression vector as the /merges fix in PR #12460
(commit 39a2688): a footgun that becomes a vulnerability the moment
a future caller forwards `web.input()` data to `select=` or `table=`.

If reached, an attacker could append a UNION SELECT against the
`store` or `account` tables on the same DB connection (i.e., dump
account password hashes) or execute arbitrary writes via stacked
statements.

FIX

* security: parameterize admin_range__covers query

WHAT
----
`admin_range__covers` in `openlibrary/admin/numbers.py` formats a
date-range SQL query with f-string interpolation:

    q1 = f"SELECT count(*) as count from cover where created>= '{start}' and created < '{end}'"
    result = db.query(q1)

Sibling helpers in the same file (e.g. `admin_range__bot_edits`,
`admin_range__loans`) use the parameterized form:

    db.query("... created >= $start ...", vars=locals())

EXPLOIT (theoretical, blocked today)
------------------------------------
The values come from `kargs["start"].strftime(...)` /
`kargs["end"].strftime(...)`. Both arguments are `datetime` objects
inside the cron-time stats gathering pipeline (`admin/stats.py` calls
`run_gathering_functions(infobase_db, coverstore_db, start, end, ...)`
with internally-generated `datetime` instances). So the format string
output is always shaped `YYYY-MM-DD` / `YYYY-MM-DD HH:MM:SS` and
cannot be injected today.

The risk is shape-regression: if anyone ever invokes this with a
string `start` (e.g. forwarding a query parameter from a future admin
endpoint), the string falls through `strftime` only because Python's
`str.strftime` doesn't exist -- it would throw `AttributeError`. But
the f-string concatenation pattern is the same one fixed in
PR #12460 for /merges, and a forgotten footgun.

FIX

* security(critical): fix /api/versions SQL injection — full credential dump

Update infogami submodule to include sort-parameter allowlist that
prevents unauthenticated extraction of all account data (password
hashes, emails, S3 keys) via a single crafted GET request to
/api/versions.

See vendor/infogami commit for full exploit recipe and impact details.

* security: fix broken LIKE parameterization in booknotes + infogami hardening

Fix '%$search%' which does not substitute in web.db (the $ only works
outside of string literals). Build the search pattern explicitly so
LIKE parameterization works correctly.

Also bump infogami submodule to include error-redaction and condition-
validation hardening (defense-in-depth against future SQL injection).

# Conflicts:
#	vendor/infogami

* chore: add ClassVar annotation to _ALLOWED_ORDERS (ruff RUF012)

---------

Co-authored-by: Scott Barnes (sec audit) <scott.barnes@archive.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants