Skip to content

Migrate existing trending_books_api endpoint to FastAPI and added tests#12000

Merged
RayBB merged 19 commits intointernetarchive:masterfrom
bhardwajparth51:trending-fastapi-migration
Mar 24, 2026
Merged

Migrate existing trending_books_api endpoint to FastAPI and added tests#12000
RayBB merged 19 commits intointernetarchive:masterfrom
bhardwajparth51:trending-fastapi-migration

Conversation

@bhardwajparth51
Copy link
Copy Markdown
Contributor

@bhardwajparth51 bhardwajparth51 commented Mar 3, 2026

Part of #11999

Migrates the /trending/{period}.json endpoint from the legacy web.py to FastAPI at /trending/{period}.json.

Technical

  1. Routing: Wired up the endpoint at /trending/{period}.json in openlibrary/fastapi/internal/api.py
  2. Validation: Switched to Pydantic and FastAPI's Query annotations so params like hours, page, and limit are bounds-checked and cast to the right types automatically — no manual parsing needed
  3. Response Modeling: Added a SolrWork Pydantic model and used response_model_exclude_none=True to match the legacy output faithfully, including intentionally dropping days from the forever response
  4. Error Handling: Wrapped the asyncio.to_thread(get_trending_books, ...) call in a try/except so any database failure comes back as a clean 502 Bad Gateway instead of a raw 500
  5. Testing: Moved to the shared fastapi_client fixture from conftest.py and leaned heavily on @pytest.mark.parametrize to cover all periods and query param combinations without repetition. Also added a test_trending_comparison.sh integration script to verify 1:1 parity with the legacy endpoint

Testing

  1. Run the unit tests locally via Docker:
docker compose exec web pytest openlibrary/tests/fastapi/test_trending.py -v
  1. Optionally, run the parity script against both servers to confirm the responses match:
./tests/integration/temporary/test_trending_comparison.sh http://localhost:8080 http://localhost:18080

Stakeholders

@RayBB

@github-project-automation github-project-automation bot moved this to Waiting Review/Merge from Staff in Ray's Project Mar 3, 2026
@RayBB
Copy link
Copy Markdown
Collaborator

RayBB commented Mar 3, 2026

It should be the exact same url don't prepend internal.

@bhardwajparth51 bhardwajparth51 force-pushed the trending-fastapi-migration branch from f4d7874 to 988f6fe Compare March 3, 2026 20:38
@bhardwajparth51
Copy link
Copy Markdown
Contributor Author

Fixed - removed the /internal prefix from the router. URL is now /trending/{period}.json

@github-actions github-actions bot added the Needs: Response Issues which require feedback from lead label Mar 4, 2026
@bhardwajparth51 bhardwajparth51 force-pushed the trending-fastapi-migration branch 2 times, most recently from 8891257 to 89bf6b2 Compare March 4, 2026 16:19
Copy link
Copy Markdown
Collaborator

@RayBB RayBB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good start. Left some initial feedback.

@RayBB
Copy link
Copy Markdown
Collaborator

RayBB commented Mar 4, 2026

As a note to myself here's a list of endpoints to test when I test this in the future

Trending Endpoint Test URLs

Trending API (JSON)

Basic Period Tests

Limit Parameter Tests

Page Parameter Tests

Sort By Count Tests

Minimum Count Filter Tests

Custom Days Override Tests

Custom Hours Override Tests

Days and Hours Combined Tests

Fields Parameter Tests

Minimal Fields

Title and Author Fields

Cover Image Fields

Edition Fields

Publishing Information

Language Fields

eBook and Availability Fields

Series Information

Ratings Fields

External ID Fields

Comprehensive Field Set

Combined Parameter Tests

Edge Case Tests

Special Field Combinations

Testing Notes

Available Period Values

  • now - 0 days (uses hours if specified)
  • daily - 1 day
  • weekly - 7 days
  • monthly - 30 days
  • yearly - 365 days
  • forever - all time

Query Parameters

  • page - Page number for pagination (default: 1)
  • limit - Number of results per page (default: 100, max: 1000)
  • days - Override the default days for the period
  • hours - Number of hours for "now" period (default: 0)
  • sort_by_count - Sort by count (true/false, default: false)
  • minimum - Minimum count threshold (default: 0)
  • fields - Comma-separated list of Solr fields to return

Common Fields Available

  • key, title, subtitle
  • author_name, author_key
  • cover_i, cover_edition_key
  • edition_count, first_publish_year
  • language, has_fulltext, ebook_access
  • ratings_average, ratings_count, want_to_read_count
  • series_key, series_name, series_position
  • ia, ia_collection, public_scan_b
  • lending_edition_s, lending_identifier_s
  • id_project_gutenberg, id_librivox, id_standard_ebooks, id_openstax, id_cita_press, id_wikisource

Response Format (JSON)

{
  "query": "/trending/daily",
  "works": [...],
  "days": 1,
  "hours": 0
}

@bhardwajparth51 bhardwajparth51 force-pushed the trending-fastapi-migration branch 2 times, most recently from 8a68b89 to 8891257 Compare March 5, 2026 01:18
Copy link
Copy Markdown
Contributor Author

@bhardwajparth51 bhardwajparth51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @RayBB, thanks for the feedback! I've pushed an update addressing your points: I switched to the shared Pagination model, removed the deprecated days param, and cleaned up the logger, extra docstrings, and responses block. Let me know if there's anything else

@bhardwajparth51 bhardwajparth51 force-pushed the trending-fastapi-migration branch 2 times, most recently from b846e9a to 363ca83 Compare March 5, 2026 06:50
@bhardwajparth51
Copy link
Copy Markdown
Contributor Author

Two quick notes for your review/testing:

  1. Testing Notes Update: Per your suggestion ("If it has no effect lets remove it"), I entirely removed the days parameter. Just a heads up that the URLs in your testing notes that use it (like ?days=3) won't have an effect anymore — turns out it was never actually doing anything in the original web.py either, since SINCE_DAYS.get(period) always overrode it.

  2. Future Cleanup: I noticed the parse_comma_separated_list logic we added is identical to a list comprehension currently used in search.py. I left it local to api.py for now to keep this PR's scope tight, but it's a good candidate for a shared fastapi/utils.py in a follow-up.

@bhardwajparth51
Copy link
Copy Markdown
Contributor Author

Hey @RayBB, just checking in on this when you get a chance!

Copy link
Copy Markdown
Collaborator

@RayBB RayBB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start with this. In general it looks close.

There's a bug thought.

Steps to reproduce:

  1. Visit: http://localhost:8080/books/OL24152177M/Robinson_Crusoe
  2. Click want to read
  3. Visit http://localhost:18080/trending/now.json
  4. See error

Unfortunately, in this case, we should probably just have it keep running on the main thead. One day we'll be able to have an async postgres driver but we are not there yet.

- Switched trending_books_api to synchronous execution to preserve web.ctx
- Centralized parameters into TrendingRequestParams pydantic model
- Moved parse_fields_string to shared fastapi/models.py
- Expanded test suite to 45 cases covering exhaustive field combinations
- Enforced 1000 limit constraint
@bhardwajparth51 bhardwajparth51 force-pushed the trending-fastapi-migration branch from f7b21e7 to 3416df0 Compare March 14, 2026 06:11
…migration

# Conflicts:
#	openlibrary/fastapi/models.py
#	openlibrary/fastapi/search.py
@bhardwajparth51
Copy link
Copy Markdown
Contributor Author

Hey @RayBB, I've addressed all the requested changes:

  1. Created TrendingRequestParams(Pagination) to centralize hours, sort_by_count, minimum, and fields into a single request object.
  2. Moved parse_fields_string to models.py and removed the local static method from search.py so both APIs share the same implementation.
  3. Removed asyncio.to_threadget_trending_books now runs on the main thread as a standard def endpoint.

All 45 tests passing. Ready for your review!

@bhardwajparth51 bhardwajparth51 requested a review from RayBB March 14, 2026 06:31
Copy link
Copy Markdown
Collaborator

@RayBB RayBB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide a video of this working on
http://localhost:8080/trending/now.json
and
http://localhost:18080/trending/now.json

It doesn't work at all when I open it.
Be sure to add at least one book to a want to read list or rating first so there's data.

@RayBB RayBB removed the On Testing label Mar 15, 2026
@bhardwajparth51
Copy link
Copy Markdown
Contributor Author

Note on web.ctx.site: To get this working I had to bridge the ContextVar to web.ctx.site via web.ctx.site = site_ctx.get() since get_trending_books calls deep into legacy code (add_non_solr_fieldsget_solr_works) that still uses web.ctx.site. I noticed search.py line 306 has the same pattern. Since we're moving away from web.ctx, fully removing this dependency would require an architectural change across multiple files (loanstats.py, bookshelves.py, worksearch/code.py, schemes/works.py) — should that be tracked as a separate issue?

@bhardwajparth51
Copy link
Copy Markdown
Contributor Author

Here is the screen recording showing parity between legacy (8080) and FastAPI (18080) — added 3 books to Want to Read (Alice's Adventures in Wonderland, Gulliver's Travels, Robinson Crusoe) and both endpoints return identical results.
Screencast from 15-03-26 05:25:41 PM IST.webm

@bhardwajparth51 bhardwajparth51 requested a review from RayBB March 15, 2026 12:37
@bhardwajparth51
Copy link
Copy Markdown
Contributor Author

Hey @RayBB, just checking in — all 45 tests passing and the screen recording shows parity between legacy and FastAPI. Let me know if there's anything else to address!

Copy link
Copy Markdown
Collaborator

@RayBB RayBB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very close but two questions for you!

Comment on lines +64 to +65
True,
description="Sort results by total log count (most-logged first). Defaults to True, matching the legacy endpoint behaviour.",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Issue: The FastAPI version defaults sort_by_count to True, but the legacy endpoint defaults it to False. The comment even incorrectly claims "Defaults to True, matching the legacy endpoint behaviour."

Impact: Users who don't explicitly specify the sort_by_count parameter will get different sorting behavior between the legacy and FastAPI endpoints, breaking backward compatibility.

"""Fetch trending books for the given period."""
# ``period`` is always a key in SINCE_DAYS — guaranteed by the Literal type above.
since_days: int | None = SINCE_DAYS[period]
web.ctx.site = site_ctx.get()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me figure out why we need this here? In general we want to avoid using web.ctx.site in fastapi. However, it seems that without this we get a bug. Why is that? Is there any way we can avoid this? It would be very helfpul to understand this

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it seems this is in schemes/works.py line 651
things = cast(list[Work | Edition], web.ctx.site.get_many(keys))

We could change that to site.get now but it does seem to touch quite a bit of code so maybe we should do it in a separate PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly, traced it to schemes/works.py line 651 — web.ctx.site.get_many(keys) inside add_non_solr_fields. FastAPI never sets up web.ctx.site so without the bridge it just blows up. Makes sense to keep the refactor in a separate PR

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened an issue here #12178

Copy link
Copy Markdown
Collaborator

@RayBB RayBB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RayBB RayBB merged commit 88000ee into internetarchive:master Mar 24, 2026
2 of 3 checks passed
@github-project-automation github-project-automation bot moved this from Waiting Review/Merge from Staff to Done in Ray's Project Mar 24, 2026
@RayBB
Copy link
Copy Markdown
Collaborator

RayBB commented Mar 24, 2026

@bhardwajparth51 congrats on this one. It was a littler trickier than it looked at first. Please closely review the changes I added to your branch before merging so you can learn a bit more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs: Response Issues which require feedback from lead On Testing

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants