Skip to content

Switch books page lists to solr from db#11187

Merged
cdrini merged 6 commits intointernetarchive:masterfrom
benbdeitch:Switch-Books-Page-Lists-to-Solr-from-DB
Dec 11, 2025
Merged

Switch books page lists to solr from db#11187
cdrini merged 6 commits intointernetarchive:masterfrom
benbdeitch:Switch-Books-Page-Lists-to-Solr-from-DB

Conversation

@benbdeitch
Copy link
Copy Markdown
Collaborator

Closes #11020

This PR enables Edition pages to fetch the lists that a given edition is from the Solr database in a single query, rather than having to perform multiple queries on the infogami database. This should improve the loading times. It also allows us to only display non-trivial lists on these pages (i.e: lists that contain more than a single entry.)

Technical

Testing

Step 1: Create two new lists on your local environment. One with two books, and one with only one book.
Step 2: Navigate to the page of the book they share in common, and examine the Lists section.
Step 3: If only one list is shown on that page, then the code is behaving properly.

Screenshot

image

Stakeholders

@cdrini

@github-actions github-actions bot added the Priority: 2 Important, as time permits. [managed] label Aug 24, 2025
@mekarpeles mekarpeles added this to the Sprint 2025-08 milestone Sep 5, 2025
@mekarpeles mekarpeles added Priority: 1 Do this week, receiving emails, time sensitive, . [managed] and removed Priority: 2 Important, as time permits. [managed] labels Sep 8, 2025
Copy link
Copy Markdown
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @benbdeitch ! A few comments, but one core blocker. One of the goals of the issue is to fix "We have to query twice against the DB and show 2 rows base on lists including the Work v. Edition". Looking at the code, I think we might need to rename/tweak _get_lists_solr_uncached and friends, since this requires take advantage of a new type of solr query to search for both work and edition keys. Eg something like:

@public
def get_book_lists(work_key: str, edition_key: str) -> list[List]:
    q = f'seed:("{work_key}' OR "{edition_key}"')
    ... the query you have now in `_get_lists_solr_uncached`

That would let us achieve the stated goal, and change the partials.py code to use this method, and only call render_template("lists/widget", ...) once with the results!

@cdrini cdrini added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Sep 16, 2025
@mekarpeles mekarpeles added Priority: 2 Important, as time permits. [managed] and removed Priority: 1 Do this week, receiving emails, time sensitive, . [managed] labels Sep 22, 2025
@github-actions github-actions bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Oct 9, 2025
$ seed_info = get_seed_info(page)
$ user_lists = [] if async_load or not ctx.user else get_user_lists(seed_info)
$ page_lists = get_page_lists(page, seed_info)
$ page_lists = get_book_lists(page.key.split('/')[-1], None) if work else get_book_lists(None, page.key.split('/')[-1])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I think this is the issue! We don't need the split the keys, the entire key is stored in solr, eg /works/OL3261646W

$ seed_info = get_seed_info(page)
$ user_lists = [] if async_load or not ctx.user else get_user_lists(seed_info)
$ page_lists = get_page_lists(page, seed_info)
$ page_lists = get_book_lists(page.key.split('/')[-1], None) if work else get_book_lists(None, page.key.split('/')[-1])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and we'll want to forward along both, eg get_book_lists(work and work.key, edition and edition.key), and update the method to handle None. That'll let it do the magic of combining both!

from openlibrary.plugins.worksearch.code import run_solr_query
from openlibrary.plugins.worksearch.schemes.lists import ListSearchScheme

filter_query = "seed_count:[2 TO *] OR (NOT seed_count:*)"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the solr reindex finished, this is no longer necessary!

Suggested change
filter_query = "seed_count:[2 TO *] OR (NOT seed_count:*)"
filter_query = "seed_count:[2 TO *]"

@benbdeitch benbdeitch force-pushed the Switch-Books-Page-Lists-to-Solr-from-DB branch from 1f466fa to 7f5593f Compare October 20, 2025 17:52
@cdrini cdrini force-pushed the Switch-Books-Page-Lists-to-Solr-from-DB branch from 5c5c8ff to 56f7292 Compare December 11, 2025 17:05
Copy link
Copy Markdown
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Did some refactoring of the lists widget html since it was kind of tricky to work through. The lists look much better now with the seed_count filter, too!

@cdrini cdrini merged commit 0c475be into internetarchive:master Dec 11, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Priority: 2 Important, as time permits. [managed]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch Books Page "Lists" to use Solr instead of DB

3 participants