Plugins

Built-in plugins

This is a list of built-in plugins that are considered stable.

See the Plugins section of the user guide for details on how built-in plugins are loaded.

reader.enclosure_dedupe

Deduplicate the enclosures of an entry by enclosure URL.

reader.entry_dedupe

Deduplicate the entries of a feed.

Sometimes, the id of some or all the entries in a feed changes (e.g. from example.com/123 to example.com/entry-title), causing each entry to appear twice. entry_dedupe fixes this by copying user attributes to the new entry and deleting the old one.

User attributes

Entry user attributes are set as follows:

read / important

If any of the entries is read/important, make the new entry read/important.

read_modified / important_modified

Use the oldest modified of the entries with the same status as the new read/important.

entry tags

Copy tags to the new entry; duplicate tags are named .reader.duplicate.N.of.TAG.

Existing duplicates

By default, the plugin runs only for new entries; to have it run for all the entries of a feed on the next update, add the .reader.dedupe.once tag to the feed.

To avoid false positives, the heuristics used to detect duplicate entries are fairly conservative. However, this can cause some duplicates to be missed (e.g. if the content changes significantly, or is too short); as an escape hatch for such cases, it is possible to ignore entry content once by adding one of the following feed tags:

Warning

This mechanism makes the plugin ignore entry content entirely, significantly increasing the chance of false positives (i.e. deleting entries that shouldn’t be deleted). Use it only if .once doesn’t work, and you are sure there are no non-duplicate entries with the same title / link.

.reader.dedupe.once.title

Use only the title for comparisons.

.reader.dedupe.once.link

Use only the link for comparisons.

Changelog

Added in version 3.20.

.reader.dedupe.once.title.prefix

Use only the title for comparisons, removing common prefixes.

Changelog

Added in version 3.20.

How duplicates are discovered

At a high level, duplicates are entries with the same title / link / published timestamp and the same summary / content.

When matching entries:

  • Remove common title prefixes of new entries.

  • Use case-insensitive comparison, and ignore whitespace and punctuation.

  • Ignore HTML tags (with the exception of a few text attributes like alt and title).

  • Use approximate content matching.

  • For entries with content of different lengths, trim the longer one to the length of the shorter one (useful when one entry has only the first paragraph, but the other the whole article).

To reduce false positives:

  • Titles / links / published timestamps must match exactly.

  • If there are too many entries with the same title/…, ignore them.

  • The entries must have both title/… and content.

  • Content must be at least ~48 words long.

  • Similarity thresholds are set relatively high, and higher for shorter content.

Changelog

Changed in version 3.20: Use more heuristics to find potential duplicates (in addition to title matching): match link, match published timestamp, strip common title prefixes for new entries.

Changed in version 3.20: When comparing entries, include the alt and title HTML attributes, strip accents, and treat dates and versions as single tokens.

Changed in version 3.20: Increase required minimum content length from 32 to 48 words.

Changed in version 2.3: Delete old duplicates instead of marking them as read / unimportant.

Changed in version 2.2: Reduce false negatives by using approximate content matching.

Changed in version 2.2: Make it possible to re-run the plugin for existing entries.

reader.mark_as_read

Mark added entries of specific feeds as read + unimportant if their title matches a regex.

To configure, set the make_reader_reserved_name('mark-as-read') (by default, .reader.mark-as-read) tag to something like:

{
    "title": ["first-regex", "second-regex"]
}

By default, this plugin runs only for newly-added entries. To run it for the existing entries of a feed, add the .reader.mark-as-read.once tag to the feed; the plugin will run on the next feed update, and remove the tag afterwards.

Changelog

Changed in version 3.13: Make it possible to re-run the plugin for existing entries.

Changed in version 3.5: Don’t set read_modified and important_modified anymore; because important is now optional, important = False is enough to mark an entry as unimportant. Old unimportant entries will be migrated automatically.

Changed in version 2.7: Use the .reader.mark-as-read metadata for configuration. Feeds using the old metadata, .reader.mark_as_read, will be migrated automatically on update until reader 3.0.

Changed in version 2.4: Explicitly mark matching entries as unimportant.

reader.readtime

Calculate the read time for new/updated entries, and store it as the .reader.readtime entry tag, with the format:

{'seconds': 1234}

The content used is that returned by get_content().

The read time for existing entries is backfilled as follows:

  • On the first update_feeds() / update_feeds_iter() call:

    • all feeds with updates_disabled false are scheduled to be backfilled

      • the feeds selected to be updated are backfilled then

      • the feeds not selected to be updated will be backfilled the next time they are updated

    • all feeds with updates_disabled true are backfilled, regardless of which feeds are selected to be updated

  • To prevent any feeds from being backfilled, set the .reader.readtime global tag to {'backfill': 'done'}.

  • To schedule a feed to be backfilled on its next update, set the .reader.readtime feed tag to {'backfill': 'pending'}.

Changelog

Changed in version 3.1: Do not require additional dependencies. Deprecate the readtime extra.

Added in version 2.12.

reader.ua_fallback

Retry feed requests that get 403 Forbidden with a different user agent.

Sometimes, servers blocks requests coming from reader based on the user agent. This plugin retries the request with feedparser’s user agent, which seems to be more widely accepted.

Servers/CDNs known to not accept the reader UA: Cloudflare, WP Engine.

Experimental plugins

reader also ships with a number of experimental plugins.

For these, the full entry point must be specified.

To use them from within Python code, use the entry point as a custom plugin:

>>> from reader._plugins import sqlite_releases
>>> reader = make_reader("db.sqlite", plugins=[sqlite_releases.init])

cli_status

Capture the output of a CLI command and add it as an entry to a special feed.

The feed URL is reader:status; if it does not exist, it is created.

The entry id is the command, without options or arguments:

('reader:status', 'command: update')
('reader:status', 'command: search update')

Entries contain the output of all the runs in the past 24 hours. Entries are marked as read.

To load:

READER_CLI_PLUGIN='reader._plugins.cli_status.init_cli' \
python -m reader ...

preview_feed_list

If the feed to be previewed is not actually a feed, show a list of feeds linked from that URL (if any).

This plugin needs additional dependencies, use the unstable-plugins extra to install them:

pip install reader[unstable-plugins]

To load:

READER_APP_PLUGIN='reader._plugins.preview_feed_list:init' \
python -m reader serve

Implemented for https://github.com/lemon24/reader/issues/150.

enclosure_tags

Fix tags for MP3 enclosures (e.g. podcasts).

Adds a “with tags” link to a version of the file with tags set as follows:

  • the entry title as title

  • the feed (user) title as album and artist

  • Podcast as genre, if the feed has any tag containing “podcast”

This plugin needs additional dependencies, use the unstable-plugins extra to install them:

pip install reader[unstable-plugins]

To load:

READER_APP_PLUGIN='reader._plugins.enclosure_tags:init' \
python -m reader serve

Implemented for #50. Became a plugin in #52. Streaming added in #344.

sqlite_releases

Create a feed out of the SQLite release history pages at:

Also serves as an example of how to write custom parsers.

This plugin needs additional dependencies, use the unstable-plugins extra to install them:

pip install reader[unstable-plugins]

To load:

READER_PLUGIN='reader._plugins.sqlite_releases:init' \
python -m reader ...

timer

Measure Reader, Storage, and search method calls, including time spent in iterables.

If loaded, the Web application will show per-request method statistics in the footer.

Once reader.timer.enable() is called, the timing of each method call is collected in reader.timer.calls; disable() clears the list of calls and stops collection:

>>> reader = make_reader('db.sqlite', plugins=[
...     'reader._plugins.timer:init_reader'
... ])
>>> reader.timer.enable()
>>> for _ in reader.get_entries(limit=500): pass
>>> for call in reader.timer.calls:
...     print(f"{call.name:30} {call.time:9.6f}")
...
Reader.get_entries              0.304127
Storage.get_entries             0.290139
Storage.get_entries_page        0.159803
Storage.get_db                  0.000008
Storage.get_entries_page        0.128641
Storage.get_db                  0.000826
>>> print(reader.timer.format_stats())
                            len    sum    min    avg    max
Reader.get_entries            1  0.304  0.304  0.304  0.304
Storage.get_db                2  0.001  0.000  0.000  0.001
Storage.get_entries           1  0.290  0.290  0.290  0.290
Storage.get_entries_page      2  0.288  0.129  0.144  0.160

This plugin needs additional dependencies, use the unstable-plugins extra to install them:

pip install reader[unstable-plugins]

share

Add social sharing links at the end of the entry page.

To load:

READER_APP_PLUGIN='reader._plugins.share:init' \
python -m reader serve

Discontinued plugins

Following are experimental plugins that are not very useful anymore.

twitter

Prior to version 3.7, reader had a Twitter plugin; it was removed because it’s not possible to get tweets using the free API tier anymore.

However, the plugin used the internal Parser API in new and interesting ways – it mapped the multiple tweets in a thread to a single entry, and stored old tweets alongside the rendered HTML content to avoid retrieving them again when updating the thread/entry.

You can still find the code on GitHub: twitter.py.

tumblr_gdpr

Prior to version 3.7, reader had a plugin to accept Tumblr GDPR terms (between 2018 and 2020, Tumblr would redirect all new sessions to an “accept the terms of service” page, including machine-readable RSS feeds).

This plugin is a good example of how to set cookies on the Requests session used to retrieve feeds.

You can still find the code on GitHub: tumblr_gdpr.py.

Loading plugins from the CLI and the web application

There is experimental support of plugins in the CLI and the web application.

Warning

The plugin system/hooks are not stable yet and may change without any notice.

To load plugins, set the READER_PLUGIN environment variable to the plugin entry point (e.g. package.module:entry_point); multiple entry points should be separated by one space:

READER_PLUGIN='first.plugin:entry_point second_plugin:main' \
python -m reader some-command

For built-in plugins, it is enough to use the plugin name (reader.XYZ).

Note

make_reader() ignores the plugin environment variables.

To load web application plugins, set the READER_APP_PLUGIN environment variable. To load CLI plugins (that customize the CLI), set the READER_CLI_PLUGIN environment variable.

Recipes

I currently don’t need this functionality, but if you’d be interested in maintaining any of these as an experimental or even built-in plugin, please submit a pull request.

Feed slugs

This is a recipe of what a “get feed by slug” plugin may look like (e.g. for user-defined short URLs).

Usage:

>>> from reader import make_reader
>>> import feed_slugs
>>> reader = make_reader('db.sqlite', plugins=[feed_slugs.init_reader])
>>> reader.set_feed_slug('https://death.andgravity.com/_feed/index.xml', 'andgravity')
>>> reader.get_feed_by_slug('andgravity')
Feed(url='https://death.andgravity.com/_feed/index.xml', ...)
>>> reader.get_feed_slug(_.url)
'andgravity'
def init_reader(reader):
    # __get__() allows help(reader.get_feed_by_slug) to work
    reader.get_feed_by_slug = get_feed_by_slug.__get__(reader)
    reader.get_feed_slug = get_feed_slug.__get__(reader)
    reader.set_feed_slug = set_feed_slug.__get__(reader)

def get_feed_by_slug(reader, slug):
    tag = _make_tag(reader, slug)
    return next(reader.get_feeds(tags=[tag], limit=1), None)

def get_feed_slug(reader, feed):
    if tag := next(_get_tags(reader, feed), None):
        return tag.removeprefix(_make_tag(reader, ''))
    return None

def set_feed_slug(reader, feed, slug: str | None):
    feed = reader.get_feed(feed)
    tag = _make_tag(reader, slug)

    if not slug:
        reader.delete_tag(feed, tag, missing_ok=True)
        return

    reader.set_tag(feed, tag)

    # ensure only one feed has the slug; technically a race condition,
    # when it happens no feed will have the tag
    for other_feed in reader.get_feeds(tags=[tag]):
        if feed.url != other_feed.url:
            reader.delete_tag(other_feed, tag, missing_ok=True)

    # ensure feed has only one slug; technically a race condition,
    # when it happens the feed will have no slug
    for other_tag in _get_tags(reader, feed):
        if tag != other_tag:
            reader.delete_tag(feed, other_tag, missing_ok=True)

def _make_tag(reader, slug):
    return reader.make_plugin_reserved_name('slug', slug)

def _get_tags(reader, resource):
    prefix = _make_tag(reader, '')
    # filter tags by prefix would make this faster,
    # https://github.com/lemon24/reader/issues/309
    return (t for t in reader.get_tag_keys(resource) if t.startswith(prefix))