Autocomplete using PostgreSQL instead of Elasticsearch

Thursday, Dec 18, 2025
0 comments Python, PostgreSQL, Elasticsearch

Here on my blog I have a site search. Before you search, there's autocomplete. The autocomplete is solved by using downshift in React and on the backend, there's an API /api/v1/typeahead?q=bla. Up until today, that backend was powered by Elasticsearch. Now it's powered by PostgreSQL. Here's how I implemented it.

Indexing

A cron job loops over all titles in all blog posts and finds portions of the words in the titles as singles, doubles, and triples. For each one, the popularity of the blog post is accumulated to the extracted keywords and combos.

These are then inserted into a Django ORM model that looks like this:


class SearchTerm(models.Model):
    term = models.CharField(max_length=100, db_index=True)
    popularity = models.FloatField(default=0.0)
    add_date = models.DateTimeField(auto_now=True)
    index_version = models.IntegerField(default=0)

    class Meta:
        unique_together = ("term", "index_version")
        indexes = [
            GinIndex(
                name="plog_searchterm_term_gin_idx",
                fields=["term"],
                opclasses=["gin_trgm_ops"],
            ),
        ]

The index_version is used like this, in the indexing code:


current_index_version = (
    SearchTerm.objects.aggregate(Max("index_version"))["index_version__max"]
    or 0
)
index_version = current_index_version + 1

...

SearchTerm.objects.bulk_create(bulk)

SearchTerm.objects.filter(index_version__lt=index_version).delete()

That means that I don't have to delete previous entries until new ones have been created. So if something goes wrong during the indexing, it doesn't break the API.
Essentially, there are about 13k entries in that model. For a very brief moment there are 2x13k entries and then back to 13k entries when the whole task is done.

Search

The search is done with the LIKE operator.


peterbecom=# select term from plog_searchterm where term like 'za%';
            term
-----------------------------
 zahid
 zappa
 zappa biography
 zappa biography barry
 zappa biography barry miles
 zappa blog
(6 rows)

In Python, it's as simple as:


base_qs = SearchTerm.objects.all()
qs = base_qa.filter(term__startswith=term.lower())

But suppose someone searches for bio we want it to match things like frank zappa biography so what it actually does is:


from django.db.models import Q 

qs = base_qs.filter(
    Q(term__startswith=term.lower()) | Q(term__contains=f" {term.lower()}")
)

Typo tolerance

This is done with the % operator.


peterbecom=# select term from plog_searchterm where term % 'frenk';
  term
--------
 free
 frank
 freeze
 french
(4 rows)

In the Django ORM it looks like this:


base_qs = SearchTerm.objects.all()
qs = base_qs.filter(term__trigram_similar=term.lower())

And if that doesn't work, it gets even more desperate. It does this using the similarity() function. Looks like this in SQL:


peterbecom=# select term from plog_searchterm where similarity(term, 'zuppa') > 0.14;
       term
-------------------
 frank zappa
 zappa
 zappa biography
 radio frank zappa
 frank zappa blog
 zappa blog
 zurich
(7 rows)

Note on typo tolerance

Most of the time, the most basic query works and yields results. I.e. the .filter(term__startswith=term.lower()) query.
It's only if it yields fewer results than the pagination size. That's why the fault tolerance query is only-if-needed. This means, it might send 2 SQL select queries from Python to PostgreSQL. In Elasticsearch, you usually don't do this. You send multiple queries and boost the differently.

It can be done with PostgreSQL too using an UNION operator so that you send one but more complex query.

Speed

It's hard to measure the true performance of these things because they're so fast that it's more about the network speed.

On my fast MacBook Pro M4, I ran about 50 realistic queries and measured the time it took each with this new PostgreSQL-based solution versus the previous Elasticsearch solution. They both take about 4ms per query. I suspect, that 90% of that 4ms is serialization & transmission, and not much time inside the database itself.

The number of rows it searches is only, at the time of writing, 13,000+ so it's hard to get a feel for how much faster Elasticsearch would be than PostgreSQL. But with a GIN index in PostgreSQL, it would have to scale much much larger to feel too slow.

About Elasticsearch

Elasticsearch is better than PostgreSQL at full-text search, including n-grams. Elasticsearch is highly optimized for these kinds of things and has powerful ways that you can make a query be a product of how well it matched with each entry's popularity. With PostgreSQL that gets difficult.

But PostgreSQL is simple. It's solid and it doesn't take up nearly as much memory as Elasticsearch.

Comments

Previous:: Comparison of speed between gpt-5, gpt-5-mini, and gpt-5-nano December 15, 2025 Python

Related by category:: A Python dict that can report which keys you did not use June 12, 2025 Python; Using AI to rewrite blog post comments November 12, 2025 Python; Native connection pooling in Django 5 with PostgreSQL June 25, 2025 Python; Elasticsearch memory usage December 11, 2025 Elasticsearch

Related by keyword:: gg shell completion August 13, 2025 Linux, JavaScript, Bun, macOS; <datalist> looks great on mobile devices August 28, 2020 Mobile, Web development; How to throttle AND debounce an autocomplete input in React March 1, 2018 Web development, React, JavaScript; A darn good search filter function in JavaScript September 12, 2018 Web development, JavaScript