Welcome to the Power Users community on Codidact!
Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.
Search tool for PDF content with verbatim text including special characters
I'm looking for a free (and if possible opensource) tool to search through the content of PDFs.
Requirements:
- search full text of all PDFs in one folder (the PDFs are "plain text", not scans)
- show search result in the context of the lines around it
- allow special characters in search like
\begin{frame}<1->or\defbeamertemplate*without having to escape them. I'm looking for exact matches and don't need fuzzy search etc. - works on macOS15
So far I've tried
DocFetcher
✅ search full text of all PDFs in one folder. Search index needs to be manually updated
✅ show search result in the context of the lines around it. Shows the full context including line breaks.
❌ allow special characters in search like \begin{frame}<1-> or \defbeamertemplate* without having to escape them
- searching for
\begin{frame}<1->will cause an error, searching for"\begin{frame}<1->"will find false results likebegin{frame} $1 - searching for
\defbeamertemplate*will give false results like\defbeamertemplate{block}
✅ works on macOS15
Recoll
✅ search full text of all PDFs in one folder. Update of search index can be automated, e.g. with a cron job
✅ show search result in the context of the lines around it. Shown unformatted context, line breaks are missing
❌ allow special characters in search like \begin{frame}<1-> or \defbeamertemplate* without having to escape them
- searching for
\begin{frame}<1->will find false results like\begin{frame} 1 - searching for
\defbeamertemplate*will give false results like\defbeamertemplate{block}
✅ works on macOS15
rga
✅ search full text of all PDFs in one folder.
❌ show search result in the context of the lines around it. Only shows one line
❌ allow special characters in search like \begin{frame}<1-> or \defbeamertemplate* without having to escape them
- searching for
rga \begin{frame}<1-> /path/to/my/folderdoes not give any matches, don't know how I would need to escape this... - searching for
rga defbeamertemplate\* /path/to/my/folderwill give false results like\defbeamertemplate{block}
✅ works on macOS15
2 answers
The following users marked this post as Works for me:
| User | Comment | Date |
|---|---|---|
| samcarter | (no comment) | Oct 15, 2025 at 23:26 |
The big sticking point (you obviously know that, since you kept running into the same issue of needing to escape certain characters) is that too many of these systems want to have their own regular expression (or similar) engine. Poking around for something where that could be turned off, I found the uninspiringly but usefully named pdfgrep.
At first, it doesn't look like it'll work, but digging through the full man page, we find three or four options relevant to the question.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings separated by newlines, any of
which is to be matched.
-A NUM, --after-context=NUM
Print NUM lines of context after matching lines. Contiguous groups of matches
are separated by a line containing --. With -o, this option has no effect.
-B NUM, --before-context=NUM
Print NUM lines of context before matching lines. Contiguous groups of
matches are separated by a line containing --. With -o, this option has no
effect.
-C NUM, --context=NUM
Print NUM lines of context before and after matching lines. Contiguous groups
of matches are separated by a line containing --. With -o, this option has no
effect.
Then, at least in the BASH shell, single-quote the search string, and that should do it.
pdfgrep --fixed-strings --context=2 '\begin{frame}<1->' target-file.pdf
Plus or minus the actual desired context. Well, except that you'll still need to escape any apostrophes in the search-string, sadly, since that would break the quoting. But at least that's more predictable and has the recognizable failure mode of not running the command when you hit Enter or Return.
However, I see one bigger caveat in playing around: The target-string needs to appear on a single line in the document. I happen to have a PDF file handy that I generated from Markdown files, and it breaks words across lines when necessary; pdfgrep can't find those split words, because they technically sit in separate boxes, the way that most PDF output routines work.
1 comment thread
@JohnC's great answer made me realise that rga can actually use the same options (it passes them on to ripgrep):
rga --fixed-strings --context=2 '\begin{frame}<' /path/to/my/folder
which makes it
- show as many additional lines as I would like
- uses a fixed search expression and thus eliminating almost all problems with special characters (see @JohnC's answer for the exceptions)

1 comment thread