How we implemented verifier-friendly regex engine in eBPF LSM

We are implementing a Sigma rules engine, so we need to support regex matching. I checked how other projects like Tetragon and KubeArmor do it. Turns out no open source project had done it. So we built it ourselves. Thus, owLSM is the first open source project to implement a verifier-friendly regex engine in implemented eBPF.

The problem: We needed regex matching for our rules, but it had to be efficient and verifier-friendly. Its runtime had to be O(n). However, modern regex engines are too complex for eBPF and can reach O( 2ⁿ ) due to backtracking.

First approach — kfuncs

I read on how he solved the issue. He moved the regex logic into a kernel module that exposes an API eBPF programs call via kfuncs. That way the heavy work bypasses the verifier because the real regex logic lives in the kernel module.

I treated that as a last resort. I did not want to introduce a kernel module into owLSM, because it creates many new issues: crashes, Secure Boot, maintenance, etc.

Second approach: regex → AST → NFA → DFA

As specified before, modern regex engines are too complex. With backtracking, evaluation can reach O( 2ⁿ ). That is not acceptable for our use case or for the verifier.
I looked for other implementations of regex engines and found that older engines like the one used in grep avoid backtracking, so they can be implemented in O(n).
I dug into the subject and found Russ Cox’s write-up:

This fit well because the project already uses DFAs for substring matching, as I described in a different post:
Runtime DFA evaluation is almost the same for substring matching and for regex. I only needed to convert the regex pattern to a DFA.
So first I converted the regex pattern to an AST.
Then the AST is converted to an NFA.
Then the NFA is converted to a DFA.
Then the DFA is flattened to a DFA table plus accepting states.
See the regex-to-DFA code:
That regex compilation runs in userspace.

At runtime we evaluate the regex in O(n) like this:

# Pseudo code
for (i = 0; i < haystack_length && i < PATH_MAX; i++) {
    state = dfa->value[(state * 256) + haystack[i]];
    if (is_accepting(state)) return TRUE;
}

Single bounded loop, one map lookup per character, O(n) time. The verifier stays happy.

Regex features we support

Feature	Syntax	Example
Literals	normal characters	`"abc"`
Character ranges	`[a-z]`, `[0-9A-F]`	`"[a-z]+"`
Negated ranges	`[^a-z]`, `[^/]`	`"[^/]+"`
Character classes	`d`, `w`, `s`	`"\d+"`
Negated classes	`D`, `W`, `S`	`"\D+"`
Dot (printable ASCII)	`.`	`"a.c"`
Alternation	`a	b`
Grouping	`(...)`	`"(ab)+"`
Repetition	`*`, `+`, `?`	`"ab*c"`, `"a+"`
Bounded repetition	`{n}`, `{n,m}`, `{n,}`	`"a{2,4}"`
Non-greedy	`*?`, `+?`, `??`	`"a+?"`
Case insensitive	`(?i)`	`"(?i)hello"`
Escape sequences	`n`, `t`, `\`, `.`	`"a\.b"`

Regex state limit: Regex patterns are compiled to deterministic finite automata (DFAs) for efficient O(n) matching in the kernel. Each DFA is limited to 32 states. Complex patterns with many branches or long literals may exceed this limit—if so, simplify the pattern. In practice, you are unlikely to hit this limit. This limit can easily be increased to any other number, but because our infra already used DFA's with 32 state limits, we just kept it.

I got tired of iptables crashing my server during HTTP floods, so I built an eBPF/XDP firewall in Rust with zero CPU overhead 🦀

u/xmull1gan

I got tired of iptables crashing my server during HTTP floods, so I built an eBPF/XDP firewall in Rust with zero CPU overhead 🦀

r/Hosting •

I got tired of iptables crashing my server during HTTP floods, so I built an eBPF/XDP firewall in Rust with zero CPU overhead 🦀

Hey everyone!

Whenever my small VPS was hit by L7 HTTP botnets or simple DDoS attacks, traditional tools like Fail2ban + iptables would actually make things worse. The sheer overhead of the Linux kernel allocating sk_buff memory for 100,000 packets per second created an Interrupt Storm that crashed my databases and locked me out of SSH.

So, I spent some time building CrabShield — a hybrid firewall written entirely in Rust.

How it works: It uses an asynchronous Tokio daemon in user-space to instantly analyze Nginx/Traefik logs (detecting 404 floods, brute-forcers, scrappers). But instead of adding iptables rules, it dynamically updates an eBPF BPF-map. The actual penalty (XDP_DROP) happens natively at the Network Interface Card (NIC) driver level.

The result? The malicious packets are dropped before the heavy Linux TCP/IP stack even knows they exist. The CPU stays under 5%, and Nginx never wakes up.

I just open-sourced it, put together proper documentation on it, and added cross-compilation support so you can just drop a static binary on your Linux box (x86_64 or ARM) and be protected.

Check out the repo and the architecture here:

Would love to hear your feedback, issues, or code-review if anyone is into eBPF!

upvotes

comments

r/eBPF

Community highlights

r/eBPF Lounge

First approach — kfuncs

Second approach: regex → AST → NFA → DFA

Regex features we support