@@ -10,6 +10,13 @@ based regular expression engine for V.
1010- ** Bitmap Lookups** : ASCII character classes use a 128-bit bitset for $O(1)$ matching.
1111- ** Instruction Merging** : Consecutive character matches are merged
1212into string blocks for faster execution.
13+ - ** Bitmap lookups** : ASCII character classes use a 128-bit bitset for O(1) matching.
14+ - ** NFA Virtual Machine** : Executes bytecode instructions to simulate pattern matching.
15+ - ** Dynamic Stack Growth** : Automatically expands the backtracking stack to prevent false negatives.
16+ - ** Zero-Allocation Search** : Reuses a pre-allocated Machine workspace for search operations.
17+ - ** Anchored Optimization** : Patterns starting with '^' skip the scanning loop.
18+ - ** Prefix Skipping** : Uses Boyer-Moore-like skipping for literal prefixes.
19+
1320
1421## Supported Syntax
1522
@@ -144,7 +151,22 @@ if m := r.match_str('hello world', 0, 0) {
144151```
145152
146153## Performance Note
147- The engine automatically detects literal prefixes (e.g., in ` abc.* ` ) and uses
148- a fast-skip optimization to bypass the VM until the prefix is found in the
149- input string.
150- This makes it extremely fast for searching specific patterns in large files.
154+ Here is a clear summary of the optimizations implemented in the code:
155+
156+ * ** Raw Pointer Access:** The VM bypasses standard array bounds checking by using ` unsafe `
157+ pointer arithmetic for both the instruction set and the string text, significantly speeding up
158+ the hot loop.
159+ * ** Zero-Allocation Search:** The ` Machine ` struct pre-allocates the backtracking stack and
160+ capture arrays, ensuring that running a search (finding a match) creates no new heap allocations
161+ (garbage collection pressure is zero).
162+ * ** Fast ASCII Path:** The code checks if a byte is ` < 128 ` before decoding. If it is ASCII, it
163+ skips the expensive UTF-8 decoding logic entirely.
164+ * ** Bitmap Class Lookups:** Character classes (like ` \w ` , ` \d ` , ` [a-z] ` ) use a 128-bit bitset.
165+ Checking if an ASCII character matches a class is a single O(1) bitwise operation.
166+ * ** Instruction Merging:** The compiler groups consecutive literal characters into a single
167+ ` string ` instruction (e.g., ` a ` , ` b ` , ` c ` becomes ` "abc" ` ), reducing the number of VM cycles
168+ required.
169+ * ** Prefix Skipping:** If a pattern starts with a literal string, the engine scans ahead for
170+ that substring (Boyer-Moore style) before initializing the VM, avoiding useless execution.
171+ * ** Anchored Optimization:** If the pattern starts with ` ^ ` , the engine only attempts a match at
172+ the start of the string (or line), skipping the character-by-character scan of the rest of the text.
0 commit comments