-
-
Notifications
You must be signed in to change notification settings - Fork 600
[5.x] Performance Optimizations for Stache and Query Operations #12894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5.x] Performance Optimizations for Stache and Query Operations #12894
Conversation
- Load items once per store instead of once per index (45% faster warming) - Eliminate double parsing by caching items during paths() resolution - Replace Filesystem::allFiles() with RecursiveDirectoryIterator for better memory efficiency - Add parallel store processing with Laravel Concurrency facade - Fix AggregateStore key format issues and hidden file filtering - Add early returns and filtering optimizations Combined improvements: - 45% faster warming (328s → 181s on 15,996 files) - 83% fewer file operations - Better memory efficiency for large directories - 40-60% faster warming on multi-core systems - Backwards compatible, no breaking changes Signed-off-by: Beau Hastings <beau@saweet.net>
- Replace in_array() with isset(array_flip()) for O(1) hash lookups in filterWhereIn/filterWhereNotIn - Add early returns to Repository methods for empty inputs and null checks - Optimize array filtering operations with direct loops and early exits - Optimize BasicDictionary::matchesSearchQuery() with isset() lookup (1.29-1.63x faster) Key improvements: - O(1) hash table lookups instead of O(n) linear searches - Avoids unnecessary database queries for empty inputs - Reduces memory allocation in array operations - Better scalability for large datasets Most beneficial for: - Bulk actions in Control Panel (publish/delete 50+ entries) - ID-based queries and search result processing - Dictionary searches with many searchable fields - Large dataset filtering operations
|
Test script https://gist.github.com/hastinbe/d6ec020c019131c7490623ea217d6755 Edit: accidentally cancelled checks below can someone re-run? |
Done |
Signed-off-by: Beau Hastings <beau@saweet.net>
1) Tests\Stache\Repositories\EntryRepositoryTest::it_gets_entries_by_ids with data set "missing" (['numeric-one', 'unknown', 'numeric-three'], ['One', 'Three'])
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
Array (
0 => 'One'
- 1 => 'Three'
+ 2 => 'Three'
)
Signed-off-by: Beau Hastings <beau@saweet.net>
|
First of all this is exactly what I've been wanting to work on but haven't had time, so thank you for getting a functional PR submitted! I am going to leave a few comments with things I noted/implemented from my testing - feel free to disregard them. Also, did you do any testing in regards to using a generator in the Traverser and then passing that all the way down? https://www.php.net/manual/en/language.generators.overview.php Thanks again for submitting this. |
|
love this, thanks for your work |
Avoids using iterator_to_array() to save memory and increase iteration speed. Co-authored-by: Daniel Weaver <godismyjudge95@users.noreply.github.com> Signed-off-by: Beau Hastings <beau@saweet.net>
|
Amazing 🤗 |
Revision
Even without the Single Item Load improvement, we're still on par with about the same performance
|
This fixes two bugs introduced by performance optimizations: 1. Parent-child relationships: Reverted Store::warm() optimization that loaded all items upfront, which broke parent relationships because items were loaded before the structure tree was built. Restored original behavior where each index loads items individually when parent context is available. 2. Entry ordering: Reverted Traverser to use Finder directly instead of either RecursiveDirectoryIterator or Filesystem::allFiles() to maintain original file traversal order, which affects entry display order in the UI. 3. Cleanup: Removed unused updateFromItems() method from Index class.
cd64746 to
386f51a
Compare
Replace nested arrow functions with traditional closures using explicit use clauses to fix ArgumentCountError when using the 'process' concurrency driver.
Following the removal of ResolveValue usage in 386f51a, the getItemValue method in the Index base class is no longer needed.
|
Thanks for all this. Are you still working on it? |
Finished if there are no further suggestions from anyone |
I've implemented performance optimizations for Statamic's Stache and query operations. Our ongoing proprietary CMS migration, combined with the team's need to load our full dataset, created a severe bottleneck. The Stache warm-up time was reaching 6 minutes, a process we were executing frequently during daily development. These changes focus on reducing redundant operations and improving algorithmic efficiency, resulting in a ~6x faster warm time and a ~50% memory reduction across 16k files.
Issues Addressed
Stache Warming
Query Operations
in_array()calls infilterWhereIn()andfilterWhereNotIn()were doing linear searchesChanges Made
Stache Optimizations
paths()resolution to avoid double parsingRecursiveDirectoryIteratorfor more efficient file traversalwarmingconfiguration options inconfig/stache.phpQuery/Repository Optimizations
in_array()withisset(array_flip())infilterWhereIn()andfilterWhereNotIn()BasicDictionary::matchesSearchQuery()with hash lookupsPerformance Results
I ran benchmarks on a production dataset with 15,996 files to measure the impact:
Baseline:
328.45s829.77MB40.02sAfter Optimizations:
53.07s(~6x faster)416.92MB(~50% reduction)517.46ms(~77x faster)Query Operations:
filterWhereIn()with 1000 items:~78xfaster with hash lookupsBasicDictionary::matchesSearchQuery():1.29-1.63xfaster depending on dataset sizeThe parallel processing improvement mainly applies to CLI operations using the 'fork' driver, as web requests can't fork processes.
Detailed Benchmark Breakdown
Optimization Progression
Total Improvement:
286.40sreduction in warm timeConfiguration Options
New
warmingconfiguration section added toconfig/stache.php:These options allow fine-tuning of the parallel processing behavior based on your server environment and requirements.
Note: The
forkdriver is not available on Windows systems. On Windows, the system will automatically fall back to theprocessdriver or sequential processing.Implementation Notes