libtest: use binary search for --exact test filtering#154865
libtest: use binary search for --exact test filtering#154865sunshowers wants to merge 1 commit intorust-lang:mainfrom
Conversation
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
The test array is sorted by name at compile time. When `--exact` is passed in, use binary search for O(f log n) lookups instead of an O(n) linear scan, under the assumption that f << n (which is true for the most relevant cases). This is important for Miri, where the interpreted execution makes the linear scan very expensive. I measured this against a repo with 1000 empty tests, running `cargo +stage1 miri nextest run test_00` (100 tests) under hyperfine: * Before (linear scan): 49.7s ± 0.6s * After (binary search): 41.9s ± 0.2s (-15.7%) I also tried a few other variations (particularly swapping matching tests to the front of the list + truncating the list), but the index + swap_remove approach proved to be the fastest. Questions: - [ ] To be conservative, I've assumed that test_main can potentially receive an unsorted list of tests. Is this assumption correct?
1e41231 to
bbb9e3b
Compare
| // invalidate indexes we haven't visited yet. | ||
| let mut result = Vec::with_capacity(indexes.len()); | ||
| for &idx in indexes.iter().rev() { | ||
| result.push(tests.swap_remove(idx)); |
There was a problem hiding this comment.
I have no feedback here, but I just wanted to say this is the first time I've seen swap_remove being used 👀
And it's really cool how you avoid an allocation/making a copy of the TestDescAndFn (not that you could since I don't see TestDescAndFn implement the Clone trait) through this method since the index order is sorted and deduplicated, so moving the TestDescAndFn items in backward index order would always give you your filtered TestDescAndFn in reverse order. You don't have to worry about the wrong item being taken since it gets replaced by the last item in the Vec, which is not something you are taking anyways.
I'll keep a note of this in mind if I need to do something like this in the future.
| let owned_tests: Vec<_> = tests.iter().map(make_owned_test).collect(); | ||
| test_main(&args, owned_tests, None) | ||
| // Tests are sorted by name at compile time by mk_tests_slice. | ||
| let tests = TestList::new(owned_tests, TestListOrder::Sorted); |
There was a problem hiding this comment.
I think we can keep the test_main(&args, owned_tests, None) invocation here, and just construct the TestList inside test_main(), where test_main() would call test_main_inner(). This would reduce code duplication a bit since both test_main_static() and test_main_static_abort() seem to be the only functions that invoke test_main() and they both have a TestListOrder::Sorted. I might be incorrect though since I notice that test_main() is calling on test_main_with_exit_callback(), which denotes the test list order as unsorted, but I'm not sure why it needs to be marked as unsorted for test_main() when the test_main_static_* functions are seemingly only one using it.
Ideally, I would have prefer test_main() to keep with calling on test_main_with_exit_callback() and not need to introduce a test_main_inner() through, making it take a TestList instead of Vec<TestDescAndFn>. However, I can see that librustdoc/doctest.rs does utilize test_main_with_exit_callback(), though I don't see why a TestList object can not be constructed there.
There was a problem hiding this comment.
Yeah, I definitely thought of that, as well as the related approach of making TestList be a newtype wrapper which always ensures sortedness. Happy to hear opinions before I go about making these bigger changes.
At the moment, I believe that within this repo, merged doctests pass in a potentially-unsorted list. But I wasn't sure how much these methods are callable outside the repo.
There was a problem hiding this comment.
At the moment, I believe that within this repo, merged doctests pass in a potentially-unsorted list. But I wasn't sure how much these methods are callable outside the repo.
Looking at librustdoc/doctest.rs, which is the only file (as far as my Go to references are telling me and a quick text search on codebase) outside this file that calls test_main_with_exit_callback, it looks like it sorts the test list as well?
// We need to call `test_main` even if there is no doctest to run to get the output
// `running 0 tests...`.
if ran_edition_tests == 0 || !standalone_tests.is_empty() {
standalone_tests.sort_by(|a, b| a.desc.name.as_slice().cmp(b.desc.name.as_slice()));
test::test_main_with_exit_callback(&test_args, standalone_tests, None, || {
let times = times.times_in_secs();
// We ensure temp dir destructor is called.
std::mem::drop(temp_dir.take());
if let Some((total_time, compilation_time)) = times {
test::print_merged_doctests_times(&test_args, total_time, compilation_time);
}
});
}But also since these are public methods accessible in the test crate, I'm not sure if people would need to use these methods and expect the passed in test array to handle both sorted/unsorted (which I think making TestList be a newtype wrapper that ensures sortedness is a good idea). The test module docs:
//! Support code for rustc's built in unit-test and micro-benchmarking
//! framework.
//!
//! Almost all user code will only be interested in `Bencher` and
//! `black_box`. All other interactions (such as writing tests and
//! benchmarks themselves) should be done via the `#[test]` and
//! `#[bench]` attributes.
//!
//! See the [Testing Chapter](../book/ch11-00-testing.html) of the book for more
//! details.
feels like it implies that people won't really use the test_main_* methods. Are these methods used in nextest?
There was a problem hiding this comment.
Never mind, I do see it in librust/doctest/runner.rs here
There was a problem hiding this comment.
Are these methods used in nextest?
No, nextest doesn't insert itself into the build process in any way.
How do you think I should proceed with this?
There was a problem hiding this comment.
At the moment, I believe that within this repo, merged doctests pass in a potentially-unsorted list.
I was reading through, librustdoc/lib.rs, librustdoc/doctest/markdown , and librustdoc/doctest.rs, librustdoc/doctest/runner, and I think it does sort the merged doctests already.
In librustdoc/doctest.rs, that's what I'm assuming the iterative loop for run_tests(...) (lines 352-371) does for us (correct me if I'm wrong though). If that's true, then it seems like all usage of test_main* functions pass in a sorted test list.
But I wasn't sure how much these methods are callable outside the repo.
At least withinlibrustdoc, the test list seems to be in sorted order, but becausetest_main*are publicly accessible functions via the test crate, it's possible for people to pass in tests that are unsorted.
How do you think I should proceed with this?
To answer this question, I prefer what you mentioned earlier here:
as well as the related approach of making TestList be a newtype wrapper which always ensures sortedness
We wouldn't need the TestListOrder enum with this approach, and can construct a TestList type in each of the test_main* functions. test_main_static* would invoke test_main underneath the hood like before. The test list being sorted internally would have to be documented for these functions as well.
Moreover, I feel like the TestList newtype should not be a publicly accessible struct, and should remain as an internal helper struct within the rust repo (pub(crate) visibility?); I can't see in what situations people would create a TestList struct and use it when all the test crate function uses TestDescAndFn (and we can't change the signature of test_main* since it would cause a breaking change -- though it is marked as experimental so not sure what policy is on that -- + means that more librustdoc code needs to be modified).
The consequence to this change is that some changes in librustdoc might need to be made (probably separately) to prevent redundant sorting, or there should be a note somewhere that there's some redundant sorting occurring in librustdoc.
Those are my thoughts and how I would probably proceed with this.
When
--exactis passed in, use binary search for O(f log n) lookups instead of an O(n) linear scan, under the assumption that f << n (which is true for the most relevant cases).This is important for Miri, where the interpreted execution makes the linear scan very expensive.
I measured this against a repo with 1000 empty tests, running
cargo +stage1 miri nextest run test_00(100 tests) under hyperfine:I also tried a few other variations (particularly swapping matching tests to the front of the list + truncating the list), but the index + swap_remove approach proved to be the fastest.
Questions: