Improve bulk quantiles#5
Conversation
This improves performance, especially for small arrays.
This has slightly lower overhead.
| self[i].clone() | ||
| } else { | ||
| self.slice_mut(s![partition_index + 1..]) | ||
| self.slice_axis_mut(Axis(0), Slice::from(partition_index + 1..)) |
There was a problem hiding this comment.
I can see the rationale for all the changes you have proposed, but I feel I am missing something here: why is slice_axis_mut preferable to slice_mut?
There was a problem hiding this comment.
It's just slightly lower overhead, that's all. .slice_mut() goes through all the n-dimensional slicing infrastructure and in the end calls .slice_axis_inplace(); .slice_axis_mut() jumps there directly. [Edit: In theory, the compiler should be able to eliminate the extra slicing infrastructure through constant propagation and inlining, but in practice, some overhead usually remains.] The performance difference is quite small in this case (~2% in the small array tests when I tried it earlier), so if .slice_mut() seems cleaner, that's fine too.
There was a problem hiding this comment.
Readibility-wise, I'd say they are the same. I was just curious to understand the rationale, your laser-focused optimization eye is something I am trying to learn from your PR reviews 😛
This is a bit cleaner.
For me, the tests are easier to understand when they don't collect into a `Vec`.
This has a few advantages: * It's now possible to save the interpolation strategy in a variable and easily re-use it. * We can now freely add more type parameters to the `quantile*` methods as needed without making them more difficult to call. * We now have the flexibility to add more advanced interpolation strategies in the future (e.g. one that wraps a closure). * Calling the `quantile*` methods is now slightly more compact because a turbofish isn't necessary.
This is slightly more versatile because `ArrayBase` allows arbitrary strides.
See rust-ndarray#26.