string_extract_if: initial implementation#154583
string_extract_if: initial implementation#154583GrigorenkoPV wants to merge 1 commit intorust-lang:mainfrom
string_extract_if: initial implementation#154583Conversation
This comment has been minimized.
This comment has been minimized.
dddcb99 to
5f26749
Compare
This comment has been minimized.
This comment has been minimized.
5f26749 to
ad0847d
Compare
This comment has been minimized.
This comment has been minimized.
ad0847d to
ecf94cd
Compare
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
ecf94cd to
ec0173f
Compare
This comment has been minimized.
This comment has been minimized.
ec0173f to
c1d594c
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
| /// During the iteration, the underlying vector's consists of: | ||
| /// - A valid UTF-8 prefix (`valid_prefix.len()` bytes) | ||
| /// of characters that we iterated over and didn't extract. | ||
| /// - A middle portion of `bytes_removed` initialized bytes that might not be valid UTF-8. |
There was a problem hiding this comment.
Since we only ever remove a full char, how can we break the UTF-8 property? AFAICT, the Vec should always be initialized and a valid String. The removal operations this performs should correspond directly to calling String::drain(start..end) after a sequence of false returns. That is a tiny bit less efficient since it'll re-check the start/end characters are at a UTF-8 boundary, but that check is O(1) so it shouldn't be that much slower.
I think the current implementation doesn't actually optimize to copy larger chunks only when needed, it looks like we copy each char (likely via call to memmove) which seems likely to be pretty inefficient?
Tracking issue: #154318