Skip to content

string_extract_if: initial implementation#154583

Open
GrigorenkoPV wants to merge 1 commit intorust-lang:mainfrom
GrigorenkoPV:string-extract-if
Open

string_extract_if: initial implementation#154583
GrigorenkoPV wants to merge 1 commit intorust-lang:mainfrom
GrigorenkoPV:string-extract-if

Conversation

@GrigorenkoPV
Copy link
Copy Markdown
Contributor

@GrigorenkoPV GrigorenkoPV commented Mar 30, 2026

Tracking issue: #154318

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 30, 2026
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@GrigorenkoPV GrigorenkoPV marked this pull request as ready for review March 30, 2026 22:55
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 30, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Mar 30, 2026

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: libs
  • libs expanded to 7 candidates

@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Mar 30, 2026
@rustbot

This comment has been minimized.

@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 1, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

/// During the iteration, the underlying vector's consists of:
/// - A valid UTF-8 prefix (`valid_prefix.len()` bytes)
/// of characters that we iterated over and didn't extract.
/// - A middle portion of `bytes_removed` initialized bytes that might not be valid UTF-8.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only ever remove a full char, how can we break the UTF-8 property? AFAICT, the Vec should always be initialized and a valid String. The removal operations this performs should correspond directly to calling String::drain(start..end) after a sequence of false returns. That is a tiny bit less efficient since it'll re-check the start/end characters are at a UTF-8 boundary, but that check is O(1) so it shouldn't be that much slower.

I think the current implementation doesn't actually optimize to copy larger chunks only when needed, it looks like we copy each char (likely via call to memmove) which seems likely to be pretty inefficient?

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants