perf: remove loop from `str::floor_char_boundary` #149466

overlookmotel · 2025-11-30T00:47:11Z

#144472 made str::floor_char_boundary a const function, but in doing so introduced a loop. This is unnecessary because the next UTF-8 character boundary can only be within a 4-byte range.

This produces excessive code for e.g. str.floor_char_boundary(20), because the loop is unrolled.

https://godbolt.org/z/5f3YsM6oK

This PR replaces the loop with 3 checks in series.

In addition to reducing code size in some cases, it also removes bounds checks from calling code when following floor_char_boundary with a call to String::truncate (which I assume might be a common pattern).

Notes

I tried using index.unchecked_sub(1), but found it doesn't remove bounds checks, whereas index.checked_sub(1).unwrap_unchecked() does. Surprising!
The assert_uncheckeds are required to elide checks from code following the floor_char_boundary call e.g.:

let index = string.floor_char_boundary(20);
string.truncate(index);

If this PR is accepted, I'll follow up with a similar PR for ceil_char_boundary.

Very happy to receive feedback and amend.

rustbot · 2025-11-30T00:47:15Z

r? @scottmcm

rustbot has assigned @scottmcm.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

clarfonthey · 2025-11-30T06:09:16Z

Rather than manually unrolling the code here, it might be worth considering if the original implementation can be made const instead:

rust/library/core/src/str/mod.rs

Lines 244 to 247 in 03b17b1

    
           let lower_bound = index.saturating_sub(3); 
        
           let new_index = self.as_bytes()[lower_bound..=index] 
        
               .iter() 
        
               .rposition(|b| b.is_utf8_char_boundary());

I had this exact thought in the original implementation, which is why I explicitly only checked the four following bytes for a character boundary. Note that I haven't fully scrutinised the code and yours may be better regardless, but I figured I'd mention this for some extra context. I would expect LLVM to automatically unroll a 4-max-iteration loop in most contexts.

quaternic · 2025-11-30T08:13:19Z

library/core/src/str/mod.rs

+        let bytes = self.as_bytes();
+        let boundary_index = if bytes[index].is_utf8_char_boundary() {
+            index
        } else {
-            let mut i = index;
-            while i > 0 {
-                if self.as_bytes()[i].is_utf8_char_boundary() {
-                    break;
+            // SAFETY: `bytes[index]` is a UTF-8 continuation byte, therefore there must be a byte before it.
+            // Note: `index.unchecked_sub(1)` would be preferable, but it doesn't the remove bounds check
+            // from `bytes[previous_index]`.
+            let previous_index = unsafe { index.checked_sub(1).unwrap_unchecked() };
+            if bytes[previous_index].is_utf8_char_boundary() {
+                previous_index
+            } else {
+                // SAFETY: `bytes[index - 1]` is a UTF-8 continuation byte, therefore there must be a byte before it
+                let previous_previous_index = unsafe { index.checked_sub(2).unwrap_unchecked() };
+                if bytes[previous_previous_index].is_utf8_char_boundary() {
+                    previous_previous_index
+                } else {
+                    // UTF-8 character sequences are at most 4 bytes long.
+                    // `bytes[index - 2]`, `bytes[index - 1]`, and `bytes[index]` are all continuation bytes,
+                    // therefore `bytes[index - 3]` must be the 1st byte in a 4-byte UTF-8 character sequence.
+                    debug_assert!(bytes[index - 3].is_utf8_char_boundary());
+                    index - 3
                }
-                i -= 1;
            }
+        };


The deep nesting isn't ideal. You could try matching slice patterns:

let bytes = &self.as_bytes()[..=index]; let boundary_index = match bytes { [.., b] if b.is_utf8_char_boundary() => index, [.., b, _] if b.is_utf8_char_boundary() => index - 1, [.., b, _, _] if b.is_utf8_char_boundary() => index - 2, [.., b, _, _, _] if b.is_utf8_char_boundary() => index - 3, // SAFETY: TODO _ => unsafe { debug_assert!(bytes[index.saturating_sub(3)].is_utf8_char_boundary()); unreachable_unchecked() } };

This is much nicer. I'll check if it optimizes equally as well.

That's definitely more elegant, but it's definitely non-obvious to me that it could optimize well, since each arm has bounds check requirements that cannot be deduced from the previous.

However, a failed bounds check implies the following ones also fail. That means that any bounds check that fails leads to the fallback arm, which is unreachable_unchecked. So, yes, it's extra steps compared to get_unchecked, but LLVM seems to deduce that fine: https://godbolt.org/z/jr48rjh9h (foo and bar get merged)

...oh, no, I was wrong. The unreachable_unchecked nukes any other checks. Cool!

(The scratch pad I was using: https://godbolt.org/z/46Msq6G85)

okaneco · 2025-11-30T19:01:28Z

You might want to add a codegen test under tests/codegen-llvm to make sure the panics/slice fail checks that you're targeting are optimized out.

At the time, I tried to add a counter variable to limit the number of iterations but I wasn't able to remove the panics like the current implementation does.

slice_new and truncate_new still have panic paths when the index is variable. Variable index slice doesn't have a panic in the current implementation. Variable truncate has a panic for both implementations.
https://godbolt.org/z/T1TexYbE7

I wrote an example codegen test to help with testing this out (others may be able to improve it further).
You can run the tests with ./x test [path to file or directory] (save time by iterating over the IR/assembly in godbolt first).
Again, the variable truncate may not be possible to enable currently.
https://rustc-dev-guide.rust-lang.org/tests/compiletest.html#codegen-tests

Example codegen test file

// Test that panics and slice errors are elided from `floor_char_boundary` implementation.

//@ compile-flags: -Copt-level=3

#![crate_type = "lib"]

// CHECK-LABEL: @floor_fixed_index
#[no_mangle]
pub fn floor_fixed_index(s: &str) -> usize {
    // CHECK-NOT: panic
    s.floor_char_boundary(20)
}

// CHECK-LABEL: @floor_variable_index
#[no_mangle]
pub fn floor_variable_index(s: &str, i: usize) -> usize {
    // CHECK-NOT: panic
    s.floor_char_boundary(i)
}

// CHECK-LABEL: @floor_slice_fixed_index
#[no_mangle]
pub fn floor_slice_fixed_index(s: &str) -> &str {
    // CHECK-NOT: panic
    // CHECK-NOT: slice_error_fail
    let index = s.floor_char_boundary(20);
    &s[..index]
}

// CHECK-LABEL: @floor_slice_variable_index
#[no_mangle]
pub fn floor_slice_variable_index(s: &str, i: usize) -> &str {
    // CHECK-NOT: panic
    // CHECK-NOT: slice_error_fail
    let index = s.floor_char_boundary(i);
    &s[..index]
}

// CHECK-LABEL: @truncate_from_floor_fixed_index
#[no_mangle]
pub fn truncate_from_floor_fixed_index(s: &mut String) {
    // CHECK-NOT: panic
    let index = s.floor_char_boundary_new(20);
    s.truncate(index)
}

// CHECK-LABEL: @truncate_from_floor_variable_index
#[no_mangle]
pub fn truncate_from_floor_variable_index(s: &mut String, i: usize) {
    // CHECK-NOT: panic
    let index = s.floor_char_boundary_new(i);
    s.truncate(index)
}

overlookmotel · 2025-11-30T19:05:10Z

slice_new and truncate_new still have panic paths when the index is variable. Variable index slice doesn't have a panic in the current implementation. Variable truncate has a panic for both implementations.

Yes, I noticed this today. Am hoping to eliminate all these panics. I feel like it should be possible, but... mysteries of the compiler...

scottmcm · 2025-12-06T08:38:24Z

I tried using index.unchecked_sub(1), but found it doesn't remove bounds checks, whereas index.checked_sub(1).unwrap_unchecked() does. Surprising!

This is the long-frustrating problem that sub nuw i32 %1, 1 gets turned into add i32 %1, -1 by LLVM and thus completely loses the "it's not going to overflow" information. (cc @nikic in case this real-life example is helpful)

Whereas x.check_sub(1).unwrap_unchecked() ends up with an llvm.assume(x >= 1) which doesn't get removed.

scottmcm · 2025-12-06T08:45:46Z

library/core/src/str/mod.rs

+            // Note: `index.unchecked_sub(1)` would be preferable, but it doesn't the remove bounds check
+            // from `bytes[previous_index]`.


It's not obvious to me that it's better to do an unwrap_unchecked on the subtraction than just to do a get_unchecked on the array, if the goal is to remove the bounds checking.

For example, this https://godbolt.org/z/d155x5hqj seems to work too:

impl StrExt for str { #[inline] fn floor_char_boundary_new(&self, index: usize) -> usize { if index >= self.len() { return self.len(); } let bytes = self.as_bytes(); let boundary_index = unsafe { if bytes.get_unchecked(index).is_utf8_char_boundary() { index } else if bytes.get_unchecked(index - 1).is_utf8_char_boundary() { index - 1 } else if bytes.get_unchecked(index - 2).is_utf8_char_boundary() { index - 2 } else if bytes.get_unchecked(index - 3).is_utf8_char_boundary() { index - 3 } else { unreachable_unchecked() } }; // Inform compiler that returned index is `<= index` and on a char boundary. // This removes bounds check from e.g. `String::truncate` call after this. // While it looks redundant, it seems to make a difference as load of // `bytes[index - 3]` is actually removed above, not kept for the assume. // SAFETY: Calculations above only deduct from `index`, and cannot wrap around. // `boundary_index` is a char boundary. unsafe { assert_unchecked(bytes.get_unchecked(boundary_index).is_utf8_char_boundary()); } boundary_index } }

scottmcm

That godbolt demo is quite persuasive to me that this should work differently, but I think the critical thing missing is a codegen-llvm test that confirms that the new implementation doesn't have those problems -- because without that, it's just going to get broken again like it did before.

View changes since this review

rustbot · 2025-12-06T09:16:52Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

perf: remove loop from str::floor_char_boundary

3c96dfa

rustbot assigned scottmcm Nov 30, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 30, 2025

quaternic reviewed Nov 30, 2025

View reviewed changes

scottmcm reviewed Dec 6, 2025

View reviewed changes

scottmcm requested changes Dec 6, 2025

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 6, 2025

		// Note: `index.unchecked_sub(1)` would be preferable, but it doesn't the remove bounds check
		// from `bytes[previous_index]`.

Uh oh!

perf: remove loop from str::floor_char_boundary #149466

Are you sure you want to change the base?

perf: remove loop from str::floor_char_boundary #149466

Conversation

overlookmotel commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Uh oh!

rustbot commented Nov 30, 2025

Uh oh!

clarfonthey commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quaternic Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

overlookmotel Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

scottmcm Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

quaternic Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

scottmcm Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

okaneco commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

overlookmotel commented Nov 30, 2025

Uh oh!

scottmcm commented Dec 6, 2025

Uh oh!

scottmcm Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

scottmcm Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

scottmcm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rustbot commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

perf: remove loop from `str::floor_char_boundary` #149466

perf: remove loop from `str::floor_char_boundary` #149466

overlookmotel commented Nov 30, 2025 •

edited

Loading

clarfonthey commented Nov 30, 2025 •

edited

Loading

okaneco commented Nov 30, 2025 •

edited

Loading

scottmcm left a comment •

edited

Loading