-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Proposal
Problem statement
As of today, changing the case of a String is done by using methods defined on str, which implies that a new buffer is allocated each time.
Yet, in most use cases, upper/lower-casing of UTF-8 strings can be done in place. Indeed, the set of codepoints which needs strictly more or less bytes to encode when their case is changed is small, and even more so when considering commonly used languages (cf https://raw.githubusercontent.com/krtab/utf8caseinplace/master/stats/out.txt).
Hence, this APC proposes that a new API is added to String to change cases and do so efficiently by consuming self and reusing the buffer, not allocating in most cases.
Motivating examples or use cases
Case changes that can consume the String they are working on include:
to_string().to_uppercase(): https://github.com/search?q=lang%3ARust+to_string%28%29.to_uppercase%28%29&type=code (also sometimes found asformat(...).to_uppercase()https://github.com/facebook/hhvm/blob/9004eeeb255e06f2459ea3b5c40e1dc558f3b136/hphp/hack/src/hackc/print_expr/print.rs#L318-L324)env::var("FOO")?.to_uppercase() == "SOMETHING"(https://github.com/ddisqq/qiskit-terra/blob/3095955244ace26ed38d1af25fe5f0033246bfd7/src/lib.rs#L31-L41)
And likely other I couldn't search for.
Solution sketch
This would add the following methods (names to be determined)
impl String {
fn into_uppercase(self) -> String;
fn into_lowercase(self) -> String;
}The exact implementation remains to be discussed, but the idea would be that in cases where it is possible, the case change is done in place. Once that isn't possible, a auxiliary DE-queue can be used to store bytes temporarily.
Alternatives
This could be done as a crate but people would use it much less and resort to str::to_othercase instead. Moreover, availability on crates.io of an up-to-date version of the Unicode database allowing correct case change for all situations is not guaranteed, and core::unicode methods used in str::to_othercase are not public. Finally, having both implementations in std help keep them in sync an iso-behaved.
Links and related work
- zulip thread : https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/In.20place.20String.20case.20change
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.