Windows: set main thread name without re-encoding#123534
Windows: set main thread name without re-encoding#123534bors merged 3 commits intorust-lang:masterfrom
Conversation
|
rustbot has assigned @Mark-Simulacrum. Use |
| /// Const convert UTF-8 to UTF-16, for use in the wide_str macro. | ||
| /// | ||
| /// Note that this is designed for use in const contexts so is not optimized. | ||
| pub const fn to_utf16<const UTF16_LEN: usize>(s: &str) -> [u16; UTF16_LEN] { | ||
| let mut output = [0_u16; UTF16_LEN]; | ||
| let mut pos = 0; | ||
| let s = s.as_bytes(); | ||
| let mut i = 0; | ||
| while i < s.len() { | ||
| match s[i].leading_ones() { | ||
| // Decode UTF-8 based on its length. | ||
| // See https://en.wikipedia.org/wiki/UTF-8 | ||
| 0 => { | ||
| // ASCII is the same in both encodings | ||
| output[pos] = s[i] as u16; | ||
| i += 1; | ||
| pos += 1; | ||
| } | ||
| 2 => { | ||
| // Bits: 110xxxxx 10xxxxxx | ||
| output[pos] = ((s[i] as u16 & 0b11111) << 6) | (s[i + 1] as u16 & 0b111111); | ||
| i += 2; | ||
| pos += 1; | ||
| } | ||
| 3 => { | ||
| // Bits: 1110xxxx 10xxxxxx 10xxxxxx | ||
| output[pos] = ((s[i] as u16 & 0b1111) << 12) | ||
| | ((s[i + 1] as u16 & 0b111111) << 6) | ||
| | (s[i + 2] as u16 & 0b111111); | ||
| i += 3; | ||
| pos += 1; | ||
| } | ||
| 4 => { | ||
| // Bits: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx | ||
| let mut c = ((s[i] as u32 & 0b111) << 18) | ||
| | ((s[i + 1] as u32 & 0b111111) << 12) | ||
| | ((s[i + 2] as u32 & 0b111111) << 6) | ||
| | (s[i + 3] as u32 & 0b111111); | ||
| // re-encode as UTF-16 (see https://en.wikipedia.org/wiki/UTF-16) | ||
| // - Subtract 0x10000 from the code point | ||
| // - For the high surrogate, shift right by 10 then add 0xD800 | ||
| // - For the low surrogate, take the low 10 bits then add 0xDC00 | ||
| c -= 0x10000; | ||
| output[pos] = ((c >> 10) + 0xD800) as u16; | ||
| output[pos + 1] = ((c & 0b1111111111) + 0xDC00) as u16; | ||
| i += 4; | ||
| pos += 2; | ||
| } | ||
| // valid UTF-8 cannot have any other values | ||
| _ => unreachable!(), | ||
| } | ||
| } | ||
| output | ||
| } |
There was a problem hiding this comment.
nice work. I feel like at least some of this should be using more public std API instead of a bunch of sorcerous isopsephia, but I looked for equivalents and couldn't find any in the stdlib, so this will do for now.
|
@bors r=workingjubilee |
|
Okay, following fmease's explanation I think using a decl macro would be fine since we're std and get to use nightly features when we want: |
`wide_str!` creates a null terminated UTF-16 string whereas `utf16!` just creates a UTF-16 string without adding a null.
|
Ok, I've rewritten it to use macros 2.0. I did the same for both macros for the sake of consistency. |
|
Yay! ( I don't mean to be annoying, I just don't think we should embrace a fragile proliferation of underscores if we don't have to. ) @bors r+ |
…llaumeGomez Rollup of 7 pull requests Successful merges: - rust-lang#118391 (Add `REDUNDANT_LIFETIMES` lint to detect lifetimes which are semantically redundant) - rust-lang#123534 (Windows: set main thread name without re-encoding) - rust-lang#123659 (Add support to intrinsics fallback body) - rust-lang#123689 (Add const generics support for pattern types) - rust-lang#123701 (Only assert for child/parent projection compatibility AFTER checking that theyre coming from the same place) - rust-lang#123702 (Further cleanup cfgs in the UI test suite) - rust-lang#123706 (rustdoc: reduce per-page HTML overhead) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#123534 - ChrisDenton:name, r=workingjubilee Windows: set main thread name without re-encoding As a minor optimization, we can skip the runtime UTF-8 to UTF-16 conversion.
As a minor optimization, we can skip the runtime UTF-8 to UTF-16 conversion.