Skip to content

Tracking issue: UTF-8 decoder in libcore #33906

@strake

Description

@strake

Update (@SimonSapin): this is now the tracking issue for these items in both core::char and std::char:

  • decode_utf8() which takes an iterable of u8 and return DecodeUtf8
  • DecodeUtf8 which implements Iterator<Item=Result<char, InvalidSequence>>
  • InvalidSequence which is opaque

Original issue:

In libcore we have a facility to encode a character to UTF-8, i.e. char::EncodeUtf8, but no facility to decode a character from potentially-invalid UTF-8, and return 0xFFFD if it reads an invalid sequence, which seems a surprising omission to me as a libcore user, given in libstd we have string::String::from_utf8_lossy.

These options came to mind:

  • A function str::next_code_point_lossy or so which behaves as str::next_code_point but checks whether its input is valid and returns 0xFFFD if not
  • An iterator DecodeUtf8 which one can make from an arbitrary iterator of bytes, which decodes them

Metadata

Metadata

Assignees

No one assigned

    Labels

    B-unstableBlocker: Implemented in the nightly compiler and unstable.C-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCT-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.final-comment-periodIn the final comment period and will be merged soon unless new substantive objections are raised.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions