Skip to content

Strings::normalize() should normalize into UTF-8 NFC #149

@jkuchar

Description

@jkuchar
  • bug report? no
  • feature request? yes
  • version: 2.4

Description

There are more ways how to save UTF-8 strings. This means that when one want to compare two strings it can be not that simple and intuitive.

$char_A_ring = "\xC3\x85"; // Å 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_combining_ring_above = "A\xCC\x8A";  // Å 'COMBINING RING ABOVE' (U+030A)

assert(\Nette\Utils\Strings::compare($char_A_ring, $char_combining_ring_above)); //error
assert(\Nette\Utils\Strings::normalize($char_A_ring) === \Nette\Utils\Strings::normalize($char_combining_ring_above)); //error

assert($char_A_ring === $char_combining_ring_above);  //error

I would expect that when I do this after Strings::normalize() this would succeed, but it does not. I would expect to normalize everything into NFC or NFD (does not matter that much). Currently there is no guarantee that two same strings will be understood as the same.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions