Schrödinger's 😻 and outside-the-box naming
Benefits for LWN subscribersThe primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!
What's in a string? That depends on who you ask, apparently; a lesson that Fedora recently learned when it unexpectedly ran into a problem with the release name for the upcoming Fedora 19, "Schrödinger's Cat"—and all of the unusual characters contained within. Typographic oddities might seem like a trivial reason to upend the distribution release process, but a validation tool in the bug reporting system objected to the name, so Fedora developers found themselves asking whether it was more practical to stop and fix all of the utilities, or to change the release name itself.
The problem, of course, is that unlike previous Fedora release names,
"Schrödinger's Cat" contains some characters outside of the
basic Latin alphabet: an o with umlaut and an apostrophe.
But the specific issue encountered in the wild is even more specific
than that; the "apostrophe" in question is frequently typed as the
similar-looking but different single-quote character, and quotes can
wreak havoc when the release name is processed by a shell script.
On March 16, Adam Williamson reported a bug in
the Fedora bug reporting tool: when reporting a bug against Fedora 19,
the server side threw an error when it tried to validate the name of
the release, complaining of "illegal characters
".
The root of the bug was quickly traced to libreport, which contains an is_text_file() function. The function determines whether or not a given file is text by whether 2% of the bytes are greater than 0x80. Two percent is a rather arbitrary limit, and in this case the file triggering the error was /etc/os-release, which consisted of a single line:
Fedora release 19 (Schrödinger's Cat)
Dave Malcolm pointed
out that the /etc/os-release manual page says non-alphanumeric
characters should be escaped " Vlasenko did commit a more substantive patch a few days later, but
libreport was not the only utility to stumble when it
encountered the new release name. Another bug
opened by Williamson reported that grub2 also broke when it
encountered /etc/os-release, due to the un-escaped
single-quote character.
On the Fedora development list, Sérgio Basto proposed one change that would solve both
problems (and, hopefully, any others stemming from the unusual release
name): formally change the release name from "Schrödinger's
Cat" to "Schrodingers Cat" or some similar variation that stuck to
pure ASCII characters. After all, as Chris Murphy commented, there are likely to be many
more utilities that cannot handle the release name, and the project
will continue to encounter them as the development cycle progresses.
But, to others, simply changing the release name amounts to
"papering over" the real issue, which is ensuring that the build and
QA tools can handle arbitrary UTF-8 text. Surely it is better to
spend a little time now to fix the issues than to avoid them, the
thinking went. Williamson, however, disagreed, calling it " If we have to compromise on just papering it over for Alpha, I mean,
_fine_. But seriously: sometimes papering it over is just the right
thing to do.
Similarly, Chris Adams pointed out
that the deadline for adding new features for Fedora 19 had already
passed; adding UTF-8 support to a variety of tools may be important,
but there is no doubt that it amounts to a feature. But G.Wolfe
Woodbury contended that the real issue
was proper internationalization, and that " Jaroslav Reznik opened a Fedora
Engineering Steering Committee (FESCo) ticket on the
subject, offering two alternatives: fixing UTF-8 and
character-handling issues as they arise, or changing the release name
to something similar but less problematic (perhaps "Cat of
Schroedinger" or The discussion on the mailing list continued, including mention of
the very real risk that after Fedora 18's lengthy delays, the prospect
of holding up Fedora 19's release to fix a character string would
amount to a terrible public relations blunder. But Peter Jones found
a compromise solution and posted a
patch changing Schrödinger's Cat to Schrödinger’s
Cat in the affected files. The two strings may not look too
different (in fact, depending on one's font, they may look identical),
but the second replaces the "typewriter apostrophe" character at
Unicode point U+0027 to the "punctuation apostrophe" at U+2019. The
typewriter apostrophe is interpreted as a shell quote character, but
the punctuation apostrophe is not. Rarely do the differences in
Unicode's byzantine slate of similar code points solve more problems
than they create—just look at curly- versus straight-quotes in
HTML, for example—but in this case, the change allowed
/etc/os-release to work once again. FESCo voted to approve
the apostrophe change and to fix any other UTF-8 support issues
encountered during the development cycle.
Of course, the apostrophe compromise leaves the potential for other
UTF-8 support issues to be encountered, and sidesteps the
quote-character issue. That bodes well for Fedora 19's release date not
getting pushed back due to a last-minute "umlaut bug," but it means less
rigorous testing on the release build tools. FESCo subsequently ruled that future release names shall not
include " In fact, some participants in the mailing list discussion proposed
adding non-alphanumeric characters to future release names just to see
what happens. Paul Flo Williams predicted someone proposing
"Motörhead's Moshpit" as the Fedora 20 release name because
of the non-ASCII characters, while Richard M. Jones suggested ☃ (the Unicode "snowman"
character U+2603, also known as HTML character entity &9731; or ☃). Peter Robinson proposed the project go right for the
goal and choose "DROP table *;".
On the other side of the debate, some developers were less than
amused. Fedora has had its share of project members who object to
release names altogether; Jóhann B. Guðmundsson said:
They also get the benefit of fixing what breaks in the process.
Anti-release-name comments did not elicit much further debate, so it seems
likely that release names will continue to cling on for at least one
more release cycle. But it is true that "Schrödinger's Cat" caused
some problems due to the unpredictable effect it has on development
and release tools. On the whole, though, the problems it revealed are
problems worth solving—there is no telling what characters
downstream spins and Fedora derivatives might put into a
string.
The distribution will be better for catching and correcting
assumptions about character encodings and non-alphanumeric strings.
Robinson noted that Fedora 19's
release name was chosen roughly six months ago during the Fedora 18
Alpha period; nevertheless it took six months for anyone to encounter
a bug related to it precisely because of how deeply buried the problem
was. A release name might be a lowly string, primarily chosen for
amusement value, but the issue should remind all distributions how
subtle such bugs can be, and Fedora clearly stands to benefit now that
the cat is out of the bag.
[Special hat tip to Don Marti for proposing
"Schrödinger's 😻" as an alternative name. "with backslashes, following shell
style
", and Denys Vlasenko patched
is_text_file() to bump the acceptable-character threshold
from 2% to 10%. But that fix was a simple workaround; as others in
the bug comments pointed out, the function should test whether the
contents of the file are really valid UTF-8 text, which the
0x80 test does not do.
Schrödinger, Schmodinger
a question of
priorities
" in light of Fedora's human resources and release
schedule. Later, he elaborated that
fixing UTF-8 support in the problematic tools in separate branches
would be acceptable, if it did not slow down the release:
not defensively
programming for such cases is short-sighted.
"
Solutions and open questions
the proper German "Schroedinger Katze").
shell metacharacters
". That is a practical
trade-off; as several list members pointed out, by changing the
problematic string, an unknown number of character-handling bugs may
go undetected by Fedora—but they could still bite other projects
that use the Fedora tools. In the long run, the tools will still need
fixing.
If you're going to do
Unicode, do Unicode.
"]
