Displaying tricky UTF-8 filenames with sane escape sequences
Suppose, in a UTF-8 locale, I do something like:
touch $'\u200d.txt'
The resulting file's name starts with a zero-width joiner character, which is apparently considered "printable" but is invisible and non-spacing (as its name would suggest) in a terminal window. It's rather difficult to interact with; for example, the name apparently only tab-completes if I paste in an actual zero-width joiner character (or use a custom input method to create one), not if I use any kind of escape syntax for it.
This question however is specifically about displaying this file with ls. Normally the name will just show up as .txt, as if there were nothing before the dot. The -b and --quoting-style options don't help, either:
$ ls -b $'\u200d.txt'
.txt
$ ls --quoting-style=shell-escape-always $'\u200d.txt'
'.txt'
The only thing I found that gives any visual indication of the zero-width joiner is to change the locale first, but this is still unsatisfactory:
$ LC_ALL=C ls $'\u200d.txt'
''$'\342\200\215''.txt'
(That is: the individual bytes of the UTF-8-encoded filename are then escaped as octal sequences, and then the whole thing goes through more layers of quoting.)
Is there any way I can get ls to show something like the original syntax that I used to create the file?
Ideally, in general I would like to be able to see \u escape sequences for bytes that correspond to valid non-ASCII UTF-8 sequences, and \x hexadecimal escape sequences for bytes that are invalid as UTF-8.

0 comment threads