Image

Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Displaying tricky UTF-8 filenames with sane escape sequences

+6
−0

Suppose, in a UTF-8 locale, I do something like:

touch $'\u200d.txt'

The resulting file's name starts with a zero-width joiner character, which is apparently considered "printable" but is invisible and non-spacing (as its name would suggest) in a terminal window. It's rather difficult to interact with; for example, the name apparently only tab-completes if I paste in an actual zero-width joiner character (or use a custom input method to create one), not if I use any kind of escape syntax for it.

This question however is specifically about displaying this file with ls. Normally the name will just show up as .txt, as if there were nothing before the dot. The -b and --quoting-style options don't help, either:

$ ls -b $'\u200d.txt'
.txt
$ ls --quoting-style=shell-escape-always $'\u200d.txt'
'‍.txt'

The only thing I found that gives any visual indication of the zero-width joiner is to change the locale first, but this is still unsatisfactory:

$ LC_ALL=C ls $'\u200d.txt'
''$'\342\200\215''.txt'

(That is: the individual bytes of the UTF-8-encoded filename are then escaped as octal sequences, and then the whole thing goes through more layers of quoting.)

Is there any way I can get ls to show something like the original syntax that I used to create the file?

Ideally, in general I would like to be able to see \u escape sequences for bytes that correspond to valid non-ASCII UTF-8 sequences, and \x hexadecimal escape sequences for bytes that are invalid as UTF-8.

History

0 comment threads

Sign up to answer this question »