Image

Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Safe handling and display of untrusted data in the terminal

+3
−0

Suppose I have a binary file with some "readable" content in it that I want to display and inspect in the terminal. It's not a problem if cat produces a mess of strange symbols, but it could be a problem (or even a security risk) if the output contains, in particular, ANSI control codes. Maybe it just messes up the terminal (some of these issues are harder to recover from than others) but it can actually be malicious.

More generally, even something that I expect to be an ordinary "text file" might not be; sometimes one will have to deal with arbitrary, untrusted data. (Aside from the above terminal vulnerabilities, plain text can also exploit various quirks of Unicode to misrepresent what is written. This is especially important if, for example, I'm trying to audit a shell script before running it.)

Are there any standard tools that mitigate the risks of trying to display an arbitrary file in the terminal? What about paging the file (more, less etc.) or processing it to view partial contents (head, tail, cut etc.)?

History

1 comment thread

Recovery (1 comment)

2 answers

+3
−0

xxd

xxd produces a hex dump of the input fed to it. For each byte of input, this includes two hex digits and a symbol which is either an ASCII representation (if the byte corresponds to a printable character) or a . character (proxy for non-ASCII bytes). This means it can't understand Unicode, but it also can't emit anything potentially dangerous. (It does have options for colour output, but it fully controls what ANSI codes are produced.)

Processing file contents with tools like cut is safe in itself; the risk comes from actual output. So for example one can cut a file and pipe the result to xxd.

xxd does not page the output, but its output can in turn be piped to a pager. Piping xxd output to head or tail will get a few lines of xxd's formatted output, which may have no correspondence to "lines" in the original file.

As an example:

$ echo -e '\e[1;31;103mHello\b\b\b\b\bこんにちは\e[0m' | xxd
00000000: 1b5b 313b 3331 3b31 3033 6d48 656c 6c6f  .[1;31;103mHello
00000010: 0808 0808 08e3 8193 e382 93e3 81ab e381  ................
00000020: a1e3 81af 1b5b 306d 0a                   .....[0m.

Without xxd, in a Unicode-aware terminal, the echo displays the Japanese text (overwriting and hiding the English), in an annoying colour combination.

History

0 comment threads

+3
−0

You can use strings (comes with binutils, should usually be available) to get readable strings out of binary files.

Summary from the manual:

For each file given, GNU strings prints the printable character sequences that are at least 4 characters long (or the number given with the options below) and are followed by an unprintable character.

Using it on your example:

$ echo -e '\e[1;31;103mHello\b\b\b\b\bこんにちは\e[0m' | strings
[1;31;103mHello
History

0 comment threads

Sign up to answer this question »