DomTerm: A web-based rich terminal emulator [LWN.net]

We're bad at marketing
We can admit it, marketing is not our strong suit. Our strength is writing the kind of articles that developers, administrators, and free-software supporters depend on to know what is going on in the Linux world. Please subscribe today to help us keep doing that, and so we don’t have to get good at marketing.

January 6, 2016

This article was contributed by Per Bothner

Many LWN readers likely still use their trusty terminal emulators a lot: a command console (read-eval-print loop or REPL) is a handy and useful way to explore a system, try out code, chat, and start programs with many options. Terminal emulators are also a useful platform for running interactive programs (Midnight Commander, Emacs, top, etc.), in a way that is economical of screen real estate without clutter. They are also keyboard-oriented, available for multiple platforms, and network-friendly.

However, terminal user interfaces (UIs) are rather minimalist, both as a REPL and as a platform, and make poor use of modern technology. You might like to use different fonts (including variable-width fonts) or embed images, icons, graphs, and tables. A terminal UI can provide input fields, and mouse-clickable buttons and menus, but interactive elements are otherwise limited, and the look-and-feel is sparse and restricted to working at the character-cell level (i.e. rows and columns). Anything beyond a simple REPL is tedious to program, requiring working with low-level character abstractions and libraries that few programmers know.

At the other extreme, the web platform (HTML5 and friends) is quite rich, flexible, and multi-platform. It also has lots of tooling and experienced programmers with multiple high-quality competing implementations. Many applications these days are web applications, which can now run standalone or disconnected. The web platform provides a nice toolkit and is especially strong for applications that are heavily text-oriented — like terminal emulators. Also note that Cascading Style Sheets (CSS) is a flexible way to model user preferences, such as fonts, colors, and sizes.

In other words: When implementing a terminal emulator, it makes a lot of sense to use the web platform as a rich-text toolkit even if all you want is a plain xterm replacement. Of course that means you would have the option to use all of the goodies of the web: graphics, fonts, and fancy event handlers. Another benefit is that you get a high-level data model and programming interface that works at the level of elements and events rather than characters.

Now it is nice to be able to run text-mode Emacs in a browser (as seen in a later screenshot). However, those of us who use terminals day-to-day want something that looks and acts like a general-purpose standalone terminal emulator. That means one without an URL bar or back button, for example, and that can be installed in /usr/bin. Luckily there are multiple embedded browsers available, which implement the web-technology stack as a library callable from a standalone program written in C, C++, Java — or even JavaScript. We also need it to be more-or-less xterm-compatible. Then we can add features to make use of the web platform. This is the path blazed by xterm: first implement a VTxxx/ANSI-compatible terminal emulator, and then add features (such as mouse handling) to take advantage of X11. The updated plan: start with a full-featured xterm clone, and then add functionality that takes advantage of the web platform.

DomTerm is my attempt to modernize and extend the terminal emulator using the web platform. The core of DomTerm is a JavaScript class that handles the terminal escapes and UI. The class can run as-is in a browser, or it can be used with an embedded browser to create a standalone application (with menus and chrome suited to a terminal emulator rather than a browser). DomTerm is so-named because it works by manipulating the Document Object Model (DOM) of a browser: the nested structure of elements, text, and attributes. (Also, most of the other appropriate names were already taken it seems.)

Beyond plain text

The previous screenshot shows some simple possibilities of DomTerm. Most obviously, you see printing of an SVG circle. The underlined text in DomTerm is cool! is a clickable link, printed out by the echo command above it. In general, if you send these characters to DomTerm (where \e is the ASCII escape character and \a is the alarm/bell character):

    \e]72;html-text\a

it will parse html-text as raw HTML and insert it in the display at the cursor position. The same screenshot has hide/show buttons — the output from ls -l is hidden. More about that later.

The text colors you see (prompt text and input text) are not hardwired by the application or by DomTerm, but can be controlled by stylesheets (even after the fact). The style indicators and the hide/show buttons were enabled by adding some escape sequences to the PS1 prompt.

The second screenshot shows how a REPL can adapt to the richer DomTerm functionality. We see Kawa started from the shell running on DomTerm. The expressions evaluate Paintable values, which are graphical concepts like filled polygons. When a Paintable is printed, Kawa checks for a DOMTERM environment variable. If it is set, Kawa converts the Paintable to SVG, and sends the SVG (wrapped in an escape sequence) to DomTerm, which displays the image inline.

Kawa also has XML literals, which are expressions starting with #< that evaluate to Element objects. If you print an Element to a DomTerm console, it inserts its HTML representation. The following prints three values: a string, then an Element, then a string:

    #|kawa:2|# "Read " #<a href='http://lwn.net'>LWN</a> "."
    Read LWN.

Going further, IPython (Jupyter) shows us an enhanced REPL based on the web platform. It would be nice if you could could run IPython directly in your terminal without having to fire up a browser window. One reason is seamless usage: imagine starting up a language REPL in your terminal emulator because you just want to do some simple calculations — and then you realize you want to see a graph visualization. Oops, better start over. It is also nice to have control over when you create a new window or tab, and when you continue in an existing window. Finally it would be nice to have the kind of notebook-saving, smart history, and line editing in all your console applications, such as the shell.

Structured output

A traditional terminal manages a fixed-size two-dimensional array of characters. A rich-text console, instead, works on a list of variable-length lines. Things get interesting when you add additional hierarchical structure. For example, lines associated with a single command can be grouped together. This is what enables hide/show buttons to control multiple lines. A planned extension to this concept would enable inspectors of tree-structured data. Details would be initially hidden, but when you click to show details, DomTerm would send a request to the application.

An application could optionally delimit each field in an output line with in a separate <span> element, using the class attribute to mark the field's type. This can be used for styling (like syntax coloring, but more powerful) with styles that can be changed even after an application finishes. Semantic structuring of the output is useful for anything that wants to analyze or replay the session, and therefore makes a good base for a notebook (in the IPython sense). A plausible notebook document format is an HTML file (for simple documents) or a zip archive of an HTML file with images and other resources (for more complex documents).

In its day, XMLterm attempted to implement a similar concept, but it was before its time, and depended too much on Mozilla internals. The archived website suggests some ideas (such as pagelets) that could easily be handled by the DomTerm design. The author of XMLTerm, Ramalingam Saravanan, more recently implemented GraphTerm.

Microsoft's PowerShell lets commands produce structured objects. When output is printed on a regular terminal, it is converted to plain text. We can do much better when printing to DomTerm. For example, we can create an HTML <table>. We can also change the output style dynamically; for example, we can re-layout old output when the window size changes.

DomTerm architecture

The core of DomTerm is terminal.js, which is a class written in JavaScript. It receives characters from an application, interprets escape sequences, and updates the DOM structure of elements, attributes, and text. Those changes cause the browser engine to update the display, as controlled by the stylesheets. The DomTerm class also handles keyboard and mouse events, and sends them to the application. The public API for this class is extremely simple: you need to supply functions for the input and output character streams. There are functions for reporting special events, but the default implementation just encodes the events in the character streams.

This JavaScript can run in a general-purpose web browser such as Firefox or Chrome. Alternatively, an embedded browser library can be used. The library provides a basic GUI that wraps the guts of a browser, which lets the programmer add their own menus and default actions. In that case, the browser and the application can all reside in a single process.

The application can be any program that reads and writes character streams — by default a shell. On Unix-like systems you can optionally run the application attached to a PTY (pseudoterminal). Windows doesn't have true PTYs, so people use various hacks instead. If any application knows it is talking to DomTerm, it doesn't actually need PTYs, but could instead communicate directly.

The browser and the application could be separated by a network. Any character-stream connection can be used. The WebSocket protocol is attractive because it is built into modern web browsers. All you need to run DomTerm in a browser is a small HTML wrapper.

The DomTerm distribution includes various utility classes and programs to run an application (such as a shell) either integrated with an embedded browser, or started by a WebSocket server. In the latter case, you open a terminal by pointing your favorite browser to the server's address. These are just sample utilities; better ones will hopefully be written.

Keystrokes and line-editing

DomTerm handles keyboard input in two basic modes:

Character mode is like the terminal driver's raw mode: Each keystroke is sent directly to the backend. The front-end does not otherwise process the keystroke. Some keystroke events may send multiple characters. For example, the left-arrow key sends "\e[D" while shift-F5 sends "\e[15;5~". (It is unclear why that is — I just do what xterm does.)
Line mode is like the terminal driver's canonical mode: DomTerm maintains a current input line. This is a <span> element with the contenteditable="true" attribute. A keystroke will cause insertion, cursor motion, or other updates to the input line <span>. This can use either the browser's default action, or it can be overridden with custom editing actions. When the user types Enter, the contents of the input line are sent to the back-end, and a fresh empty input line is created. (I'm ignoring complications related to echoing.)

Line mode provides the basic functionality of GNU readline or Emacs shell mode, including history, but no tab-completion support yet.

In addition, the PTY back-end supports automatic mode. This switches automatically between line and character mode depending on the inferior terminal: if it is in raw mode, the keystroke is handled in character mode; if it is in canonical mode, the keystroke is handled in line mode. (This requires a somewhat complicated dance, because the terminal driver doesn't notify us about its state — we have to ask it, after we get a keystroke event.)

Finally, pipe mode is like line mode, but the back-end is fed the data in a pipe, and does no echoing.

Other features — current and planned

DomTerm already has most of the functionality of a terminal emulator. It implements a large subset of base xterm — the basic functionality of emulators that set TERM=xterm-256color. Much (but not all) of the vttest torture test works. It runs Emacs; mc (Midnight Commander) (pictured at right); and of course readline. The most important missing piece is probably xterm-style mouse-event handling, plus lots of testing and bug-fixing.

Many terminals, including xterm, remember line-wrapping (the difference between an explicit newline and a soft newline caused by wrapping around when writing past the right margin), which allows correct copy-and-paste. DomTerm also re-wraps old lines when the window is resized. This is highly desirable for a true console, but terminal emulators don't always do this. DomTerm also indicates line wraps with a special marker (controllable by a stylesheet), similar to the way Emacs does it.

In the future, I plan to do intelligent line-breaking based on the structure of the line contents: pretty-printing a complex value like a nested list means line-breaks in allowable places (indicated with escape sequences), so that each sub-value fits on as few lines as possible, with appropriate indentation and nesting. This too will automatically reflow on window resize.

A builtin paginator like the more program would be convenient. This is like old-fashioned XON/XOFF flow control: when there is more than a screenful of output since the last user interaction, further output is paused until the user asks for more. A problem with traditional flow control is users don't expect it, and think the system is frozen. One solution is to write a button at the end of the output, like:

An application may also want to update part of the previous output. A plain terminal does that using row/column addressing, but for complex structures it is better to identify a location logically. The plan is to support moving the cursor to an Element with a specified id attribute. We also want to allow replacing the contents of a named node. (Though there are some namespace and lifetime management issues of id attributes that would need to be resolved.)

There is also a need to be able to print buttons and other interactive elements. The plan is to define a protocol to associate an event handler with an element in the output. That would be done by associating a JavaScript event handler with the element. When the handler is triggered, it encodes the interesting properties of the event (probably using JSON), and sends it to the input of the user process.

Application ideas

A terminal-using application that wishes to benefit from DomTerm should just continue doing normal terminal things, but in addition check that the DOMTERM environment variable is non-empty. When it makes sense it could then output images, links, interactive buttons, or switch to a non-monospace font.

Consider info, the venerable GNU documentation browser, or the man command. If either detects DOMTERM, it could emit the HTML version of the documentation page in a much more readable format. It can still use the same key-bindings and user interface. (This would probably require some enhancements to the DomTerm byte-stream protocol, such as to communicate scrolling appropriately.)

Emacs in a DomTerm window could be nicer than using just a plain terminal emulator. For example, it could use a scrollbar. It could use images, icons, and variable-width fonts. It might want to use the DomTerm line-wrap indicator rather than its own, as the former would work better with selections and copying. The DomTerm protocol could be extended to support sub-windows, each with its own line-wrapping and scrollbar. Emacs would create a DomTerm sub-window for each Emacs window in the same top-level window (frame in Emacs-speak).

The Atom text editor framework is written in JavaScript. There is a terminal emulator component for it, but DomTerm would provide more functionality.

Status and future

This project is young. The code is quite functional, but you wouldn't want to use it for your day-to-day terminal emulator — unless you're adventurous and like to help and contribute. If so, please check out source code and the home page with documentation.

Index entries for this article
GuestArticles	Bothner, Per

DomTerm: A web-based rich terminal emulator

Posted Jan 7, 2016 10:45 UTC (Thu) by Seegras (guest, #20463) [Link] (12 responses)

Two questions:
- Performance. Speed? Memory-Footprint? Load?
- Security. Can we now get victim of cross-side-scripting on our shells? ;)

DomTerm: A web-based rich terminal emulator

Posted Jan 7, 2016 18:16 UTC (Thu) by Per_Bothner (subscriber, #7375) [Link] (11 responses)

It's pretty zippy on Chrome. Doing ls -lR is pretty instantaneous. Profiling suggests the line-breaking calculation is a significant part of the execution time, because it forces the browser to prematurely calculate layout. That can be improved by doing it more lazily (for example at each line change rather than each style change). However, things are fast enough that it hasn't been a priority.

Doing ls -lR in Firefox may cause an annoying pause. Profiling so for hasn't given any clues.

Memory-footprint: Who cares? There are no big images, videos, or bloated JavaScript frameworks, so it's nothing compared to regular browsing. Likewise load is not a problem. (There is bug in the Java WebSocket server that causes it to spin in certain situations, but that shouldn't affect normal usage. Regardless, I plan to replace it with a C-based server.)

Security: One new threat (besides those for any terminal emulator) is if you ssh into a compromised machine. You don't want it to be able to inject malicious JavaScript into your DomTerm session. A simple option is for DomTerm to not allow user-supplied JavaScript at all. Instead, the user program can register a call-back, and get an event notification. GraphTerm puts a cookie string in the environment. Without the cookie (which wouldn't be set if you ssh to another machine), it won't accept JavaScript.

DomTerm doesn't use any browser-side extensions, so you do get the regular JavaScript sandbox.

DomTerm: A web-based rich terminal emulator

Posted Jan 8, 2016 1:19 UTC (Fri) by joey (guest, #328) [Link] (10 responses)

No need to ssh into a compromised machine, catting a text file would be enough for an XSS attack, or even a PNG decoder attack in this terminal.

DomTerm: A web-based rich terminal emulator

Posted Jan 8, 2016 4:53 UTC (Fri) by Per_Bothner (subscriber, #7375) [Link] (9 responses)

"catting a text file would be enough for an XSS attack"
That is an argument for not allowing "printed" JavaScript to be imported to the browser - or at least being very cautious about it. I don't think it's an argument for not allowing "printed" HTML in the browser, since I don't think there is a fundamental difference (security-wise) between catting text and catting (JavaScript-free) HTML.

"even a PNG decoder attack"
I don't understand your point. You seem to recommend not running any programs that decode PNG images because the program might have an exploitable bug. Good luck with that.

It's called attack surface

Posted Jan 8, 2016 14:06 UTC (Fri) by hummassa (guest, #307) [Link] (8 responses)

> "even a PNG decoder attack"
> I don't understand your point. You seem to recommend not running any programs that decode PNG images because the program might have an exploitable bug. Good luck with that.

I think they recommend not to use any TERMINAL EMULATORS that decode PNG in the process of cat'ting a random text file.

It's called attack surface

Posted Jan 8, 2016 16:17 UTC (Fri) by Per_Bothner (subscriber, #7375) [Link] (7 responses)

"I think they recommend not to use any TERMINAL EMULATORS that decode PNG in the process of cat'ting a random text file."
First of all: Don't cat random text files - that can screw up any terminal emulator. Second: If you're worried about a terminal emulator that decodes PNGs, then don't use a web browser, because DomTerm is just a web browser application. DomTerm itself doesn't decode images. (Though I'm considering adding a decoder for Sixel graphics.)

It's called attack surface

Posted Jan 8, 2016 17:51 UTC (Fri) by hummassa (guest, #307) [Link] (6 responses)

That was a big non sequitur.

The problem is that, by using DomTerm, you are effectively using a web browser (and henceforth image decoding libraries, etc, with a huge attack surface) during the time you would be using a terminal emulator (that should have a much smaller attack surface, and that is -- usually -- used with greater confidence for that exact reason).

Anyway, I do cat/less/view README* files all the time. Don't you?

It's called attack surface

Posted Jan 8, 2016 18:33 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (1 responses)

> Anyway, I do cat/less/view README* files all the time. Don't you?

The `less` and `view` commands, at least, will suppress escape codes by default, so there shouldn't be any issue there. Raw `cat` of untrusted files can be a problem, even for ordinary terminals. There have been security issues with escape codes in the past, e.g. "log to file" or "print" escapes which accepted a pathname parameter, allowing remote applications (or `cat`ed text files) to write to local paths they wouldn't otherwise have access to. However, I'm not worried about malicious image files so much as embedded scripts. All the input appears to the browser to be coming from the same local server, so there is no XSS protection, and scrubbing arbitrary HTML of anything that might contain executable code sounds like a lot of work, as code can appear in many places: <script> tags, event attributes, javascript: URLs, scripts embedded in SVG files, CSS expressions, etc. If any of that is missed, malicious code could run arbitrary commands as the user logged in to the terminal. The only secure solution is probably a whitelist of acceptable tags and attributes, and in many cases attribute values as well (e.g. URLs).

It's called attack surface

Posted Jan 8, 2016 19:05 UTC (Fri) by Per_Bothner (subscriber, #7375) [Link]

GraphTerm requires a magic cookie to be passed along with HTML (except certain limited exceptions). That seems a reasonable change that should substantially reduce the "attack surface": For example if you cat a file it won't have the magic cookie. I have not implemented the GraphTerm mechanism, but I'm hoping to, both for security reasons, and (longer-term) to port GraphTerm to use DomTerm (which has the blessing of GraphTerm's author).

It's called attack surface

Posted Jan 8, 2016 18:50 UTC (Fri) by Per_Bothner (subscriber, #7375) [Link] (3 responses)

"The problem is that, by using DomTerm, you are effectively using a web browser ... during the time you would be using a terminal emulator ... that is -- usually -- used with greater confidence."

Nobody (to a rough approximation) compartmentalizes their computer use that way.

"Anyway, I do cat/less/view README* files all the time. Don't you?"

Using less converts escape sequences to a harmless "quoted" form: Escape characters in the file do not get sent to the terminal - whether it's xterm or DomTerm. You should never cat a random file to the terminal — you never know what escape sequences might be in those files, and what they might do to your terminal.

It's called attack surface

Posted Jan 9, 2016 13:20 UTC (Sat) by hummassa (guest, #307) [Link]

> Nobody (to a rough approximation) compartmentalizes their computer use that way.

Should I consider myself insulted by the "Nobody"? ;)

> Using less converts escape sequences to a harmless "quoted" form

Somewhere in some .dotfiles one can have LESS=-r

> You should never cat a random file to the terminal — you never know what escape sequences might be in those files, and what they might do to your terminal.

That's my point. Before DomTerm, only a very devilishly (and somewhat visibly [*]) crafted text file would wreak such havoc in my terminal that it could find a security vulnerability and exploit it. After, a file with an img tag pointing to a (seemingly innocent) image can hide a serious exploit.

[*] if file(1) returns something different than ASCII text, ASCII english text, etc, for instance, it could be a clue; my file manager usually display a different icon for each file.

It's called attack surface

Posted Jan 12, 2016 21:22 UTC (Tue) by ballombe (subscriber, #9523) [Link] (1 responses)

I run all my terminals in separated linux console and iceweasel in tty12, so this is pretty well compartmentalized.

It's called attack surface

Posted Jan 12, 2016 22:15 UTC (Tue) by Per_Bothner (subscriber, #7375) [Link]

Good for you. However, I stand by my claim: "Nobody (to a rough approximation) compartmentalizes their computer use that way." I'm pretty confident that less than a fraction of 1% of those who actually use terminal emulators do what you do.

FWIW: I'm in the process of implementing an HTML scrubber. The parser and error framework seem to work. On error, the invalid text is inserted as plain text with a pink background, rather than silentry failing. The policy is handled by two over-ridable functions (one to vet elements, one to vet attributes). Still to do is defining the default policy: which elements and which attributes to allow.

The parser and checker are pretty simple: 130 lines, not counting the policy functions. It is simple because I require well-formed XML, I disallow HTML comments or processing directives, and I don't need to interpret entity references. It's also very efficient: a single scan of the string, without use of regexps.

DomTerm: A web-based rich terminal emulator

Posted Jan 14, 2016 10:03 UTC (Thu) by hitmark (guest, #34609) [Link] (3 responses)

I have in the past pondered a terminal/console that could do inline graphics, but going whole hog *ML/CSS seems like overkill to say the least.

DomTerm: A web-based rich terminal emulator

Posted Jan 14, 2016 16:22 UTC (Thu) by raven667 (subscriber, #5198) [Link]

It may be overkill compared to something purpose built for the requirements but it also uses existing technology that has more resources pushing its maturity than some bespoke system will ever have.

DomTerm: A web-based rich terminal emulator

Posted Feb 19, 2016 21:18 UTC (Fri) by ThomasBellman (guest, #67902) [Link] (1 responses)

> I have in the past pondered a terminal/console that could do
> inline graphics

You mean something like https://en.wikipedia.org/wiki/Tektronix_4010 ?

Also, the MGR windowing system (https://en.wikipedia.org/wiki/ManaGeR) used escape codes to do graphics within normal terminal windows.

DomTerm: A web-based rich terminal emulator

Posted Feb 20, 2016 3:16 UTC (Sat) by nybble41 (subscriber, #55106) [Link]

>> I have in the past pondered a terminal/console that could do inline graphics
> You mean something like https://en.wikipedia.org/wiki/Tektronix_4010 ?

Fun fact: xterm has a mode where it supports Tektronix escape codes, so most Linux users probably already have a terminal installed with support for inline graphics. Try installing the plotutils package to get some sample files, run "xterm -t", and send one of the samples directly to the terminal with:

$ gzip -dc /usr/share/doc/plotutils/tek2plot/dmerc.tek.gz

(Debian here, but the same should work in other distros.) This should display a celestial tracking plot superimposed on a map of the globe.

Tektronix mode can also be enabled through the menus, but the -t option has the additional side-effect of setting TERM properly. There is no scrolling, so the window gradually fills up with superimposed graphics; press Ctrl-L or run the "clear" command to clear the window.

DomTerm: A web-based rich terminal emulator

Posted Jan 25, 2016 18:42 UTC (Mon) by b7j0c (guest, #27559) [Link]

i love this and have always wanted something "userspace" to augment the javascript console etc

if i understand correctly, the chromeos ssh extension is also uses web technologies (possibly also NaCL)

DomTerm: A web-based rich terminal emulator

Posted Jan 26, 2016 21:31 UTC (Tue) by seanMcGrath (guest, #1563) [Link]

related: http://xiki.org/