DomTerm: A web-based rich terminal emulator
We're bad at marketingWe can admit it, marketing is not our strong suit. Our strength is writing the kind of articles that developers, administrators, and free-software supporters depend on to know what is going on in the Linux world. Please subscribe today to help us keep doing that, and so we don’t have to get good at marketing.
Many LWN readers likely still use their trusty terminal emulators a lot: a command console (read-eval-print loop or REPL) is a handy and useful way to explore a system, try out code, chat, and start programs with many options. Terminal emulators are also a useful platform for running interactive programs (Midnight Commander, Emacs, top, etc.), in a way that is economical of screen real estate without clutter. They are also keyboard-oriented, available for multiple platforms, and network-friendly.
However, terminal user interfaces (UIs) are rather minimalist, both as a REPL and as a platform, and make poor use of modern technology. You might like to use different fonts (including variable-width fonts) or embed images, icons, graphs, and tables. A terminal UI can provide input fields, and mouse-clickable buttons and menus, but interactive elements are otherwise limited, and the look-and-feel is sparse and restricted to working at the character-cell level (i.e. rows and columns). Anything beyond a simple REPL is tedious to program, requiring working with low-level character abstractions and libraries that few programmers know.
At the other extreme, the web platform
(HTML5 and friends) is quite rich, flexible,
and multi-platform.
It also has lots of tooling and experienced programmers with
multiple high-quality competing implementations.
Many applications these days are web applications
,
which can now run standalone or disconnected.
The web platform provides a nice toolkit and is especially strong for
applications that are heavily text-oriented — like terminal emulators.
Also note that Cascading Style Sheets (CSS) is a flexible way to model user
preferences,
such as fonts, colors, and sizes.
In other words: When implementing a terminal emulator, it makes a lot of sense to use the web platform as a rich-text toolkit even if all you want is a plain xterm replacement. Of course that means you would have the option to use all of the goodies of the web: graphics, fonts, and fancy event handlers. Another benefit is that you get a high-level data model and programming interface that works at the level of elements and events rather than characters.
Now it is nice to be able to run text-mode Emacs in a browser
(as seen in a later screenshot).
However, those of us who use terminals day-to-day want something
that looks and acts like a general-purpose standalone terminal emulator.
That means one without an URL bar or back button, for example,
and that can be installed in /usr/bin.
Luckily there are multiple embedded
browsers available,
which implement the web-technology stack
as a library callable from a standalone program written
in C, C++, Java — or even JavaScript.
We also need it to be more-or-less xterm-compatible.
Then we can add features to make use of the web platform.
This is the path blazed by xterm: first implement a VTxxx/ANSI-compatible
terminal emulator, and then add features (such as mouse handling) to take
advantage of X11.
The updated plan: start with a full-featured xterm clone, and then
add functionality that takes advantage of the web platform.
DomTerm is my attempt to
modernize and extend the terminal emulator using the web platform.
The core of DomTerm is a JavaScript class that
handles the terminal escapes and UI.
The class can run as-is in a browser,
or it can be used with an embedded browser to create
a standalone application (with menus and chrome
suited to a terminal emulator rather than a browser).
DomTerm is so-named because it works
by manipulating the Document Object Model (DOM) of a browser:
the nested structure of elements, text, and attributes.
(Also, most of the other appropriate names were already taken it seems.)
Beyond plain text
The previous screenshot shows some simple possibilities of DomTerm.
Most obviously, you see printing of an SVG circle.
The underlined text in DomTerm is cool!
is a clickable link, printed out by the echo command above it.
In general, if you send these characters to DomTerm (where \e is the ASCII
escape character and \a is the alarm/bell character):
\e]72;html-text\a
it will parse
html-text
as raw HTML and insert it in the display at the cursor position.
The same screenshot has hide/show buttons —
the output from ls -l is hidden.
More about that later.
The text colors you see (prompt text and input text) are
not hardwired by the application or by DomTerm,
but can be controlled by stylesheets (even after the fact).
The style indicators and the hide/show buttons were enabled
by adding some escape sequences to the PS1 prompt.
The second screenshot shows how a REPL
can adapt to the richer DomTerm functionality.
We see Kawa started from
the shell running on DomTerm.
The expressions evaluate Paintable values,
which are graphical concepts like filled polygons.
When a Paintable is printed,
Kawa checks for a DOMTERM environment variable.
If it is set, Kawa converts the Paintable to SVG,
and sends the SVG (wrapped in an escape sequence) to DomTerm,
which displays the image inline.
Kawa also has XML literals, which are expressions starting with #<
that evaluate to Element objects. If you print
an Element to a DomTerm console, it
inserts its HTML representation. The following prints three values:
a string, then an Element, then a string:
#|kawa:2|# "Read " #<a href='http://lwn.net'>LWN</a> "."
Read LWN.
Going further, IPython (Jupyter) shows us an enhanced REPL based on the web platform. It would be nice if you could could run IPython directly in your terminal without having to fire up a browser window. One reason is seamless usage: imagine starting up a language REPL in your terminal emulator because you just want to do some simple calculations — and then you realize you want to see a graph visualization. Oops, better start over. It is also nice to have control over when you create a new window or tab, and when you continue in an existing window. Finally it would be nice to have the kind of notebook-saving, smart history, and line editing in all your console applications, such as the shell.
Structured output
A traditional terminal manages a fixed-size two-dimensional array of
characters.
A rich-text console, instead, works on a list of variable-length lines.
Things get interesting when you add additional hierarchical structure.
For example, lines associated with a single command can
be grouped together. This is what enables hide/show buttons to
control multiple lines.
A planned extension to this concept would enable inspectors
of tree-structured data.
Details would be initially hidden, but when you click to show details,
DomTerm would send a request to the application.
An application could optionally delimit
each field in an output line with in a separate <span>
element, using the class attribute to mark the field's type.
This can be used for styling (like syntax coloring, but more powerful)
with styles that can be changed even after an application finishes.
Semantic structuring of the output is useful for anything that
wants to analyze or replay the session, and therefore
makes a good base for a notebook
(in the IPython sense).
A plausible notebook document format is an HTML file (for
simple documents) or a zip archive of an HTML file with
images and other resources (for more complex documents).
In its day, XMLterm attempted to implement a similar concept, but it was before its time, and depended too much on Mozilla internals. The archived website suggests some ideas (such as pagelets) that could easily be handled by the DomTerm design. The author of XMLTerm, Ramalingam Saravanan, more recently implemented GraphTerm.
Microsoft's PowerShell lets commands produce structured objects. When output
is printed on a regular terminal, it is converted to plain text.
We can do much better when printing to DomTerm. For example, we
can create an HTML <table>. We can also change the
output style dynamically; for example, we can re-layout old output
when the window size changes.
DomTerm architecture
The core of DomTerm is terminal.js,
which is a class written in JavaScript.
It receives characters from an application, interprets
escape sequences,
and updates the DOM structure of elements, attributes, and text.
Those changes cause the browser engine to update the display,
as controlled by the stylesheets.
The DomTerm class also handles keyboard and mouse events,
and sends them to the application.
The public API for this class is extremely simple:
you need to supply functions for the input and output character streams.
There are functions for reporting special events,
but the default implementation just encodes the events in the
character streams.
This JavaScript can run in a general-purpose web browser such as Firefox or Chrome. Alternatively, an embedded browser library can be used. The library provides a basic GUI that wraps the guts of a browser, which lets the programmer add their own menus and default actions. In that case, the browser and the application can all reside in a single process.
The application can be any program that reads and writes character streams — by default a shell. On Unix-like systems you can optionally run the application attached to a PTY (pseudoterminal). Windows doesn't have true PTYs, so people use various hacks instead. If any application knows it is talking to DomTerm, it doesn't actually need PTYs, but could instead communicate directly.
The browser and the application could be separated by a network. Any character-stream connection can be used. The WebSocket protocol is attractive because it is built into modern web browsers. All you need to run DomTerm in a browser is a small HTML wrapper.
The DomTerm distribution includes
various utility classes
and programs to run an application (such as a shell) either integrated
with an embedded browser, or started by a WebSocket server.
In the latter case, you open a terminal
by pointing your
favorite browser to the server's address. These are just sample utilities;
better ones will hopefully be written.
Keystrokes and line-editing
DomTerm handles keyboard input in two basic modes:
- Character mode is like the terminal driver's raw mode:
Each keystroke is sent directly to the backend.
The front-end does not otherwise process the keystroke.
Some keystroke events may send multiple characters.
For example, the left-arrow key sends "
\e[D" while shift-F5 sends "\e[15;5~". (It is unclear why that is — I just do what xterm does.) - Line mode is like the terminal driver's canonical mode:
DomTerm maintains a current input line. This is a
<span>element with thecontenteditable="true"attribute. A keystroke will cause insertion, cursor motion, or other updates to the input line<span>. This can use either the browser's default action, or it can be overridden with custom editing actions. When the user types Enter, the contents of the input line are sent to the back-end, and a fresh empty input line is created. (I'm ignoring complications related to echoing.)
Line mode provides the basic functionality of GNU readline or Emacs shell mode, including history, but no tab-completion support yet.
In addition, the PTY back-end supports automatic mode. This switches automatically between line and character mode depending on the inferior terminal: if it is in raw mode, the keystroke is handled in character mode; if it is in canonical mode, the keystroke is handled in line mode. (This requires a somewhat complicated dance, because the terminal driver doesn't notify us about its state — we have to ask it, after we get a keystroke event.)
Finally, pipe mode is like line mode, but the back-end is fed the data in a pipe, and does no echoing.
Other features — current and planned
DomTerm already has most of the functionality of a terminal emulator.
It implements a large subset of base xterm
— the basic
functionality of emulators that set TERM=xterm-256color.
Much (but not all) of the vttest torture test works.
It runs Emacs; mc (Midnight Commander) (pictured at right);
and of course readline.
The most important missing piece is probably xterm-style mouse-event
handling, plus lots of testing and bug-fixing.
Many terminals, including xterm, remember line-wrapping
(the difference between an explicit newline and a soft newline caused
by wrapping around when writing past the right margin),
which allows correct copy-and-paste.
DomTerm also re-wraps old lines when the window is resized.
This is highly desirable for a true console
, but terminal
emulators don't always do this. DomTerm also indicates line wraps with
a special marker (controllable by a stylesheet), similar to
the way Emacs does it.
In the future, I plan to do intelligent line-breaking
based on the structure of the line contents:
pretty-printing
a complex value like a nested list means
line-breaks in allowable places (indicated with escape sequences),
so that each sub-value
fits on as few lines as possible, with appropriate indentation and nesting.
This too will automatically reflow on window resize.
A builtin paginator like the more program would
be convenient. This is like old-fashioned XON/XOFF flow control:
when there is more than a screenful of output since the last user
interaction, further output is paused until the user asks for more
.
A problem with traditional flow control is users don't expect it,
and think the system is frozen.
One solution is to write a button at the end of the output, like:
An application may also want to update part of the previous output.
A plain terminal does that using row/column addressing,
but for complex structures it is better to identify a location logically.
The plan is to support moving the cursor to an Element
with a specified id attribute.
We also want to allow replacing the contents of a named node.
(Though there are some namespace and lifetime management issues of
id attributes that would need to be resolved.)
There is also a need to be able to print
buttons and other
interactive elements. The plan is to define a protocol to associate
an event handler with an element in the output.
That would be done by associating a JavaScript event handler with the element.
When the handler is triggered, it encodes the interesting properties
of the event (probably using JSON), and sends it to the input of the
user process.
Application ideas
A terminal-using application that wishes to benefit from DomTerm
should just continue doing normal terminal things, but in addition
check that the DOMTERM environment variable is non-empty.
When it makes sense it could then output images, links, interactive buttons,
or switch to a non-monospace font.
Consider info, the venerable GNU documentation browser,
or the man command.
If either detects DOMTERM, it could emit the HTML version of
the documentation page in a much more readable format.
It can still use the same key-bindings and user interface.
(This would probably require some enhancements to the DomTerm
byte-stream protocol, such as to communicate scrolling appropriately.)
Emacs in a DomTerm window could be nicer than using just a plain terminal emulator.
For example, it could use a scrollbar. It could use
images, icons, and variable-width fonts. It might want to use the
DomTerm line-wrap indicator rather than its own, as the former
would work better with selections and copying.
The DomTerm protocol could be extended to
support sub-windows, each with its own line-wrapping and scrollbar.
Emacs would create a DomTerm sub-window for each Emacs window in the same
top-level window (frame
in Emacs-speak).
The Atom text editor framework is written in JavaScript. There is a terminal emulator component for it, but DomTerm would provide more functionality.
Status and future
This project is young. The code is quite functional,
but you wouldn't want to use it for your day-to-day terminal emulator
— unless you're adventurous and like to help and contribute.
If so, please check out source code
and the home page with documentation.
| Index entries for this article | |
|---|---|
| GuestArticles | Bothner, Per |
