<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://spidermonkey.dev/feed.xml" rel="self" type="application/atom+xml" /><link href="https://spidermonkey.dev/" rel="alternate" type="text/html" /><updated>2026-03-20T15:18:22+00:00</updated><id>https://spidermonkey.dev/feed.xml</id><title type="html">SpiderMonkey JavaScript/WebAssembly Engine</title><subtitle>SpiderMonkey is Mozilla&apos;s JavaScript and WebAssembly Engine, used in Firefox, Servo and various other projects. It is written in C++ and Rust.</subtitle><entry><title type="html">Flipping Responsibility for Jobs in SpiderMonkey</title><link href="https://spidermonkey.dev/blog/2026/01/15/job-responsibility.html" rel="alternate" type="text/html" title="Flipping Responsibility for Jobs in SpiderMonkey" /><published>2026-01-15T17:00:00+00:00</published><updated>2026-01-15T17:00:00+00:00</updated><id>https://spidermonkey.dev/blog/2026/01/15/job-responsibility</id><content type="html" xml:base="https://spidermonkey.dev/blog/2026/01/15/job-responsibility.html"><![CDATA[<p><em>This blog post is written both as a heads-up to embedders of SpiderMonkey, and an explanation of why the changes are coming</em></p>

<p>As an embedder of SpiderMonkey one of the decisions you have to make is whether or not to provide your own implementation of the job queue.</p>

<p>The responsibility of the job queue is to hold pending jobs for <code class="language-plaintext highlighter-rouge">Promise</code>s, which in the HTML spec are called ‘microtasks’. For embedders, the status quo of 2025 was two options:</p>

<ol>
  <li>Call <a href="https://searchfox.org/firefox-main/rev/5917a9f2af3294b27a325371c5c499e7dd9554fd/js/src/jsfriendapi.h#203"><code class="language-plaintext highlighter-rouge">JS::UseInternalJobQueues</code></a>, and then at the appropriate point for your embedding, call <a href="https://searchfox.org/firefox-main/rev/5917a9f2af3294b27a325371c5c499e7dd9554fd/js/src/jsfriendapi.h#203"><code class="language-plaintext highlighter-rouge">JS::RunJobs</code></a>. This uses an internal job queue and drain function.</li>
  <li>Subclass and implement the <a href="https://searchfox.org/firefox-main/rev/5917a9f2af3294b27a325371c5c499e7dd9554fd/js/public/Promise.h#34"><code class="language-plaintext highlighter-rouge">JS::JobQueue</code> type</a>, storing and invoking your own jobs. An embedding might want to do this if they wanted to add their own jobs, or had particular needs for the shape of jobs and data carried alongside them.</li>
</ol>

<p>The goal of this blog post is to indicate that SpiderMonkey’s handling of <code class="language-plaintext highlighter-rouge">Promise</code> jobs is changing over the next little while, and explain a bit of why.</p>

<p>If you’ve chosen to use the internal job queue, almost nothing should change for your embedding. If you’ve provided your own job queue, read on:</p>

<h1 id="whats-changing">What’s Changing</h1>

<ol>
  <li>The actual type of a job from the JS engine is changing to be opaque.</li>
  <li>The responsibility for actually <em>storing</em> the <code class="language-plaintext highlighter-rouge">Promise</code> jobs is moving from the embedding, even in the case of an embedding provided JobQueue.</li>
  <li>As a result of (1), the interface to <em>run</em> a job from the queue is also changing.</li>
</ol>

<p>I’ll cover this in a bit more detail, but a good chunk of the interface discussed is in <a href="https://searchfox.org/firefox-main/rev/8e6b6cb1dd0fdd9838e2359219e2b8d3b84490b2/js/public/friend/MicroTask.h"><code class="language-plaintext highlighter-rouge">MicroTask.h</code></a> (this link is to a specific revision because I expect the header to move).</p>

<p>For most embeddings the changes turn out to be very mechanical. If you have specific challenges with your embedding please reach out.</p>

<h2 id="job-type">Job Type</h2>

<p>The type of a JS <code class="language-plaintext highlighter-rouge">Promise</code> job has been a <code class="language-plaintext highlighter-rouge">JSFunction</code>, and thus invoked with <code class="language-plaintext highlighter-rouge">JS::Call</code>. The job type is changing to an opaque type. The external interface to this type will be <code class="language-plaintext highlighter-rouge">JS::Value</code> (<code class="language-plaintext highlighter-rouge">typedef</code>’d as <code class="language-plaintext highlighter-rouge">JS::GenericMicroTask</code>);</p>

<p>This means that if you’re an embedder who had been storing your own tasks in the same queue as JS tasks you’ll still be able to, but you’ll need to use the queue access APIs in MicroTask.h. A queue entry is simply a <code class="language-plaintext highlighter-rouge">JS::Value</code> and so an arbitrary C address can be stored in it as a <code class="language-plaintext highlighter-rouge">JS::PrivateValue</code>.</p>

<p>Jobs now are split into two types: <code class="language-plaintext highlighter-rouge">JSMicroTasks</code> (enqueued by the JS engine) and GenericMicroTasks (possibly JS engine provided, possibly embedding provided).</p>

<h2 id="storage-responsibility">Storage Responsibility</h2>

<p>It used to be that if an embedding provided its own JobQueue, we’d expect them to store the jobs and trace the queue. Now that an embedding finds that the queue is inside the engine, the model is changing to one where the embedding must ask the JS engine to store jobs it produces outside of promises if it would like to share the job queue.</p>

<h2 id="running-micro-tasks">Running Micro Tasks</h2>

<p>The basic loop of microtask execution now looks like this:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">JS</span><span class="o">::</span><span class="n">Rooted</span><span class="o">&lt;</span><span class="n">JSObject</span><span class="o">*&gt;</span> <span class="n">executionGlobal</span><span class="p">(</span><span class="n">cx</span><span class="p">)</span>
<span class="n">JS</span><span class="o">::</span><span class="n">Rooted</span><span class="o">&lt;</span><span class="n">JS</span><span class="o">::</span><span class="n">GenericMicroTask</span><span class="o">&gt;</span> <span class="n">genericTask</span><span class="p">(</span><span class="n">cx</span><span class="p">);</span>
<span class="n">JS</span><span class="o">::</span><span class="n">Rooted</span><span class="o">&lt;</span><span class="n">JS</span><span class="o">::</span><span class="n">JSMicroTask</span><span class="o">&gt;</span> <span class="n">jsTask</span><span class="p">(</span><span class="n">cx</span><span class="p">);</span>

<span class="k">while</span> <span class="p">(</span><span class="n">JS</span><span class="o">::</span><span class="n">HasAnyMicroTasks</span><span class="p">(</span><span class="n">cx</span><span class="p">))</span> <span class="p">{</span>
  <span class="n">genericTask</span> <span class="o">=</span> <span class="n">JS</span><span class="o">::</span><span class="n">DequeueNextMicroTask</span><span class="p">(</span><span class="n">cx</span><span class="p">);</span> 

  <span class="k">if</span> <span class="p">(</span><span class="n">JS</span><span class="o">::</span><span class="n">IsJSMicroTask</span><span class="p">(</span><span class="n">genericTask</span><span class="p">))</span> <span class="p">{</span>
    <span class="n">jsMicroTask</span> <span class="o">=</span> <span class="n">JS</span><span class="o">::</span><span class="n">ToMaybeWrappedJSMicroTask</span><span class="p">(</span><span class="n">genericMicroTask</span><span class="p">);</span>
    <span class="n">executionGlobal</span> <span class="o">=</span> <span class="n">JS</span><span class="o">::</span><span class="n">GetExecutionGlobalFromJSMicroTask</span><span class="p">(</span><span class="n">jsMicroTask</span><span class="p">);</span>

    <span class="p">{</span>
      <span class="n">AutoRealm</span> <span class="n">ar</span><span class="p">(</span><span class="n">cx</span><span class="p">,</span> <span class="n">executionGlobal</span><span class="p">);</span>
      <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">JS</span><span class="o">::</span><span class="n">RunJSMicroTask</span><span class="p">(</span><span class="n">cx</span><span class="p">,</span> <span class="n">jsMicroTask</span><span class="p">))</span> <span class="p">{</span>
        <span class="c1">// Handle job execution failure in the </span>
        <span class="c1">// same way JS::Call failure would have been</span>
        <span class="c1">// handled</span>
      <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">continue</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="c1">// Handle embedding jobs as appropriate. </span>
<span class="p">}</span>
</code></pre></div></div>

<p>The abstract separation of the execution global is required to handle cases with many compartments and complicated realm semantics (aka a web browser).</p>

<h2 id="an-example">An example</h2>

<p>In order to see roughly what the changes would look like, I attempted to patch <a href="https://gitlab.gnome.org/GNOME/gjs/">GJS</a>, the GNOME JS embedding which uses SpiderMonkey.</p>

<p>The patch is <a href="https://gist.github.com/mgaudet/ae38a457d7d26b07f599e3f13f3b57e0">here</a>. It doesn’t build due to other incompatibilities I found, but this is the rough shape of a patch for an embedding. As you can see, it’s fairly self contained with not too much work to be done.</p>

<h1 id="why-change">Why Change?</h1>

<p>In a word, performance. The previous form of <code class="language-plaintext highlighter-rouge">Promise</code> job management is very heavyweight with lots of overhead, causing performance to suffer.</p>

<p>The changes made here allow us to make SpiderMonkey quite a bit faster for dealing with <code class="language-plaintext highlighter-rouge">Promise</code>s, and unlock the potential to get even faster.</p>

<h2 id="how-do-the-changes-help">How do the changes help?</h2>

<p>Well, perhaps the most important change here is making the job representation opaque. What this allows us to do is use pre-existing objects as stand-ins for the jobs. This means that rather than having to allocate a new object for every job (which is costly) we can some of the time actually allocate nothing, simply enqueing an existing job with enough information to run.</p>

<p>Owning the queue will also allow us to choose the most efficient data structure for JS execution, potentially changing opaquely in the future as we find better choices.</p>

<p>Empirically, changing from the old microtask queue system to the new in Firefox led to an improvement of up to 45% on <code class="language-plaintext highlighter-rouge">Promise</code> heavy microbenchmarks.</p>

<h1 id="is-this-it">Is this it?</h1>

<p>I do not think this is the end of the story for changes in this area. I plan further investment. Aspirationally I would like this all to be stabilized by the next ESR release which is <a href="https://whattrainisitnow.com/calendar/">Firefox 153, which will ship to beta in June</a>, but only time will tell what we can get done.</p>

<p>Future changes I can predict are things like</p>

<ol>
  <li>Renaming <code class="language-plaintext highlighter-rouge">JS::JobQueue</code> which is now more of a ‘jobs interface’</li>
  <li>Renaming the MicroTask header to be less HTML specific</li>
</ol>

<p>However, I can also imagine making more changes in the pursuit of performance.</p>

<h2 id="whats-the-bug-for-this-work">What’s the bug for this work</h2>

<p>You can find most of the work related to this under <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1983153">Bug 1983153 (sm-µ-task)</a></p>

<h1 id="an-apology">An Apology</h1>

<p>My apologies to those embedders who will have to do some work during this transition period. Thank you for sticking with SpiderMonkey!</p>]]></content><author><name>Matthew Gaudet</name></author><summary type="html"><![CDATA[Changes are coming to job handling in SpiderMonkey]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" /><media:content medium="image" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Who needs Graphviz when you can build it yourself?</title><link href="https://spidermonkey.dev/blog/2025/10/28/iongraph-web.html" rel="alternate" type="text/html" title="Who needs Graphviz when you can build it yourself?" /><published>2025-10-28T17:00:00+00:00</published><updated>2025-10-28T17:00:00+00:00</updated><id>https://spidermonkey.dev/blog/2025/10/28/iongraph-web</id><content type="html" xml:base="https://spidermonkey.dev/blog/2025/10/28/iongraph-web.html"><![CDATA[<link rel="stylesheet" href="/assets/js/iongraph/main.css" />

<style>
  .full-width {
    width: calc(min(100vw, 1280px) - (45px * 2));
  }

  .flex {
    display: flex;
  }

  .g2 {
    gap: 0.5rem;
  }

  .ba {
    border: 1px solid var(--color-primary);
  }

  .flex-column {
    flex-direction: column;
  }

  details {
    contain: inline-size;
  }

  @media (min-width: 920px) {
    .flex-row-ns {
      flex-direction: row;
    }
  }

  #livegraph-container {
    display: none;
    align-items: stretch;
    height: 75vh;
  }

  #livegraph-available {
    display: none;
  }

  @media (min-width: 920px) {
    #livegraph-container {
      display: flex;
    }

    #livegraph-available {
      display: block;
    }

    #livegraph-unavailable {
      display: none;
    }

    #livegraph-preview {
      display: none;
    }
  }

  #js-input {
    width: 34em;
    font-size: 0.8em;
    overflow: auto;
  }

  #js-input .prism-code-editor {
    height: 100%;
  }

  #graph-container-container {
    flex-grow: 1;
    background-color: white;
    overflow: hidden;
    position: relative;
    font-size: 0.8em;
    color: black;
  }

  #graph-container {
    position: absolute;
    left: 0;
    top: 0;
    right: 0;
    bottom: 0;
    padding: 1em;
  }

  #legend {
    position: absolute;
    left: 0;
    right: 0;
    bottom: 0;
    padding: 0.5em 1em 1em;
    background-color: white;
    border-top: 1px solid black;
    display: flex;
    flex-direction: column;
  }

  #legend input {
    outline: none;
  }

  #pass-slider {
    flex-grow: 1;
  }

  .demoblock {
    width: 64px;
    height: 48px;
    border: 1px solid black;
    position: absolute;
    left: 0;
    top: 0;
    background-color: white;
    display: flex;
    justify-content: center;
    align-items: center;
    padding-top: 0.2rem;

    &.stack {
      background-color: #ddd;
    }

    &.current {
      background-color: #dfd;
    }

    &::after {
      content: "";
      position: absolute;
      background-color: black;
      top: 0;
      left: 0;
      right: 0;
      height: 0.2rem;
    }

    &.loopheader::after {
      background-color: #1fa411;
    }
  }

  .demodummy {
    width: 11px;
    height: 11px;
    border: 1px solid black;
    background-color: white;
    border-radius: 100px;
    position: absolute;
    left: 0;
    top: 0;
  }
</style>

<p>We recently overhauled our internal tools for visualizing the compilation of JavaScript and WebAssembly. When SpiderMonkey’s optimizing compiler, Ion, is active, we can now produce interactive graphs showing exactly how functions are processed and optimized.</p>

<div id="livegraph-available">
  <p>You can play with these graphs right here on this page. Simply write some JavaScript code in the <code>test</code> function and see what graph is produced. You can click and drag to navigate, ctrl-scroll to zoom, and drag the slider at the bottom to scrub through the optimization process.</p>
  <p>As you experiment, take note of how stable the graph layout is, even as the sizes of blocks change or new structures are added. Try clicking a block's title to select it, then drag the slider and watch the graph change while the block remains in place. Or, click an instruction's number to highlight it so you can keep an eye on it across passes.</p>
</div>

<div style="contain: inline-size">
  <div id="livegraph-container" class="full-width ba">
    <div id="js-input"></div>
    <div id="graph-container-container">
      <div id="graph-container"></div>
      <div id="legend">
        <span id="pass-name">&nbsp;</span>
        <div class="flex g2">
          <button id="pass-prev" disabled="">Prev</button>
          <input id="pass-slider" type="range" list="pass-slider-markers" value="0" disabled="" />
          <datalist id="pass-slider-markers"></datalist>
          <button id="pass-next" disabled="">Next</button>
        </div>
      </div>
    </div>
  </div>
</div>

<p><img id="livegraph-preview" alt="Example iongraph output" src="/assets/img/iongraph-preview.png" style="max-width: min(100%, 871px)" /></p>

<script async="" type="module">
  import { run } from "/assets/js/iongraph/main.js";
  let alreadyLoaded = false;

  function tryLoad() {
    if (window.innerWidth >= 920 && !alreadyLoaded) {
      run(`
        .prism-code-editor {
          height: 100%;
        }
      `);
      alreadyLoaded = true;
    }
  }
  tryLoad();

  window.addEventListener("resize", () => {
    tryLoad();
  });
</script>

<p>We are not the first to visualize our compiler’s internal graphs, of course, nor the first to make them interactive. But I was not satisfied with the output of common tools like <a href="https://graphviz.org/">Graphviz</a> or <a href="https://mermaid.js.org/">Mermaid</a>, so I decided to create a layout algorithm specifically tailored to our needs. The resulting algorithm is simple, fast, produces surprisingly high-quality output, and can be implemented in less than a thousand lines of code. The purpose of this article is to walk you through this algorithm and the design concepts behind it.</p>

<div id="livegraph-unavailable">
  <p><i>Read this post on desktop to see an interactive demo of iongraph.</i></p>
</div>

<h2 id="background">Background</h2>

<p>As readers of this blog already know, SpiderMonkey has several tiers of execution for JavaScript and WebAssembly code. The highest tier is known as Ion, an optimizing SSA compiler that takes the most time to compile but produces the highest-quality output.</p>

<p>Working with Ion frequently requires us to visualize and debug the SSA graph. Since 2011 we have used a tool for this purpose called <a href="https://github.com/sstangl/iongraph">iongraph</a>, built by Sean Stangl. It is a simple Python script that takes a JSON dump of our compiler graphs and uses Graphviz to produce a PDF. It is perfectly adequate, and very much the status quo for compiler authors, but unfortunately the Graphviz output has many problems that make our work tedious and frustrating.</p>

<p>The first problem is that the Graphviz output rarely bears any resemblance to the source code that produced it. Graphviz will place nodes wherever it feels will minimize error, resulting in a graph that snakes left and right seemingly at random. There is no visual intuition for how deeply nested a block of code is, nor is it easy to determine which blocks are inside or outside of loops. Consider the following function, and its Graphviz graph:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span> <span class="nx">foo</span><span class="p">(</span><span class="nx">n</span><span class="p">)</span> <span class="p">{</span>
  <span class="kd">let</span> <span class="nx">result</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
  <span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="nx">n</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!!</span><span class="p">(</span><span class="nx">i</span> <span class="o">%</span> <span class="mi">2</span><span class="p">))</span> <span class="p">{</span>
      <span class="nx">result</span> <span class="o">=</span> <span class="mh">0x600DBEEF</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="nx">result</span> <span class="o">=</span> <span class="mh">0xBADBEEF</span><span class="p">;</span>
    <span class="p">}</span>
  <span class="p">}</span>

  <span class="k">return</span> <span class="nx">result</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="ba" style="overflow-y: auto; max-height: 80vh; background-color: white; padding: 1rem; display: flex; justify-content: center; align-items: flex-start">
  <img src="/assets/img/iongraph-example1-orig.svg" style="width: min(36rem, 100%)" />
</div>

<p>Counterintuitively, the <code class="language-plaintext highlighter-rouge">return</code> appears <em>before</em> the two assignments in the body of the loop. Since this graph mirrors JavaScript control flow, we’d expect to see the return at the bottom. This problem only gets worse as graphs grow larger and more complex.</p>

<p>The second, related problem is that Graphviz’s output is unstable. Small changes to the input can result in large changes to the output. As you page through the graphs of each pass within Ion, nodes will jump left and right, true and false branches will swap, loops will run up the right side instead of the left, and so on. This makes it very hard to understand the actual effect of any given pass. Consider the following before and after, and notice how the second graph is almost—but not quite—a mirror image of the first, despite very minimal changes to the graph’s structure:</p>

<div style="contain: inline-size">
  <div class="full-width ba flex-column flex-row-ns" style="overflow: auto; background-color: white; padding: 1rem; display: flex; gap: 1rem; min-width: 100%; max-height: 80vh">
    <img src="/assets/img/iongraph-example2-before.svg" style="max-width: initial; height: 40rem" />
    <img src="/assets/img/iongraph-example2-after.svg" style="max-width: initial; height: 40rem" />
  </div>
</div>

<p>None of this felt right to me. Control flow graphs should be able to follow the structure of the program that produced them. After all, a control flow graph has many restrictions that a general-purpose tool would not be aware of: they have very few cycles, all of which are well-defined because they come from loops; furthermore, both JavaScript and WebAssembly have reducible control flow, meaning all loops have only one entry, and it is not possible to jump directly into the middle of a loop. This information could be used to our advantage.</p>

<p>Beyond that, a static PDF is far from ideal when exploring complicated graphs. Finding the inputs or uses of a given instruction is a tedious and frustrating exercise, as is following arrows from block to block. Even just zooming in and out is difficult. I eventually concluded that we ought to just build an interactive tool to overcome these limitations.</p>

<h2 id="how-hard-could-layout-be">How hard could layout be?</h2>

<p>I had one false start with graph layout, with an algorithm that attempted to sort blocks into vertical “tracks”. This broke down quickly on a variety of programs and I was forced to go back to the drawing board—in fact, back to the source of the very tool I was trying to replace.</p>

<p>The algorithm used by <code class="language-plaintext highlighter-rouge">dot</code>, the typical hierarchical layout mode for Graphviz, is known as the Sugiyama layout algorithm, from a 1981 paper by Sugiyama et al. As introduction, I found a short series of <a href="https://www.youtube.com/watch?v=3_FbSCWLC3A&amp;list=PLubYOWSl9mIvoXDwf_Wqcrvlg15N_AWQE&amp;index=38">lectures</a> that broke down the Sugiyama algorithm into 5 steps:</p>

<ol>
  <li><strong>Cycle breaking</strong>, where the direction of some edges are flipped in order to produce a <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">DAG</a>.</li>
  <li><strong>Leveling</strong>, where vertices are assigned into horizontal layers according to their depth in the graph, and dummy vertices are added to any edge that crosses multiple layers.</li>
  <li><strong>Crossing minimization</strong>, where vertices on a layer are reordered in order to minimize the number of edge crossings.</li>
  <li><strong>Vertex positioning</strong>, where vertices are horizontally positioned in order to make the edges as straight as possible.</li>
  <li><strong>Drawing</strong>, where the final graph is rendered to the screen.</li>
</ol>

<p><img src="/assets/img/kindermann.png" alt="A screenshot from the lectures, showing the five steps above" /></p>

<p>These steps struck me as surprisingly straightforward, and provided useful opportunities to insert our own knowledge of the problem:</p>

<ul>
  <li>Cycle breaking would be trivial for us, since the only cycles in our data are loops, and loop backedges are explicitly labeled. We could simply ignore backedges when laying out the graph.</li>
  <li>Leveling would be straightforward, and could easily be modified to better mimic the source code. Specifically, any blocks coming after a loop in the source code could be artificially pushed down in the layout, solving the confusing early-exit problem.</li>
  <li>Permuting vertices to reduce edge crossings was actually just a bad idea, since our goal was stability from graph to graph. The true and false branches of a condition should always appear in the same order, for example, and a few edge crossings is a small price to pay for this stability.</li>
  <li>Since reducible control flow ensures that a program’s loops form a tree, vertex positioning could ensure that loops are always well-nested in the final graph.</li>
</ul>

<p>Taken all together, these simplifications resulted in a remarkably straightforward algorithm, with the <a href="https://github.com/mozilla-spidermonkey/iongraph/blob/fc27ee3e8f3bd3c020aaf2498de9a260da089bc1/src/Graph.ts">initial implementation</a> being just 1000 lines of JavaScript. (See this <a href="https://x.com/its_bvisness/status/1957565307809329465?s=46">demo</a> for what it looked like at the time.) It also proved to be very efficient, since it avoided the most computationally complex parts of the Sugiyama algorithm.</p>

<h2 id="iongraph-from-start-to-finish">iongraph from start to finish</h2>

<p>We will now go through the entire iongraph layout algorithm. Each section contains explanatory diagrams, in which rectangles are basic blocks and circles are dummy nodes. Loop header blocks (the single entry point to each loop) are additionally colored green.</p>

<p>Be aware that the block positions in these diagrams are not representative of the actual computed layout position at each point in the process. For example, vertical positions are not calculated until the very end, but it would be hard to communicate what the algorithm was doing if all blocks were drawn on a single line!</p>

<h3 id="step-1-layering">Step 1: Layering</h3>

<p>We first sort the basic blocks into horizontal tracks called “layers”. This is very simple; we just start at layer 0 and recursively walk the graph, incrementing the layer number as we go. As we go, we track the “height” of each loop, not in pixels, but in layers.</p>

<p>We also take this opportunity to vertically position nodes “inside” and “outside” of loops. Whenever we see an edge that exits a loop, we defer the layering of the destination block until we are done layering the loop contents, at which point we know the loop’s height.</p>

<p>A note on implementation: nodes are visited multiple times throughout the process, not just once. This can produce a quadratic explosion for large graphs, but I find that an early-out is sufficient to avoid this problem in practice.</p>

<p>The animation below shows the layering algorithm in action. Notice how the final block in the graph is visited twice, once after each loop that branches to it, and in each case, the block is deferred until the entire loop has been layered, rather than processed immediately after its predecessor block. The final position of the block is below the entirety of both loops, rather than directly below one of its predecessors as Graphviz would do. (Remember, horizontal and vertical positions have not yet been computed; the positions of the blocks in this diagram are hardcoded for demonstration purposes.)</p>

<details>
<summary>Implementation pseudocode</summary>
<div data-codeblock="layering"></div>
</details>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*CODEBLOCK=layering*/</span><span class="kd">function</span> <span class="nx">layerBlock</span><span class="p">(</span><span class="nx">block</span><span class="p">,</span> <span class="nx">layer</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// Omitted for clarity: special handling of our "backedge blocks"</span>

  <span class="c1">// Early out if the block would not be updated</span>
  <span class="k">if</span> <span class="p">(</span><span class="nx">layer</span> <span class="o">&lt;=</span> <span class="nx">block</span><span class="p">.</span><span class="nx">layer</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">return</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="c1">// Update the layer of the current block</span>
  <span class="nx">block</span><span class="p">.</span><span class="nx">layer</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">max</span><span class="p">(</span><span class="nx">block</span><span class="p">.</span><span class="nx">layer</span><span class="p">,</span> <span class="nx">layer</span><span class="p">);</span>

  <span class="c1">// Update the heights of all loops containing the current block</span>
  <span class="kd">let</span> <span class="nx">header</span> <span class="o">=</span> <span class="nx">block</span><span class="p">.</span><span class="nx">loopHeader</span><span class="p">;</span>
  <span class="k">while</span> <span class="p">(</span><span class="nx">header</span><span class="p">)</span> <span class="p">{</span>
    <span class="nx">header</span><span class="p">.</span><span class="nx">loopHeight</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">max</span><span class="p">(</span><span class="nx">header</span><span class="p">.</span><span class="nx">loopHeight</span><span class="p">,</span> <span class="nx">block</span><span class="p">.</span><span class="nx">layer</span> <span class="o">-</span> <span class="nx">header</span><span class="p">.</span><span class="nx">layer</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
    <span class="nx">header</span> <span class="o">=</span> <span class="nx">header</span><span class="p">.</span><span class="nx">parentLoopHeader</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="c1">// Recursively layer successors</span>
  <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">succ</span> <span class="k">of</span> <span class="nx">block</span><span class="p">.</span><span class="nx">successors</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="nx">succ</span><span class="p">.</span><span class="nx">loopDepth</span> <span class="o">&lt;</span> <span class="nx">block</span><span class="p">.</span><span class="nx">loopDepth</span><span class="p">)</span> <span class="p">{</span>
      <span class="c1">// Outgoing edges from the current loop will be layered later</span>
      <span class="nx">block</span><span class="p">.</span><span class="nx">loopHeader</span><span class="p">.</span><span class="nx">outgoingEdges</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">succ</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="nx">layerBlock</span><span class="p">(</span><span class="nx">succ</span><span class="p">,</span> <span class="nx">layer</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
    <span class="p">}</span>
  <span class="p">}</span>

  <span class="c1">// Layer any outgoing edges only after the contents of the loop have</span>
  <span class="c1">// been processed</span>
  <span class="k">if</span> <span class="p">(</span><span class="nx">block</span><span class="p">.</span><span class="nx">isLoopHeader</span><span class="p">())</span> <span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">succ</span> <span class="k">of</span> <span class="nx">block</span><span class="p">.</span><span class="nx">outgoingEdges</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">layerBlock</span><span class="p">(</span><span class="nx">succ</span><span class="p">,</span> <span class="nx">layer</span> <span class="o">+</span> <span class="nx">block</span><span class="p">.</span><span class="nx">loopHeight</span><span class="p">);</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<style>
  @media (max-width: 525px) {
    #layeranim {
      transform: scale(0.7);
      transform-origin: top left;
    }
  }
</style>

<div id="layeranim" class="ba" style="background-color: white; width: 476px; height: 410px; position: relative; margin: 1rem auto">
  <svg id="layerarrows" style="position: absolute; left: 0; top: 0; width: 100%; height: 100%"></svg>
</div>

<script type="module">
  import {
    downwardArrow,
    arrowFromBlockToBackedgeDummy,
    upwardArrow,
    arrowToBackedge,
    loopHeaderArrow,
    filerp,
  } from "/assets/js/iongraph/main.js";

  const blocks = [
    { id: 0, x: 20, y: 20, loopDepth: 0, loopHeader: null, successors: [1, 4] },
    { id: 1, x: 20, y: 20, loopDepth: 0, loopHeader: null, successors: [2] },
    { id: 2, x: 60, y: 20, loopDepth: 1, loopHeader: 2, successors: [3, 9], isLoopHeader: true },
    { id: 3, x: 160, y: 20, loopDepth: 1, loopHeader: 2, successors: [2], isBackedge: true },
    { id: 4, x: 280, y: 20, loopDepth: 0, loopHeader: null, successors: [5] },
    { id: 5, x: 280, y: 20, loopDepth: 1, loopHeader: 5, successors: [6, 9], isLoopHeader: true },
    { id: 6, x: 310, y: 20, loopDepth: 1, loopHeader: 5, successors: [7, 8] },
    { id: 7, x: 310, y: 20, loopDepth: 1, loopHeader: 5, successors: [8] },
    { id: 8, x: 380, y: 20, loopDepth: 1, loopHeader: 5, successors: [5], isBackedge: true },
    { id: 9, x: 70, y: 20, loopDepth: 0, loopHeader: null, successors: [] },
  ];

  // const blocks = [
  //   { id: 0,  x: 20, y: 20, loopDepth: 0, loopHeader: null, successors: [1] },
  //   { id: 1,  x: 20, y: 20, loopDepth: 1, loopHeader: 1,    successors: [2, 10], isLoopHeader: true },
  //   { id: 2,  x: 40, y: 20, loopDepth: 1, loopHeader: 1,    successors: [3, 4] },
  //   { id: 3,  x: 40, y: 20, loopDepth: 1, loopHeader: 1,    successors: [8, 5, 6, 7] },
  //   { id: 4,  x: 160, y: 20, loopDepth: 1, loopHeader: 1,    successors: [10] },
  //   { id: 5,  x: 50, y: 20, loopDepth: 1, loopHeader: 1,    successors: [8] },
  //   { id: 6,  x: 100, y: 20, loopDepth: 1, loopHeader: 1,    successors: [8] },
  //   { id: 7,  x: 150, y: 20, loopDepth: 1, loopHeader: 1,    successors: [8] },
  //   { id: 8,  x: 40, y: 20, loopDepth: 1, loopHeader: 1,    successors: [9] },
  //   { id: 9,  x: 100, y: 20, loopDepth: 1, loopHeader: 1,    successors: [1], isBackedge: true },
  //   { id: 10, x: 20, y: 20, loopDepth: 0, loopHeader: null, successors: [] },
  // ];

  const container = document.querySelector("#layeranim");
  for (let i = blocks.length - 1; i >= 0; i--) {
    const el = document.createElement("div");
    el.classList.add("demoblock");
    el.classList.toggle("loopheader", !!blocks[i].isLoopHeader);
    el.setAttribute("data-blockid", blocks[i].id);
    container.appendChild(el);
  }

  let gas = 0;
  function reset(newGas) {
    gas = newGas;
    for (const block of blocks) {
      block.layer = -1;
      block.targetY = 0;
      if (block.isLoopHeader) {
        block.loopHeight = 0;
        block.outgoingEdges = [];
      }
    }

    for (const el of container.querySelectorAll(".demoblock")) {
      el.classList.remove("stack", "current");
    }
  }
  function layerBlock(block, layer = 0) {
    function markCurrent() {
      for (const other of container.querySelectorAll(".demoblock")) {
        other.classList.remove("current");
      }
      el.classList.add("current");
    }

    gas -= 1;
    if (gas <= 0) {
      throw new Error("out of gas");
    }

    const el = container.querySelector(`.demoblock[data-blockid="${block.id}"]`);
    el.classList.add("stack");

    if (block.isBackedge) {
      block.layer = blocks[block.successors[0]].layer;
      el.classList.remove("stack");
      markCurrent();
      return;
    }

    // Early out if the block would not be updated
    if (layer <= block.layer) {
      el.classList.remove("stack");
      return;
    }

    // Update the layer of the current block
    block.layer = Math.max(block.layer, layer);
    markCurrent();

    // Update the heights of all loops containing the current block
    let header = blocks[block.loopHeader];
    while (header) {
      header.loopHeight = Math.max(header.loopHeight, block.layer - header.layer + 1);
      header = blocks[header.parentLoopHeader];
    }

    // Recursively layer successors
    for (const succ of block.successors) {
      if (blocks[succ].loopDepth < block.loopDepth) {
        // Outgoing edges from the current loop will be layered later
        blocks[block.loopHeader].outgoingEdges.push(succ);
      } else {
        layerBlock(blocks[succ], layer + 1);
      }
    }

    // Layer any outgoing edges only after the contents of the loop have
    // been processed
    if (block.isLoopHeader) {
      for (const succ of block.outgoingEdges) {
        layerBlock(blocks[succ], layer + block.loopHeight);
      }
    }

    el.classList.remove("stack");
  }

  (async function() {
    while (true) {
      for (let i = 0; i < 100; i++) {
        reset(i);

        try {
          layerBlock(blocks[0]);
        } catch (e) {
          if (e.message !== "out of gas") {
            throw e;
          }
        }

        for (const block of blocks) {
          // Leave block.x untouched for this demo.
          block.targetY = Math.max(block.layer, 0) * 64 + 20;
        }

        if (gas > 0) {
          break;
        }
        await new Promise(res => setTimeout(res, 600));
      }

      for (const other of container.querySelectorAll(".demoblock")) {
        other.classList.remove("current");
      }
      await new Promise(res => setTimeout(res, 3000));
    }
  })();

  (async function() {
    const svg = container.querySelector("#layerarrows");
    
    let lastTime = performance.now();
    while (true) {
      const now = await new Promise(res => requestAnimationFrame(res));
      const dt = (now - lastTime) / 1000;
      lastTime = now;

      // Check if animation is on screen
      const rect = svg.getBoundingClientRect();
      if (rect.bottom < 0 || rect.top > window.innerHeight) {
        continue;
      }

      // Lerp block positions
      for (const block of blocks) {
        const R = 0.000001; // fraction remaining after one second: smaller = faster

        const el = container.querySelector(`.demoblock[data-blockid="${block.id}"]`);
        block.y = filerp(block.y, block.targetY, R, dt);
        el.style.transform = `translate(${block.x}px, ${block.y}px)`;
      }

      svg.innerHTML = "";
      for (const block of blocks) {
        if (block.layer > -1) {
          for (const [i, succ] of block.successors.entries()) {
            const x1 = block.x + 5 + i * 10;
            const y1 = block.y + 48;
            if (blocks[succ].isBackedge) {
              const x2 = blocks[succ].x + 64;
              const y2 = blocks[succ].y + 5;
              svg.appendChild(arrowFromBlockToBackedgeDummy(x1, y1, x2 + 10, y1, y1 + 8, 5));
              svg.appendChild(upwardArrow(x2 + 10, y1, x2 + 10, y2 + 16, (y1 + y2) / 2, 5));
              svg.appendChild(arrowToBackedge(x2 + 10, y2 + 16, x2, y2, 5, 2));
            } else if (block.isBackedge && blocks[succ].isLoopHeader) {
              const x1 = block.x;
              const y1 = block.y + 5;
              const x2 = blocks[succ].x + 64;
              const y2 = blocks[succ].y + 5;
              svg.appendChild(loopHeaderArrow(x1, y1, x2, y2, 5, 2));
            } else {
              const x2 = blocks[succ].x + 5;
              const y2 = blocks[succ].y;
              svg.appendChild(downwardArrow(x1, y1, x2, y2, y2 - 8, 5, true, 2));
            }
          }
        }
      }
    }
  })();
</script>

<h3 id="step-2-create-dummy-nodes">Step 2: Create dummy nodes</h3>

<p>Any time an edge crosses a layer, we create a dummy node. This allows edges to be routed across layers without overlapping any blocks. Unlike in traditional Sugiyama, we always put downward dummies on the left and upward dummies on the right, producing a consistent “counter-clockwise” flow. This also makes it easy to read long vertical edges, whose direction would otherwise be ambiguous. (Recall how the loop backedge flipped from the right to the left in the “unstable layout” Graphviz example from before.)</p>

<p>In addition, we coalesce any edges that are going to the same destination by merging their dummy nodes. This heavily reduces visual noise.</p>

<!--
Interesting program:

function test(n) {
  let result = 0;
  early:
  for (let i = n; i >= -10; i--) {
    if (i > 0) {
      switch (i % 3) {
        case 0:
          result += 1;
          break;
        case 1:
          result -= 1;
          break;
        case 2:
          result *= 3;
          break;
      }
    } else {
      result -= 2;
      break;
    }
    // result += 10;
  }
  return result;
}
-->

<div id="dummydiagram" class="ba" style="background-color: white; width: 344px; height: 478px; position: relative; margin: 1rem auto">
  <svg id="dummyarrows" style="position: absolute; left: 0; top: 0; width: 100%; height: 100%"></svg>
</div>
<script type="module">
  import {
    downwardArrow,
    arrowFromBlockToBackedgeDummy,
    upwardArrow,
    arrowToBackedge,
    loopHeaderArrow,
  } from "/assets/js/iongraph/main.js";
  const blocks = [
    // Blocks
    { id: 0,   layer: 1, successors: [1] },
    { id: 1,   layer: 2, successors: [2, 300], isLoopHeader: true },
    { id: 2,   layer: 3, successors: [3, 4] },
    { id: 3,   layer: 4, successors: [501, 5, 6, 7] },
    { id: 4,   layer: 4, successors: [500] },
    { id: 5,   layer: 5, successors: [8] },
    { id: 6,   layer: 5, successors: [8] },
    { id: 7,   layer: 5, successors: [8] },
    { id: 8,   layer: 6, successors: [601] },
    { id: 9,   layer: 2, successors: [1] },
    { id: 10,  layer: 7, successors: [] },
    // Dummies
    { id: 200, layer: 2, successors: [9],   upward: true },
    { id: 300, layer: 3, successors: [400], upward: false },
    { id: 301, layer: 3, successors: [200], upward: true },
    { id: 400, layer: 4, successors: [500], upward: false },
    { id: 401, layer: 4, successors: [301], upward: true },
    { id: 500, layer: 5, successors: [600], upward: false },
    { id: 501, layer: 5, successors: [8],   upward: false },
    { id: 502, layer: 5, successors: [401], upward: true },
    { id: 600, layer: 6, successors: [10],  upward: false },
    { id: 601, layer: 6, successors: [502], upward: true },
  ];
  const rows = [];
  for (let layer = 1; layer < 10; layer++) {
    rows.push([
      ...blocks.filter(b => b.layer === layer && b.upward === false),
      ...blocks.filter(b => b.layer === layer && b.upward === undefined),
      ...blocks.filter(b => b.layer === layer && b.upward === true),
    ]);
  }
  const container = document.querySelector("#dummydiagram");
  let x = 10, y = 10;
  for (const row of rows) {
    x = 10;
    for (const block of row) {
      block.x = x;
      block.y = y;
      const el = document.createElement("div");
      const isDummy = block.upward !== undefined;
      el.classList.add(isDummy ? "demodummy" : "demoblock");
      el.classList.toggle("loopheader", !!block.isLoopHeader);
      el.style.left = `${x}px`;
      el.style.top = `${y}px`;
      container.appendChild(el);
      x += (isDummy ? 10 : 64) + 20;
    }
    y += 48 + 20;
  }
  const svg = document.querySelector("#dummyarrows");
  for (const block of blocks) {
    for (const [i, succID] of block.successors.entries()) {
      const succ = blocks.find(b => b.id === succID);
      const x1 = block.x + 5 + i * 10;
      const y1 = block.upward ? block.y : block.y + (block.upward === undefined ? 48 : 10);
      if (succ.upward) {
        const x2 = succ.x + 5;
        const y2 = succ.y + 10;
        if (block.upward) {
          svg.appendChild(upwardArrow(x1, y1, x2, y2, y1 - 8, 5));
        } else {
          svg.appendChild(arrowFromBlockToBackedgeDummy(x1, y1, x2, y2, y1 + 8, 5));
        }
      } else if (block.upward && succ.upward === undefined) {
        const x2 = succ.x + 64;
        const y2 = succ.y + 5;
        svg.appendChild(arrowToBackedge(x1, y1 + 10, x2, y2, 5, 2));
      } else if (block.layer === succ.layer) {
        const x1 = block.x;
        const y1 = block.y + 5;
        const x2 = succ.x + 64;
        const y2 = succ.y + 5;
        svg.appendChild(loopHeaderArrow(x1, y1, x2, y2, 5, 2));
      } else {
        const x2 = succ.x + 5;
        const y2 = succ.y;
        svg.appendChild(downwardArrow(x1, y1, x2, y2, y2 - 8, 5, true, 2));
      }
    }
  }
</script>

<h3 id="step-3-straighten-edges">Step 3: Straighten edges</h3>

<p>This is the fuzziest and most ad-hoc part of the process. Basically, we run lots of small passes that walk up and down the graph, aligning layout nodes with each other. Our edge-straightening passes include:</p>

<ul>
  <li>Pushing nodes to the right of their loop header to “indent” them.</li>
  <li>Walking a layer left to right, moving children to the right to line up with their parents. If any nodes overlap as a result, they are pushed further to the right.</li>
  <li>Walking a layer right to left, moving parents to the right to line up with their children. This version is more conservative and will not move a node if it would overlap with another. This cleans up most issues from the first pass.</li>
  <li>Straightening runs of dummy nodes so we have clean vertical lines.</li>
  <li>“Sucking in” dummy runs on the left side of the graph if there is room for them to move to the right.</li>
  <li>Straighten out any edges that are “nearly straight”, according to a chosen threshold. This makes the graph appear less wobbly. We do this by repeatedly “combing” the graph upward and downward, aligning parents with children, then children with parents, and so on.</li>
</ul>

<p>It is important to note that dummy nodes participate fully in this system. If for example you have two side-by-side loops, straightening the left loop’s backedge will push the right loop to the side, avoiding overlaps and preserving the graph’s visual structure.</p>

<p>We do not reach a fixed point with this strategy, nor do we attempt to. I find that if you continue to repeatedly apply these particular layout passes, nodes will wander to the right forever. Instead, the layout passes are hand-tuned to produce decent-looking results for most of the graphs we look at on a regular basis. That said, this could certainly be improved, especially for larger graphs which do benefit from more iterations.</p>

<p>At the end of this step, all nodes have a fixed X-coordinate and will not be modified further.</p>

<style>
  @media (max-width: 440px) {
    #edgediagram {
      transform: scale(0.87);
      transform-origin: top left;
    }
  }
</style>

<div id="edgediagram" class="ba" style="background-color: white; width: 384px; height: 498px; position: relative; margin: 1rem auto">
  <svg id="edgearrows" style="position: absolute; left: 0; top: 0; width: 100%; height: 100%"></svg>
</div>

<script type="module">
  import {
    downwardArrow,
    arrowFromBlockToBackedgeDummy,
    upwardArrow,
    arrowToBackedge,
    loopHeaderArrow,
    straightenEdges,
    filerp,
  } from "/assets/js/iongraph/main.js";

  const blocks = [
    { id: 0,   layer: 1, lh: null, succs: [1] },
    { id: 1,   layer: 2, lh: 1,    succs: [2, 300], isLoopHeader: true },
    { id: 200, layer: 2,           succs: [9],   dummy: true, upward: true,  dst: 9 },
    { id: 300, layer: 3,           succs: [400], dummy: true, upward: false, dst: 10 },
    { id: 2,   layer: 3, lh: 1,    succs: [3, 4] },
    { id: 301, layer: 3,           succs: [200], dummy: true, upward: true,  dst: 9 },
    { id: 400, layer: 4,           succs: [500], dummy: true, upward: false, dst: 10 },
    { id: 3,   layer: 4, lh: 1,    succs: [501, 5, 6, 7] },
    { id: 4,   layer: 4, lh: 1,    succs: [500] },
    { id: 401, layer: 4,           succs: [301], dummy: true, upward: true,  dst: 9 },
    { id: 500, layer: 5,           succs: [600], dummy: true, upward: false, dst: 10 },
    { id: 501, layer: 5,           succs: [8],   dummy: true, upward: false, dst: 8 },
    { id: 5,   layer: 5, lh: 1,    succs: [8] },
    { id: 6,   layer: 5, lh: 1,    succs: [8] },
    { id: 7,   layer: 5, lh: 1,    succs: [8] },
    { id: 502, layer: 5,           succs: [401], dummy: true, upward: true,  dst: 9 },
    { id: 600, layer: 6,           succs: [10],  dummy: true, upward: false, dst: 10 },
    { id: 8,   layer: 6, lh: 1,    succs: [601] },
    { id: 601, layer: 6,           succs: [502], dummy: true, upward: true,  dst: 9 },
    { id: 9,   layer: 2, lh: 1,    succs: [1] },
    { id: 10,  layer: 7, lh: null, succs: [] },
  ];

  let numLayers = 0;
  for (const block of blocks) {
    numLayers = Math.max(numLayers, block.layer);
    block.srcNodes = blocks.filter(b => b.succs.includes(block.id));
    block.dstNodes = block.succs.map(s => blocks.find(b => b.id === s));
    block.loop = block.lh ? blocks.find(b => b.id === block.lh) : null;
    block.dstNode = block.dst ? blocks.find(b => b.id === block.dst) : null;
  }
  const layoutNodesByLayer = [];
  for (let i = 1; i <= numLayers; i++) {
    layoutNodesByLayer.push([
      ...blocks.filter(b => b.layer === i && b.upward === false),
      ...blocks.filter(b => b.layer === i && !b.dummy),
      ...blocks.filter(b => b.layer === i && b.upward === true),
    ]);
  }
  for (let i = 0; i < layoutNodesByLayer.length; i++) {
    for (const node of layoutNodesByLayer[i]) {
      node.x = 20;
      node.y = 20 + (48 + 20) * i;
    }
  }

  const container = document.querySelector("#edgediagram");
  for (const layer of layoutNodesByLayer) {
    for (const node of layer) {
      const el = document.createElement("div");
      el.classList.add(node.dummy ? "demodummy" : "demoblock");
      el.classList.toggle("loopheader", !!node.isLoopHeader);
      el.setAttribute("data-blockid", node.id);
      container.appendChild(el);
    }
  }

  (async function() {
    while (true) {
      for (const [numPasses, delay] of [
        [0, 2000],
        [1, 1000],
        [2, 1000],
        [3, 3000],
      ]) {
        for (const block of blocks) {
          block.x = 20;
        }

        straightenEdges(layoutNodesByLayer, numPasses);
        await new Promise(res => setTimeout(res, delay));
      }
    }
  })();

  (async function() {
    const svg = document.querySelector("#edgearrows");

    let lastTime = performance.now();
    while (true) {
      const now = await new Promise(res => requestAnimationFrame(res));
      const dt = (now - lastTime) / 1000;
      lastTime = now;

      // Check if animation is on screen
      const rect = svg.getBoundingClientRect();
      if (rect.bottom < 0 || rect.top > window.innerHeight) {
        continue;
      }

      // Lerp block positions
      for (const block of blocks) {
        const R = 0.000001; // fraction remaining after one second: smaller = faster

        const el = container.querySelector(`[data-blockid="${block.id}"]`);
        block.xx = filerp(block.xx ?? block.x, block.x, R, dt);
        block.yy = filerp(block.yy ?? block.y, block.y, R, dt);
        el.style.transform = `translate(${block.xx}px, ${block.yy}px)`;
      }

      svg.innerHTML = "";
      for (const block of blocks) {
        for (const [i, succID] of block.succs.entries()) {
          const succ = blocks.find(b => b.id === succID);
          const x1 = block.xx + 5 + i * 10;
          const y1 = block.upward ? block.yy : block.yy + (block.upward === undefined ? 48 : 10);
          if (succ.upward) {
            const x2 = succ.xx + 5;
            const y2 = succ.yy + 10;
            if (block.upward) {
              svg.appendChild(upwardArrow(x1, y1, x2, y2, y1 - 8, 5));
            } else {
              svg.appendChild(arrowFromBlockToBackedgeDummy(x1, y1, x2, y2, y1 + 8, 5));
            }
          } else if (block.upward && succ.upward === undefined) {
            const x2 = succ.xx + 64;
            const y2 = succ.yy + 5;
            svg.appendChild(arrowToBackedge(x1, y1 + 10, x2, y2, 5, 2));
          } else if (block.layer === succ.layer) {
            const x1 = block.xx;
            const y1 = block.yy + 5;
            const x2 = succ.xx + 64;
            const y2 = succ.yy + 5;
            svg.appendChild(loopHeaderArrow(x1, y1, x2, y2, 5, 2));
          } else {
            const x2 = succ.xx + 5;
            const y2 = succ.yy;
            svg.appendChild(downwardArrow(x1, y1, x2, y2, y2 - 8, 5, true, 2));
          }
        }
      }
    }
  })();
</script>

<h3 id="step-4-track-horizontal-edges">Step 4: Track horizontal edges</h3>

<p>Edges may overlap visually as they run horizontally between layers. To resolve this, we sort edges into parallel “tracks”, giving each a vertical offset. After tracking all the edges, we record the total height of the tracks and store it on the preceding layer as its “track height”. This allows us to leave room for the edges in the final layout step.</p>

<p>We first sort edges by their starting position, left to right. This produces a consistent arrangement of edges that has few vertical crossings in practice. Edges are then placed into tracks from the “outside in”, stacking rightward edges on top and leftward edges on the bottom, creating a new track if the edge would overlap with or cross any other edge.</p>

<p>The diagram below is interactive. Click and drag the blocks to see how the horizontal edges get assigned to tracks.</p>

<details>
<summary>Implementation pseudocode</summary>
<div data-codeblock="tracks"></div>
</details>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*CODEBLOCK=tracks*/</span><span class="kd">function</span> <span class="nx">trackHorizontalEdges</span><span class="p">(</span><span class="nx">layer</span><span class="p">)</span> <span class="p">{</span>
  <span class="kd">const</span> <span class="nx">TRACK_SPACING</span> <span class="o">=</span> <span class="mi">20</span><span class="p">;</span>

  <span class="c1">// Gather all edges on the layer, and sort left to right by starting coordinate</span>
  <span class="kd">const</span> <span class="nx">layerEdges</span> <span class="o">=</span> <span class="p">[];</span>
  <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">node</span> <span class="k">of</span> <span class="nx">layer</span><span class="p">.</span><span class="nx">nodes</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">edge</span> <span class="k">of</span> <span class="nx">node</span><span class="p">.</span><span class="nx">edges</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">layerEdges</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">edge</span><span class="p">);</span>
    <span class="p">}</span>
  <span class="p">}</span>
  <span class="nx">layerEdges</span><span class="p">.</span><span class="nx">sort</span><span class="p">((</span><span class="nx">a</span><span class="p">,</span> <span class="nx">b</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="nx">a</span><span class="p">.</span><span class="nx">startX</span> <span class="o">-</span> <span class="nx">b</span><span class="p">.</span><span class="nx">startX</span><span class="p">);</span>

  <span class="c1">// Assign edges to "tracks" based on whether they overlap horizontally with</span>
  <span class="c1">// each other. We walk the tracks from the outside in and stop if we ever</span>
  <span class="c1">// overlap with any other edge.</span>
  <span class="kd">const</span> <span class="nx">rightwardTracks</span> <span class="o">=</span> <span class="p">[];</span> <span class="c1">// [][]Edge</span>
  <span class="kd">const</span> <span class="nx">leftwardTracks</span> <span class="o">=</span> <span class="p">[];</span>  <span class="c1">// [][]Edge</span>
  <span class="nl">nextEdge</span><span class="p">:</span>
  <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">edge</span> <span class="k">of</span> <span class="nx">layerEdges</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">const</span> <span class="nx">trackSet</span> <span class="o">=</span> <span class="nx">edge</span><span class="p">.</span><span class="nx">endX</span> <span class="o">-</span> <span class="nx">edge</span><span class="p">.</span><span class="nx">startX</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="p">?</span> <span class="nx">rightwardTracks</span> <span class="p">:</span> <span class="nx">leftwardTracks</span><span class="p">;</span>
    <span class="kd">let</span> <span class="nx">lastValidTrack</span> <span class="o">=</span> <span class="kc">null</span><span class="p">;</span> <span class="c1">// []Edge | null</span>

    <span class="c1">// Iterate through the tracks in reverse order (outside in)</span>
    <span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="nx">trackSet</span><span class="p">.</span><span class="nx">length</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span><span class="o">--</span><span class="p">)</span> <span class="p">{</span>
      <span class="kd">const</span> <span class="nx">track</span> <span class="o">=</span> <span class="nx">trackSet</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span>
      <span class="kd">let</span> <span class="nx">overlapsWithAnyInThisTrack</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
      <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">otherEdge</span> <span class="k">of</span> <span class="nx">track</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="nx">edge</span><span class="p">.</span><span class="nx">dst</span> <span class="o">===</span> <span class="nx">otherEdge</span><span class="p">.</span><span class="nx">dst</span><span class="p">)</span> <span class="p">{</span>
          <span class="c1">// Assign the edge to this track to merge arrows</span>
          <span class="nx">track</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">edge</span><span class="p">);</span>
          <span class="k">continue</span> <span class="nx">nextEdge</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="kd">const</span> <span class="nx">al</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">min</span><span class="p">(</span><span class="nx">edge</span><span class="p">.</span><span class="nx">startX</span><span class="p">,</span> <span class="nx">edge</span><span class="p">.</span><span class="nx">endX</span><span class="p">);</span>
        <span class="kd">const</span> <span class="nx">ar</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">max</span><span class="p">(</span><span class="nx">edge</span><span class="p">.</span><span class="nx">startX</span><span class="p">,</span> <span class="nx">edge</span><span class="p">.</span><span class="nx">endX</span><span class="p">);</span>
        <span class="kd">const</span> <span class="nx">bl</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">min</span><span class="p">(</span><span class="nx">otherEdge</span><span class="p">.</span><span class="nx">startX</span><span class="p">,</span> <span class="nx">otherEdge</span><span class="p">.</span><span class="nx">endX</span><span class="p">);</span>
        <span class="kd">const</span> <span class="nx">br</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">max</span><span class="p">(</span><span class="nx">otherEdge</span><span class="p">.</span><span class="nx">startX</span><span class="p">,</span> <span class="nx">otherEdge</span><span class="p">.</span><span class="nx">endX</span><span class="p">);</span>
        <span class="kd">const</span> <span class="nx">overlaps</span> <span class="o">=</span> <span class="nx">ar</span> <span class="o">&gt;=</span> <span class="nx">bl</span> <span class="o">&amp;&amp;</span> <span class="nx">al</span> <span class="o">&lt;=</span> <span class="nx">br</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="nx">overlaps</span><span class="p">)</span> <span class="p">{</span>
          <span class="nx">overlapsWithAnyInThisTrack</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
          <span class="k">break</span><span class="p">;</span>
        <span class="p">}</span>
      <span class="p">}</span>

      <span class="k">if</span> <span class="p">(</span><span class="nx">overlapsWithAnyInThisTrack</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">break</span><span class="p">;</span>
      <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="nx">lastValidTrack</span> <span class="o">=</span> <span class="nx">track</span><span class="p">;</span>
      <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">if</span> <span class="p">(</span><span class="nx">lastValidTrack</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">lastValidTrack</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">edge</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="nx">trackSet</span><span class="p">.</span><span class="nx">push</span><span class="p">([</span><span class="nx">edge</span><span class="p">]);</span>
    <span class="p">}</span>
  <span class="p">}</span>

  <span class="c1">// Use track info to apply offsets to each edge for rendering.</span>
  <span class="kd">const</span> <span class="nx">tracksHeight</span> <span class="o">=</span> <span class="nx">TRACK_SPACING</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">max</span><span class="p">(</span>
    <span class="mi">0</span><span class="p">,</span>
    <span class="nx">rightwardTracks</span><span class="p">.</span><span class="nx">length</span> <span class="o">+</span> <span class="nx">leftwardTracks</span><span class="p">.</span><span class="nx">length</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span>
  <span class="p">);</span>
  <span class="kd">let</span> <span class="nx">trackOffset</span> <span class="o">=</span> <span class="o">-</span><span class="nx">tracksHeight</span> <span class="o">/</span> <span class="mi">2</span><span class="p">;</span>
  <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">track</span> <span class="k">of</span> <span class="p">[...</span><span class="nx">rightwardTracks</span><span class="p">.</span><span class="nx">toReversed</span><span class="p">(),</span> <span class="p">...</span><span class="nx">leftwardTracks</span><span class="p">])</span> <span class="p">{</span>
    <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">edge</span> <span class="k">of</span> <span class="nx">track</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">edge</span><span class="p">.</span><span class="nx">offset</span> <span class="o">=</span> <span class="nx">trackOffset</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="nx">trackOffset</span> <span class="o">+=</span> <span class="nx">TRACK_SPACING</span><span class="p">;</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<style>
  #trackdiagram {
    .demoblock {
      cursor: move;
      touch-action: none;
    }
  }
</style>

<div id="trackdiagram" class="ba" style="background-color: white; width: 303px; height: 180px; position: relative; margin: 1rem auto">
  <svg id="trackarrows" style="position: absolute; left: 0; top: 0; width: 100%; height: 100%"></svg>
</div>

<script type="module">
  import {
    downwardArrow,
  } from "/assets/js/iongraph/main.js";

  const blocks = [
    { id: 0, x: 20,  y: 20, succs: [3] },
    { id: 1, x: 104, y: 20, succs: [3, 4] },
    { id: 2, x: 198, y: 20, succs: [5] },

    { id: 3, x: 20,  y: 88, succs: [] },
    { id: 4, x: 114, y: 88, succs: [] },
    { id: 5, x: 198, y: 88, succs: [] },
  ];

  const container = document.querySelector("#trackdiagram");
  const svg = document.querySelector("#trackarrows");
  let startMousePos = { x: 0, y: 0 };
  let lastMousePos = { x: 0, y: 0 };
  let draggingBlock = null;
  for (const block of blocks) {
    block.trackOffsets = new Array(block.succs.length).fill(0);

    const el = document.createElement("div");
    el.classList.add(block.dummy ? "demodummy" : "demoblock");
    el.setAttribute("data-blockid", block.id);
    container.appendChild(el);

    const icon = document.createElement("img");
    icon.src = "/assets/img/drag-lr.svg";
    icon.style.width = "24px";
    icon.style.opacity = 0.25;
    el.appendChild(icon);

    el.addEventListener("pointerdown", e => {
      if (e.pointerType === "mouse" && !(e.button === 0 || e.button === 1)) {
        return;
      }

      e.preventDefault();
      container.setPointerCapture(e.pointerId);
      startMousePos = { x: e.clientX, y: e.clientY };
      lastMousePos = { x: e.clientX, y: e.clientY };
      draggingBlock = block;
    });
  }
  container.addEventListener("pointermove", e => {
    if (!container.hasPointerCapture(e.pointerId)) {
      return;
    }

    const dx = e.clientX - lastMousePos.x;
    draggingBlock.x = Math.max(1, Math.min(236, draggingBlock.x + dx));
    lastMousePos = { x: e.clientX, y: e.clientY };

    renderTracks();
  });
  container.addEventListener("pointerup", e => {
    container.releasePointerCapture(e.pointerId);
  });

  function renderTracks() {
    const PORT_START = 5;
    const PORT_SPACING = 10;
    const TRACK_SPACING = 6;
    const ARROW_RADIUS = 5;

    // Gather all edges on the layer, and sort left to right by starting coordinate
    const layerEdges = [];
    for (const block of blocks) {
      for (const [srcPort, dstID] of block.succs.entries()) {
        const dst = blocks.find(b => b.id === dstID);
        const x1 = block.x + PORT_START + PORT_SPACING * srcPort;
        const x2 = dst.x + PORT_START;
        if (Math.abs(x2 - x1) < 2 * ARROW_RADIUS) {
          // Ignore edges that are narrow enough not to render with a joint.
          continue;
        }
        layerEdges.push({ x1, x2, src: block, srcPort, dst });
      }
    }
    layerEdges.sort((a, b) => a.x1 - b.x1);

    // Assign edges to "tracks" based on whether they overlap horizontally with
    // each other. We walk the tracks from the outside in and stop if we ever
    // overlap with any other edge.
    const rightwardTracks = []; // [][]Edge
    const leftwardTracks = [];  // [][]Edge
    nextEdge:
    for (const edge of layerEdges) {
      const trackSet = edge.x2 - edge.x1 >= 0 ? rightwardTracks : leftwardTracks;
      let lastValidTrack = null; // []Edge | null

      // Iterate through the tracks in reverse order (outside in)
      for (let i = trackSet.length - 1; i >= 0; i--) {
        const track = trackSet[i];
        let overlapsWithAnyInThisTrack = false;
        for (const otherEdge of track) {
          if (edge.dst === otherEdge.dst) {
            // Assign the edge to this track to merge arrows
            track.push(edge);
            continue nextEdge;
          }

          const al = Math.min(edge.x1, edge.x2);
          const ar = Math.max(edge.x1, edge.x2);
          const bl = Math.min(otherEdge.x1, otherEdge.x2);
          const br = Math.max(otherEdge.x1, otherEdge.x2);
          const overlaps = ar >= bl && al <= br;
          if (overlaps) {
            overlapsWithAnyInThisTrack = true;
            break;
          }
        }

        if (overlapsWithAnyInThisTrack) {
          break;
        } else {
          lastValidTrack = track;
        }
      }

      if (lastValidTrack) {
        lastValidTrack.push(edge);
      } else {
        trackSet.push([edge]);
      }
    }

    // Use track info to apply offsets to each edge for rendering.
    const tracksHeight = TRACK_SPACING * Math.max(
      0,
      rightwardTracks.length + leftwardTracks.length - 1,
    );
    let trackOffset = -tracksHeight / 2;
    for (const track of [...rightwardTracks.toReversed(), ...leftwardTracks]) {
      for (const edge of track) {
        edge.src.trackOffsets[edge.srcPort] = trackOffset;
      }
      trackOffset += TRACK_SPACING;
    }

    // total hack!
    for (const block of blocks) {
      if (block.id >= 3) {
        block.y = 20 + 48 + 20 + tracksHeight;
      }
    }

    // Render
    svg.innerHTML = "";
    for (const block of blocks) {
      const el = container.querySelector(`[data-blockid="${block.id}"]`);
      el.style.transform = `translate(${block.x}px, ${block.y}px)`;

      for (const [i, succID] of block.succs.entries()) {
        const succ = blocks.find(b => b.id === succID);
        const x1 = block.x + 5 + i * 10;
        const y1 = block.y + 48;
        const x2 = succ.x + 5;
        const y2 = succ.y;
        svg.appendChild(downwardArrow(x1, y1, x2, y2, (y1 + y2) / 2 + block.trackOffsets[i], 5, true, 2));
      }
    }
  }

  renderTracks();
</script>

<h3 id="step-5-verticalize">Step 5: Verticalize</h3>

<p>Finally, we assign each node a Y-coordinate. Starting at a Y-coordinate of zero, we iterate through the layers, repeatedly adding the layer’s height and its track height, where the layer height is the maximum height of any node in the layer. All nodes within a layer receive the same Y-coordinate; this is simple and easier to read than Graphviz’s default of vertically centering nodes within a layer.</p>

<p>Now that every node has both an X and Y coordinate, the layout process is complete.</p>

<details>
<summary>Implementation pseudocode</summary>
<div data-codeblock="verticalize"></div>
</details>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*CODEBLOCK=verticalize*/</span><span class="kd">function</span> <span class="nx">verticalize</span><span class="p">(</span><span class="nx">layers</span><span class="p">)</span> <span class="p">{</span>
  <span class="kd">let</span> <span class="nx">layerY</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
  <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">layer</span> <span class="k">of</span> <span class="nx">layers</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">let</span> <span class="nx">layerHeight</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">node</span> <span class="k">of</span> <span class="nx">layer</span><span class="p">.</span><span class="nx">nodes</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">node</span><span class="p">.</span><span class="nx">y</span> <span class="o">=</span> <span class="nx">layerY</span><span class="p">;</span>
      <span class="nx">layerHeight</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">max</span><span class="p">(</span><span class="nx">layerHeight</span><span class="p">,</span> <span class="nx">node</span><span class="p">.</span><span class="nx">height</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="nx">layerY</span> <span class="o">+=</span> <span class="nx">layerHeight</span><span class="p">;</span>
    <span class="nx">layerY</span> <span class="o">+=</span> <span class="nx">layer</span><span class="p">.</span><span class="nx">trackHeight</span><span class="p">;</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<style>
  #verticalizediagram {
    .demodummy {
      display: none;
    }
  }

  @media (max-width: 440px) {
    #verticalizediagram {
      transform: scale(0.87);
      transform-origin: top left;
    }
  }
</style>

<div id="verticalizediagram" class="ba" style="background-color: white; width: 384px; height: 518px; position: relative; margin: 1rem auto">
  <svg id="verticalizearrows" style="position: absolute; left: 0; top: 0; width: 100%; height: 100%"></svg>
</div>

<script type="module">
  import {
    downwardArrow,
    arrowFromBlockToBackedgeDummy,
    upwardArrow,
    arrowToBackedge,
    loopHeaderArrow,
    straightenEdges,
    filerp,
  } from "/assets/js/iongraph/main.js";

  const PORT_START = 5;
  const PORT_SPACING = 10;
  const TRACK_SPACING = 4;
  const TRACK_PADDING = 10;
  const ARROW_RADIUS = 5;

  const blocks = [
    { id: 0,   layer: 1, lh: null, succs: [1] },
    { id: 1,   layer: 2, lh: 1,    succs: [2, 300], isLoopHeader: true },
    { id: 200, layer: 2,           succs: [9],   dummy: true, upward: true,  dst: 9 },
    { id: 300, layer: 3,           succs: [400], dummy: true, upward: false, dst: 10 },
    { id: 2,   layer: 3, lh: 1,    succs: [3, 4] },
    { id: 301, layer: 3,           succs: [200], dummy: true, upward: true,  dst: 9 },
    { id: 400, layer: 4,           succs: [500], dummy: true, upward: false, dst: 10 },
    { id: 3,   layer: 4, lh: 1,    succs: [501, 5, 6, 7] },
    { id: 4,   layer: 4, lh: 1,    succs: [500] },
    { id: 401, layer: 4,           succs: [301], dummy: true, upward: true,  dst: 9 },
    { id: 500, layer: 5,           succs: [600], dummy: true, upward: false, dst: 10 },
    { id: 501, layer: 5,           succs: [8],   dummy: true, upward: false, dst: 8 },
    { id: 5,   layer: 5, lh: 1,    succs: [8] },
    { id: 6,   layer: 5, lh: 1,    succs: [8] },
    { id: 7,   layer: 5, lh: 1,    succs: [8] },
    { id: 502, layer: 5,           succs: [401], dummy: true, upward: true,  dst: 9 },
    { id: 600, layer: 6,           succs: [10],  dummy: true, upward: false, dst: 10 },
    { id: 8,   layer: 6, lh: 1,    succs: [601] },
    { id: 601, layer: 6,           succs: [502], dummy: true, upward: true,  dst: 9 },
    { id: 9,   layer: 2, lh: 1,    succs: [1] },
    { id: 10,  layer: 7, lh: null, succs: [] },
  ];

  let numLayers = 0;
  for (const block of blocks) {
    numLayers = Math.max(numLayers, block.layer);
    block.srcNodes = blocks.filter(b => b.succs.includes(block.id));
    block.dstNodes = block.succs.map(s => blocks.find(b => b.id === s));
    block.loop = block.lh ? blocks.find(b => b.id === block.lh) : null;
    block.dstNode = block.dst ? blocks.find(b => b.id === block.dst) : null;
  }
  const layoutNodesByLayer = [];
  for (let i = 1; i <= numLayers; i++) {
    layoutNodesByLayer.push([
      ...blocks.filter(b => b.layer === i && b.upward === false),
      ...blocks.filter(b => b.layer === i && !b.dummy),
      ...blocks.filter(b => b.layer === i && b.upward === true),
    ]);
  }
  for (let i = 0; i < layoutNodesByLayer.length; i++) {
    for (const node of layoutNodesByLayer[i]) {
      node.x = 20;
    }
  }

  const container = document.querySelector("#verticalizediagram");
  for (const layer of layoutNodesByLayer) {
    for (const node of layer) {
      const el = document.createElement("div");
      el.classList.add(node.dummy ? "demodummy" : "demoblock");
      el.classList.toggle("loopheader", !!node.isLoopHeader);
      el.setAttribute("data-blockid", node.id);
      container.appendChild(el);
    }
  }

  // Layout
  for (const block of blocks) {
    block.x = 20;
    block.trackOffsets = new Array(block.succs.length).fill(0);
  }
  straightenEdges(layoutNodesByLayer, 100);

  // Track edges
  const layerTrackHeights = [];
  {
    // Gather all edges on the layer, and sort left to right by starting coordinate
    for (let i = 0; i < layoutNodesByLayer.length; i++) {
      const layerEdges = [];
      for (const block of layoutNodesByLayer[i]) {
        for (const [srcPort, dstID] of block.succs.entries()) {
          const dst = blocks.find(b => b.id === dstID);
          const x1 = block.x + PORT_START + PORT_SPACING * srcPort;
          const x2 = dst.x + PORT_START;
          if (Math.abs(x2 - x1) < 2 * ARROW_RADIUS) {
            // Ignore edges that are narrow enough not to render with a joint.
            continue;
          }
          layerEdges.push({ x1, x2, src: block, srcPort, dst });
        }
      }
      layerEdges.sort((a, b) => a.x1 - b.x1);

      // Assign edges to "tracks" based on whether they overlap horizontally with
      // each other. We walk the tracks from the outside in and stop if we ever
      // overlap with any other edge.
      const rightwardTracks = []; // [][]Edge
      const leftwardTracks = [];  // [][]Edge
      nextEdge:
      for (const edge of layerEdges) {
        const trackSet = edge.x2 - edge.x1 >= 0 ? rightwardTracks : leftwardTracks;
        let lastValidTrack = null; // []Edge | null

        // Iterate through the tracks in reverse order (outside in)
        for (let i = trackSet.length - 1; i >= 0; i--) {
          const track = trackSet[i];
          let overlapsWithAnyInThisTrack = false;
          for (const otherEdge of track) {
            if (edge.dst === otherEdge.dst) {
              // Assign the edge to this track to merge arrows
              track.push(edge);
              continue nextEdge;
            }

            const al = Math.min(edge.x1, edge.x2);
            const ar = Math.max(edge.x1, edge.x2);
            const bl = Math.min(otherEdge.x1, otherEdge.x2);
            const br = Math.max(otherEdge.x1, otherEdge.x2);
            const overlaps = ar >= bl && al <= br;
            if (overlaps) {
              overlapsWithAnyInThisTrack = true;
              break;
            }
          }

          if (overlapsWithAnyInThisTrack) {
            break;
          } else {
            lastValidTrack = track;
          }
        }

        if (lastValidTrack) {
          lastValidTrack.push(edge);
        } else {
          trackSet.push([edge]);
        }
      }

      // Use track info to apply offsets to each edge for rendering.
      const tracksHeight = TRACK_SPACING * Math.max(
        0,
        rightwardTracks.length + leftwardTracks.length - 1,
      );
      let trackOffset = -tracksHeight / 2;
      for (const track of [...rightwardTracks.toReversed(), ...leftwardTracks]) {
        for (const edge of track) {
          edge.src.trackOffsets[edge.srcPort] = trackOffset;
        }
        trackOffset += TRACK_SPACING;
      }

      layerTrackHeights.push(tracksHeight);
    }
  }
  console.log({ layerTrackHeights });

  // Verticalize
  let layerY = 20;
  for (let i = 0; i < layoutNodesByLayer.length; i++) {
    let layerHeight = 0;
    for (const node of layoutNodesByLayer[i]) {
      node.layer = i;
      node.y = layerY;
      layerHeight = Math.max(layerHeight, 48);
    }
    layerY += layerHeight;
    layerY += TRACK_PADDING + layerTrackHeights[i] + TRACK_PADDING;
  }

  // Apply layout
  for (const block of blocks) {
    const el = container.querySelector(`[data-blockid="${block.id}"]`);
    el.style.transform = `translate(${block.x}px, ${block.y}px)`;
  }

  // Render
  const svg = document.querySelector("#verticalizearrows");
  svg.innerHTML = "";
  for (const block of blocks) {
    for (const [i, succID] of block.succs.entries()) {
      const succ = blocks.find(b => b.id === succID);
      const x1 = block.x + 5 + i * 10;
      const y1 = block.upward ? block.y : block.y + (block.upward === undefined ? 48 : 0);
      if (succ.upward) {
        const x2 = succ.x + 5;
        const y2 = succ.y;
        if (block.upward) {
          const succsucc = blocks.find(b => b.id === succ.succs[0]);
          svg.appendChild(upwardArrow(x1, y1, x2, y2 + (succsucc.dummy ? 0 : 10), y1 - 8, 5));
        } else {
          const ym = y1 + TRACK_PADDING + layerTrackHeights[block.layer] / 2 + block.trackOffsets[i];
          svg.appendChild(arrowFromBlockToBackedgeDummy(x1, y1, x2, y2, ym, 5));
        }
      } else if (block.upward && succ.upward === undefined) {
        const x2 = succ.x + 64;
        const y2 = succ.y + 5;
        svg.appendChild(arrowToBackedge(x1, y1 + 10, x2, y2, 5, 2));
      } else if (block.layer === succ.layer) {
        const x1 = block.x;
        const y1 = block.y + 5;
        const x2 = succ.x + 64;
        const y2 = succ.y + 5;
        svg.appendChild(loopHeaderArrow(x1, y1, x2, y2, 5, 2));
      } else {
        const x2 = succ.x + 5;
        const y2 = succ.y;
        const ym = y1 + TRACK_PADDING + layerTrackHeights[block.layer] / 2 + block.trackOffsets[i];
        svg.appendChild(downwardArrow(x1, y1, x2, y2, ym, 5, !succ.dummy, 2));
      }
    }
  }
</script>

<h3 id="step-6-render">Step 6: Render</h3>

<p>The details of rendering are out of scope for this article, and depend on the specific application. However, I wish to highlight a stylistic decision that I feel makes our graphs more readable.</p>

<p>When rendering edges, we use a style inspired by <a href="https://en.wikipedia.org/wiki/Syntax_diagram">railroad diagrams</a>. These have many advantages over the Bézier curves employed by Graphviz. First, straight lines feel more organized and are easier to follow when scrolling up and down. Second, they are easy to route (vertical when crossing layers, horizontal between layers). Third, they are easy to coalesce when they share a destination, and the junctions provide a clear indication of the edge’s direction. Fourth, they always cross at right angles, improving clarity and reducing the need to avoid edge crossings in the first place.</p>

<p>Consider the following example. There are several edge crossings that may traditionally be considered undesirable—yet the edges and their directions remain clear. Of particular note is the vertical junction highlighted in red on the left: not only is it immediately clear that these edges share a destination, but the junction itself signals that the edges are flowing downward. I find this much more pleasant than the “rat’s nest” that Graphviz tends to produce.</p>

<p><img alt="Examples of railroad-diagram edges" src="/assets/img/iongraph-edge-examples-highlighted.png" width="716" /></p>

<h2 id="why-does-this-work">Why does this work?</h2>

<p>It may seem surprising that such a simple (and stupid) layout algorithm could produce such readable graphs, when more sophisticated layout algorithms struggle. However, I feel that the algorithm succeeds <em>because</em> of its simplicity.</p>

<p>Most graph layout algorithms are optimization problems, where error is minimized on some chosen metrics. However, these metrics seem to correlate poorly to readability in practice. For example, it seems good in theory to rearrange nodes to minimize edge crossings. But a predictable order of nodes seems to produce more sensible results overall, and simple rules for edge routing are sufficient to keep things tidy. (As a bonus, this also gives us layout stability from pass to pass.) Similarly, layout rules like “align parents with their children” produce more readable results than “minimize the lengths of edges”.</p>

<p>Furthermore, by rejecting the optimization problem, a human author gains more control over the layout. We are able to position nodes “inside” of loops, and push post-loop content down in the graph, <em>because</em> we reject this global constraint-solver approach. Minimizing “error” is meaningless compared to a human <em>maximizing</em> meaning through thoughtful design.</p>

<p>And finally, the resulting algorithm is simply more efficient. All the layout passes in iongraph are easy to program and scale gracefully to large graphs because they run in roughly linear time. It is better, in my view, to run a fixed number of layout iterations according to your graph complexity and time budget, rather than to run a complex constraint solver until it is “done”.</p>

<p>By following this philosophy, even the worst graphs become tractable. Below is a screenshot of a zlib function, compiled to WebAssembly, and rendered using the old tool.</p>

<p><img alt="spaghetti nightmare!!" src="/assets/img/iongraph-spaghetti-nightmare.png" /></p>

<p>It took about <strong>ten minutes</strong> for Graphviz to produce this spaghetti nightmare. By comparison, iongraph can now lay out this function in <strong>20 milliseconds</strong>. The result is still not particularly beautiful, but it renders thousands of times faster <em>and</em> is much easier to navigate.</p>

<p><img alt="better spaghetti" src="/assets/img/iongraph-zlib.png" /></p>

<p>Perhaps programmers ought to put less trust into magic optimizing systems, especially when a human-friendly result is the goal. Simple (and stupid) algorithms can be very effective when applied with discretion and taste.</p>

<h2 id="future-work">Future work</h2>

<p>We have already integrated iongraph into the Firefox profiler, making it easy for us to view the graphs of the most expensive or impactful functions we find in our performance work. Unfortunately, this is only available in specific builds of the SpiderMonkey shell, and is not available in full browser builds. This is due to architectural differences in how profiling data is captured and the flags with which the browser and shell are built. I would love for Firefox users to someday be able to view these graphs themselves, but at the moment we have no plans to expose this to the browser. However, one bug tracking some related work can be found <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1987005">here</a>.</p>

<p>We will continue to sporadically update iongraph with more features to aid us in our work. We have several ideas for new features, including <a href="https://github.com/mozilla-spidermonkey/iongraph/issues/9">richer navigation</a>, search, and visualization of <a href="https://github.com/mozilla-spidermonkey/iongraph/issues/4">register allocation info</a>. However, we have no explicit roadmap for when these features may be released.</p>

<p>To experiment with iongraph locally, you can run a debug build of the SpiderMonkey shell with <code class="language-plaintext highlighter-rouge">IONFLAGS=logs</code>; this will dump information to <code class="language-plaintext highlighter-rouge">/tmp/ion.json</code>. This file can then be loaded into the <a href="https://mozilla-spidermonkey.github.io/iongraph/">standalone deployment of iongraph</a>. Please be aware that the user experience is rough and unpolished in its current state.</p>

<p>The source code for iongraph can be found on <a href="https://github.com/mozilla-spidermonkey/iongraph">GitHub</a>. If this subject interests you, we would welcome contributions to iongraph and its integration into the browser. The best place to reach us is our <a href="https://chat.mozilla.org/#/room/#spidermonkey:mozilla.org">Matrix chat</a>.</p>

<hr />

<p><em>Thanks to Matthew Gaudet, Asaf Gartner, and Colin Davidson for their feedback on this article.</em></p>

<script>
  // Terrible code to put code blocks inside HTML tags, because our markdown
  // renderer cannot do that.

  const codeblocks = {};
  for (const codeblock of document.querySelectorAll(".highlighter-rouge")) {
    for (const comment of codeblock.querySelectorAll(".cm")) {
      const matches = comment.innerText.match(/CODEBLOCK=([a-zA-Z0-9_]+)/);
      if (matches) {
        comment.remove();
        codeblock.remove();
        codeblocks[matches[1]] = codeblock;
      }
    }
  }
  for (const placeholder of document.querySelectorAll("[data-codeblock]")) {
    const codeblockName = placeholder.getAttribute("data-codeblock");
    placeholder.replaceWith(codeblocks[codeblockName]);
  }
</script>]]></content><author><name>Ben Visness</name></author><summary type="html"><![CDATA[Exploring a new layout algorithm for control flow graphs.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://spidermonkey.dev/assets/img/iongraph-opengraph.png" /><media:content medium="image" url="https://spidermonkey.dev/assets/img/iongraph-opengraph.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">5 Things You Might Not Know about Developing Self-Hosted Code</title><link href="https://spidermonkey.dev/blog/2025/04/23/self-hosted-development.html" rel="alternate" type="text/html" title="5 Things You Might Not Know about Developing Self-Hosted Code" /><published>2025-04-23T00:00:00+00:00</published><updated>2025-04-23T00:00:00+00:00</updated><id>https://spidermonkey.dev/blog/2025/04/23/self-hosted-development</id><content type="html" xml:base="https://spidermonkey.dev/blog/2025/04/23/self-hosted-development.html"><![CDATA[<p>Self-hosted code is JavaScript code that SpiderMonkey uses to implement some of its intrinsic functions for JavaScript. Because it is written in JavaScript, it gets all the benefits of our JITs, like inlining and inline caches.</p>

<p>Even if you are just getting started with self-hosted code, you probably already know that it isn’t quite the same as your typical, day-to-day JavaScript. You’ve probably already been pointed at the <a href="https://searchfox.org/mozilla-central/rev/d602f8558872d133dc9240a01cd25d0898c58e5a/js/src/vm/SelfHosting.h#16">SMDOC</a>, but here are a couple tips to make developing self-hosted code a little easier.</p>

<h1 id="1-when-you-change-self-hosted-code-you-need-to-build">1. When you change self-hosted code, you need to build</h1>

<p>When you make changes to SpiderMonkey’s self-hosted JavaScript code, you will not automatically see your changes take effect in Firefox or the JS Shell.</p>

<p>SpiderMonkey’s self-hosted code is split up into multiple files and functions to make it easier for developers to understand, but at runtime, SpiderMonkey loads it all from a single, compressed data stream. This means that all those files are gathered together into a single script file and compressed at build time.</p>

<p>To see your changes take effect, you must remember to build!</p>

<h1 id="2-dbg">2. dbg()</h1>

<p>Self-hosted JavaScript code is hidden from the JS Debugger, and it can be challenging to debug JS using a C++ debugger. You might want to try logging messages to <code class="language-plaintext highlighter-rouge">console.log()</code> to help you debug your code, but that is not available in self-hosted code!</p>

<p>In debug builds, you can print out messages and objects using <a href="https://searchfox.org/mozilla-central/rev/d602f8558872d133dc9240a01cd25d0898c58e5a/js/src/builtin/Utilities.js#16-21">dbg()</a>, which takes a single argument to print to stderr.</p>

<h1 id="3-specification-step-comments">3. Specification step comments</h1>

<p>If you are stuck trying to figure out how to implement a step in the JS specification or a proposal, you can see if SpiderMonkey has implemented a similar step elsewhere and base your implementation off that. We try to diligently comment our implementations with references to the specification, so there’s a good chance you can find what you are looking for.</p>

<p>For example, if you need to use the specification function <code class="language-plaintext highlighter-rouge">CreateDataPropertyOrThrow()</code>, you can search for it (<a href="https://searchfox.org/mozilla-central/search?q=CreateDataPropertyOrThrow&amp;path=js%2Fsrc%2Fbuiltin&amp;case=false&amp;regexp=false">SearchFox is a great tool for this</a>) and discover that it is implemented in self-hosted code using <code class="language-plaintext highlighter-rouge">DefineDataProperty()</code>.</p>

<h1 id="4-getselfhostedvalue">4. getSelfHostedValue()</h1>

<p>If you want to explore how a self-hosted function works directly, you can use the JS Shell helper function <a href="https://searchfox.org/mozilla-central/rev/40da66b801b7dee3bdc77a06ac7de77bed1de3fc/js/src/shell/js.cpp#10406-10409">getSelfHostedValue()</a>.</p>

<p>We use this method to write many of our tests. For example, <a href="https://searchfox.org/mozilla-central/rev/40da66b801b7dee3bdc77a06ac7de77bed1de3fc/js/src/tests/non262/Intl/extensions/unicode-extension-sequences.js">unicode-extension-sequences.js</a> checks the implementation of the self-hosted functions <code class="language-plaintext highlighter-rouge">startOfUnicodeExtensions()</code> and <code class="language-plaintext highlighter-rouge">endOfUnicodeExtensions()</code>.</p>

<p>You can also use <code class="language-plaintext highlighter-rouge">getSelfHostedValue()</code> to get C++ intrinsic functions, like how <a href="https://searchfox.org/mozilla-central/rev/40da66b801b7dee3bdc77a06ac7de77bed1de3fc/js/src/tests/non262/extensions/toLength.js">toLength.js</a> tests <a href="https://searchfox.org/mozilla-central/rev/40da66b801b7dee3bdc77a06ac7de77bed1de3fc/js/src/vm/SelfHosting.cpp#2247">ToLength()</a>.</p>

<h1 id="5-you-can-define-your-own-self-hosted-functions">5. You can define your own self-hosted functions</h1>

<p>You can write your own self-hosted functions and make them available in the JS Shell and XPC shell. For example, you could write a self-hosted function to print a formatted error message:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="kd">function</span> <span class="nx">report</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span> <span class="p">{</span>
      <span class="nx">dbg</span><span class="p">(</span><span class="dl">"</span><span class="s2">|ERROR| </span><span class="dl">"</span> <span class="o">+</span> <span class="nx">msg</span> <span class="o">+</span> <span class="dl">"</span><span class="s2">|</span><span class="dl">"</span><span class="p">);</span>
  <span class="p">}</span>
</code></pre></div></div>
<p>Then, while you are setting up globals for your JS runtime, call <code class="language-plaintext highlighter-rouge">JS_DefineFunctions(cx, obj, funcs)</code>:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="k">static</span> <span class="k">const</span> <span class="n">JSFunctionSpec</span> <span class="n">funcs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
      <span class="n">JS_SELF_HOSTED_FN</span><span class="p">(</span><span class="s">"report"</span><span class="p">,</span> <span class="s">"report"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span>
      <span class="n">JS_FS_END</span><span class="p">,</span>
  <span class="p">};</span>

  <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">JS_DefineFunctions</span><span class="p">(</span><span class="n">cx</span><span class="p">,</span> <span class="n">globalObject</span><span class="p">,</span> <span class="n">funcs</span><span class="p">))</span> <span class="p">{</span>
    <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
  <span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">JS_SELF_HOSTED_FN()</code> macro takes the following parameters:</p>
<ol>
  <li><code class="language-plaintext highlighter-rouge">name</code> - The name you want your function to have in JS.</li>
  <li><code class="language-plaintext highlighter-rouge">selfHostedName</code> - The name of the self-hosted function.</li>
  <li><code class="language-plaintext highlighter-rouge">nargs</code> - Number of formal JS arguments to the self-hosted function.</li>
  <li><code class="language-plaintext highlighter-rouge">flags</code> - This is almost always 0, but could be any combination of <a href="https://searchfox.org/mozilla-central/rev/3b95c8dbe724b10390c96c1b9dd0f12c873e2f2e/js/public/PropertyDescriptor.h#28-61">JSPROP_*</a>.</li>
</ol>

<p>Now, when you build the JS Shell or XPC Shell, you can call your function:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">js</span><span class="o">&gt;</span> <span class="nx">report</span><span class="p">(</span><span class="dl">"</span><span class="s2">BOOM!</span><span class="dl">"</span><span class="p">);</span>          
<span class="nx">Iterator</span><span class="p">.</span><span class="nx">js</span><span class="err">#</span><span class="mi">6</span><span class="p">:</span> <span class="o">|</span><span class="nx">ERROR</span><span class="o">|</span> <span class="nx">BOOM</span><span class="o">!|</span>
</code></pre></div></div>]]></content><author><name>Bryan Thrall</name></author><summary type="html"><![CDATA[Self-hosted code is JavaScript code that SpiderMonkey uses to implement some of its intrinsic functions for JavaScript. Because it is written in JavaScript, it gets all the benefits of our JITs, like inlining and inline caches.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" /><media:content medium="image" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Shipping Temporal</title><link href="https://spidermonkey.dev/blog/2025/04/11/shipping-temporal.html" rel="alternate" type="text/html" title="Shipping Temporal" /><published>2025-04-11T18:00:00+00:00</published><updated>2025-04-11T18:00:00+00:00</updated><id>https://spidermonkey.dev/blog/2025/04/11/shipping-temporal</id><content type="html" xml:base="https://spidermonkey.dev/blog/2025/04/11/shipping-temporal.html"><![CDATA[<p>The <a href="https://github.com/tc39/proposal-temporal">Temporal proposal</a> provides a
replacement for <code class="language-plaintext highlighter-rouge">Date</code>, a long standing pain-point in the JavaScript language.
This <a href="https://maggiepint.com/2017/04/09/fixing-javascript-date-getting-started/">blog post</a>
describes some of the history and motivation behind the proposal. The
Temporal API itself is well docmented on
<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Temporal">MDN</a>.</p>

<p>Temporal reached Stage 3 of the <a href="https://tc39.es/process-document/">TC39 process</a> in March 2021.
Reaching Stage 3 means that the specification is considered complete, and that the proposal is
ready for implementation.</p>

<p>SpiderMonkey began our implementation that same month, with the initial work
tracked in <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1519167">Bug 1519167</a>.
Incredibly, our implementation was not developed by Mozilla employees, but was contributed
entirely by a single volunteer, André Bargull. That initial bug consisted of 99 patches, but
the work did not stop there, as the specification continued to evolve as problems were found
during implementation. Beyond contributing to SpiderMonkey, André filed close to
<a href="https://github.com/tc39/proposal-temporal/issues?q=is%3Aissue%20state%3Aclosed%20author%3Aanba">200 issues</a>
against the specification. <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1840374">Bug 1840374</a>
is just one example of the massive amount of work required to keep up to date with the
specification.</p>

<p>As of Firefox 139, we’ve enabled our Temporal implementation by default, making
us the first browser to ship it. Sometimes it can seem like the ideas
of open source, community, and volunteer contributors are a thing of the past,
but the example of Temporal shows that volunteers can still have a meaningful impact
both on Firefox and on the JavaScript language as a whole.</p>

<h2 id="interested-in-contributing">Interested in contributing?</h2>

<p>Not every proposal is as large as Temporal, and we welcome contributions of
all shapes and sizes. If you’re interested in contributing to SpiderMonkey,
please have a look at our
<a href="https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&amp;emailbug_mentor2=1&amp;emailtype2=regexp&amp;resolution=---&amp;email2=.*&amp;component=JavaScript%20Engine&amp;component=JavaScript%20Engine%3A%20JIT&amp;component=JavaScript%3A%20GC&amp;component=JavaScript%3A%20Internationalization%20API&amp;component=JavaScript%3A%20Standard%20Library&amp;list_id=17451567&amp;classification=Client%20Software&amp;classification=Developer%20Infrastructure&amp;classification=Components&amp;classification=Server%20Software&amp;classification=Other&amp;product=Core">mentored bugs</a>.
You don’t have to be an expert :). If your interests are more on the specification side,
you can also check out how to
<a href="https://github.com/tc39/ecma262/blob/HEAD/CONTRIBUTING.md">contribute to TC39</a>.</p>]]></content><author><name>Daniel Minor</name></author><summary type="html"><![CDATA[The Temporal proposal provides a replacement for Date, a long standing pain-point in the JavaScript language. This blog post describes some of the history and motivation behind the proposal. The Temporal API itself is well docmented on MDN.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" /><media:content medium="image" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">SpiderMonkey Newsletter (Firefox 135-137)</title><link href="https://spidermonkey.dev/blog/2025/03/17/newsletter-firefox-135-137.html" rel="alternate" type="text/html" title="SpiderMonkey Newsletter (Firefox 135-137)" /><published>2025-03-17T20:00:00+00:00</published><updated>2025-03-17T20:00:00+00:00</updated><id>https://spidermonkey.dev/blog/2025/03/17/newsletter-firefox-135-137</id><content type="html" xml:base="https://spidermonkey.dev/blog/2025/03/17/newsletter-firefox-135-137.html"><![CDATA[<p>Hello everyone,</p>

<p>Matthew here from the SpiderMonkey team. As the weather whipsaws from cold to hot to
cold, I have elected to spend some time whipping together a too brief newsletter,
which will almost certainly not capture the best of what we’ve done these last few
months. Nevertheless, onwards!</p>

<h3 id="outreachy">🧑‍🎓Outreachy</h3>

<p>We hosted an <a href="https://www.outreachy.org/">Outreachy</a> intern, <a href="https://github.com/MundiaNderi">Serah
Nderi</a>, for the most recent Outreachy cycle, with Dan
as her mentor. Serah worked on implementing the Iterator.range proposal as well as a
few other things. We were happy to host her, and grateful to her for joining. Read
about <a href="https://spidermonkey.dev/blog/2025/03/05/iterator-range.html">her internship project
here</a>.</p>

<h3 id="hytradboi-have-you-tried-rubbing-a-database-on-it">🥯HYTRADBOI: Have You Tried Rubbing a Database On It</h3>

<p><a href="https://www.hytradboi.com/2025">HYTRADBOI</a> is an interesting independent online only
conference, which this year had a strong programming languages track.
<a href="https://mstdn.ca/@iainireland">Iain</a> from the SpiderMonkey team was able to produce
a stellar video talk called <a href="https://www.hytradboi.com/2025/0a4d08fd-149e-4174-a752-20e9c4d965c5-a-quick-ramp-up-on-ramping-up-quickly">A quick ramp-up on ramping up
quickly</a>,
where he helps the audience reinvent our baseline interpreter in 10 minutes. The talk
is fun and short, so go forth and watch it!</p>

<h3 id="️-new-features--in-progress-standards-work">👷🏽‍♀️ New features &amp; In Progress Standards Work</h3>

<p>We have done a whole bunch of shipping work this cycle. By far the most important
thing is that <a href="https://tc39.es/proposal-temporal/">Temporal</a> has now <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1946823">been shipped on
Nightly</a>. We must extend our
enormous gratitude to André Bargull, who has been implementing this proposal for
years, providing reams of feedback to champions, and making it possible for us to
ship so early. We’ve also been working on improving error messages reported to
developers, and have a list of “<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1839676">good first
bugs</a>” available for people
interested in getting started contributing to SpiderMonkey or Firefox.</p>

<p>In addition to Temporal, Dan has worked on shipping a number of our complete proposal
implementations:</p>

<ul>
  <li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1943120">Math.sumPrecise</a></li>
  <li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1933303">Intl.DurationFormat</a></li>
</ul>

<p>and <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1937805">Atomics.pause</a>.</p>

<h3 id="-performance">🚀 Performance</h3>

<ul>
  <li>New contributor abdoatef.ab <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1941446">got a nice speedup (2.3x on a micro-benchmark!) by
hinting our object allocator on the final size an object
literal</a>.</li>
  <li>Jon added a <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1934856">slots-and-elements
allocator</a> which should
reduce contention on the system allocator which is used for many other things.</li>
  <li>Jan added code to <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1913757">recycle
LifoAllocs</a> for
IonCompilations, which reduces the amount of contention on the memory allocator
where possible.</li>
  <li>Jan continued work on register allocation tuning, continuing on from where <a href="https://spidermonkey.dev/blog/2024/10/16/75x-faster-optimizing-the-ion-compiler-backend.html">we left
it last with Jan’s blog
post</a>.</li>
  <li>Jan’s been <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1947767">doing some work with
fuses</a> to take advantage of
knowing the state of the VM more.</li>
</ul>

<h3 id="-spidermonkey-platform-improvements">🚉 SpiderMonkey Platform Improvements</h3>

<ul>
  <li>Iain landed the infrastructure for <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1935289">off-thread baseline
compilation</a> and <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1822650">batched
baseline compilation</a>. The
hope is that this will eventually lead to some performance improvements but it’s
disabled while it is tuned for now.</li>
  <li>We now <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1618391">share the parsed version of our self-hosted code from parent process to
child process on Android</a>,
leading to a small improvement in child process startup time on Android.</li>
  <li>Ryan added <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1943696">JitDump support for Wasm
compilation</a>, which means
that now it shows up beautifully in <a href="https://github.com/mstange/samply">Samply</a>.</li>
</ul>]]></content><author><name>Matthew Gaudet</name></author><summary type="html"><![CDATA[Hello everyone,]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" /><media:content medium="image" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Implementing Iterator.range in SpiderMonkey</title><link href="https://spidermonkey.dev/blog/2025/03/05/iterator-range.html" rel="alternate" type="text/html" title="Implementing Iterator.range in SpiderMonkey" /><published>2025-03-05T07:00:00+00:00</published><updated>2025-03-05T07:00:00+00:00</updated><id>https://spidermonkey.dev/blog/2025/03/05/iterator-range</id><content type="html" xml:base="https://spidermonkey.dev/blog/2025/03/05/iterator-range.html"><![CDATA[<p>In October 2024, I joined Outreachy as an Open Source contributor and in December 2024, I joined Outreachy as an intern working with Mozilla. My role was to implement the <a href="https://tc39.es/proposal-iterator.range/#sec-iteration">TC39 Range Proposal</a> in the SpiderMonkey JavaScript engine.
<code class="language-plaintext highlighter-rouge">Iterator.range </code>is a new built-in method proposed for JavaScript iterators that allows generating a sequence of numbers within a specified range. It functions similarly to Python’s range, providing an easy and efficient way to iterate over a series of values:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">i</span> <span class="k">of</span> <span class="nx">Iterator</span><span class="p">.</span><span class="nx">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">43</span><span class="p">))</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">i</span><span class="p">);</span> <span class="c1">// 0 to 42</span>
</code></pre></div></div>

<p>But also things like:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">function</span><span class="o">*</span> <span class="nx">even</span><span class="p">()</span> <span class="p">{</span>
  <span class="k">for</span> <span class="p">(</span><span class="kd">const</span> <span class="nx">i</span> <span class="k">of</span> <span class="nx">Iterator</span><span class="p">.</span><span class="nx">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="kc">Infinity</span><span class="p">))</span> <span class="k">if</span> <span class="p">(</span><span class="nx">i</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="k">yield</span> <span class="nx">i</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>In this blog post, we will explore the implementation of Iterator.range in the SpiderMonkey JavaScript engine.</p>

<h2 id="understanding-the-implementation">Understanding the Implementation</h2>

<p>When I started working on <code class="language-plaintext highlighter-rouge">Iterator.range</code>, the initial implementation had been done, ie; adding a preference for the proposal and making the builtin accessible in the JavaScript shell.</p>

<p>The <code class="language-plaintext highlighter-rouge">Iterator.range</code> simply returned <code class="language-plaintext highlighter-rouge">false</code>, a stub indicating that the actual implementation of <code class="language-plaintext highlighter-rouge">Iterator.range</code> was under development or not fully implemented, which is where I came in. As a start, I created a <code class="language-plaintext highlighter-rouge">CreateNumericRangeIterator</code> function that delegates to the <code class="language-plaintext highlighter-rouge">Iterator.range</code> function. Following that, I implemented the first three steps within the Iterator.range function. Next, I initialised variables and parameters for the <code class="language-plaintext highlighter-rouge">NUMBER-RANGE</code> data type in the <code class="language-plaintext highlighter-rouge">CreateNumericRangeIteratorfunction</code>.</p>

<p>I focused on implementing sequences that increase by one, such as <code class="language-plaintext highlighter-rouge">Iterator.range(0, 10)</code>.Next, I created an <code class="language-plaintext highlighter-rouge">IteratorRangeGenerator*</code> function (ie, step 18 of the Range proposal), that when called doesn’t execute immediately, but returns a generator object which follows the iterator protocol. Inside the generator function you have <code class="language-plaintext highlighter-rouge">yield</code> statements which represents where the function suspends its execution and provides value back to the caller. Additionaly, I updated the <code class="language-plaintext highlighter-rouge">CreateNumericRangeIterator</code> function to invoke <code class="language-plaintext highlighter-rouge">IteratorRangeGenerator*</code> with the appropriate arguments, aligning with Step 19 of the specification, and added tests to verify its functionality.</p>

<p>The generator will pause at each <code class="language-plaintext highlighter-rouge">yield</code>, and will not continue until the <code class="language-plaintext highlighter-rouge">next</code> method is called on the generator object that is created.
The <code class="language-plaintext highlighter-rouge">NumericRangeIteratorPrototype</code> (Step 27.1.4.2 of the proposal) is the object that holds the <code class="language-plaintext highlighter-rouge">iterator prototype</code> for the Numeric range iterator. The <code class="language-plaintext highlighter-rouge">next()</code> method is added to the <code class="language-plaintext highlighter-rouge">NumericRangeIteratorPrototype</code>, when you call the <code class="language-plaintext highlighter-rouge">next()</code> method on an object created from <code class="language-plaintext highlighter-rouge">NumericRangeIteratorPrototype</code>, it doesn’t directly return a value, but it makes the generator <code class="language-plaintext highlighter-rouge">yield</code> the <code class="language-plaintext highlighter-rouge">next</code> value in the series, effectively resuming the suspended generator.</p>

<p>The first time you invoke <code class="language-plaintext highlighter-rouge">next()</code> on the generator object created via <code class="language-plaintext highlighter-rouge">IteratorRangeGenerator*</code>, the generator will run up to the first <code class="language-plaintext highlighter-rouge">yield</code> statement and return the first value. When you invoke <code class="language-plaintext highlighter-rouge">next()</code> again, the<code class="language-plaintext highlighter-rouge">NumericRangeIteratorNext()</code> will be called.</p>

<p>This method uses <code class="language-plaintext highlighter-rouge">GeneratorResume(this)</code>, which means the generator will pick up right where it left off, continuing to iterate the next <code class="language-plaintext highlighter-rouge">yield</code> statement or until iteration ends.</p>

<h2 id="generator-alternative">Generator Alternative</h2>

<p>After discussions with my mentors Daniel and Arai, I transitioned from a generator-based implementation to a more efficient slot-based approach. This change involved defining <code class="language-plaintext highlighter-rouge">slots</code> to store the state necessary for computing the next value. The reasons included:</p>

<ul>
  <li>Efficiency: Directly managing iteration state is faster than relying on generator functions.</li>
  <li>Simplified Implementation: A slot-based approach eliminates the need for generator-specific handling, making the code more maintainable.</li>
  <li>Better Alignment with Other Iterators: Existing built-in iterators such as <code class="language-plaintext highlighter-rouge">StringIteratorPrototype</code> and <code class="language-plaintext highlighter-rouge">ArrayIteratorPrototype</code> do not use generators in their implementations.</li>
</ul>

<h2 id="perfomance-and-benchmarks">Perfomance and Benchmarks</h2>

<p>To quantify the performance improvements gained by transitioning from a generator-based implementation to a slot-based approach, I conducted comparative benchmarks using a test in the current bookmarks/central, and in the revision that used generator-based approach. My benchmark tested two key scenarios:</p>

<ul>
  <li>Floating-point range iteration: Iterating through 100,000 numbers with a step of 0.1</li>
  <li>BigInt range iteration: Iterating through 1,000,000 BigInts with a step of 2</li>
</ul>

<p>Each test was run 100 times to eliminate anomalies. The benchmark code was structured as follows:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Benchmark for Number iteration</span>
<span class="kd">var</span> <span class="nx">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="mi">100</span><span class="p">;</span> <span class="o">++</span><span class="nx">i</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">for</span> <span class="p">(</span><span class="nx">num</span> <span class="k">of</span> <span class="nx">Iterator</span><span class="p">.</span><span class="nx">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100000</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">))</span> <span class="p">{</span>
    <span class="nx">sum</span> <span class="o">+=</span> <span class="nx">num</span><span class="p">;</span>
  <span class="p">}</span>
<span class="p">}</span>
<span class="nx">print</span><span class="p">(</span><span class="nx">sum</span><span class="p">);</span>

<span class="c1">// Benchmark for BigInt iteration</span>
<span class="kd">var</span> <span class="nx">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="nx">n</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="mi">100</span><span class="p">;</span> <span class="o">++</span><span class="nx">i</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">for</span> <span class="p">(</span><span class="nx">num</span> <span class="k">of</span> <span class="nx">Iterator</span><span class="p">.</span><span class="nx">range</span><span class="p">(</span><span class="mi">0</span><span class="nx">n</span><span class="p">,</span> <span class="mi">1000000</span><span class="nx">n</span><span class="p">,</span> <span class="mi">2</span><span class="nx">n</span><span class="p">))</span> <span class="p">{</span>
    <span class="nx">sum</span> <span class="o">+=</span> <span class="nx">num</span><span class="p">;</span>
  <span class="p">}</span>
<span class="p">}</span>
<span class="nx">print</span><span class="p">(</span><span class="nx">sum</span><span class="p">);</span>
</code></pre></div></div>

<h2 id="results">Results</h2>

<table>
  <thead>
    <tr>
      <th>Implementation</th>
      <th>Execution Time (ms)</th>
      <th>Improvement</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Generator-based</td>
      <td>8,174.60</td>
      <td>-</td>
    </tr>
    <tr>
      <td>Slot-based</td>
      <td>2,725.33</td>
      <td>66.70%</td>
    </tr>
  </tbody>
</table>

<p>The slot-based implementation completed the benchmark in just 2.7 seconds compared to 8.2 seconds for the generator-based approach. This represents a 66.7% reduction in execution time, or in other words, the optimized implementation is approximately 3 times faster.</p>

<h2 id="challenges">Challenges</h2>

<p>Implementing BigInt support was straightforward from a specification perspective, but I encountered two blockers:</p>

<h3 id="1-handling-infinity-checks-correctly">1. Handling Infinity Checks Correctly</h3>

<p>The specification ensures that start is either a Number or a BigInt in steps 3.a and 4.a. However, step 5 states:</p>

<ul>
  <li>If start is +∞ or -∞, throw a RangeError.</li>
</ul>

<p>Despite following this, my implementation still threw an error stating that start must be finite. After investigating, I found that the issue stemmed from using a self-hosted isFinite function.</p>

<p>The specification requires isFinite to throw a TypeError for BigInt, but the self-hosted Number_isFinite returns false instead. This turned out to be more of an implementation issue than a specification issue.</p>

<p>See Github discussion <a href="https://github.com/tc39/proposal-iterator.range/issues/74">here</a>.</p>

<ul>
  <li>Fix: Explicitly check that start is a number before calling isFinite:</li>
</ul>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Step 5: If start is +∞ or -∞, throw a RangeError.</span>
<span class="k">if</span> <span class="p">(</span><span class="k">typeof</span> <span class="nx">start</span> <span class="o">===</span> <span class="dl">"</span><span class="s2">number</span><span class="dl">"</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="nx">Number_isFinite</span><span class="p">(</span><span class="nx">start</span><span class="p">))</span> <span class="p">{</span>
  <span class="nx">ThrowRangeError</span><span class="p">(</span><span class="nx">JSMSG_ITERATOR_RANGE_START_INFINITY</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="2-floating-point-precision-errors">2. Floating Point Precision Errors</h3>

<p>When testing floating-point sequences, I encountered an issue where some decimal values were not represented exactly due to JavaScript’s floating-point precision limitations. This caused incorrect test results.</p>

<p>There’s a <a href="https://github.com/tc39/proposal-iterator.range/issues/64#issuecomment-1477257423">GitHub issue</a> discussing this in depth. I implemented an approximatelyEqual function to compare values within a small margin of error.</p>

<ul>
  <li>Fix: Using approximatelyEqual in tests:</li>
</ul>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="nx">resultFloat2</span> <span class="o">=</span> <span class="nb">Array</span><span class="p">.</span><span class="k">from</span><span class="p">(</span><span class="nx">Iterator</span><span class="p">.</span><span class="nx">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">));</span>
<span class="nx">approximatelyEqual</span><span class="p">(</span><span class="nx">resultFloat2</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">,</span> <span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">]);</span>
</code></pre></div></div>

<p>This function ensures that minor precision errors do not cause test failures, improving floating-point range calculations.</p>

<h2 id="next-steps-and-future-improvements">Next Steps and Future Improvements</h2>

<p>There are different stages a TC39 proposal goes through before it can be shipped. This <a href="https://tc39.es/process-document/">document</a> shows the different stages that a proposal goes through from ideation to consumption. The Iterator.range proposal is currently at stage 1 which is the Draft stage. Ideally, the proposal should advance to stage 3 which means that the specification is stable and no changes to the proposal are expected, but some necessary changes may still occur due to web incompatibilities or feedback from production-grade implementations.</p>

<p>Currently, this implementation is in it’s early stages of implementation. It’s only built in Nightly and disabled by default until such a time the proposal is in stage 3 or 4 and no further revision to the specification can be made.</p>

<h2 id="final-thoughts">Final Thoughts</h2>

<p>Working on the <code class="language-plaintext highlighter-rouge">Iterator.range</code> implementation in SpiderMonkey has been a deeply rewarding experience. I learned how to navigate a large and complex codebase, collaborate with experienced engineers, and translate a formal specification into an optimized, real-world implementation. The transition from a generator-based approach to a slot-based one was a significant learning moment, reinforcing the importance of efficiency in JavaScript engine internals.</p>

<p>Beyond technical skills, I gained a deeper appreciation for the standardization process in JavaScript. The experience highlighted how proposals evolve through real-world feedback, and how early-stage implementations help shape their final form.</p>

<p>As <code class="language-plaintext highlighter-rouge">Iterator.range</code> continues its journey through the TC39 proposal stages, I look forward to seeing its adoption in JavaScript engines and the impact it will have on developers. I hope this post provides useful insights into SpiderMonkey development and encourages others to contribute to open-source projects and JavaScript standardization efforts.</p>

<p>If you’d like to read more, here are my blog posts that I made during the project:</p>

<ul>
  <li><a href="https://dev.to/mundianderi/decoding-open-source-vocabulary-ive-learned-on-my-outreachy-journey-34o4">Decoding Open Source: Vocabulary I’ve Learned on My Outreachy Journey</a></li>
  <li><a href="https://dev.to/mundianderi/mid-internship-progress-report-achievements-and-goals-ahead-4jif">Mid-Internship Progress Report: Achievements and Goals Ahead</a></li>
  <li><a href="https://dev.to/mundianderi/navigating-tc39-proposals-from-error-handling-to-iteratorrange-306a">Navigating TC39 Proposals: From Error Handling to Iterator.range</a></li>
</ul>]]></content><author><name>Serah Nderi</name></author><summary type="html"><![CDATA[In October 2024, I joined Outreachy as an Open Source contributor and in December 2024, I joined Outreachy as an intern working with Mozilla. My role was to implement the TC39 Range Proposal in the SpiderMonkey JavaScript engine. Iterator.range is a new built-in method proposed for JavaScript iterators that allows generating a sequence of numbers within a specified range. It functions similarly to Python’s range, providing an easy and efficient way to iterate over a series of values:]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" /><media:content medium="image" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Making Teleporting Smarter</title><link href="https://spidermonkey.dev/blog/2025/02/19/Making-Teleporting-Smarter.html" rel="alternate" type="text/html" title="Making Teleporting Smarter" /><published>2025-02-19T18:00:00+00:00</published><updated>2025-02-19T18:00:00+00:00</updated><id>https://spidermonkey.dev/blog/2025/02/19/Making-Teleporting-Smarter</id><content type="html" xml:base="https://spidermonkey.dev/blog/2025/02/19/Making-Teleporting-Smarter.html"><![CDATA[<p><em>Recently I got to land a patch which touches a cool optimization, that I had to really make sure I understood deeply. As a result, I wrote a <a href="https://phabricator.services.mozilla.com/D236961">huge commit message</a>. I’d like to expand that message a touch here and turn it into a nice blog post.</em></p>

<p><em>This post assumes roughly that you understand how Shapes work in the JavaScript object model, and how prototypical property lookup works in JavaScript. If you don’t understand that just yet, <a href="https://mathiasbynens.be/notes/shapes-ics">this blog post by Matthias Bynens is a good start</a>.</em></p>

<p>This patch aims to mitigate a performance cliff that occurs when we have applications which shadow properties on the prototype chain or which mutate the prototype chain.</p>

<p>The problem is that these actions currently break a property lookup optimization called “Shape Teleportation”.</p>

<h3 id="what-is-shape-teleporting">What is Shape Teleporting?</h3>

<p>Suppose you’re looking up some property <code class="language-plaintext highlighter-rouge">y</code> on an object <code class="language-plaintext highlighter-rouge">obj</code>, which has a prototype chain with 4 elements. Suppose <code class="language-plaintext highlighter-rouge">y</code> isn’t stored on <code class="language-plaintext highlighter-rouge">obj</code>, but instead is stored on some prototype object <code class="language-plaintext highlighter-rouge">B</code>, in slot 1.</p>

<p><img src="/assets/img/teleport-1.png" alt="A diagram of shape teleporting" style="width: 45%; display: block; margin: auto;" /></p>

<p>In order to get the value of this property, officially you have to walk from <code class="language-plaintext highlighter-rouge">obj</code> up to <code class="language-plaintext highlighter-rouge">B</code> to find the value of <code class="language-plaintext highlighter-rouge">y</code>. Of course, this would be inefficient, so what we do instead is attach an <a href="https://www.mgaudet.ca/technical/2023/10/16/cacheir-the-benefits-of-a-structured-representation-for-inline-caches">inline cache</a> to make this lookup more efficient.</p>

<p>Now we have to guard against future mutation when creating an inline cache. A basic version of a cache for this lookup might look like:</p>

<ul>
  <li>Check <code class="language-plaintext highlighter-rouge">obj</code> still has the same shape.</li>
  <li>Check <code class="language-plaintext highlighter-rouge">obj</code>‘s prototype (<code class="language-plaintext highlighter-rouge">D</code>) still has the same shape.</li>
  <li>Check <code class="language-plaintext highlighter-rouge">D</code>‘s prototype (<code class="language-plaintext highlighter-rouge">C</code>) still has the same shape</li>
  <li>Check <code class="language-plaintext highlighter-rouge">C</code>’s prototype (<code class="language-plaintext highlighter-rouge">B</code>) still has the same shape.</li>
  <li>Load slot 1 out of B.</li>
</ul>

<p>This is less efficient than we would like though. Imagine if instead of having 3 intermediate prototypes, there were 13 or 30? You’d have this long chain of prototype shape checking, which takes a long time!</p>

<p>Ideally, what you’d like is to be able to simply say</p>
<ul>
  <li>Check <code class="language-plaintext highlighter-rouge">obj</code> still has the same shape.</li>
  <li>Check <code class="language-plaintext highlighter-rouge">B</code> still has the same shape</li>
  <li>Load slot 1 out of B.</li>
</ul>

<p>The problem with doing this naively is “What if someone adds <code class="language-plaintext highlighter-rouge">y</code> as a property to <code class="language-plaintext highlighter-rouge">C</code>? With the faster guards, you’d totally miss that value, and as a result compute the wrong result. We don’t like wrong results.</p>

<p>Shape Teleporting is the existing optimization which says that so long as you actively force a change of shape on objects in the prototype chain when certain modifications occur, then you <strong>can</strong> guard in inline-caches only on the shape of the receiver object and the shape of the holder object.</p>

<p>By forcing each shape to be changed, inline caches which have baked in assumptions about these objects will no longer succeed, and we’ll take a slow path, potentially attaching a new IC if possible.</p>

<p>We must reshape in the following situations:</p>

<ul>
  <li>Adding a property to a prototype which shadows a property further up the prototype chain. In this circumstance, the object getting the new property will naturally reshape to account for the new property, but the old holder needs to be explicitly reshaped at this point, to avoid an inline cache jumping over the newly defined prototype.</li>
</ul>

<p><img src="/assets/img/teleport-2.png" alt="A diagram of shape teleporting" style="width: 45%; display: block; margin: auto;" /></p>

<ul>
  <li>Modifying the prototype of an object which exists on the prototype chain. For this case we need to invalidate the shape of the object being mutated (natural reshape due to changed prototype), as well as the shapes of all objects on the mutated object’s prototype chain. This is to invalidate all stubs which have teleported over the mutated object.</li>
</ul>

<p><img src="/assets/img/teleport-3.png" alt="A diagram of shape teleporting" style="width: 80%; display: block; margin: auto;" /></p>

<p>Furthermore, we must avoid an “A-B-A” problem, where an object returns to a shape prior to prototype modification: for example, even if we re-shape <code class="language-plaintext highlighter-rouge">B</code>, what if code deleted and then re-added <code class="language-plaintext highlighter-rouge">y</code>, causing <code class="language-plaintext highlighter-rouge">B</code> to take on its old shape? Then the IC would start working again, even though the prototype chain may have been mutated!</p>

<p>Prior to this patch, Watchtower watches for prototype mutation and shadowing, and marks the shapes of the prototype objects involved with these operations as <code class="language-plaintext highlighter-rouge">InvalidatedTeleporting</code>. This means that property access with the objects involved can never more rely on the shape teleporting optimization. This also avoids the A-B-A problem as new shapes will always carry along the <code class="language-plaintext highlighter-rouge">InvalidatedTeleporting</code> flag.</p>

<p>This patch instead chooses to migrate an object shape to dictionary mode, or generate a new dictionary shape if it’s already in dictionary mode. Using dictionary mode shapes works because all dictionary mode shapes are unique and never recycled. This ensures the ICs are no longer valid as expected, as well as handily avoiding the A-B-A problem.</p>

<p>The patch does keep the <code class="language-plaintext highlighter-rouge">InvalidatedTeleporting</code> flag to catch potentially ill-behaved sites that do lots of mutation and shadowing, avoiding having to reshape proto objects forever.</p>

<p>The patch also provides a preference to allow cross-comparison between old and new, however this patch defaults to dictionary mode teleportation.</p>

<p>Performance testing on micro-benchmarks shows large impact by allowing ICs to attach where they couldn’t before, however Speedometer3 shows no real movement.</p>]]></content><author><name>Matthew Gaudet</name></author><summary type="html"><![CDATA[Recently I got to land a patch which touches a cool optimization, that I had to really make sure I understood deeply. As a result, I wrote a huge commit message. I’d like to expand that message a touch here and turn it into a nice blog post.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" /><media:content medium="image" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Is Memory64 actually worth using?</title><link href="https://spidermonkey.dev/blog/2025/01/15/is-memory64-actually-worth-using.html" rel="alternate" type="text/html" title="Is Memory64 actually worth using?" /><published>2025-01-15T18:00:00+00:00</published><updated>2025-01-15T18:00:00+00:00</updated><id>https://spidermonkey.dev/blog/2025/01/15/is-memory64-actually-worth-using</id><content type="html" xml:base="https://spidermonkey.dev/blog/2025/01/15/is-memory64-actually-worth-using.html"><![CDATA[<p>After many long years, the <a href="https://github.com/WebAssembly/memory64/">Memory64 proposal</a> for WebAssembly has finally been <a href="https://webassembly.org/features/#table-row-memory64">released</a> in both Firefox 134 and Chrome 133. In short, this proposal adds 64-bit pointers to WebAssembly.</p>

<p>If you are like most readers, you may be wondering: “Why wasn’t WebAssembly 64-bit to begin with?” Yes, it’s the year 2025 and WebAssembly has only just added 64-bit pointers. Why did it take so long, when 64-bit devices are the majority and 8GB of RAM is considered the bare minimum?</p>

<p>It’s easy to think that 64-bit WebAssembly would run better on 64-bit hardware, but unfortunately that’s simply not the case. WebAssembly apps tend to run slower in 64-bit mode than they do in 32-bit mode. This performance penalty depends on the workload, but it can range from just 10% to over 100%—a 2x slowdown just from changing your pointer size.</p>

<p>This is not simply due to a lack of optimization. Instead, the performance of Memory64 is restricted by hardware, operating systems, and the design of WebAssembly itself.</p>

<h2 id="what-is-memory64-actually">What is Memory64, actually?</h2>

<p>To understand why Memory64 is slower, we first must understand how WebAssembly represents memory.</p>

<p>When you compile a program to WebAssembly, the result is a WebAssembly module. A module is analogous to an executable file, and contains all the information needed to bootstrap and run a program, including:</p>

<ul>
  <li>A description of how much memory will be necessary (the <em>memory section</em>)</li>
  <li>Static data to be copied into memory (the <em>data section</em>)</li>
  <li>The actual WebAssembly bytecode to execute (the <em>code section</em>)</li>
</ul>

<p>These are encoded in an efficient binary format, but WebAssembly also has an official text syntax used for debugging and direct authoring. This article will use the text syntax. You can convert any WebAssembly module to the text syntax using tools like <a href="https://github.com/WebAssembly/wabt">WABT</a> (wasm2wat) or <a href="https://github.com/bytecodealliance/wasm-tools/">wasm-tools</a> (wasm-tools print).</p>

<p>Here’s a simple but complete WebAssembly module that allows you to store and load an <code class="language-plaintext highlighter-rouge">i32</code> at address 16 of its memory.</p>

<pre><code class="language-wasm">(module
  ;; Declare a memory with a size of 1 page (64KiB, or 65536 bytes)
  (memory 1)

  ;; Declare, and export, our store function
  (func (export "storeAt16") (param i32)
    i32.const 16  ;; push address 16 to the stack
    local.get 0   ;; get the i32 param and push it to the stack
    i32.store     ;; store the value to the address
  )

  ;; Declare, and export, our load function
  (func (export "loadFrom16") (result i32)
    i32.const 16  ;; push address 16 to the stack
    i32.load      ;; load from the address
  )
)
</code></pre>

<p>Now let’s modify the program to use Memory64:</p>

<pre><code class="language-wasm">(module
  ;; Declare an i64 memory with a size of 1 page (64KiB, or 65536 bytes)
  (memory i64 1)

  ;; Declare, and export, our store function
  (func (export "storeAt16") (param i32)
    i64.const 16  ;; push address 16 to the stack
    local.get 0   ;; get the i32 param and push it to the stack
    i32.store     ;; store the value to the address
  )

  ;; Declare, and export, our load function
  (func (export "loadFrom16") (result i32)
    i64.const 16  ;; push address 16 to the stack
    i32.load      ;; load from the address
  )
)
</code></pre>

<p>You can see that our memory declaration now includes <code class="language-plaintext highlighter-rouge">i64</code>, indicating that it uses 64-bit addresses. We therefore also change <code class="language-plaintext highlighter-rouge">i32.const 16</code> to <code class="language-plaintext highlighter-rouge">i64.const 16</code>. That’s it. This is pretty much the entirety of the Memory64 proposal<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>

<h2 id="how-is-memory-implemented">How is memory implemented?</h2>

<p>So why does this tiny change make a difference for performance? We need to understand how WebAssembly engines actually implement memories.</p>

<p>Thankfully, this is very simple. The host (in this case, a browser) simply allocates memory for the WebAssembly module using a system call like <a href="https://man7.org/linux/man-pages/man2/mmap.2.html"><code class="language-plaintext highlighter-rouge">mmap</code></a> or <a href="https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc"><code class="language-plaintext highlighter-rouge">VirtualAlloc</code></a>. WebAssembly code is then free to read and write within that region, and the host (the browser) ensures that WebAssembly addresses (like <code class="language-plaintext highlighter-rouge">16</code>) are translated to the correct address within the allocated memory.</p>

<p>However, WebAssembly has an important constraint: accessing memory out of bounds will <em>trap</em>, analogous to a segmentation fault (segfault). It is the host’s job to ensure that this happens, and in general it does so with <em>bounds checks</em>. These are simply extra instructions inserted into the machine code on each memory access—the equivalent of writing <code class="language-plaintext highlighter-rouge">if (address &gt;= memory.length) { trap(); }</code> before every single load<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. You can see this in the actual x64 machine code <a href="https://searchfox.org/mozilla-central/rev/29e186485fe1b835f05bde01f650e371545de98e/js/src/jit/x64/MacroAssembler-x64.cpp#1718-1725">generated</a> by SpiderMonkey for an <code class="language-plaintext highlighter-rouge">i32.load</code><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>:</p>

<pre><code class="language-asm">  movq 0x08(%r14), %rax       ;; load the size of memory from the instance (%r14)
  cmp %rax, %rdi              ;; compare the address (%rdi) to the limit
  jb .load                    ;; if the address is ok, jump to the load
  ud2                         ;; trap
.load:
  movl (%r15,%rdi,1), %eax    ;; load an i32 from memory (%r15 + %rdi)
</code></pre>

<p>These instructions have several costs! Besides taking up CPU cycles, they require an extra load from memory, they increase the size of machine code, and they take up branch predictor resources. But they are critical for ensuring the security and correctness of WebAssembly code.</p>

<p>Unless…we could come up with a way to remove them entirely.</p>

<h2 id="how-is-memory-really-implemented">How is memory <em>really</em> implemented?</h2>

<p>The maximum possible value for a 32-bit integer is about 4 billion. 32-bit pointers therefore allow you to use up to 4GB of memory. The maximum possible value for a 64-bit integer, on the other hand, is about 18 sextillion, allowing you to use up to 18 exabytes of memory. This is truly enormous, tens of millions of times bigger than  the memory in even the most advanced consumer machines today. In fact, because this difference is so great, most “64-bit” devices are actually 48-bit in practice, using just 48 bits of the memory address to map from virtual to physical addresses<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>

<p>Even a 48-bit memory is enormous: 65,536 times larger than the largest possible 32-bit memory. This gives every process 281 terabytes of <em>address space</em> to work with, even if the device has only a few gigabytes of physical memory.</p>

<p>This means that address space is cheap on 64-bit devices. If you like, you can <em>reserve</em> 4GB of address space from the operating system to ensure that it remains free for later use. Even if most of that memory is never used, this will have little to no impact on most systems.</p>

<p>How do browsers take advantage of this fact? <strong>By reserving 4GB of memory for every single WebAssembly module.</strong></p>

<p>In our first example, we declared a 32-bit memory with a size of 64KB. But if you run this example on a 64-bit operating system, the browser will actually reserve 4GB of memory. The first 64KB of this 4GB block will be read-write, and the remaining 3.9999GB will be reserved but inaccessible.</p>

<p>By reserving 4GB of memory for all 32-bit WebAssembly modules, <strong>it is impossible to go out of bounds.</strong> The largest possible pointer value, 2^32-1, will simply land inside the reserved region of memory and trap. This means that, when running 32-bit wasm on a 64-bit system, <strong>we can omit all bounds checks entirely<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.</strong></p>

<p>This optimization is impossible for Memory64. The size of the WebAssembly address space is the same as the size of the host address space. Therefore, we must pay the cost of bounds checks on every access, and as a result, Memory64 is slower.</p>

<h2 id="so-why-use-memory64">So why use Memory64?</h2>

<p>The only reason to use Memory64 is if you actually need more than 4GB of memory.</p>

<p>Memory64 won’t make your code faster or more “modern”. 64-bit pointers in WebAssembly simply allow you to address more memory, at the cost of slower loads and stores.</p>

<p>The performance penalty may diminish over time as engines make optimizations. Bounds checking strategies can be improved, and WebAssembly compilers may be able to <a href="https://en.wikipedia.org/wiki/Bounds-checking_elimination">eliminate</a> some bounds checks at compile time. But it is impossible to beat the absolute removal of all bounds checks found in 32-bit WebAssembly.</p>

<p>Furthermore, the WebAssembly JS API constrains memories to a maximum size of 16GB. This may be quite disappointing for developers used to native memory limits. Unfortunately, because WebAssembly makes no distinction between “reserved” and “committed” memory, browsers cannot freely allocate large quantities of memory without running into system commit limits.</p>

<p>Still, being able to access 16GB is very useful for some applications. If you need more memory, and can tolerate worse performance, then Memory64 might be the right choice for you.</p>

<p>Where can WebAssembly go from here? Memory64 may be of limited use today, but there are some exciting possibilities for the future:</p>

<ul>
  <li>
    <p>Bounds checks could be better supported in hardware in the future. There has already been some research in this direction—for example, see <a href="https://dl.acm.org/doi/10.1145/3582016.3582023">this 2023 paper</a> by Narayan et. al. With the growing popularity of WebAssembly and other sandboxed VMs, this could be a very impactful change that improves performance while also eliminating the wasted address space from large reservations. (Not all WebAssembly hosts can spend their address space as freely as browsers.)</p>
  </li>
  <li>
    <p>The <a href="https://github.com/WebAssembly/memory-control/">memory control proposal</a> for WebAssembly, which I co-champion, is exploring new features for WebAssembly memory. While none of the current ideas would remove the need for bounds checks, they could take advantage of virtual memory hardware to enable larger memories, more efficient use of large address spaces (such as reduced fragmentation for memory allocators), or alternative memory allocation techniques.</p>
  </li>
</ul>

<p>Memory64 may not matter for most developers today, but we think it is an important stepping stone to an exciting future for memory in WebAssembly.</p>

<hr />
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>The rest of the proposal fleshes out the <code class="language-plaintext highlighter-rouge">i64</code> mode, for example by modifying instructions like <code class="language-plaintext highlighter-rouge">memory.fill</code> to accept either <code class="language-plaintext highlighter-rouge">i32</code> or <code class="language-plaintext highlighter-rouge">i64</code> depending on the memory’s address type. The proposal also adds an <code class="language-plaintext highlighter-rouge">i64</code> mode to <em>tables</em>, which are the primary mechanism used for function pointers and indirect calls. For simplicity, they are omitted from this post. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>In practice the instructions may actually be more complicated, as they also need to account for integer overflow, <a href="https://webassembly.github.io/spec/core/syntax/instructions.html#syntax-memarg"><code class="language-plaintext highlighter-rouge">offset</code></a>, and <a href="https://webassembly.github.io/spec/core/syntax/instructions.html#syntax-memarg"><code class="language-plaintext highlighter-rouge">align</code></a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>If you’re using the SpiderMonkey JS shell, you can try this yourself by using <code class="language-plaintext highlighter-rouge">wasmDis(func)</code> on any exported WebAssembly function. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p>Some hardware now also supports addresses larger than 48 bits, such as Intel processors with 57-bit addresses and <a href="https://en.wikipedia.org/wiki/Intel_5-level_paging">5-level paging</a>, but this is not yet commonplace. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5" role="doc-endnote">
      <p>In practice, a few extra pages beyond 4GB will be reserved to account for <code class="language-plaintext highlighter-rouge">offset</code> and <code class="language-plaintext highlighter-rouge">align</code>, called “guard pages”. We could reserve another 4GB of memory (8GB in total) to account for every possible offset on every possible pointer, but in SpiderMonkey we instead choose to reserve just 32MiB + 64KiB for guard pages and fall back to explicit bounds checks for any offsets larger than this. (In practice, large offsets are very uncommon.) For more information about how we handle bounds checks on each supported platform, see <a href="https://searchfox.org/mozilla-central/rev/d788991012a1a8ec862787f9799db4954a33045f/js/src/wasm/WasmMemory.cpp#70">this SMDOC comment</a> (which seems to be slightly out of date), <a href="https://searchfox.org/mozilla-central/rev/d788991012a1a8ec862787f9799db4954a33045f/js/src/wasm/WasmMemory.h#198">these constants</a>, and <a href="https://searchfox.org/mozilla-central/rev/d788991012a1a8ec862787f9799db4954a33045f/js/src/wasm/WasmIonCompile.cpp#1581-1590">this Ion code</a>. It is also worth noting that we fall back to explicit bounds checks whenever we cannot use this allocation scheme, such as on 32-bit devices or resource-constrained mobile phones. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Ben Visness</name></author><summary type="html"><![CDATA[After many long years, the Memory64 proposal for WebAssembly has finally been released in both Firefox 134 and Chrome 133. In short, this proposal adds 64-bit pointers to WebAssembly.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" /><media:content medium="image" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">SpiderMonkey Newsletter (Firefox 132-134)</title><link href="https://spidermonkey.dev/blog/2024/11/27/newsletter-firefox-132-134.html" rel="alternate" type="text/html" title="SpiderMonkey Newsletter (Firefox 132-134)" /><published>2024-11-27T17:00:00+00:00</published><updated>2024-11-27T17:00:00+00:00</updated><id>https://spidermonkey.dev/blog/2024/11/27/newsletter-firefox-132-134</id><content type="html" xml:base="https://spidermonkey.dev/blog/2024/11/27/newsletter-firefox-132-134.html"><![CDATA[<p>Hello! Welcome to another episode of the SpiderMonkey Newsletter. I’m your host,
Matthew Gaudet.</p>

<p>In the spirit of the <a href="https://en.wikipedia.org/wiki/Thanksgiving_\(United_States\)">upcoming
season</a>, let’s <a href="https://en.wiktionary.org/wiki/talk_turkey">talk
turkey</a>. I mean, monkeys. I mean
SpiderMonkey.</p>

<p>Today we’ll cover a little more ground than the normal newsletter.</p>

<p>If you haven’t already read Jan’s wonderful blog about how he managed to improve Wasm
compilation speed by 75x on large modules, <a href="https://spidermonkey.dev/blog/2024/10/16/75x-faster-optimizing-the-ion-compiler-backend.html">please take a
peek</a>.
It’s a great story of how O(n^2) is the worst complexity – fast enough to seem OK in
small cases, and slow enough to blow up horrendously when things get big.</p>

<h3 id="-performance">🚀 Performance</h3>

<ul>
  <li>
    <p>Awesome contributor Debadree <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1490441">has added IC support for addition and subtraction on
Date Objects</a></p>
  </li>
  <li>
    <p>Jan <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1919217">worked on saving fewer registers when calling out to C++
code</a>, providing a 5%
improvement on some speedometer subtests.</p>
  </li>
  <li>
    <p>Jon <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1920761">improved the performance of store buffer
iteration</a> by avoiding
linked-list traversal where possible.</p>
  </li>
  <li>
    <p>Jon <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1925581">improved the performance of
sweeping</a> by doing more work
without holding a lock</p>
  </li>
  <li>
    <p>Jan <a href="http://bugzilla.mozilla.org/show_bug.cgi?id=1920430">continued improving register allocation by choosing a better representation
for sparse bitsets</a>, improving
compilation time on a wasm module by 40%, and improving some PDF test cases by 5%.</p>
  </li>
  <li>
    <p>Jan worked on <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1851662">Map and Set
Optimizations,</a> with
improvements across a number of benchmarks.</p>
  </li>
</ul>

<h3 id="️-new-features--in-progress-standards-work">👷🏽‍♀️ New features &amp; In Progress Standards Work</h3>

<ul>
  <li>
    <p>The WebAssembly Memory64 proposal has reached stage 4 and is enabled by default in
Firefox 134. This proposal finally adds 64-bit pointers to WebAssembly—although
this comes with some downsides, so stay tuned for a blog post exploring this
subject further.</p>
  </li>
  <li>
    <p>Prolific contributor André has worked on over 30 (‼) bugs in this newsletter time
frame. Feature wise he’s continued his incredible stewardship of the Temporal
proposal, as well as <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1648139">implementing Int.Duration
format</a>, providing <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1930952">initial
implementation of
Atomics.pause</a>.</p>
  </li>
  <li>
    <p>Speaking of great contributors: Debadree has continued work on the <a href="https://github.com/tc39/proposal-explicit-resource-management">Explicit
Resource Management
proposal</a>, with <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1569081">the
implementation looking like it’s in great
shape</a>.</p>
  </li>
  <li>
    <p>Dan has been on a roll shipping proposals:
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1918235">Promise.try</a>,
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1918235">Regexp.escape</a>, and <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1913752">Regular
Expression Pattern
Modifiers</a>.</p>
  </li>
</ul>

<h3 id="-spidermonkey-platform-improvements"><strong>🚉</strong> SpiderMonkey Platform Improvements</h3>

<ul>
  <li>
    <p>Jan has killed a whole class of errors that have <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=552007">bedevilled SpiderMonkey
developers for at least 15
years.</a> The root cause of the
issue is the overloading of error checking inside of SpiderMonkey. The general case
is “Report an Exception” and return false.</p>

    <p>The problem is that, returning false without setting an exception -also- has a
meaning: It’s throwing an uncatchable exception, which is used for things like
unwinding when you stop a script due to the slow script dialog.</p>

    <p>So what happens if you were to accidentally return false without setting an
exception: Did you mean to do that? Or did you forget? In <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1921215">Bug
1921215</a>, <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1921780">bug
1921780</a>, and <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1921963">bug
1921963</a> Jan has tidied this
whole story up with a new helper <code class="language-plaintext highlighter-rouge">JS::ReportUncatchableException</code> and assertions
that validate you’ve used this correctly!</p>
  </li>
  <li>
    <p>Debadree <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1916359">added support for generating LIR ops where they are pretty
straightforward boilerplate</a>.
This will reduce the amount of work we have to do when improving the JITs. Previous
work like this has been appreciated hugely, and I expect we’ll find lots of value
in this quickly.</p>
  </li>
  <li>
    <p>Denis <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1918139">has made <code class="language-plaintext highlighter-rouge">--enable-perf</code> the default for nightly
builds</a>, which will make
profiling Firefox more effective.</p>
  </li>
  <li>
    <p>A few years ago our OS integration team deployed a neat trick on Windows to reduce
out of memory errors in Firefox, <a href="https://hacks.mozilla.org/2022/11/improving-firefox-stability-with-this-one-weird-trick/">written up in this wonderful blog
post</a>.
Recently <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1786451">Jon went through the effort to apply this same weird trick to our own
memory allocation</a>. Our hope
is that this will reduce the amount of out-of-memory failures.</p>
  </li>
</ul>]]></content><author><name>Matthew Gaudet</name></author><summary type="html"><![CDATA[Hello! Welcome to another episode of the SpiderMonkey Newsletter. I’m your host, Matthew Gaudet.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" /><media:content medium="image" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">75x faster: optimizing the Ion compiler backend</title><link href="https://spidermonkey.dev/blog/2024/10/16/75x-faster-optimizing-the-ion-compiler-backend.html" rel="alternate" type="text/html" title="75x faster: optimizing the Ion compiler backend" /><published>2024-10-16T17:00:00+00:00</published><updated>2024-10-16T17:00:00+00:00</updated><id>https://spidermonkey.dev/blog/2024/10/16/75x-faster-optimizing-the-ion-compiler-backend</id><content type="html" xml:base="https://spidermonkey.dev/blog/2024/10/16/75x-faster-optimizing-the-ion-compiler-backend.html"><![CDATA[<p>In September, machine learning engineers at Mozilla filed <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1916442">a bug report</a> indicating that Firefox was consuming excessive memory and CPU resources while running Microsoft’s <a href="https://github.com/microsoft/onnxruntime">ONNX Runtime</a> (a machine learning library) compiled to WebAssembly.</p>

<p>This post describes how we addressed this and some of our longer-term plans for improving WebAssembly performance in the future.</p>

<h2 id="the-problem">The problem</h2>

<p>SpiderMonkey has two compilers for WebAssembly code. First, a Wasm module is compiled with the Wasm Baseline compiler, a compiler that generates decent machine code very quickly. This is good for startup time because we can start executing Wasm code almost immediately after downloading it. Andy Wingo wrote a nice <a href="https://wingolog.org/archives/2020/03/25/firefoxs-low-latency-webassembly-compiler">blog post</a> about this Baseline compiler.</p>

<p>When Baseline compilation is finished, we compile the Wasm module with our more advanced Ion compiler. This backend produces faster machine code, but compilation time is a lot higher.</p>

<p>The issue with the ONNX module was that the Ion compiler backend took a long time and used a lot of memory to compile it. On my Linux x64 machine, Ion-compiling this module took about 5 minutes and used more than 4 GB of memory. Even though this work happens on background threads, this was still too much overhead.</p>

<h2 id="optimizing-the-ion-backend">Optimizing the Ion backend</h2>

<p>When we investigated this, we noticed that this Wasm module had some extremely large functions. For the largest one, Ion’s MIR control flow graph contained 132856 <a href="https://en.wikipedia.org/wiki/Basic_block">basic blocks</a>. This uncovered some performance cliffs in our compiler backend.</p>

<h3 id="virtualregister-live-ranges">VirtualRegister live ranges</h3>

<p>In Ion’s register allocator, each <code class="language-plaintext highlighter-rouge">VirtualRegister</code> has a list of <code class="language-plaintext highlighter-rouge">LiveRange</code> objects. We were using a linked list for this, sorted by start position. This caused quadratic behavior when allocating registers: the allocator often splits live ranges into smaller ranges and we’d have to iterate over the list for each new range to insert it at the correct position to keep the list sorted. This was very slow for virtual registers with thousands of live ranges.</p>

<p>To address this, I tried a few different data structures. The <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1916442#c17">first attempt</a> was to use an AVL tree instead of a linked list and that was a big improvement, but the performance was still not ideal and we were also worried about memory usage increasing even more.</p>

<p>After this we <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1918970">realized</a> we could store live ranges in a vector (instead of linked list) that’s optionally sorted by decreasing start position. We also <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1917817">made some changes</a> to ensure the initial live ranges are sorted when we create them, so that we could just append ranges to the end of the vector.</p>

<p>The observation here was that the core of the register allocator, where it assigns registers or stack slots to live ranges, doesn’t actually require the live ranges to be sorted. We therefore now just append new ranges to the end of the vector and mark the vector unsorted. Right before the final phase of the allocator, where we again rely on the live ranges being sorted, we do a single <code class="language-plaintext highlighter-rouge">std::sort</code> operation on the vector for each virtual register with unsorted live ranges. Debug assertions are used to ensure that functions that require the vector to be sorted are not called when it’s marked unsorted.</p>

<p>Vectors are also better for cache locality and they let us use binary search in a few places. When I was discussing this with Julian Seward, he pointed out that Chris Fallin also <a href="https://cfallin.org/blog/2022/06/09/cranelift-regalloc2/">moved away</a> from linked lists to vectors in Cranelift’s port of Ion’s register allocator. It’s always good to see convergent evolution :)</p>

<p>This change from sorted linked lists to optionally-sorted vectors made Ion compilation of this Wasm module about 20 times faster, down to 14 seconds.</p>

<h3 id="semi-nca">Semi-NCA</h3>

<p>The next problem that stood out in performance profiles was the Dominator Tree Building compiler pass, in particular a function called <code class="language-plaintext highlighter-rouge">ComputeImmediateDominators</code>. This function determines the <a href="https://en.wikipedia.org/wiki/Dominator_(graph_theory)">immediate dominator</a> block for each basic block in the MIR graph.</p>

<p>The algorithm we used for this (based on <em>A Simple, Fast Dominance Algorithm</em> by Cooper et al) is relatively simple but didn’t scale well to very large graphs.</p>

<p>Semi-NCA (from <em>Linear-Time Algorithms for Dominators and Related Problems</em> by Loukas Georgiadis) is a different algorithm that’s also used by LLVM and the Julia compiler. I prototyped this and was surprised to see how much faster it was: it got our total compilation time down from 14 seconds to less than 8 seconds. For a single-threaded compilation, it reduced the time under <code class="language-plaintext highlighter-rouge">ComputeImmediateDominators</code> from 7.1 seconds to 0.15 seconds.</p>

<p>Fortunately it was easy to run both algorithms in debug builds and assert they computed the same immediate dominator for each basic block. After a week of fuzz-testing, no problems were found and we <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1919025">landed a patch</a> that removed the old implementation and enabled the <a href="https://searchfox.org/mozilla-central/rev/d56687458d4e6e8882c4b740e78413a0f0a69d59/js/src/jit/DominatorTree.cpp#19">Semi-NCA code</a>.</p>

<h3 id="sparse-bitsets">Sparse BitSets</h3>

<p>For each basic block, the register allocator allocated a (dense) <a href="https://en.wikipedia.org/wiki/Bit_array">bit set</a> with a bit for each virtual register. These bit sets are used to check which virtual registers are live at the start of a block.</p>

<p>For the largest function in the ONNX Wasm module, this used a lot of memory: 199477 virtual registers x 132856 basic blocks is at least 3.1 GB just for these bit sets! Because most virtual registers have short live ranges, these bit sets had relatively few bits set to 1.</p>

<p>We <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1920430">replaced</a> these dense bit sets with a new <a href="https://searchfox.org/mozilla-central/source/js/src/jit/SparseBitSet.h"><code class="language-plaintext highlighter-rouge">SparseBitSet</code></a> data structure that uses a hashmap to store 32 bits per entry. Because most of these hashmaps contain a small number of entries, it uses an <code class="language-plaintext highlighter-rouge">InlineMap</code> to optimize for this: it’s a data structure that stores entries either in a small inline array or (when the array is full) in a hashmap. We also <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1920433">optimized</a> <code class="language-plaintext highlighter-rouge">InlineMap</code> to use a variant (a union type) for these two representations to save memory.</p>

<p>This saved at least 3 GB of memory but also improved the compilation time for the Wasm module to 5.4 seconds.</p>

<h3 id="faster-move-resolution">Faster move resolution</h3>

<p>The last issue that showed up in profiles was a function in the register allocator called <code class="language-plaintext highlighter-rouge">createMoveGroupsFromLiveRangeTransitions</code>. After the register allocator assigns a register or stack slot to each live range, this function is responsible for connecting pairs of live ranges by inserting <em>moves</em>.</p>

<p>For example, if a value is stored in a register but is later spilled to memory, there will be two live ranges for its virtual register. This function then inserts a move instruction to copy the value from the register to the stack slot at the start of the second live range.</p>

<p>This function was slow because it had a number of loops with quadratic behavior: for a move’s destination range, it would do a linear lookup to find the best source range. We <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1920951">optimized</a> the main two loops to run in linear time instead of being quadratic, by taking more advantage of the fact that live ranges are sorted.</p>

<p>With these changes, Ion can compile the ONNX Wasm module in less than 3.9 seconds on my machine, more than 75x faster than before these changes.</p>

<h2 id="adobe-photoshop">Adobe Photoshop</h2>

<p>These changes not only improved performance for the ONNX Runtime module, but also for a number of other WebAssembly modules. A large Wasm module downloaded from the free online <a href="https://photoshop.adobe.com/discover">Adobe Photoshop demo</a> can now be Ion-compiled in 14 seconds instead of 4 minutes.</p>

<p>The JetStream 2 benchmark has a HashSet module that was <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1918970#c14">affected</a> by the quadratic move resolution code. Ion compilation time for it improved from 2.8 seconds to 0.2 seconds.</p>

<h2 id="new-wasm-compilation-pipeline">New Wasm compilation pipeline</h2>

<p>Even though these are great improvements, spending at least 14 seconds (on a fast machine!) to fully compile Adobe Photoshop on background threads still isn’t an <em>amazing</em> user experience. We expect this to only get worse as more large applications are compiled to WebAssembly.</p>

<p>To address this, our WebAssembly team is making great progress rearchitecting the Wasm compiler pipeline. This work will make it possible to Ion-compile individual Wasm functions as they warm up instead of compiling everything immediately. It will also unlock exciting new capabilities such as (speculative) inlining.</p>

<p>Stay tuned for updates on this as we start rolling out these changes in Firefox.</p>

<p>- Jan de Mooij, engineer on the SpiderMonkey team</p>]]></content><author><name>Jan de Mooij</name></author><summary type="html"><![CDATA[In September, machine learning engineers at Mozilla filed a bug report indicating that Firefox was consuming excessive memory and CPU resources while running Microsoft’s ONNX Runtime (a machine learning library) compiled to WebAssembly.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" /><media:content medium="image" url="https://spidermonkey.dev/assets/img/twitter-dark-large.png?1" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>