HTTP Toolkit

Dictionary Compression is finally here, and it's ridiculously good

Mon, 23 Feb 2026 13:00:00 GMT

Dictionary compression could completely change how applications send data over the web. It's recently gained broad support, and offers absurd real-world traffic reductions: initial testing shows YouTube JS download size for returning desktop users shrinking up to 90% (!!!) compared to existing best-practice compression, while the Google search results HTML (arguably the most optimized content on the internet) shrinks nearly 50%.

This works by initializing the (de)compression algorithm with a dictionary of data known in advance to both compressor & decompressor, so that the compressed data can just be references to that directly ("insert bytes 1 - 10,000 from the dictionary") without having to include the original data at all. This is applicable in a surprising number of scenarios, because most data we send (especially on the web) isn't completely novel or unpredictable. Today's JavaScript bundle shares 99% of its content with yesterday's JavaScript bundle - if the browser already has the old one, using that as a dictionary means you can compress down to (approximately) just the differences.

This can work either using a previous response as the dictionary for the next response, or using an explicit custom dictionary; for many kinds of dynamic response, you do know large chunks of the data in advance, like all the keys in your API's JSON response, and many common values that might be included, and you can generate & preload a dictionary defining exactly this to efficiently cover those. In either case, this can drastically shrink JS bundles, WebAssembly files, known-structure API responses, or many other kinds of incrementally updated & diffable content - a lot of the worst offenders for bandwidth usage that have become very common on the modern web.

This is now widely usable, safe to deploy without compatibility concerns, and surprisingly easy to set up.

Here's a quick low-level demo for Node.js (v24.6+ or v22.19+) so you can play with the raw compression directly for yourself:

const zlib = require('zlib');

// A very basic dictionary - a previous API response
const dictionary = Buffer.from(
  '{"type":"event","source":"server-2","status":"active"}'
);

// A new response we want to compress:
const dataToCompress = Buffer.from(
  '{"type":"event","source":"server-1","status":"inactive"}'
);

console.log(
    "Compressed data size without dictionary",
    zlib.zstdCompressSync(dataToCompress).length
);

console.log(
    "Compressed data size with dictionary",
    zlib.zstdCompressSync(dataToCompress, { dictionary }).length
);

Even in a toy example like this, that comes out to 65 bytes with normal Zstandard compression, vs 28 bytes when using the past response as a dictionary - 57% smaller.

You're welcome to skip to the meat to get this set up right now, but before we do, let's talk about how this works under the hood, the history and where this is supported today, and then pull back and look at practical setup.

Under the hood

We're going to focus on Zstandard here, just to keep things simple & focused (and because it's great). When we do the compression from the example above, the output is:

> console.log(zlib.zstdCompressSync(dataToCompress, { dictionary }));

What does this little string of hex actually mean?

28 b5 2f fd - this is a Zstandard magic number, so we know this is Zstandard data.
20 - Frame header description, with no dictionary id (we didn't include any dictionary name metadata) and the single-segment flag set.
38 - Hex 0x38 = 56 in decimal. This is the final size of the decompressed data.
9d 00 00 - Data block header, telling us we're about to read the last (and only) block of data, and it's 19 bytes long.
58 - Start an 11 byte 'literals' section (raw content, which the decompression process will read from to build the output).
31 69 6e 61 63 74 69 76 65 22 7d - in ASCII, this decodes as 1inactive"}. This is the only actual data from the input included in the output.
02 - Start the sequences section (LZ77 decompression instructions).
00 - The compression mode is FSE - this is how the following instructions are encoded.
80 93 3c 2a 20 - The decompression instructions. These are very complicated and tightly packed, but roughly work out as:
- Copy 33 bytes from the dictionary at offset 0: everything from the start to ...server-
- Copy 1 byte from literals: 1
- Copy 12 bytes from dictionary at offset 34: ","status":"
- Copy 10 bytes from literals: inactive"}

(Somewhat simplified, but you get the gist)

I think there's two notable things here: firstly, compressed data comes with a disproportionally large amount of overhead in small examples like this, and secondly, very very little of the original data is included here, so despite that overhead it ends up tiny. By pulling data directly from a dictionary, the vast majority of the original content we're compressing never actually appears in the output at all.

As you might imagine, as data gets larger the proportional overhead reduces drastically, and you get asymptotically closer to just distributing a diff between your data and the dictionary. In this kind of scenario, this is effectively a mechanism to efficiently deliver deltas between data, that's already tightly optimized & built-into browsers and backends you already use. Neat!

How did we get here?

Compressing data with custom dictionaries like this isn't especially new as a concept. It's existed at least back to the zlib rfc in 1996. However, until now use cases were relatively limited, as the DEFLATE (the compression algorithm that zlib wraps) comes with quite a few limitations like a tiny 32KB maximum sliding window, meaning you could only use a very small dictionary, and once you've processed another 32KB of data the original dictionary is out of the window & unusable. Maybe OK back in 1996, but not practical for much recently.

The larger problem though was that zlib lost the HTTP encoding war. Both gzip (meaning gzip-wrapped deflate) and deflate (meaning zlib-wrapped deflate) were standardized as options for the content-encoding header in HTTP, but deflate was incorrectly implemented by Internet Explorer and IIS (thanks Bill) creating a compatibility mess, so everybody stuck with gzip which actually worked reliably everywhere (but didn't support custom dictionaries).

In 2008, Google made a shot at custom dictionaries on the web anyway, introducing Shared Dictionary Compression for HTTP (SDCH) powered by the VCDIFF delta algorithm, including it in the very first version of Chromium, and using it on their own sites. This didn't really go anywhere, with no other browser implementations and little other usage on the web. The main issues here were privacy & security concerns, such as dictionary ids being used as a global cross-site tracking vector, and the uncertainty & caution around new compression options at the time, as attacks like CRIME were showing how compression could leak secrets in surprising ways. SDCH was also much more specialized, as VCDIFF is an algorithm for file deltas specifically, not a general purpose compression tool, and the lack of HTTPS usage meant middleboxes messing with headers & recompressing content could cause enormous problems as well.

SDCH was removed entirely with minimal fanfare in 2017.

In addition to all those good technical reasons, the real killer for SDCH was the rise of Brotli. The Brotli RFC was published in 2016, and included a fixed dictionary specifically designed to cover many core web use cases, blowing gzip performance out of the water by compressing common web content 10-20% better (although slower to do it, so generally used for static pre-compressed content). My impression is this took the last gasps of energy away from SDCH, shifting the performance focus in Chromium fully onto Brotli instead, and nailing that coffin for good.

So now lastly, bringing us up to the current day, a new competitor emerged in the form of Zstandard. Zstandard offers different state-of-the-art tradeoffs (almost as effective compression as Brotli, but much faster to do it) and with custom dictionary support from day 1, standardized in 2018. Brotli added its own official custom dictionary support in 2023 as well, and both algorithms now have standardized & reasonably widespread browser support.

That means all of a sudden nowadays we have a great pair of compression algorithms (very-efficient but slow Brotli, and pretty-efficient but super-fast Zstandard) which are widely supported everywhere on the modern web, and most importantly: both support custom dictionaries.

Where can I use dictionary compression today?

To actually use this, you need two things:

An implementation that supports custom dictionaries on both sides.
A way to coordinate both sides on the dictionary you're going to use.

If you want to use this entirely within your own codebase, coordination is generally fairly simple, so you just need implementations. There's been some great progress there recently. My understanding of the current state of things is:

In Node, Zstandard with custom dictionaries comes built-in as part of zlib in Node v24.6+ and v22.19+. Basic Brotli support has been around longer, but custom dictionary support was just merged recently, so should be landing in the next releases.
In Python, Zstandard with custom dictionaries comes built-in as compression.zstd for Zstandard as of Python 3.14, and there's a popular Brotli package on Pypi as well.
Rust has mature popular packages for both Zstandard and Brotli both including custom dictionary support.
JVM has mature packages like zstd-jni and Brotli4j
Go, .NET and others have less clear options, but plenty of libraries in the space, and often bindings to the native zstd/brotli libraries that can be used directly.

You do need support on the decompression side as well. If that's elsewhere within your systems, great, however if it's in a browser then for now this is only available in Chrome 130+ (and related browsers: Edge, Brave, etc). That said, both Safari & Firefox have public plans (here and here respectively) to support this as well so hopefully this will be universally supported soon.

Fortunately, you can start using it today even just for your Chrome users, because the browser proposal for this is designed around automatic negotiation of the dictionary to use. The standard for this is known as:

Compression dictionary transport

This is an IETF standard defining how clients & servers should distribute and use custom dictionaries, with Zstandard and Brotli, over HTTP. In the minimal case, the key step looks like this:

Client sends:

GET /some/content HTTP/1.1
[...other headers...]
Available-Dictionary: :abcdefabcdefabcdef:
Accept-Encoding: br, zstd, dcb, dcz

That means:

Here's the SHA-256 hash of the best dictionary I have for this request (encoded as base64, enclosed in colons - this is structured field byte sequence format).
Here's the encodings I support, e.g. dictionary-compressed Brotli (dcb) and dictionary-compressed Zstandard (dcz).

Then, if the server agrees to use the requested dictionary, it might send:

HTTP/1.1 200 OK
Content-Encoding: dcb
Vary: Accept-Encoding, Available-Dictionary

...a stream of data compressed with Brotli
using the abcdefabcdefabcdef dictionary...

If the server doesn't have or doesn't want to use that dictionary, it can reply in any other normal way, just like today. It's entirely opt-in on both sides, so it's safe to deploy now.

Note though that the Vary header here is important - that is an existing standard that tells any caches en route that this response depends on the request headers listed, and so any future requests with different values there (e.g. any requests asking for a different dictionary) should not be given this response from the cache.

This leaves one open question though: how does the client get the dictionary? There's two options:

The server can add a Use-As-Dictionary: match="/file/pattern/*" to any existing response. This tells the client it should save this response as a dictionary, and offer it later for matching requests.
You can add link relations (e.g. a Link: ... HTTP header or a HTML element) with rel="compression-dictionary" to tell the client to actively fetch a separate dictionary file. That file can then be served with Use-As-Dictionary to configure it.

The latter is largely relevant if you're planning to use a custom dictionary (building a custom file to maximize dictionary applicability & efficiency, instead of reusing existing content). See the Building your own custom dictionary section below for more details.

That's it! There's a few bonus things to note:

You can add ids to dictionaries, in addition to just using the cache, with id=... in Use-As-Dictionary, in which case the client will send it back to you in a Dictionary-ID header with the request.
This is all only usable on the same origin. This solves the privacy concerns with SDCH: you can't share dictionaries across origins in any way, so in terms of tracking they're only as capable as a first-party cookie.

Putting compression dictionaries into practice

Ok, the important bit, how do you actually implement this right now?

Let's assume you're interested in the most obvious use case: JavaScript bundles. For simplicity, let's say you have one JavaScript bundle at https://website.example/js/bundle.js which frequently changes in small ways, and you'd like to use dictionary compression to avoid resending every single byte from scratch every time, reducing this download size by 80% or so for returning users. Here's an outline of the setup steps:

Store your old bundles somewhere your backend can reach them. You need to organize them either by SHA-256 hash, or by some tightly linked id (e.g. git commit). This could be a folder on disk, an S3 bucket, or an internal cache service. You could keep the last few months, every version ever, or just the last few days depending on how often users generally return to your site.
Have your backend serve your JS bundle with Use-As-Dictionary: match="/js/bundle.js". Insert wildcards here (/js/bundle.*.js) here if the name can vary (e.g. if you use a hash or version or similar in the filename). Append , id="your-id" if you want a distinct id for each dictionary for easier reference.
If you receive a request for this path with an Available-Dictionary header, see if you have the matching bundle available (looking it up by hash, or use the id from the Dictionary-ID header).
If you find a matching bundle, and you support a dictionary-compression (dcb or dcz) that the client has sent in their Accept-Encoding header, then compress the content using this dictionary and send them the resulting tiny response.

Here's a rough outline for Express & Node.js, using Zstandard (dcz):

const express = require('express');
const fs = require('fs/promises');
const zlib = require('node:zlib');
const { promisify } = require('node:util');

const zstdCompress = promisify(zlib.zstdCompress);
const app = express();

const currentBundle = await fs.readFile('./dist/current/bundle.js');

async function getPreviousBundle(base64Hash) {
  // ...Lookup past bundle version from the hash somehow...
}

app.get('/js/bundle.js', async (req, res) => {
  const rawAvailableDict = req.get('Available-Dictionary') || '';
  const acceptEncoding = req.get('Accept-Encoding') || '';

  // Extract the base64 hash from the structured field (e.g. :hash:)
  const hashMatch = rawAvailableDict.match(/^:(.+):$/);
  const dictionaryHash = hashMatch ? hashMatch[1] : null;

  let dictionary = null;
  if (dictionaryHash) {
    dictionary = await getPreviousBundle(dictionaryHash);
  }

  if (dictionary && acceptEncoding.includes('dcz')) {
    // If we have a matching dictionary, and the client supports it, use it to
    // compress the content:
    const compressedBundle = await zstdCompress(currentBundle, { dictionary });

    res.set({
        // Confirm that you're using the dictionary:
        'Content-Encoding': 'dcz',
        // Tell caches not to reuse this for requests without this dictionary:
        'Vary': 'Available-Dictionary, Accept-Encoding',
        // Tell the client it can use this as a dictionary as well later on:
        'Use-As-Dictionary': 'match="/js/bundle.js"'
    });

    return res.send(compressedBundle);
  } else {
    // No dictionary - just send as is. You probably want to do some other
    // non-dictionary compression here depending on what the client supports.

    // But still, tell the client they can use this as a dictionary in later
    // requests for the same path:
    res.set('Use-As-Dictionary', 'match="/js/bundle.js"');

    res.send(currentBundle);
  }
});

This should immediately reduce traffic for returning users using modern Chrome versions (currently about 70% of web clients) dramatically, improving loading times for users client side, and reducing any bandwidth costs or constraints on the server side.

The open question here of course is how to store & access your old bundles. The easiest option is likely adding "push the bundle to S3, keyed by hash" to your deploy step, and then querying S3 for the hash here, with some limited caching in memory to skip the lookup entirely where possible. In time I expect this will become more standard practice with a clearly trodden path, but in the meantime that style of approach seems like a good starting point. Remember of course that the hash is a user-controlled value - don't just stick it in a URL and load the data without validation!

Building your own custom dictionary

For delta cases, where you're repeatedly delivering changing content and you really want to just transmit the changes, the easiest option is to use your past content as your dictionary as above. Simple and effective. I'm expecting CDNs will start to support this automatically in the not too distant future, since it's a quick win that they're very well positioned to enable (and charge for) to offer big performance boosts.

For other cases though, you may be able to do better than a simple delta: producing a smaller custom dictionary, that's relevant to more requests. Building the right dictionary however can be complex. Fundamentally it's just a bag of data that compressed output can reference without having to repeat it directly ("insert data from dictionary bytes 500 - 10,000 here"), but there are open questions about the efficient dictionary size and how to find and pack the relevant values for each use case. There's a few options for actually building this dataset:

Generate a dictionary explicitly, using training functionality built into the zstd CLI tool with a large set of example values. This is the best option, if you have a good example dataset of values on hand. Install zstd, then run zstd --train TrainingData/* -o dictionaryName. Brotli doesn't appear to have an official equivalent, but you can reuse a Zstd dictionary (although there are some Zstd-specific tweaks, so it's a bit less efficient) or there are plenty of unofficial implementations floating around.
Use a known template or example value - if you have a lot of content all related to a single base value (many HTML pages sharing some core content, API responses which all have the same structure) you can use any fixed example of the output or empty template of the structure as the dictionary. The best example is one that contains as much as possible of the data of the other responses, but nothing else, and without internal duplication.
Write a custom dictionary manually. It's just raw data, no structure required, so if you know lots of values that are likely to appear in your data (e.g. JSON keys & common repeated values) then you can just fill up a file with those directly and call it a day.

In all cases, this is an advanced manoeuvre, and it's very important to test the results in practice and tweak and tune to optimize this. Use the general case Node example from the intro above to quickly compare the performance with & without your dictionary, and test different examples of your data to confirm the dictionary really helps.

Real-world results

This is all early days (the RFC was officially finished in September 2025) but production rollouts and initial data are starting to appear, along with lots of published numbers from external testing of existing sites.

Digging into the httparchive data from February 2026, despite the early experimental status there's now real-world high-profile use including:

Google.com, using a custom dictionary file covering all content on the origin (match="/*").
Pinterest, applying Use-As-Dictionary to all JS on their s.pinimg.com CDN domain.
Notion, applying Use-As-Dictionary to all JS within the Notion app itself.
Speedkit, a "website acceleration" product used by people like Swarovski and Hyundai, generating & publishing a custom dictionary file for each of their customers which covers all their assets collectively.
Connatix, a widely-used embedded video platform, in sites like the Huffington Post and El Tiempo, applying Use-As-Dictionary to each JS file.
Shopify, embedded in sites across the web under /cdn/shopifycloud paths, using both their JS & CSS files directly as dictionaries.
Doubleclick and similar 3rd-party ad services, using each of their embedded JS scripts directly as a dictionary.

Most of these don't seem to have published much detailed info on how well it's working for them, except:

Google, who say this results in a 23% drop in average HTML traffic for Chrome users on the search results page, when including first-time users as well and the overhead of downloading the custom dictionary, increasing to a 50% reduction for returning users.
Speed Kit, who are reporting to up 95% compression ratios through their custom trained dictionary approach on their customer sites.

Beyond production deployments, there's plenty of public test results, where people have externally downloaded assets from a site over a period (e.g. two versions a week apart), and then tested the resulting dictionary compression that provides. Lots of these are listed in the original spec proposal here. Some notable examples include:

Youtube's desktop video player's JavaScript bundle: This is normally 10MB of JS, normally compressed with Brotli down to 1.8MB for transfer. Testing this with dictionary compression, assuming a user visited once and then again 2 months later, reduces that down to just 384KB (78% smaller than plain Brotli). Testing versions only a week apart reduced this even further down to 172KB (90% smaller than Brotli).
Amazon product listing pages: with a large custom dictionary, these shrink 60-70% compared to plain Brotli (e.g. 539KB uncompressed HTML = 84KB of Brotli = 10KB of Brotli with a custom dictionary)
Yoni Feng ran a broad set of external tests on various popular sites, and found multi-megabyte (!) reductions for WASM-based apps like Figma & Google Earth, which often need to deliver large WASM bundles that frequently change in small ways, along with compression improvements of up to 95% for popular JS-heavy sites like Reddit and Excel online. On the flip side, this did show much smaller benefits for text-heavy minimal sites like Wikipedia, down to just 28% improvement over plain Brotli.
Loveholidays developed a proof of concept using the technique early on (before official browser support) showing up to 57% reductions in their JS bundle data transfer size using a custom dictionary - training a single dictionary on all past versions of their bundle, rather than using past bundles.

On the flip side however: Discord explored using custom dictionaries with Zstandard to compress websocket messages within their client, manually coordinating the dictionary configurations involved (not using the HTTP headers above, since those don't apply to WebSockets). They found reductions of up to 60% on some messages, but less than 1% on others, and that manual coordination and distribution of dictionaries added too much complexity & overhead to be worthwhile - eventually rolling out plain Zstandard and tweaking their underlying protocol to communicate in deltas natively instead.

Caveats

Hopefully that's all very interesting and exciting for the future of data transfer. There are a few important things to note here:

In browsers, this is usable same-origin only. For tracking & security protection, you can't share dictionaries between origins, and you can't load one from elsewhere. If you're hosting widely embedded content, this is still useful, but won't magically get reused across the web in the way you might want (in much the same way that loading your website's JS libraries from a public CDN is no longer helpful either).
Caches can be tricky - be very careful that you don't accidentally cache dictionary-compressed data and use it in other cases. Recipients without access to the required dictionary won't be able to read the compressed data at all. When using the HTTP headers here, Vary: Available-Dictionary (meaning: only reuse this response for matching requests with the same Available-Dictionary header) is your friend.
Although this is unlikely to make your compression worse, it does add complexity & server processing time, and using your own custom dictionary has a bandwidth cost itself, since it needs to be downloaded separately. This isn't a free lunch, so you'll need to actually test the end results and compare the real bandwidth upsides to the extra complexity & processing required to see if it's worthwhile for your scenario.
These compression algorithms can be very efficient - if you're decompressing data with dictionaries yourself, don't forget to add maximum size limits to the output to ensure an attacker can't send you some small data that expands to become truly enormous. That risk already exists with standard compression, but this only makes it worse.
This allows you to frequently deliver incrementally changing application bundles like JavaScript and WASM much more efficiently. That's great, but remember it only affects the amount of data on the network. It'll still unwrap to the same size at the other end, and the time to actually parse & execute your enormous JavaScript bundle client-side won't change. Please please don't treat this as a license to deliver even bigger piles of JavaScript.

Wrapping up

Dictionary compression is potentially going to drive a huge change in network traffic, on the web and elsewhere. Our systems have effectively spent years sending the same bytes between the same computers over and over again, and this might just let us stop doing a very significant portion of that. It's very exciting!

Test it out for yourself and see how it works for you, and please do share any feedback or fixes back to this article (PRs welcome).

And of course, if you're working on this and you need great tools to debug and test HTTP up close, give HTTP Toolkit a go - fully open-source, one-click setup HTTP interception for browsers, Node, Docker and more, so you can see every header and byte that you're actually sending.

Funding the OSS Stack: HTTP Toolkit & Open Source in 2025

Sat, 21 Feb 2026 18:00:00 GMT

HTTP Toolkit, like effectively all software businesses, depends on a huge quantity of open-source code for much of its fundamental functionality & infrastructure. Most of this is tirelessly maintained by volunteers, completely for free! This honestly is a great deal for the businesses, but it would be even better if these maintainers were actually rewarded for their hard work.

As part of HTTP Toolkit's commitment to giving back to open source under the Open Source Pledge, a substantial chunk of revenue goes back into these open-source projects, to keep them healthy, reward maintainers for their tireless efforts, and encourage development on projects I care about to keep them moving forwards.

What does that mean in practice?

In 2025, httptoolkit paid $7,820 to open-source maintainers, taking the project's total past $20,000! Not bad for one person.

$1200 to Frida
$1100 to Node.js
$320 to Electron
$320 to Mobx
$240 to blakeembrey
$240 to szmarczak
$240 to lpinca
$240 to apocas
$240 to anonrig
$240 to Phosphor Icons
$240 to mmaietta
$240 to Regolith
$240 to endoflife.date
$240 to react-window
$240 to fast-xml-parser
$240 to node-datachannel
$240 to Ajv
$240 to Mocha
$220 to louislam
$220 to openapi-directory
$200 to johnnyreilly
$180 to Node.js Mobile
$180 to Open Web Docs
$160 to Tauri
$140 to Servo
$140 to Styled Components
$40 to roderickvd
$40 to Waydroid

Open source has done a lot to power my tech education & career, and literally powers the foundations of HTTP Toolkit, so I'm more than happy to be able to support the upstream projects those depend on. None of these are earth-shaking amounts individually but cumulatively they add up, especially as other people and organizations add their own contributions in turn: Open Source Pledge companies are now collectively donating nearly $3 million a year in total back to maintainers! If contributions like the above were expected behaviour from everybody building on open-source work, OSS would be a very different place.

Of course, this purely covers the financial contributions to open source. On top of that, all of HTTP Toolkit's own code is 100% open source, I personally maintain plenty of other projects (I'm one of the maintainers of Node.js, plus a handful of of smaller but popular libraries like loglevel) and there's been a long series of code contributions from HTTP Toolkit back to upstream projects along the way too.

If you or your employer are building software on top of open source, please give back to the projects you depend on and sign the Open Source Pledge. Even small amounts like this can make a huge difference to maintainers, and as more orgs get involved these collectively snowball remarkably quickly. Excited to see how much Open Source Pledge can achieve in 2026!

HTTP/3 is everywhere but nowhere

Wed, 12 Mar 2025 16:00:00 GMT

HTTP/3 has been in development since at least 2016, while QUIC (the protocol beneath it) was first introduced by Google way back in 2013. Both are now standardized, supported in 95% of users' browsers, already used in 32% of HTTP requests to Cloudflare, and support is advertised by 35% of websites (through alt-svc or DNS) in the HTTP Archive dataset.

We've developed a totally new version of HTTP, and we're on track to migrate more than 1/3 of web traffic to it already! This is astonishing progress.

At the same time, neither QUIC nor HTTP/3 are included in the standard libraries of any major languages including Node.js, Go, Rust, Python or Ruby. Curl recently gained support but it's experimental and disabled in most distributions. There are a rare few external libraries for some languages, but all are experimental and/or independent of other core networking APIs. Despite mobile networking being a key use case for HTTP/3, Android's most popular HTTP library has no support. Popular servers like Nginx have only experimental support, disabled by default, Apache has no support or published plan for support, and Ingress-Nginx (arguably the most popular Kubernetes reverse proxy) has dropped all plans for HTTP/3 support punting everything to a totally new (as yet unreleased) successor project instead.

Really it's hard to point to any popular open-source tools that fully support HTTP/3: rollout has barely even started.

This seems contradictory. What's going on?

I'm going to assume a basic familiarity with the differences between HTTP/1.1 (et al), HTTP/2 and HTTP/3 here. If you're looking for more context, http2-explained and http3-explained from Daniel Stenberg (founder & lead developer of curl) is an excellent guide.

Why do we need more than HTTP/1.1?

Let's step back briefly. Why does this matter? Who cares about whether HTTP/3 is being rolled out successfully or not? If browser traffic and the big CDNs support HTTP/3, do we even need it in other client or server implementations?

For example, one recent post argued there isn't much point to HTTP/2 past the load balancer. To roughly summarize, their pitch is that HTTP/2's big benefits revolve around multiplexing to avoid latency issues & head-of-line blocking, and these aren't relevant within a LAN or datacenter where round-trip time (RTT) is low and you can keep long-lived TCP connections open indefinitely.

You could make much the same argument for HTTP/3: this is useful for the high-latency many-requests traffic of web browsers & CDNs, but irrelevant elsewhere.

Even just considering HTTP/1.1 vs HTTP/2 though, the reality of multiplexing benefits is more complicated:

Latency of responses isn't just network RTT: a slow server response because of server processing will block your TCP connection just as hard as network latency.
Your load balancer is often not co-located with your backend, e.g. if you serve everything through a geographically distributed CDN, which serves most content from its cache, but falls back to a separate application server backend for dynamic content & cache misses.
Long-lived TCP connections die. Networking can fail in a million ways, even within data centers, and 'keep-alive' is a desperate plea at best. Even HTTP itself will force this: there's cases like a response body failing half-way through that are unrecoverable in HTTP/1.1 without killing the connection entirely.
Any spikes in traffic mean you'll end up with the wrong number of TCP connections one way or the other: either you need an enormous unused pool available at all times, or you'll need to open a flood of new connections as traffic spikes come in, and so deal with TCP slow start, RTT & extra TLS handshakes as you do so.
There's a lot of traffic that's not websites and not just within in datacenters. Mobile apps definitely do have network latency issues, API servers absolutely do have slow endpoints that can block up open connections, and IoT is a world built almost exclusively of unreliable networks and performance challenges. All of these cases get value from HTTP/2 & HTTP/3.

Moving beyond multiplexing, there's plenty of other HTTP/2 benefits that apply beyond just load balancers & browsers:

HTTP header compression (HPACK and QPACK in HTTP/2 & HTTP/3 respectively) makes for a significant reduction in traffic in many cases, especially on long-lived connections such as within internal infrastructure. This is useful on the web, but can be an even bigger boost on mobile & IoT scenarios where networks are limited and unrealiable.
Bidirectional HTTP request & response streaming (only possible in HTTP/2 & HTTP/3) enables entirely different communication patterns. Most notably used in gRPC (which requires HTTP/2 for most usage) this means the client and server can exchange continuous data within the same 'request' at the same time, acting similarly to a websocket but within existing HTTP semantics.
Both protocols support advanced prioritization support, allowing clients to indicate priority of requests to servers, to more efficiently allocate processing & receive the data they need most urgently. This is valuable for clients, but also between the load balancer and server: a cache miss for a waiting client has a very different priority to an optional background cache revalidation.

All that is just for HTTP/2. HTTP/3 improves on this yet further, with:

Significantly increased resilience to unreliable networks. By moving away from TCP's strict packet ordering, HTTP/3 makes each stream within the connection fully independent, so a missed packet on one stream doesn't slow down another stream.
Zero round-trip connection initialization. TLS1.3 allows zero round-trip connection setup when resuming a connection to a known server, so you don't need to wait for the TLS handshake before sending application data. For HTTP/1 & HTTP/2 though, you still need a TCP handshake first. With QUIC, you can do 0RTT TLS handshakes, meaning you can connect to a server and send an HTTP request immediately, without waiting for a single packet in response, so there's no unnecessary RTT delay whatsoever.
Reductions in traffic size, connection count, and network round trips that can result in reduced battery use for clients and reduced processing, latency & bandwidth for servers.
Support for connection migration allowing a client to continue the same connection even as its IP address changes, and in theory even supporting multi-path connections using multiple addresses (e.g. a single connection to a server using both WiFi & cellular at the same time for extra bandwidth/reliability) in future.
Improved network congestion handling by moving away from TCP: QUIC can use Bottleneck Bandwidth and RTT (BBR) for improved congestion control via active detection of network conditions, includes timestamps in each packet to help measure RTT, has improved detection of and recovery from packet loss, has better support for explicit congestion notifications (ECN) to actively manage congestion before packet loss, and may gain forward-error correction (FEC) support in future too.
Support for WebTransport, a new protocol providing bidirectional full-duplex connections (similar to WebSockets) but supporting multiplexed streams to avoid head-of-line blocking, fixing various legacy WebSocket limitations (like incompatibility with CORS), and allowing streams to be unreliable and unordered - effectively providing UDP-style guarantees and lower-latency within web-compatible stream connections.

In addition to the theory, there's some concrete measureable benefits being reported already. RequestMetric ran some detailed benchmarks showing some astonishing performance improvements for example:

And Fastly shared the major improvements in time-to-first-byte they're seeing in the real world:

This all very clearly Good Stuff.

Now that the technology is standardized, widely supported in browsers & CDNs and thoroughly battle-tested, I think it's clear that all developers should be able to get these benefits built into their languages, servers & frameworks.

The two-tier web

That's not what's happened though: despite its benefits and frequent use in network traffic, most developers can't easily start using HTTP/3 end-to-end today. In this, HTTP/3 has thrown a long-growing divide on the internet into sharp relief. Nowadays, there's two very different kinds of web traffic:

Major browsers plus some very-specific mobile app traffic, where a small set of finely tuned & automatic-updating clients talk to a small set of very big servers, with as much traffic as possible handled by either an enormous CDN (Cloudflare, Akamai, Fastly, CloudFront) and/or significant in-house infra (Google, Meta, Amazon, Microsoft).
Everybody else: backend API clients & servers, every other mobile app, every smaller CDN, websites without a CDN, desktop apps, IoT, bots & indexers & scrapers, niche web browsers, self-hosted homelabs, CLI tools & scripts, students learning about network protocols, you name it.

Let's simplify a bit, and describe these two cases as 'hyperscale web' and 'long-tail web'. Both groups are building on the same basic standards, but they have very different focuses and needs, and increasingly different tools & platforms. This has been true for quite a while, but the reality of HTTP/3 makes it painfully clear.

There's a few notable differences in these groups:

Long-tail traffic is bigger than the hyperscale traffic. 67% of web page requests are served directly without a CDN, and of CDN traffic to Cloudflare in 2024, 30% is automated and 60% is API clients.
The long-tail world is fragmented into different implementations, almost by definition. Most of the biggest implementations are open-source organizations with relatively little direct funding or full-time engineering power available, and much work is done by volunteers with no mandated central direction or clear focus.
The hyperscale world is controlled by a relatively small number of key stakeholders on both client & server side (you can count the relevant companies without taking your socks off). This lets them agree standards to fit their needs quickly & effectively - literally putting a representative of every implementation in the same room.
The hyperscale ecosystem has far more concentrated cash & motivations. It's a small number of players comprising some of the most valuable companies in the world, with business models that tie milliseconds of web performance directly to their bottom line.
The long-tail is completely dependent on open-source implementations and shared code. If you want to build a new mobile app tomorrow, you obviously should not start by building an HTTP parser.
The hyperscale ecosystem isn't worried about access to open-source implementations at all. They have sufficient engineering resources and complicated use cases that building their own custom implementation of a protocol from scratch can make perfect sense. Google.com is not going to be served by an Apache module with default settings, and Instagram is not sending requests with the same HTTP library as a Hello-World app.
The combination of hyperscale's evergreen clients plus money & motivation plus tight links between implementers and the business using the tools, means they can move fast to quickly build, ship & iterate new approaches.
Long-tail implementations are only updated relatively rarely (how many outdated Apache servers are there on the web?) and the maintainers are a tiny subset of the users, who care significantly about stability and avoiding breaking changes. Long-tail tool maintainers cannot just move fast and break things.

You can see the picture I'm painting. These two groups exist on the same web, but in very different ways.

Some of this might sound like the hyperscale gang are the nefarious baddies. That is not what I mean (fine, yes, there's an interesting conversation there more broadly, but talking strictly about network protocols here). Regarding HTTP/3 specifically, this is some superb engineering work that is solving real problems on the web, through some astonishingly tidy cooperation on open standards across different massive organizations. That's great!

There are many many people using services built by these companies, and their obsession with web performance is improving the quality of those services for large numbers of real people every day. This is very cool.

However, this would be much cooler if it was accessible to every other server, client & developer too. Most notably, this means the next generation of web technology is being defined & built by one minority group, and the larger majority have effectively no way to access this technology right now (despite years of standardization and development) other than paying money to the CDNs of that first minority group to help. Not cool.

OpenSSL + QUIC

I think the hyperscaler/long-tail divide is the fundamental cause here, but that's created quite a few more concrete issues downstream, the most notable of which is OpenSSL's approach to QUIC.

OpenSSL is easily the most used TLS library, and it's the foundational library for a large number of the open-source tools discussed above, so their support for this is essential to bringing QUIC & HTTP/3 to the wider ecosystem. There's already been quite a bit of extensive public discussion about OpenSSL's approach to QUIC, but as a quick summary:

BoringSSL shipped a usable API for QUIC implementations back in 2018.
OpenSSL did not, so various forks like QuicTLS appeared, providing OpenSSL plus BoringSSL's QUIC API.
An ecosystem of QUIC & HTTP/3 implementations (most notably Quiche, msh3/msquic, and nghttp3/ngtcp2) were built on top of BoringSSL and these forks over the many subsequent years.
OpenSSL has since slowly implemented an incompatible approach that this ecosystem can't directly use, with client support released in OpenSSL 3.2 (2023), and server support landing imminently in OpenSSL 3.5 (2025).

Some would argue this is a major mistake by OpenSSL, while I think OpenSSL would argue that BoringSSL's design is flawed and/or unsuitable for OpenSSL, and it was worth taking the time to do it right.

Regardless of who's actually 'right', this has created a significant schism in the entire ecosystem. Curl has a good overview of the state of play excluding OpenSSL:

OpenSSL's approach doesn't work easily in the TLS section for any of the existing QUIC & HTTP/3 implementations. In effect, they've started another column, but with no compatible implementations currently available in the HTTP/3 & QUIC spots.

This is a notable issue because for most major projects it would be an enormous & problematic undertaking to drop support for OpenSSL, which effectively means they still cannot ship built-in QUIC support today. Node.js recently briefly discussed even dropping OpenSSL entirely because of this, in favour of BoringSSL or similar, but it's clear that it's not practical: it would be an enormous breaking change, no alternative offers the same levels of long-term support guarantees, and Node and other languages are often shipped in enviroments like Linux distributions where it uses the system's shared OpenSSL implementation, so this would create big headaches downstream too.

This is one example of the difference in fundamental pressures of the two tiers of organizations on the web here: open-source tools can't break things like this, and the libraries available to the long-tail are fragmented and uncoordinated. Meanwhile hyperscalers can make decisions quickly and near-unilaterally to set up implementations that work for their environments today, allowing them to get the benefits of new technologies without worrying too much about the state of the open-source common ecosystem for everybody else.

What happens next?

I hope this makes it clear there's a big problem here: underlying organizational differences are turning into a fundamental split in technologies on the Internet. There's an argument that despite the benefits, the long-tail web doesn't need HTTP/3, so they can just ignore it, or use a CDN with built-in support if they really care, and there is no real obligation as such for the hyperscalers to provide convenient implementations to the rest of us just because they want to use some neat new tech between themselves.

The problem here though is that there are real concrete benefits to these technologies. QUIC is a significant improvement on alternatives, especially on slow or unreliable mobile internet (e.g. everywhere outside the well-connected offices of the developed world) and when roaming between connections. There are technologies built on top of it, like WebTransport which provides additional significant new features & performance to replace WebSockets. There will be more features that depend on HTTP/3 in future (see how gRPC built on HTTP/2 for example). Again: the technology here is great! But it's a challenge if those benefits are not evenly distributed, and only accessible to a small set of organizations and their customers.

Continuing down this road has some plausible serious consequences:

In the short term, the long-tail web gains a concerete disadvantage vs the hyperscale web, as HTTP/3 and QUIC makes hyperscale sites faster & more reliable (especially on slow & mobile internet).
Other web tools and components (React et al) used by developers either working for hyperscale organizations or building on top of their tools & infra will increase in complexity to match, taking HTTP/3's benefits for granted and moving forwards on that basis, making them less and less relevant to other use cases.
If we're not careful, the split between the long-tail & hyperscale cases will widen. New features & tools for each use case will emerge, and won't be implemented by the other, and tooling will increasingly stratify.
If hyperscale-only tech is widespread but implementations are not, it becomes increasingly difficult to build tools to integrate with these. Building a Postman-like client for WebTransport is a whole lot harder if you're implementing the protocol from scratch instead of just a UI.
You'll start to see lack of HTTP/3 support used as a signal to trigger captchas & CDN blocks, like as TLS fingerprinting is already today. HTTP/3 support could very quickly & easily become a way to detect many non-browser clients, cutting long-tail clients off from the modern web entirely.
As all this escalates and self-reinforces, it becomes less & less sensible for the hyperscale side to worry about the long-tail's needs at all, and the ecosystem could stratify completely

All of that is a way away, and quite hypothetical! I suspect some of this will happen to some degree, but there's a wide spectrum of possibility. It's notable though that this doesn't just apply to HTTP/3: the centralization and coordination of a few CDN & web clients like this could easily play out similarly in many other kinds of technological improvements too.

For HTTP/3 at least, I'm hopeful that there will be a happy resolution here to improve on this split in time, although I don't know if it will come soon enough to avoid notable consequences. Many of the external libraries and experimental implementations of QUIC & HTTP/3 will mature with time, and I think eventually (I really really hope) the OpenSSL QUIC API schism will get resolved to open the door to QUIC support in the many OpenSSL-based environments, either with adapters to support both approaches or via a new HTTP/3 & QUIC stack that supports OpenSSL model directly. If you're interested in working on either, and there's anything I can do to help directly or to help fund that work, please get in touch.

None of that will happen today though, so unfortunately if you want to use HTTP/3 end-to-end in your application, you may in for a hard time for a while yet. Watch this space.

Want to debug HTTP/1 and HTTP/2 in the meantime? Test out HTTP Toolkit now. Open-source one-click HTTP(S) interception & debugging for web, Android, terminals, Docker & more.

HTTP Toolkit is joining the Open Source Pledge

Tue, 31 Dec 2024 15:00:00 GMT

The Open Source Pledge is a new push to make companies commit to funding the maintainers of the open-source software they depend on, and to publicly recognize the ones that do.

HTTP Toolkit has donated back to maintainers for a few years now, but joining the Open Source Pledge today means formally committing to that, and to doing so publicly with a sustainable minimum level ($2000 per full-time developer, or higher) indefinitely into the future.

What is the Open Source Pledge?

HTTP Toolkit (and effectively 100% of other software businesses) depends on a huge quantity of open-source code for much of its fundamental functionality & infrastructure. Most of this is tirelessly maintained by volunteers, completely for free.

Astonishingly, that basically works, and we've built an entire software industry on top of it.

But it's not a fair deal, and over the years it has become increasingly clear that businesses replying on people maintaining their critical dependencies for free is not a good or sustainable approach (for either the businesses or the maintainers' mental health). Plenty of important open-source projects have been abandoned as maintainers moved on with their lives, while some have been actively removed entirely or even replaced with malicious content. For businesses these are big problems, but for maintainers it's not reasonable to expect that they'll continue actively supporting their projects for free indefinitely just because your business chose to use it.

There have been many attempts to fix this, from the organizational (internally forking key dependencies, avoiding dependencies entirely where possible) to the purely technical (dependency locking & mirroring, security scanning of new releases). At the end of the day though, the only real solution to open source sustainability is to fund the maintainers you depend on.

The Open Source Pledge aims to commit businesses to doing this, and to build a wider culture where that becomes the norm.

The actual mechanism for this is that businesses must donate at least $2,000 per full-time developer employee per year, and must publicly self-report the payments they're making and where that goes. In return, the business gets:

Public recognition, as a well-behaved business that supports the maintainers it depends on (good marketing for anybody who either sells to developers, or recruits them, or both).
Healthier dependencies, which are significantly more likely to continue to develop and stay actively maintained if they're well funded.
Better engagement with the maintainers they depend on, making it far more likely they can get the support they need, if required.

This is very new! It's being driven by Sentry, and it's not formally launching until October this year (2024), but HTTP Toolkit along with a selection of other open source-focused businesses like Astral, Scalar and Val Town are signing up in advance as part of the first wave.

(Does this sound like something your organization might be interested in doing? Is it possible you already fulfill the pledge requirements and your business just needs to be recognized for it? You can join the pledge too!)

HTTP Toolkit's contributions

Let's talk about the money. In total, so far HTTP Toolkit has paid $11,030 to open source maintainers.

Of course, $11k doesn't compete with the total impact of some notably committed larger organizations (Google, Microsoft, Sentry, et al) but HTTP Toolkit is a tiny project with literally one full-time employee (me). For a single individual, I think this is not too shabby.

Absurdly, I suspect that this total is still significantly more overall contribution than the majority of larger organizations. As a supporting datapoint, these contributions make HTTP Toolkit one of the top 10 funding organizations for Electron on Open Collective (with $620 USD in total). How many Electron projects are there out there, and how much profit have they made from its existence? All but 10 of them have donated less than $620 back to the project. Now think what that looks like further down the long tail of smaller projects.

Open-source projects provide huge value to us as developers and to software businesses everywhere. We can do better than this.

All numbers here are purely financial contributions to open source. On top of that, all of HTTP Toolkit's own code is 100% open source, I personally maintain plenty of other projects (I'm one of the maintainers of Node.js, plus various smaller libraries like loglevel) and there's been a long series of code contributions from HTTP Toolkit back to upstream projects along the way too.

Let's break down these financial contributions further and get into the details:

2024

In 2024, HTTP Toolkit paid $5740 to open-source maintainers.

Payment delivery is split almost equally between GitHub sponsors ($2900) and Open Collective ($2840), and spread out into a long tail of different projects:

$1240 to anonrig
$240 to johnnyreilly
$240 to blakeembrey
$240 to szmarczak
$240 to lpinca
$240 to apocas
$220 to phosphor-icons
$220 to endoflife.date
$220 to Regolith
$220 to react-window
$220 to fast-xml-parser
$220 to Tauri
$220 to node-datachannel
$220 to Ajv
$220 to Mocha
$220 to electron
$220 to styled-components
$220 to mobx
$220 to openapi-directory
$200 to frida
$120 to Servo
$40 to mmaietta
$40 to Chai
€20 to GoToSocial
$20 to UAParser.js

2023

In 2023, httptoolkit paid $4063 to open-source maintainers:

$240 to johnnyreilly
$240 to blakeembrey
$240 to szmarczak
$240 to lpinca
€233 to GoToSocial
$226 to openapi-directory
$226 to node-datachannel
$226 to Chai
$226 to Ajv
$226 to UAParser.js
$226 to styled-components
$226 to Mocha
$226 to mobx
$226 to electron
$207 to Tauri Apps
$180 to ardatan
$160 to apocas
$151 to fast-xml-parser
$60 to anonrig
$56 to react-window
€19 to exchangerate.host

2022

In 2022, httptoolkit paid $2474 to open-source maintainers:

$227 to openapi-directory
$208 to Chai
$208 to Ajv
$208 to Mocha
$208 to UAParser.js
$208 to electron
$208 to mobx
$208 to styled-components
$170 to node-datachannel
$120 to johnnyreilly
$120 to ardatan
$120 to blakeembrey
$120 to szmarczak
$120 to lpinca
€19 to GoToSocial

2021

In 2021, the project was just getting proper traction, and these donations were just starting off, so there was just one first donation right at the end of the year: $20 to the openapi-directory.

What's next?

As you can see from the numbers above, this has been steadily ticking upwards, and I'm intending to continue that as far as possible.

So far I've focused on steady monthly donations towards core dependencies (projects that HTTP Toolkit uses directly, for essential functionality) rather than a broader trickle down approach that would cover subdependencies or less-notable minor packages. There's an interesting debate about these kinds of approaches, but in HTTP Toolkit's case there's quite a few smaller projects and individuals in that list who are important to HTTP Toolkit and would otherwise receive very little funding indeed (e.g. Node-Datachannel, Fast-XML-Parser and UAParser.js all directly power specific product functionality, and AFAICT HTTP Toolkit is their #1 funding source). This will likely continue, but I may explore other tools like thanks.dev who aim to spread donations far more widely across the entire upstream project base in time.

Even in future, it's very unlikely that funding from HTTP Toolkit alone is going to fund any of these maintainers enough to quit their day jobs. That said, payments like this really could snowball into sufficient funding for maintainers to work on open source full time if just a small percentage of other organizations did the same, and even small payments help show maintainers that their work is valuable & appreciated.

Joining the Open Source Pledge is an important step to take this further. It's a way to publicly commit to doing this both now and in future, and to help redefine industry norms: companies that build on top of open source should fund maintainers.

If that sounds like something your organization might be interested in, I'd encourage you to sign up too! You can find out more at osspledge.com, or check out the steps to join here. Organizations joining before September 15th will be part of the first launch group, and included in an outdoor advertising launch campaign that Sentry will be running later this year.

Alternatively, if you just want to hear more and get involved, you can follow the project on GitHub or join the discussion on Discord.

This post was updated at the end of 2024, to provide the final donation totals for the full year, and again at the end of 2025 to fix small calculation inconsistencies.

ERR_PROXY_CONNECTION_FAILED errors with HTTP proxies

Wed, 11 Dec 2024 17:00:00 GMT

If you're using a local debugging proxy tool like HTTP Toolkit, you might run into the dreaded ERR_PROXY_CONNECTION_FAILED error in Chrome and other similar apps.

This can be a very frustrating and unhelpful error! There's only a few possible causes though, and it's usually easy to fix.

The Simple Case

The simplest explanation is exactly what it says: the browser can't connect to your proxy.

In the simple case this may be caused by a basic connection issue: you have the address, port or some authentication details wrong, or the details are correct but the proxy is just not reachable on your connection, and so the browser can't talk to it.

The easiest way to confirm this is to try connecting to the details directly using an HTTP client tool like curl, Postman, or the HTTP Toolkit Send page, to manually check if the server is listening on the port you expect. If you see a TCP error then the details are wrong, but if you see any kind of HTTP error or similar then you know that the server does exist and it's reachable with the given details.

If the failure comes from a browser intercepted by a tool like HTTP Toolkit though, where intercepted clients are preconfigured & launched for you, this is not what's happening - the server is definitely reachable (it's running locally) and the config is correct (it's been done automatically). Instead, this might be caused by something more complicated…

Beyond the Simple Case

If you know the settings are correct, and you know the proxy is reachable, what could cause this ERR_PROXY_CONNECTION_FAILED error?

Once you're ruled out basic connection issues, the most likely cause is antivirus software on your computer that is intercepting your connections. This is becoming a common feature of some antivirus & security software, most notably ESET. It's not directly related to viruses at all, but is a separate security feature that's often enabled automatically for additional protection.

What happens in this case is that your browser is trying to talk to the local proxy on your machine, but the antivirus intercepts that connection (getting in-between your browser and your proxy) so that it can scan all the browser traffic.

When it then tries to talk to the proxy server to forward the traffic, it discovers it doesn't recognize the TLS certificate (because HTTP debugging tools like HTTP Toolkit, Charles, Fiddler, Mitmproxy, Burp Suite, Proxyman, …etc, all use a custom personal CA certificate that's unique to your machine, allowing them to intercept and let you view & modify your TLS traffic). The certificate has to be manually trusted, and it's trusted in the browser, but because the browser is no longer talking directly to the proxy server that doesn't help.

Because of this, the antivirus rejects the proxy server's certificate, and then closes the connection the browser was trying to make. From the browser's point of view, it just made a connection to the proxy, it didn't work, and so we see ERR_PROXY_CONNECTION_FAILED with no more information.

This isn't great behaviour here, and ESET particularly do seem to cause widespread issues with this TLS interception, but still enable it automatically for all users.

Checking this certificate makes little sense (malicious traffic on a connection to a server on the same local machine doesn't seem a real concern for ESET customers - they don't need to intercept this connection at all) and I've tried to talk to them about this with no luck - if you're a customer, do please let them know this is affecting you!

The Solution

If you're affected by this, the solution is to disable the TLS interception & scanning features of your antivirus (no need to disable your antivirus completely). It will generally be called "TLS scanning", "web traffic scanning", "HTTPS scanning" or similar. If your antivirus supports a way to skip this for certain cases that might be helpful, but I haven't seen this working in practice.

In the specific common ESET case, the steps to do this are:

Open ESET
Go to the advanced settings (press F5)
Click 'Protections' then 'SSL/TLS'
Disable the 'Enable SSL/TLS' option
Click OK to save your settings

This doesn't affect the rest of your security settings, it simply stops ESET from intercepting & monitoring your network traffic, which interferes with debugging tools like HTTP Toolkit that are trying to do the same.

If you run into this issue and disabling TLS scanning is still not working, or if you've found separate steps that are helpful and should be included here, please get in touch and let me know.

Designing API Errors

Mon, 09 Sep 2024 15:30:00 GMT

When everything goes smoothly with an API, life is pretty straightforward: you request a resource, and voilà, you get it. You trigger a procedure, and the API politely informs you it’s all gone to plan. But what happens when something goes pear-shaped? Well, that’s where things can get a bit tricky.

HTTP Status Codes

HTTP status codes are like a first aid kit: they’re handy, but they won’t fix everything. They give you a broad idea of what’s gone wrong, which can help plenty of tools and developers make reasonable assumptions, like:

400 Bad Request: Report error to developers, something is broken.
401 Unauthorized: Might need to refresh a token, don't try again until you have.
404 Not Found: If accepting user input to lookup a resource then this isn't a problem, so don't worry about it. Just tell the user the thing they're looking for isn't there.
405 Method Not Allowed: Ahhhh panic, the API has changed or the client was built wrong.
429 Too Many Requests: Do not retry this request until after the rate limit is over or you'll DDoS the server and get banned.
501 Not Implemented: Oh heck you've gone live relying on an endpoint which isn't ready in production, alert everyone.
504 Gateway Timeout: Probably retry that one straight away as it's likely a network blip.

HTTP status codes can convey a lot of assumptions, but they cannot possibly cover all situations, so it's important to add something for the human developers to see what's wrong.

Written Description of the Problem

Let’s say you’re building a carpooling app and you need to plan a trip between two places to find more riders. If the coordinates you provide are too close together, the API might respond with something like:

HTTP/1.1 400 Bad Request
{
  "error": "Too close for a carpool to be organized, suggest get out and walk."
}

This is a 400 Bad Request, but that's a pretty common error and a little more information needs to be conveyed, so a string has been added explaining the problem.

Next a user tries to plan a road trip from London to Iceland, which is logistically problematic. The API might come back with:

HTTP/1.1 400 Bad Request
{
  "error": "Invalid geopoints for possible trip."
}

Here is another 400, and a very different problem. This human message could be passed on to the user so they can figure out what to do next, but the application will not be able to determine the difference between these two errors programmatically, so cannot update the interface differently for either problem.

You could try and find another status code, and people get pretty deep in the weeds trying to find specific codes for every situation ever, but that generally leads to bending conventions beyond their purpose. People do weird things like throwing a "417 Expectation Failed" for something because the client's expectation could not be met… when that code is explicitly tied to the Expect header, something you'll likely never use in your API.

If we think of HTTP status codes like exceptions in your programming language of choice, if all you saw was Exception with no other information, you'd have no clue what is going wrong. Status codes provide a category of error RuntimeError, TypeError, ArgumentCountError, and the error description is like the string passed to an exception: RuntimeError("This particular problem occurred.").

This is better, but does not help applications programmatically differentiate between two different types of problem with the same status code, without doing something horrible like substring matching on text which might change.

Helping Machines with Error Codes

Forcing developers to match human-readable strings for different errors is no good, we also need to help "the machines" know specifically what is going on, so they can work out if they should trigger different interfaces, modals, retry, back-off, report the problem, or do something else.

{
    "error": {
        "type": "missing_api_version",
        "message": "Missing the x-monite-version HTTP header"
    }
}

Here is an example of an error in the Monite API, where they've turned the error into an object, moved the string into message, and added type which is a unique name for various different types of application-specific problem which could happen.

Now programmers can do if (error.type === 'missing_api_version') instead of matching keywords in sentences which might change, which is a huge step forwards.

Complete Error Objects

A type and a message are a great start, and if that's how far you get then fine, but there's more you can do to turn errors into a handy feature instead of just a red flag.

Here's the full list of what an API error should include:

HTTP Status Code: Indicating the general category of the error (4xx for client errors, 5xx for server errors).
Short Summary: A brief, human-readable summary of the issue (e.g., "Cannot checkout with an empty shopping cart").
Detailed Message: A more detailed description that offers additional context (e.g., "It looks like you have tried to check out but there is nothing in your cart").
Application-Specific Error Code: A unique code that helps developers programmatically handle the error (e.g., ERRCARTEMPTY).
Links to Documentation: Providing a URL where users or developers can find more information or troubleshooting steps.

You can build your own custom format for this, but why bother when there's an excellent standard in play already: RFC 9457 - Problem Details for HTTP APIs (replacing RFC 7807 which is basically the same.)

{
  "type": "https://signatureapi.com/docs/v1/errors/invalid-api-key",
  "title": "Invalid API Key",
  "status": 401,
  "detail": "Please provide a valid API key in the X-Api-Key header."
}

This example of an error from the Signature API includes a type, which is basically the same as an application-specific error code, but instead of an arbitrary string like invalid-api-key the standard suggests a URI which is unique to your API (or ecosystem): https://signatureapi.com/docs/v1/errors/invalid-api-key. This does not have to resolve to anything (doesn't need to go anywhere if someone loads it up) but it can, and that covers the "link to documentation" requirement too.

Why have both a title and a description? This allows the error to be used in a web interface, where certain errors are caught and handled internally, but other errors are passed on to the user to help errors be considered as functionality instead of just "Something went wrong, erm, maybe try again or phone us". This can reduce incoming support requests, and allow applications to evolve better when handling unknown problems before the interface can be updated.

Here's a more complete usage including some optional bits of the standard and some extensions.

HTTP/1.1 403 Forbidden
Content-Type: application/problem+json
{
 "type": "https://example.com/probs/out-of-credit",
 "title": "You do not have enough credit.",
 "detail": "Your current balance is 30, but that costs 50.",
 "instance": "/account/12345/msgs/abc",
 "balance": 30,
 "accounts": ["/account/12345", "/account/67890"]
}

This example shows the same type, title, and detail, but has extra bits.

The instance field allows you to point to a specific resource (or endpoint) which the error is relating to. Again URI could resolve (it's a relative path to the API), or it could just be something that does not necessarily exist on the API but makes sense to the API, allowing clients to report a specific instance of a problem back to you with more information that "it didn't work…?".

The balance and account fields are not described by the specification, they are "extensions", which can be extra data which helps the client application report the problem back to the user. This is extra helpful if they would rather use the variables to produce their own error messages instead of directly inserting the strings from title and details, opening up more options for customization and internationalization.

Summary

Handling errors in API design is about more than just choosing the right HTTP status code. It’s about providing clear, actionable information that both developers, applications, and end-users of those applications can understand and act upon.

By adopting standard error formats and thinking carefully about how errors are communicated, you can make life easier for everyone who interacts with your API—whether they’re human or machine.

22 years later, YAML now has a media type

Tue, 20 Feb 2024 15:00:00 GMT

As of February 14th 2024, RFC 9512 formally registers application/yaml as the media type for all YAML content, and adds +yaml as a standard structured suffix for all YAML-based more specific media types. With this registration, it's now included in the official media types list maintained by the IANA.

Media types like this (also known as the MIME types, from their original invention for email attachment metadata) are heavily used particularly in HTTP Content-Type headers for both requests & responses, and in all sorts of file metadata and processing logic elsewhere. These names give applications a common vocabulary to describe data when passing it around.

The additional +yaml suffix also defined here is particularly useful. Media type structured suffixes like this (+xml and +json are other common examples) are used to define specific types for content that's based on an existing generic content type (such as YAML). In this case, this notably opens the door to standardization of other YAML-based MIME types, such as application/openapi+yaml (for OpenAPI specifications that are written in YAML) which is currently being formalized in another standard, following closely behind this one.

While a few applications have been using application/yaml and +yaml like this already, many haven't (e.g. Rails still uses application/x-yaml, and others like text/yaml and even text/x-yaml are frequently seen in the wild) and there's never been clear agreement on exactly how this should work. Hopefully with this RFC we'll be able to start picking a single media type consistently from now on, though updating older applications here will obviously take some time.

The full RFC is well worth a read if you're interested in the finer details, and discusses all sorts of more detailed questions around interoperability for an evolving language like YAML, its relationship with JSON, and the (many) security considerations to be aware of when defining a formal API around YAML data.

This RFC is just one small change of course, but the hard slog of standardization like this is an important process that helps literally keep everybody on the same page. Everybody being able to agree on what YAML is actually called when sharing metadata should make it much easier to build software that more easily & reliably integrates together around YAML data in future.

Congratulations to everybody involved in the RFC and the HTTPAPI working group at the IETF who pushed this through!

Want to debug your OpenAPI APIs and even inspect YAML HTTP traffic up close? Try out HTTP Toolkit now.

What is X-Forwarded-For and when can you trust it?

Wed, 31 Jan 2024 17:00:00 GMT

The X-Forwarded-For (XFF) HTTP header provides crucial insight into the origin of web requests. The header works as a mechanism for conveying the original source IP addresses of clients, and not just across one hop, but through chains of multiple intermediaries. This list of IPv4 and IPv6 addresses is helpful to understand where requests have really come from in scenarios where they traverse several servers, proxies, or load balancers.

A typical HTTP request goes on a bit of a journey, traversing multiple layers of infrastructure before reaching its destination. Without the X-Forwarded-For header, the receiving server would only see the IP address of the last intermediary in the chain (the direct source of the request) rather than the true client origin.

{/* sequenceDiagram actor Client as

Client
28.178.124.142 participant CDN as CDN
198.40.10.101 participant LB as Load Balancer
198.40.10.102 participant B as Backend App
10.0.0.1 Client->>CDN: CDN->>LB: LB->>B: */}

In thie example, by the time the backend application is seeing an incoming request, the IP address of the original client is long forgotten. This is where the X-Forwarded-For header can help out. It looks like this:

X-Forwarded-For: 28.178.124.142, 198.40.10.101

The goal here is to give a proxy the chance to say "Alright hang on, I'm forwarding you a request, and this is the history of where it came from, as far as I know".

Note that the last proxy will not add its own IP address to the list, because that's already available: if the receiver of the request cares about who is calling it directly, they can combine the X-Forwarded-For with the request's source IP address from the incoming connection, e.g: req.connection.remoteAddress in NodeJS.

In this example above, the load balancer has said "Hey backend app, I am forwarding you a request that came from this client, via the CDN", and it doesn't need to pop its own IP in there because the backend app can already tell if it's coming from the load balancer or not.

And of course the backend app's own IP is also not included, as it's the one actually receiving the header.

What is X-Forwarded-For used for?

Knowing the original source & processing path of requests has a whole load of use cases depending on what you're building.

User Authentication: Use the header information to ensure that login attempts originate from recognized and authorized locations, and flag the login as suspect if not, triggering 2FA checks.
Load Balancing: Evenly distribute incoming traffic across servers, to ensure optimal performance during busy periods.
Data localization: European Union, Brazil, China and others have privacy laws about where data can be kept, and this can help identify those users who need special treatment.
Geographic Content Delivery: CDNs use X-Forwarded-For to determine the user's location and serve content from the nearest server to reduce latency.
Access Control and Security: Websites use X-Forwarded-For to verify the legitimacy of requests and implement access controls based on IP addresses, like a corporate intranet that only allows access to certain resources for employees coming from recognized office IP ranges.
Web Application Firewalls (WAF): Filter incoming traffic, blocking suspicious requests from a known malicious IP address listed in X-Forwarded-For.
Fraud Prevention: Financial institutions use X-Forwarded-For to detect and prevent fraudulent activities based on user location, e.g. identifying an unusual login attempt from a location that is inconsistent with the user's typical access patterns.
API Rate Limiting: APIs use X-Forwarded-For to enforce rate limiting on a per-client basis. An API provider limits the number of requests from a specific IP address within a given time frame to prevent abuse.
Localized Advertising: Ad platforms use X-Forwarded-For to customize and target ads based on the user's geographical location.
Logging and Analytics: log to analyze user traffic patterns and behaviors for statistical purposes, like the geographical distribution of users over a specific time period.

We're talking about security here, but this is an HTTP Request header… so can it not just be completely faked? Is the whole Internet built on a lie?!

Can you trust X-Forwarded-For?

You should never fully trust anything in an HTTP request that is coming from the outside world, and that includes X-Forwarded-For headers.

Actors can be malicious or misconfigured, but either way the contents of an HTTP request can be completely made up, and somebody could use X-Forwarded-For to pretend they're coming from inside your corporate VPN once they know the IP, pretend they're in the same geographic region as a user whose bank account they're trying to log into, or all sorts of other shenanigans.

Here's an example of how that might look, if the incoming header is blindly trusted:

{/* sequenceDiagram actor Client as Client
IP 28.178.124.142

participant CDN as CDN
IP 198.40.10.101 participant LB as Load Balancer
IP 198.40.10.102 participant B as Backend App Client->>CDN: X-Forwarded-For:
1.1.1.1 CDN->>LB: X-Forwarded-For:
1.1.1.1, 28.178.124.142 LB->>B: X-Forwarded-For:
1.1.1.1, 28.178.124.142, 198.40.10.101 */}

In this case, the client is sending an initial request to us that already includes an X-Forwarded-For header with a 1.1.1.1 value. This could be the client's real internal address that's been added by proxy related to the client, or it could be an attempt by the client to confuse the server about the client's IP. It's impossible for us to tell the difference, so we must ignore this, and treat the client address our infrastructure sees (28.178.124.142) as the real source IP.

One way to gain some control over the X-Forwarded-For header is to involve a trusted reverse proxy, and disable direct access at the network level to the backend server and other proxies/servers/load balancers except through that proxy. For API developers this is typically handled by an API Gateway, but it could also be a CDN like Fastly, Squid Proxy, Cloudflare, etc. If the request is coming through a trusted proxy, and that reverse proxy itself hasn't been hacked, you're probably ok to believe at least some of the IP chain you're seeing. But which parts?

In general, the further left you look in the header the more room there is for mistakes, as there are more servers which could be misconfigured, and anything that's coming from beyond the left-most proxy you control should be treated with suspicion.

To help with this, you can make decisions at the reverse proxy level that change how the header is constructed. For example, nginx can override the X-Forwarded-For header completely, ditching whatever the client provides, and replacing it with the IP address it is seeing. If all requests come through nginx, this effectively draws a line around your infrastructure, and drops all untrusted values received from outside, allowing all other services within your infrastructure to trust the header.

You can do that for nginx using the below config:

proxy_set_header X-Forwarded-For $remote_addr;

This replaces the X-Forwarded-For header with the client's real IP address, dropping anything else.

This resolves the issue with the client above who appears to be trying to hide their IP address, or pretend to be somebody else, by sending a falsified X-Forwarded-For header. This header would be ignored by the CDN, and the request value would be replaced with just the actual IP address as seen by that server rather than being blindly accepted, like so:

{/* sequenceDiagram actor Client as Client
IP 28.178.124.142

participant CDN as CDN
IP 198.40.10.101 participant LB as Load Balancer
IP 198.40.10.102 participant B as Backend App Client->>CDN: X-Forwarded-For:
1.1.1.1 CDN->>LB: X-Forwarded-For:
28.178.124.142 LB->>B: X-Forwarded-For:
28.178.124.142, 198.40.10.101 */}

Dropping all external values like this is the safest approach when you're not sure how secure and reliable the rest of your call chain is going to be. If other proxies and backend apps are likely to blindly trust the incoming information, or generally make insecure choices (which we'll get into more later) then it's probably safest to completely replace the X-Forwarded-For header at that outside-world facing reverse proxy, and ditch any untrustworthy data in the process.

On the other hand, if you're confident the backend app is able to handle it, you can accept the incoming values, and simply append the IP address being seen by the server on the end of the chain. This can be helpful if your infrastructure doesn't necessarily have a single entry point, making it harder to guarantee this kind of header sanitization.

To simply append the incoming address to the header, but preserve the original values too, you can use this nginx config:

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

This is a special bit of functionality in nginx, but most reverse proxies will have similar logic. It will build up the header to maintain whatever the client initially passed, and if they are connecting through any intermediary proxies, so long as those proxies support X-Forwarded-For, then that information will be passed along too.

With this approach, the backend server receives the full chain of everything reported through to them. This can be good or bad, and requires that everyone using this header knows that there can be a fair few problems with this data, and is prepared to handle it correctly.

Which IP in the list is really the client?

In either of the above scenarios, your backend server will eventually end up with a header containing a list of IP addresses.

You might be looking at that list of IPs wondering "which is the client IP" - for almost all the use cases discussed, this is the key value you're interested in. The answer: reading right to left, the first IP address that is not one of yours. Although there might be more IPs before this (which could be proxies local to the client, or their ISP) there's no way whatsoever to verify them, and so you have to ignore them entirely and use the first-from-right unknown address.

Note that this logic assumes that your server is not directly accessible. If it is, you need to check the actual request source IP address is one of yours first - effectively treating that as an extra right-most address.

When it comes to logging you want to log all of these IPs (escaped and limited to valid values) but when it comes to security-based use cases there is only one "client IP" you can trust: the one immediately to the left of the last known private/internal IP address. In other words: the last "not one of ours" server that's valid, and not internal.

So, if we got a request like this, which one do we use for security checks or geolocation?

X-Forwarded-For: 1.2.3.4, 172.16.1.101, 28.178.124.142, 198.40.10.101

There are two common approaches to actually implement this:

1) Picking by Trusted Proxy List

One approach is to look at the specific IPs and see if they are recognized, going from the right until you see an IP you don't recognize. This requires maintaining a list of IPs (or at least IP ranges) for your internal infrastructure.

Let's assume that 198.40.10.101 is flagged as one of your own servers in your list.
Then you look at 28.178.124.142. Is that not one of yours? Not invalid? Not internal? Great - that's the client, and it looks like they're in Ohio, USA. Serve them content from the American servers, and no data protection required because they don't care about privacy laws.

One mistake here would be to look one to the left of the first-from-left internal IP. If you do so, that might notice 172.16.1.101 which is an internal IP according to IANA. Yes, it is internal, but it's not a known and trusted internal IP you control. That has been reported to you in the HTTP request, possibly for valid reasons (the client is using some sort of intermediary server), or possibly malicious reasons (the client is really 28.178.124.142 but they want you to think their real IP address is 1.2.3.4 and hoped this would trick you).

That can be a pretty complicated approach and requires the backend to know quite a lot about the infrastructure involved, and might be even harder in a cloud environment where IPs might be changing a lot, but it's doable.

2) Picking by Trusted Proxy Count

Instead of looking at the specific IPs, the number of trusted proxies between the internet and the backend server can be configured. This can be more flexible that tracking IP ranges, but requires careful maintainence whenever layers of intermediate infrastructure are changed, and can be very challenging if there are multiple paths to the same servers.

To do this, the X-Forwarded-For IP list is searched from the right, skipping the number of proxies minus one.

For example, in this infrastructure we know we have two intermediaries: the CDN & the load balancer.

Let's assume the server receives something like this:

X-Forwarded-For: 1.2.3.4, 172.16.1.101, 28.178.124.142, 198.40.10.101

When reading this, remember that the backend is the one receiving this header, so it doesn't modify it or appear in it, and everybody else adds the previous step, so the load balancer isn't included either (this is why we minus one from our count).

So, to understand this header we're going to count from the right, counting how many proxies are to the left before we get to the outside Internet. Remember that we could the number of intermediaries minus one - so in this case we just skip the one right-most value (198.40.10.101). The logic is:

Load Balancer - this IP is not shown in the header (but is available to the backend server anyway, because it's the direct source of the request connection).
198.40.10.101 - CDN
28.178.124.142 - Client IP

Code to do this for yourself could look something like this:

const headers = {
  "X-Forwarded-For": "1.2.3.4,172.16.1.101,28.178.124.142,198.40.10.101"
}

const config = {
  trusted_proxy_count: 2
}

const ips = headers["X-Forwarded-For"]
  .split(',') // Get the separate IPs
  .reverse(); // Count from the right
const clientIp = ips[config.trusted_proxy_count - 1];

console.log(clientIp) // "28.178.124.142"

(This is simplified for readability - note that this doesn't handle any of the broader security concerns we'll discuss below)

Basically if there are three reverse proxies, the last two IP addresses will be internal.

If your infrastructure is simple and there is only one reverse proxy, that proxy will add the client's IP address, so the rightmost address can be used directly (as long as you're sure that the backend server is only accessible via the proxy server or you always validate that the IP of the received request itself is the reverse proxy's IP).

Parsing X-Forwarded-For headers

Above we've talked about which parts of the data you read from the X-Forwarded-For header you can trust. But there are also some risks to manage in the parsing stage, before you even start looking through the values themselves.

In most cases, this parsing should be handled for you by a library or your server framework, but it's important to understand how that works, and to test that this is handled correctly in your environment.

Invalid IP Addresses

First of all, there's no reason to assume everything in there will always be a valid IP address.

A client can include anything. If your code assumes valid data in a specific format, it can easily crash, typically resulting in 500 Internal Server Error for the client and potential other side effects server-side, and opening the door to all sorts of DDoS attacks.

Even valid data can be challenging, as there's a few different possible formats here. The IPv4 address format is fairly simple, standard & well recognized (IPv4 addresses should be 4 numbers, each from 0 to 255, separated by dots) and so this is generally not a huge problem, but with the rising use of IPv6 this gets quite a bit more complicated, and it's worth familiarizing yourself with the IPv6 address format if you're not already.

In some scenarios, the header may also include a port on the addresses too (with a :$PORT suffix) - typically you'll want to just ignore this, but it's worth testing that that is handled correctly.

Wikipedia has some useful examples of common formats.

Separator parsing

Between each IP address, a simple comma with optional whitespace is used as a separator.

This too has some parsing gotchas. There's a few ways you could parse this wrong, and doing so would open doors to request smuggling, where an intermediate server interprets data differently to the backend server, and shared assumptions break down.

Blindly splitting on a simple , character and sending the result off to various other services (like an API call) means all sorts of unexpected things could happen.

You'll want to watch out for trailing or leading commas here, make sure you trim whitespace between values, and again reject any unexpected invalid data here aggressively to keep things clean & under control.

Multiple headers

It's quite possible for a request to include multiple X-Forwarded-For headers! There's rarely any good reason to do this, but similar to these other issues, you should make sure you handle it correctly.

If this happens, the HTTP spec says:

A recipient MAY combine multiple header fields with the same field name into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field value to the combined field value in order, separated by a comma.

I.e. it's technically correct to combine them together in order. That said, trusting that this is safe & correct requires assuming that upstream proxies have done the same, or you're exposed to request smuggling here too.

Up to you, but in practice this should be extremely rare in normal usage, so if you're not sure I'd be inclined to drop all duplicated headers entirely.

Arbitrary code execution?

Beyond invalid or faked addresses, it's technically possible to include literal malicious code in here!

If a bad actor knew (or guessed) the programming language and logging frameworks being used, and if that logging system does not sanitize/escape input or has a bug that allows code to be executed in the logger, then somebody could be running random code on your server.

This is particularly notable given vulnerabilities like CVE-2021-44228 a.k.a. "Log4Shell” where logging a string could indeed potentially run arbitrary code.

Given a request like:

X-Forwarded-For: 1.2.3.4,nonsense,${malicious()},2.2.2.2,28.178.124.142,198.40.10.101

If a service affected by this vulnerability attempted to log that via log4j, it could run arbitrary code, creating major problems.

To protect against this, remember to validate inputs first - any logic using this field should ignore anything that is not a valid IP address. Beyond that, it's sensible to do some basic checks before logging this to a database, like checking the length of string so you're not filling up your DB with trash and potentially making other DDoS and resource management issues.

See OWASP A03:2021 – Injection and OWASP API4:2023 - Unrestricted Resource Consumption for more on what can go wrong there.

Whichever approach you pick for the client-facing proxy, make sure every other server in the chain is not directly accessible, or you'll not ever be able to trust any of it.

Alternatives & Standards

Forwarded Header

X-Forwarded-For is very widely used, but it's not part of any current formal specification. It's a convention implemented by various bits of software and services in a similar way, but has not been standardized by anyone like the IETF.

The IETF folks have been deprecating X- headers since RFC 6648: Deprecating the "X-" Prefix and Similar Constructs in 2012 (if you're interested in why, Mark Nottingham gives great backstory on this in Stop it with the X- Already!). In part the effort to standardize common X- extensions is down to a single specification making implementations more reliable, but in practice its also because the convention that was first imagined is often overly simplistic, missing important use-cases not thought of when first conceived.

Whilst the X-Forwarded-For is popular, and generally doesn't suffer from compatibility issues, there's room for making it better, and that takes the form of the Forwarded header defined by RFC 7239: Forwarded HTTP Extension finalized in 2014:

Forwarded: for=192.0.2.60;proto=http;by=203.0.113.43

It's really similar to X-Forwarded-For, but also incorporates functionality from X-Forwarded-Proto. By combining -For and -Proto into a single header there's less room for confusion stitching things together.

X-Forwarded-For: 192.0.2.172
Forwarded: for=192.0.2.172

X-Forwarded-For: 192.0.2.43, 2001:db8:cafe::17
Forwarded: for=192.0.2.43, for="[2001:db8:cafe::17]"

(Examples from MDN Web Docs: HTTP > HTTP headers > Forwarded)

Whether you should use X-Forwarded-For or Forwarded version can be a tricky question, and typically depends on the tools and service providers you're using. Does all the tooling in your infrastructure support it?

Does it matter for any practical reason when you can just use X-Forwarded-For, or is this just vibes and "being proper"? There is has one major benefit that might make it worth the effort of checking compatibility: Forwarded has extensibility.

With X-Forwarded-For, you have to figure out which IP address is the client IP with hardcoded rules such as “take the 3rd last IP address”. Whether you're using lists or counts, it's not just a faff, it's subject to change as your infrastructure evolves.

With Forwarded, as nginx suggest, your trusted client-facing proxy could include a secret token to identify itself:

Forwarded: for=12.34.56.78, for=23.45.67.89;secret=egah2CGj55fSJFs, for=10.1.2.3

Additionally, Forwarded allows including other fields like proto=, by=, host=, and potentially others in future.

Really, the only true downside here is the existing much wider use & support for X-Forwarded-For, but there are solid reasons to aim to use Forwarded where possible, and to plan to migrate tooling in that direction in future.

X-Forwarded-Host/Proto

There are actually two other headers in the XFF family:

X-Forwarded-Host: this allows a proxy to forward the originally provided Host header from an upstream client
X-Forwarded-Proto: this allows a proxy to forward the originally used protocol (http or https)

Neither is widely used, but it's worth being aware of these in case you do have a specific need for extra data about the original client request in future.

Via Header

There is another similar HTTP header: Via, defined in RFC 9110 and finalized in 2022. This header exposes the data of the intermediaries on the request path themselves, rather than the client IPs they saw (i.e. each proxy will add its own address, not the address of the client from its perspective).

This also has more of a focus on the protocol (HTTP or HTTPS) and the version (HTTP/1.1 proxy.example.re, 1.1 edge_1). This information is intended to help figure out if connection has been downgraded at any point, whether that's from HTTP/2 to 1.1 or from HTTPS to HTTP, and to make backend servers aware of the capabilities of the full connection path.

As it also adds the hostname of proxy, and some optional information about the specific product and version of the proxy, it may seem pretty similar for X-Forwarded-For and Forwarded, but the Via header is more for information/debugging or identifying and working around buggy proxies.

And others…

X-Forwarded-For is widely supported and easily the most popular solution to this problem, but it's definitely not the only one. In addition to the relatively standard options above, there's a few other non-standard headers available, which do similar things in certain contexts:

X-Real-IP: how this is used varies quite a bit - some systems use it to forward just a single IP that they consider the 'real' client, some use it just like XFF, and who knows what else. This is rarely supported, and generally not recommended unless you're sure you know what it means in your context.
CF-Connecting-IP: Cloudflare's version of the real IP header, listing the client IP specifically.
Ali-CDN-Real-IP: Alibaba Cloud CDN's own version of this, potentially with minor variations (hard to tell).
X-NF-Client-Connection-IP: this is Netlify's officially recommended header to get the client IP (they do not support XFF or others, even though they might still appear in your request headers!)
X-Vercel-Forwarded-For: Vercel's version of XFF, with some additional (but unclear) extra validation
X-Vercel-IP-{...}: a set of Vercel headers for the client's country, city, timezone, and other details their edge servers automatically infer. More info in their docs.
Cache-Status, X-Cache-Status, X-Served-By, etc: this is the same concept in reverse. This header is included in responses, and provides response path information for clients, letting them know which caches processed the request, and whether the response came fresh directly from the backend server or from a cache en route. For more information, see New HTTP Standards for Caching.

Summary

Now you know what X-Forwarded-For and Forwarded are for, and can leverage that knowledge to deliver localized content, support rate limiting, flag likely fraud, or even serve content from the closest server to reduce the carbon impact of your data transmission.

Use these powers wisely, and remember that this depends on close integration between backend & deployment infrastructure development, so don't sneak them into changes without talking to the rest of your team first!

Want to inspect real clients' headers for yourself right now, or try spoofing your own headers and see what happens? Fire up HTTP Toolkit, intercept & debug your Android/web/Docker/other traffic, set some breakpoint rules to modify it, and explore this up close.

Working with the new Idempotency Keys RFC

Tue, 12 Dec 2023 15:00:00 GMT

Idempotency is when doing an operation multiple times is guaranteed to have the same effect as doing it just once.

When working with APIs this is exceptionally helpful on slow or unreliable internet connections, or when dealing with particularly sensitive actions such as payments, because it makes retrying operations safe and reliable. This is why most payment gateways like Stripe and Adyen support 'idempotency keys' as a key feature of their APIs.

Recently, the IETF have gone further, and created a draft RFC standard for this useful common pattern, as part of the 'Building Blocks for HTTP APIs' working group. This is technically still a draft and the details could change, but it's fairly mature now and increasingly widely used, so it's a good time to take a closer look, and start using & implementing it for yourself.

Idempotency in HTTP APIs

Many HTTP methods are defined as idempotent in all cases. In theory, any GET, HEAD, PUT, DELETE, OPTIONS, or TRACE operation can be executed multiple times without any unintended side effects (though for badly behaved APIs your mileage may vary).

The idea is that an HTTP request like DELETE /users/123 clearly wants to delete that user, and if that accidentally happens twice then that's just fine. User 123 ends up deleted just the same.

It's a lot more complicated for POST and PATCH requests, which do not provide that same level of confidence out of the box. These are designed to allow non-idempotent operations, like adding a new user, sending a payment, or appending to existing data. Those are important use cases too - sometimes you really do want to send the same thing twice, and have it happen twice - but this can cause problems when things go wrong.

When sending POST and PATCH requests to modify server state, if you want to support retries then both the client and server need to explicitly handle this - which is exactly what idempotency keys are designed to allow you to do.

How can non-idempotency go wrong?

Without idempotency keys, whether or not a client wants a repeated request to be executed twice or not is usually impossible to guess. For example, you might build a payment-sending client app, which has a timeout set to 2 seconds like this:

await axios.post(
  '/payments',
  { to: 'user@example', value: 2000 },
  { timeout: 2000 }
);

If that doesn't complete within 2 seconds, the HTTP client is typically going to show the request as failed, and ask you if you'd like to try again.

Unfortunately though, in this scenario the client doesn't actually know whether the server has successfully completed the payment. It was running a little slow, sure, but maybe it sent the payment and just hadn't quite finished spitting out the JSON yet. Meanwhile, the server has no idea the client gave up and told the end-user the payment failed. The server thinks the payment is done, but the client does not. This creates a problem.

The server could try to write some logic to detect duplicate payments, but is that something you always want to automatically block?

POST /payments

{
  "amount": 5.00,
  "to": "julian1342",
  "reason": "cider"
}

If the server receives this request twice, it could be a failed retry, or it could just as easily be somebody paying Julian for a second cider.

Idempotency keys to the rescue

Idempotency keys are how you give everyone some clarity here. They let the client specify explicitly whether they are reattempting a failed request, or doing a whole new operation.

Here's an example of that in the Stripe API, using curl and setting an Idempotency-Key header:

curl https://api.stripe.com/v1/charges \
  --user sk_test_4eC39HqLyjWDarjtT1zdp7dc: \
  --header "Idempotency-Key: uWeBuDsZPxxvdhND" \
  --data amount=2000 \
  --data currency=usd \
  --data source=tok_mastercard \
  --data-urlencode description="Creating a charge"

After this request is completed, any future requests with the same idempotency key will not create a new charge - they'll return the saved response from the previous request instead. Because of this, if a client isn't sure whether the server received their request, they can always safely retry it.

This is a standard pattern, but it's just a header and it's entirely implementation agnostic, so anyone can implement this logic to make life easier for everyone involved - the API developers, the API clients, and the end users.

How can you get started with idempotency keys?

To practice working with idempotency keys here, we'll build a small example showing how this works for both the server and client.

Let's start with a basic HTTP API written using Express, with a single POST endpoint that doesn't yet implement idempotency at all:

const express = require("express");
const app = express();
const port = process.env.PORT || 3001;

app.post("/things", (req, res) =>
  res.json({
    message: `Created a new thing: ${Math.random() * 100}`,
  })
);

app.listen(port, () => console.log(`Listening on port ${port}`)

Every time a client calls POST /things it's going to create a brand new thing, and we can pretend this is being saved in the database then returned like a real API might do.

For now it's just creating a random number, so we can spot each unique operation that's executing.

Sending three requests here means we get three different responses as you would expect.

$ curl -XPOST http://localhost:3001/things
{"message":"New random number: 88.81742408158198"}

$ curl -XPOST http://localhost:3001/things
{"message":"New random number: 53.13198035021438"}

$ curl -XPOST http://localhost:3001/things
{"message":"New random number: 99.87427689891224"}

Every request does a new operation, and it's impossible for a client to safely retry operations that may have failed.

Now to add some idempotency logic to our API. Thankfully Express, like most good web application frameworks, supports middleware.

Middleware lets us wrap our logic, inspecting requests and changing responses, adding all sorts of handy functionality like caching, rate limiting, and authentication, without having to clog up our routes with all that logic.

Instead of needing to write our own middleware, we can use express-idempotency for this (similar packages exist for most other web frameworks and languages).

Let's update our API to use that:

// Load up the middleware
const idempotency = require("express-idempotency");

// Register idempotency middleware
app.use(idempotency.idempotency());

// Updated route
app.post("/things", (req, res) => {
  // Check if there was a hit!
  const idempotencyService = idempotency.getSharedIdempotencyService();
  if (idempotencyService.isHit(req)) {
    return;
  }

  res.json({
    message: `New random number: ${Math.random() * 100}`,
  });
});

This code registers a middleware on all POST endpoints, then inside the route we're checking to see if the key is a hit, and returning early if so, to avoid rerunning the logic if this is indeed a repeated request.

Let's give it a whirl, first with a series of intentionally repeated (non-idempotent) operations:

$ curl -XPOST http://localhost:3001/things
{"message":"New random number: 43.41405619983338"}

$ curl -XPOST http://localhost:3001/things
{"message":"New random number: 42.39378310046617"}

$ curl -XPOST http://localhost:3001/things
{"message":"New random number: 22.522332987290937"}

Here as expected, the API still does a new operation every time, because we haven't set an Idempotency Key. The API is doing the right thing and treating every request as a new transaction, which is exactly what you want if you're adding idempotency keys to an existing API and don't want to break things for existing clients.

To use our new feature though, we need to pass an Idempotency-Key field, and set that to anything to let the API know it should reuse the response.

$ curl -XPOST http://localhost:3001/things -H 'Idempotency-Key: cheesecake'
{"message":"New random number: 53.12434895091507"}

$ curl -XPOST http://localhost:3001/things -H 'Idempotency-Key: cheesecake'
{"message":"New random number: 53.12434895091507"}

$ curl -XPOST http://localhost:3001/things -H 'Idempotency-Key: cheesecake'
{"message":"New random number: 53.12434895091507"}

Success! The API is noticing that we intended for this transaction to be the same thing, so it's skipping all the business logic and just sending the same thing over.

A real client application would set an idempotency key with a truly unique value, and it would do this based on the user's actions. When a user loads up a form in the interface, a new key should be set, and this is how you avoid a genuine "user tried to do a thing twice" getting confused with a retry.

If the client also happens to be written in JavaScript, it might look a bit like this:

const { v5: uuidv5 } = require('uuid');

// Where a form loads up
const idempotencyKey = uuidv5();

// Elsewhere in the application
fetch("https://example.org/api/things", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Idempotency-Key": idempotencyKey,
  },
  body: JSON.stringify(req.body),
})
.then((response) => response.json())

Exactly when the key changes depends on your UX. Do you have a persistent form where the user expects a new operation every time they submit the form? In that case, you'll want to reset the key after each successful submit. Or is the user trying to do a single operation, and only ever resubmitting to retry it? In this case, you'll need to preserve the same key throughout.

Which case applies for you should be fairly clear from your existing UX, but either case is valid, so do think carefully about how these keys need to be managed in your case.

Making PATCH idempotent

All the examples here have used POST, but idempotency keys can help with PATCH as well. This is often less important (many uses of PATCH requests are idempotent by design) but because that's not strictly required by the HTTP specificiation, you do need to think about this when using PATCH.

Idempotency keys are especially relevant if you're using JSON Patch or something similar, which defines atomic actions like incrementBy: 1 that you can use to atomically increment some data on the server with a PATCH request. That's an alternative to reading existing data with a GET request, incrementing it client side, and sending the update value with a new POST request, which creates a risk of race conditions if somebody else modifies the data between the requests. This is useful, but are notably not idempotent - sending the request twice will clearly increment your value twice!

Idempotency keys work just as well for this case too though. Just remember that the same caveats apply, and you'll need to think similarly about when your idempotency keys change, and when they're preserved. If you're intentionally sending repeated requests to increment a value, and you accidentally repeat the same idempotency key, all your many increments will silently collapse into a single operation, which won't be what you expect.

Distributed idempotency with data adapters

The idempotency keys in the examples above are stored and checked in memory, which is fine for a demo, but not so good in production as you might have multiple servers behind a load balancer, each with independent local state (so that a request with an idempotency key already seen by one server won't be recognized as a repeat if the same request is received by the other server).

Similar issues apply during deployments, even with just one server, as all idempotency keys held in memory will be lost.

In these production scenarios, you would be better off using a data store like Redis, which can provide a shared temporary storage that's available from all servers, and independent of your server deployments.

The first step to doing this is to check whether the idempotency implementation you are using has a pre-built data adapter to support this, before you go building one yourself. In this instance, the express-idempotency-redis-adapter is exactly what we need, so that will fix this for us nicely:

const idempotency = require("express-idempotency");
const RedisAdapter = require("express-idempotency-redis-adapter").default;

// New Redis Adapter that will be used by the idempotency middleware
const adapter = new RedisAdapter({
  connectionConfig: { url: "redis://localhost:6379" },
});

adapter.connect().catch((error) => {
  throw error;
});

// Add the idempotency middleware by specifying the use of the redis adapter
app.use(
  idempotency.idempotency({
    dataAdapter: adapter,
  })
);

This tells express-idempotency to cache all idempotency keys and responses into Redis. If we run multiple servers using this code, and even if we restart these servers, the idempotency keys and their cached responses will still be persisted, and duplicate operations will be avoided.

Security considerations of idempotency keys

It's important to realise that there are risks here! Reused keys will reuse responses, regardless of the source, so if two different client A and client B both send Idempotency-Key: same-thing it might reuse the client A's response for client B.

This could expose sensitive data in the response. Even if it doesn't, it's easy for this to expose the existence of a previous operation that should be secret (has Alice ever paid Bob $1000?)

Fortunately, there's a few ways to avoid this.

The first step is to ensure clients use sufficiently random values (such as UUIDs) for their idempotency keys. Using large enough unguessable values makes it effectively impossible for multiple clients to clash on the same key. In addition, idempotency keys should expire after a fairly short time (hours or days, not years) so they cannot be reused indefinitely. This also helps avoid memory issues with cached response data.

Lastly, when you see a matching idempotency key, before trusting it you should also compare the contents of the new request with the previous request, before reusing the response. This makes it impossible to reuse a known key without already knowing the full request too. Express-idempotency will do this for you automatically, and you can also do it efficiently yourself elsewhere by hashing requests alongside the key.

Your authentication & rate limiting logic should also help to make brute forcing this even harder. If you register multiple middlewares you can make sure those run happening before the idempotency middleware, or if you run the code yourself in controllers/routes then you can run the security checks before you check for idempotency keys, making it impossible to anonymously brute force these values.

However you go about it, just be aware that you'll need to protect against this.

On the client side, it's also important to remember that idempotency keys are intended as a best-efforts feature, but not a 100% guarantee. Most servers don't suppor this yet, for starters, and even for servers that do explicitly support idempotency keys, you can't safely assume that a request retried days later won't duplicate a previous operation just because you saved the idempotency key. Quick retries should be safe, but it's good to have a fallback plan for other cases where it doesn't work out, such as explicitly checking whether an operation already occurred before retrying if a notable amount of time has passed.

Extremely sensitive operations that cannot be undone should probably not use idempotency keys as their only safety mechanism, unless you are very confident the server can provide hard guarantees around this.

A brief introduction to OpenAPI

Tue, 28 Nov 2023 09:30:00 GMT

It's hard to work on APIs without hearing about OpenAPI.

OpenAPI is an API description format, which is essentially metadata that describes an HTTP API: where it lives, how it works, what data is available, and how it's authenticated. Additional keywords can be used to provide all sorts of validation information, adding a type system to what would otherwise just be arbitrary JSON flying around the internet.

OpenAPI has been around for donkeys years, previously known as Swagger but renamed to OpenAPI in 2016. It's powered by JSON Schema, which is also pretty popular in certain circles, but it's only in the last few years that OpenAPI has solidified its place as the description format for HTTP APIs, pushing aside others like RAML and API Blueprint.

Elder developers will remember working with WSDLs and XML Schema, and gRPC and GraphQL folks might be thinking "hang on this sounds a bit familiar", and absolutely. Type systems for APIs are pretty common, but here's an excellent one you can use for your REST/RESTish API.

Here's an example to give you an idea:

openapi: 3.1.3

info:
  title: Your Awesome API
  version: '1.0.3'
  description: More information and introduction.

paths:
  /things:
    post:
      summary: Create a thing
      requestBody:
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  type: string
                  examples:
                    - Tim

      responses:
        '201':
          description: "Created"
          content:
            application/json:
              schema:
                type: object
                properties:
                  id:
                    type: string
                    format: uuid
                  name:
                    type: string
                    examples:
                      - Tim
                  created_at:
                    type: string
                    format: date-time
                    example: 2020-01-01T00:00:00Z

This describes an API in a machine-readable format, including overall metadata, endpoint paths, request formats, and the details of possible responses you might receive.

What can OpenAPI do?

OpenAPI specifications provide a machine-readable base on top of which lots of neat API tools can be used and even generated.

One of the most common uses by many API teams to to generate API reference documentation, which helps end-users learn about the API in the same sort of way you'd look up functions and classes to work with a code package. Tools like Redoc make it possible to create beautiful API documentation sites automatically, directly from the OpenAPI spec file:

API developers and API end-users all find this pretty helpful, but increasingly OpenAPI is being used throughout the entire API lifecycle.

OpenAPI can be used for validation, to power contract testing and do server-side & client-side validation, and you can use it to generate SDKs, backend server stubs, or even realistic mock servers so that clients can play around with the API before it's even built!

As one example, HTTP Toolkit uses OpenAPI internally to automatically understand requests to a huge list of public APIs with published OpenAPI specifications. For each request, using the OpenAPI spec HTTP Toolkit can validate the request parameters, and show metadata alongside the raw request & response information, so you can easily understand what an API response actually means and jump straight from a request to the corresponding documentation.

Similarly, you can also load your OpenAPI spec into HTTP Toolkit and other alternative tools, adding your own metadata and validation to help you debug intercepted traffic & requests.

Before getting too much further into use cases though, how does OpenAPI actually work?

How does OpenAPI work?

OpenAPI Documents

OpenAPI generally exists as a YAML or JSON document usually called something like openapi.yaml. The simple example above showed how to describe a POST with a response, but you can do a lot more, describing any HTTP method, and defining path parameters, query string parameters, and headers, providing their validation rules too if you like.

openapi: 3.1.3

info:
  title: Widget API
  description: The worlds best collection of Widgets.
  version: '1.1.0'

paths:
  /widgets/{uuid}:
    get:
      operationId: fetch-widget
      description: Fetch a Widget

      parameters:
        - name: uuid
          in: path
          required: true
          description: A unique identifier that each Widget has to help you look it up.
          schema:
            type: string
            format: uuid

      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                type: object
                properties:
                  other-fields:
                    type: string

This is a very simplistic API, but regardless how complicated your API is, OpenAPI can describe it, with all sorts of powerful keywords covering the vast majority of needs. We'll just provide a basic introduction here, but you can look through the full OpenAPI reference for more details as required.

An OpenAPI document is split into four key sections: info, paths, webhooks, and components.

info

The info section establishes general information about the API, helping people find support if they need it, learn about the license, and read a whole introduction.

openapi: 3.1.3
info:
  title: Your Awesome API
  version: '1.0.3'
  description: >
    More information, getting started, etc. *with Markdown!*
  contact:
    name: Who Owns the API
    url: https://www.example.org/support
    email: support@example.com
  license:
    name: Apache 2.0
    url: https://www.apache.org/licenses/LICENSE-2.0.html

The description which can be quite extensive and powered by CommonMark (standardized Markdown), so when it's picked up by API documentation tools it's like a little getting started guide (if you don't have one of those elsewhere).

paths

This is the most important section, it helps outline all the endpoints (resources) of the API. It covers HTTP headers (a.k.a HTTP fields), parameters, and points to which authentication schemes are involved if any.

Using the Tic Tac Toe example from OpenAPI Initiative, you can define multiple methods per path.

paths:
  /board:
    get:
      ...
    put:
      ...

Each of these HTTP methods has an "Operation", which looks a bit like this:

paths:
  /board:
    get:
      summary: Get the whole board
      description: Retrieves the current state of the board and the winner.
      operationId: get-board
      responses:
        "200":
          description: "OK"
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/status"

The $ref is a Reference to another component, which helps us reduce repetition in the OpenAPI document, and we'll explain that a bit in a moment.

Find out more about Paths on Learn OpenAPI: API Endpoints.

webhooks

A webhook is a way for two systems to communicate in real-time. Instead of an API client repeatedly making requests to the other for updates, the API client provides a URL to the API, and the API will send a HTTP request to that URL when a relevant event occurs.

Describing webhooks is almost identical to describing paths, but instead of describing a request that comes from the API client, and a response made by the API provider, you flip that around: The API provider will send the API client a request, and the API client should respond with a response as described in the webhook.

If a Tic Tac Toe game had a webhook, maybe it is letting another backend system know who won a game.

openapi: 3.1.3

webhooks:
  # Arbitrary name for the webhook
  newThing:
    post:
      requestBody:
        description: "A game was completed"
        content:
          application/json:
            schema:
              type: object
              properties:
                winner: https://example.org/api/users/Tim
                loser: https://example.org/api/users/Phil
                duration: 30
      responses:
        '200':
          description: "OK"

For a webhook, the requestBody is now explaining what HTTP body the API client can expect in the webhook messages, and the responses section now explains that the API client should return a 200 in order to mark it as a success.

components

Components allow you to define schemas, headers, parameters, requestBodies, responses, and other reusable bits of OpenAPI to be used in multiple places.

components:

  schemas:
    coordinate:
      type: integer
      minimum: 1
      maximum: 3

  parameters:
    rowParam:
      name: row
      in: path
      required: true
      schema:
        $ref: "#/components/schemas/coordinate"
    columnParam:
      name: column
      in: path
      required: true
      schema:
        $ref: "#/components/schemas/coordinate"
paths:
  /board/{row}/{column}:
    parameters:
      - $ref: "#/components/parameters/rowParam"
      - $ref: "#/components/parameters/columnParam"

This keeps things nice and tidy, and you can even spread them across multiple documents to share components between multiple APIs, or at least just keep your file sizes down.

Find out more about Components on Learn OpenAPI: Reusing Descriptions.

Schema Objects

Within each of these sections, the schema keyword is used to describe types, very similar to XML Schema, protobuff, or gRPC, but with a whole lot more options. The latest version of OpenAPI (v3.1.0) is specifically powered by JSON Schema Draft 2020-12.

It's important to understand these JSON schemas to be able to interpret or write complex OpenAPI types, but fortunately they're mostly quite readable, and often self-explanatory even if you're not already familiar with them.

A full example looks like this:

description: A representation of a movie
type: object
required:
- title
- director
- releaseDate
properties:
  title:
    type: string
  director:
    type: string
  releaseDate:
    type: string
    format: date
  genre:
    type: string,
    enum:
    - Action
    - Comedy
    - Drama
    - Science Fiction
  duration:
    type: string
  cast:
    type: array
    items:
      type: string

Head over to the JSON Schema website for an in-depth intro.

How can you use OpenAPI as an API developer?

Creating this OpenAPI description might seem like extra work, but by being able to generate beautiful interactive documentation, reduce the redundancy in contract tests, automatically build SDKs, and even reduce how much code needs to be written for validation, overall you'll save a lot more time that you ever put in.

The most common workflow is to create it manually with text editors or graphical editors. Whilst this is a fair bit of up-front work, it can then be used to reduce repetitive tasks throughout the rest of the API Lifecycle, like contract testing and server-side validation. This is known as the API Design-first workflow.

Whilst most OpenAPI is built up-front, there are plenty of options for other workflows. OpenAPI can generated from code via annotations or educated guesses. This is known as code-first (or contract-first).

Other workflows can be guesstimated from HTTP traffic, which is not a great ongoing solution, but it can be handy for "catching up" when you've built a whole API and need to get OpenAPI generated for it to match up with the rest of the company having nice OpenAPI-based documentation and testing.

It can also be converted from other formats like Postman Collections or HAR, using something like API Transformer or via OpenAPI conversion tools.

Having this machine-readable API description sitting around in the source code means your API code and API description are always being updated with each pull request, and by powering your contract testing you know it's accurate.

How can you use OpenAPI as an API consumer?

OpenAPI is often discussed as if it's useful to API developers only, but it has plenty of benefits for API consumers too. You can use OpenAPI to more easily understand an API you'll be using and the responses it'll give you, to validate your requests and provide mocking during testing, to boost the power of your HTTP debugging tools, or generate your own libraries & SDKs to more easily interact with servers.

For example, if you are working with API developers using the API design-first workflow, they'll likely get you involved earlier on to work with mock servers. This might be a hosted mock server they control, or perhaps they'll give you the OpenAPI and let you run your own mock server, which could be as simple as running:

npm install -g @stoplight/prism-cli
prism mock openapi.yaml

Now you'll have a pretend API running locally that you can try integrating with, and whilst it might not have any business logic it'll help you make sure your code is about right, and identify where data is missing for the sort of work you're trying to do, before they waste loads of your time building something that doesn't work for you.

If the team doesn't provide an OpenAPI spec yet, as quick fix you can even record your own just from sniffing HTTP traffic.

Going forwards with OpenAPI

I hope that's been an interesting introduction! If you'd like some real examples, take a look at these large companies' downloadable versions:

OpenAPI can be complicated to get started with, but it's useful on both the front & back-end, for testers, developers and others, so it's well worth understanding. Take a look at the above examples for some real-world cases, and feel free to get in touch if you have any questions.

6 ways to debug an exploding Docker container

Tue, 31 Oct 2023 14:00:00 GMT

Everything crashes.

Sometimes things crash when they're running inside a Docker container though, and then all of a sudden it can get much more difficult to work out why, or what the hell to do next.

Docker's great, but it's an extra layer of complexity that means you can't always easily poke your app up close any more, and that can really hinder debugging when your container fails to start or breaks in unusual ways.

If you're stuck in that situation, here are my goto debugging commands to help you get a bit more information on exactly what's up:

docker logs

Hopefully you've already tried this, but if not: start here. This'll give you the full STDOUT and STDERR command-line output from the command that was run initially in your container. You can also use docker attach to stream the live logs from an active container, if you want to keep an eye on the output as it runs.
docker stats

If you just need to keep an eye on the metrics of your container to work out what's gone wrong, docker stats can help: it'll give you a live stream of resource usage, so you can see just how much memory you've leaked so far and easily spot if your CPU usage is way out of control.
docker cp :/path/to/useful/file /local-path

Often just getting hold of more log files is enough to sort you out. If you already know what you want, docker cp has your back: copy any file from any container back out onto your local machine, so you can examine it in depth (especially useful analysing heap dumps).
docker exec -it /bin/bash

Next up, if you can run the container (if it's crashed, you can restart it with docker start ) then you can use this command to oppen a command line shell inside the container directly, and start digging around for further details by hand.
docker commit my-broken-container && docker run -it my-broken-container /bin/bash

Can't start your container at all? If your container starts and then immediately shuts down, then your initial command or entrypoint is immediately crashing. This can make your container extremely hard to debug, because you can't shell in any more or run any other commands inside the container.

Fortunately, there's a workaround: you can save the state of the shutdown container as a new image (with docker commit) and then start that image using with a different command (e.g. /bin/bash) to open a shell inside the container without the broken command running at all.

Have a failing entrypoint instead? There's an entrypoint override command-line flag too.
Inspect & modify network traffic with HTTP Toolkit for Docker.

HTTP Toolkit is an open-source tool to help with debugging Docker network traffic - if you think there might be any kind of HTTP traffic from your container that could help shed some light on what it's doing, you can relaunch your container with HTTP interception enabled in one click, and instantly capture all the HTTP, HTTPS and WebSocket messages that get sent before it crashes. You can even breakpoint & rewrite traffic, to see if you can modify the responses to stop your container crashing manually.

I hope that helps you out! Join the mailing list below if you're interested in more debugging & HTTP posts, and do get in touch on Mastodon, on Twitter or directly if you have suggestions for more debugging tips that should be included here.

Want to debug or test HTTP(S) from command line tools, backend servers, websites or even mobile apps? Download HTTP Toolkit for free to see & modify all your traffic in one click.

New ways to inject system CA certificates in Android 14

Thu, 21 Sep 2023 12:00:00 GMT

A couple of weeks ago I published a post about changes in Android 14 that fundamentally break existing approaches to installing system-level CA certificates, even with root access. This has triggered some fascinating discussion! I highly recommend a skim through the debate on Mastodon and Hacker News.

Since that was posted, quite a few people have talked to me about possible solutions, going beyond the previous approaches with new mechanisms that make it practical to do this in Android 14+, and there are some good options here.

While direct root access to change these certificates by simply writing to a directory is indeed no longer possible, root is root, and so with a bit of work there are still some practical & effective ways to dig down into the internals of Android and seize control of these certs once more.

Choose your own adventure:

If you just want to intercept an Android 14+ device right now, stop reading this, download the latest HTTP Toolkit, connect your device to ADB, click the 'Android Device via ADB' interception option for automatic setup, and dive into your traffic.

If you just want to know the steps to manually do system certificate injection on Android 14 for yourself, jump down to How to install system CA certificates in Android 14.

If you want the full background, so you can understand how & why this all works, read on:

Clearing up confusion

Before digging into this, I do want to explicitly clear up a few misunderstandings that I've seen repeatedly pop up from the previous article:

These changes don't affect installation of CAs in other scenarios. As far as I'm aware, CA installation for fully managed enterprise-provisioned devices and the limited user-installed (as opposed to system-level) CA certificates will continue functioning as before. If you're not using root access to inject system-level CA certificates into a rooted device or emulator, you don't need to worry about this.
Similarly, it is still possible to soft-remove system CA certificates using the existing toggle in the Settings UI.
This does affect AOSP too, and so will presumably affect most alternative distributions unless they actively disable it.
Yes, Android root access is still all powerful - it's not literally true that Android has made modifying certificates impossible, they've just blocked doing so directly. In the extreme case, you can build your own Android images from scratch without these changes, and even without that, root access provides many mechanisms to change system behaviour. Nonetheless, moving from 'write to a directory' to 'technically possible via largely undocumented internals' is a meaningful problem, and any divergence from mainline Android behaviour means more problems later (not least, maintaining that divergence as Android keeps evolving). Removing users' ability to directly modify system configuration reduces practical user control over their devices.
No, this is not likely to be an explicit manouever on the part of Google. I think this is just thoughtless collateral damage (and yes, the primary goal of the actual change is a good thing!). Even unintentionally though, it's not great that a key workflow for Android privacy & security research has been thoughtlessly broken, nor that major friction has been created for users' control of their own devices. To my eyes, whilst Google aren't actively blocking those use cases, they're very much 'not supported' WONTFIX scenarios, and so impacts like this are sadly just not considered.

Under the hood

The debate around all this has lead me on a fascinating exploration, delving into the internals of Android. To recap the problem quickly:

Until now, system-trusted CA certificates lived in /system/etc/security/cacerts/. On a standard AOSP emulator, those could be modified directly with root access with minimal setup, immediately taking effect everywhere.
In Android 14, system-trusted CA certificates will generally live in /apex/com.android.conscrypt/cacerts, and all of /apex is immutable.
That APEX cacerts path cannot be remounted as rewritable - remounts simply fail. In fact, even if you unmount the entire path from a root shell, apps can still read your certificates just fine.
The alternative technique of mounting a tmpfs directory over the top also doesn't work - even though this means that ls /apex/com.android.conscrypt/cacerts might return nothing (or anything else you like), apps will still see the same original data.

So, what's going on here?

The key is that Android apps are containerized - much like Docker, Android uses Linux namespaces to isolate each app's view & access to the wider system they're running on. There's a few elements to this, but from our point of view, the interesting point is that each app has its own mount namespace which means that they see different mounts to what's visible elsewhere.

You can test this out for yourself, on a rooted Android device:

Open a root shell (e.g. adb shell, su)
Run mount to see what's mounted (from the perspective of the shell process)
Run ps -A
Find a process id for a normal running Android app from the list
Run nsenter --mount=/proc/$PID/ns/mnt /bin/sh - This opens a shell inside that process's mount namespace
Run mount again
Note that the results are quite different!

For example, inside an app's mount namespace, you'll see different mounts including /dev/block/dm-{number} resources mounted on /data/misc/profiles/ref/{your.app.build.id} and tmpfs mounts over /data/user and other directories.

Normally, this doesn't matter. By default, mounts are created with 'SHARED' propagation, meaning that changes to mounts immediately within that path will be propagated between namespaces automatically (i.e. the mount namespaces are not fully isolated) so that everybody sees the same thing (on Linux, you can check the propagation of your mounts with findmnt -o TARGET,PROPAGATION).

This is the case for / on Android, and most other mounts, so for example in a root shell you can do this:

mkdir /data/test_directory - Create an empty target directory
mount -t tmpfs tmpfs /data/test_directory - Mount a writable temporary in-memory filesystem there
And then, for both your normal shell and in an app's mount namespace (via nsenter as above) mount will show this: bash tmpfs on /data/tmp_dir type tmpfs (rw,seclabel,relatime) I.e. both your ADB shell and all other processes on the device can see and use this mount, just as you'd expect

Unfortunately though, that's not the case for APEX. The /apex mount is explicitly mounted with PRIVATE propagation, so that all changes to mounts inside the /apex path are never shared between processes.

That's done by the init process which starts the OS, which then launches the Zygote process (with a new mount namespace copied from the parent, so including its own private /apex mount), which then in turn starts each app process whenever an app is launched on the device (who each in turn then copy that same private /apex mount).

This means from an ADB shell's mount namespace, given this private mount, it's impossible to directly make changes to APEX mounts that will be visible to apps on the device. Since all APEX mounts are read-only, that means you can't directly modify how any of that filesystem appears for your running apps - you can't remove, modify or add anything. Uh oh.

So, for those of us trying to inject certificates, or indeed change any other APEX content, how do we solve this? In a perfect world, there's a few specific things we're looking for here, which the previous pre-14 Android solution did provide:

Being able to add, modify and remove CA certificates from a device
Not needing to inconveniently reboot the device or restart apps
Having those changes become visible to all apps immediately
Being able to do so quickly & scriptably, for easy certificate management and tool integrations

It turns out there's at least two routes that tick those boxes:

Option 1: Bind-mounting through NSEnter

The key to this is the first caveat in the above paragraph: we can't solve this from an ADB's shell's mount namespace. Fortunately, using nsenter, we can run code in other namespaces! I've included a full script you can blindly run this for yourself below, but first let's talk about the steps that make this work:

First, we need set up a writable directory somewhere. For easy compatibility with the existing approach, I'm doing this with a tmpfs mount over the (still present) non-APEX system cert directory: bash mount -t tmpfs tmpfs /system/etc/security/cacerts
Then you place the CA certificates you're interested in into this directory (e.g. you might want copy all the defaults out of the existing /apex/com.android.conscrypt/cacerts/ CA certificates directory) and set permissions & SELinux labels appropriately.
Then, use nsenter to enter the Zygote's mount namespace, and bind mount this directory over the APEX directory: bash nsenter --mount=/proc/$ZYGOTE_PID/ns/mnt -- \ /bin/mount --bind /system/etc/security/cacerts /apex/com.android.conscrypt/cacerts The Zygote process spawns each app, copying its mount namespace to do so, so this ensures all newly launched apps (everything started from now on) will use this.
Then, use nsenter to enter each already running app's namespace, and do the same: bash nsenter --mount=/proc/$APP_PID/ns/mnt -- \ /bin/mount --bind /system/etc/security/cacerts /apex/com.android.conscrypt/cacerts Alternatively, if you don't mind the awkward UX, you should be able to do the bind mount on init itself (PID 1) and then run stop && start to soft-reboot the OS, recreating all the namespaces and propagating your changes everywhere (but personally I do mind the awkward reboot, so I'm ignoring that route entirely).

Bingo! Every app now sees this mount as intended, with the contents of your own directory replacing the contents of the Conscrypt module's CA certificates.

Actually doing this in practice takes a little more Bash scripting trickery to make it all run smoothly, like automatically running all those app remounts in parallel, managing permissions & SELinux labels, and dealing with Zygote vs Zygote64 - see the full script below for a ready-to-go demo.

Option 2: Recursively remounting mountpoints

The second solution comes from infosec.exchange/@g1a55er, who published their own post exploring the topic. I'd suggest you read through that for the full details, but in short:

You can remount /apex manually, removing the PRIVATE propagation and making it writable (ironically, it seems that entirely removing private propagation does propagate everywhere)
You copy out the entire contents of /apex/com.android.conscrypt elsewhere
Then you unmount /apex/com.android.conscrypt entirely - removing the read-only mount that immutably provides this module
Then you copy the contents back, so it lives into the /apex mount directly, where it can be modified (you need to do this quickly, as apparently you can see crashes otherwise)
This should take effect immediately, but they recommend killing system_server (restarting all apps) to get everything back into a consistent state

As above - this is a neat trick, but it's not my work! If you have questions on that do get in touch with @g1a55er directly.

Note that for both these solutions, this is a temporary injection - the certificates only last until the next reboot. To do this more permanently, you'll need to permanently modify the mount configuration somehow. I haven't investigated that myself (for testing & debugging use cases, automated temporary system re-configuration is much cleaner) but if you find a good persistent technique do please get in touch and I'll share the details here for others.

How to install system CA certificates in Android 14

So, putting that together, what do you need to do in practice, to actually inject your system-level CA certificate in Android 14?

First, copy your CA certificate onto the device, e.g. with adb push $YOUR_CERT_FILE /data/local/tmp/$CERT_HASH.0. You'll need the certificate hash in the filename, just as you did in previous OS versions (see my implementation of this here if you're not sure).

Then run the below, replacing $CERTIFICATE_PATH with the path on the device (e.g. /data/local/tmp/$CERT_HASH.0) for the cert you want to inject:

# Create a separate temp directory, to hold the current certificates
# Otherwise, when we add the mount we can't read the current certs anymore.
mkdir -p -m 700 /data/local/tmp/tmp-ca-copy

# Copy out the existing certificates
cp /apex/com.android.conscrypt/cacerts/* /data/local/tmp/tmp-ca-copy/

# Create the in-memory mount on top of the system certs folder
mount -t tmpfs tmpfs /system/etc/security/cacerts

# Copy the existing certs back into the tmpfs, so we keep trusting them
mv /data/local/tmp/tmp-ca-copy/* /system/etc/security/cacerts/

# Copy our new cert in, so we trust that too
mv $CERTIFICATE_PATH /system/etc/security/cacerts/

# Update the perms & selinux context labels
chown root:root /system/etc/security/cacerts/*
chmod 644 /system/etc/security/cacerts/*
chcon u:object_r:system_file:s0 /system/etc/security/cacerts/*

# Deal with the APEX overrides, which need injecting into each namespace:

# First we get the Zygote process(es), which launch each app
ZYGOTE_PID=$(pidof zygote || true)
ZYGOTE64_PID=$(pidof zygote64 || true)
# N.b. some devices appear to have both!

# Apps inherit the Zygote's mounts at startup, so we inject here to ensure
# all newly started apps will see these certs straight away:
for Z_PID in "$ZYGOTE_PID" "$ZYGOTE64_PID"; do
    if [ -n "$Z_PID" ]; then
        nsenter --mount=/proc/$Z_PID/ns/mnt -- \
            /bin/mount --bind /system/etc/security/cacerts /apex/com.android.conscrypt/cacerts
    fi
done

# Then we inject the mount into all already running apps, so they
# too see these CA certs immediately:

# Get the PID of every process whose parent is one of the Zygotes:
APP_PIDS=$(
    echo "$ZYGOTE_PID $ZYGOTE64_PID" | \
    xargs -n1 ps -o 'PID' -P | \
    grep -v PID
)

# Inject into the mount namespace of each of those apps:
for PID in $APP_PIDS; do
    nsenter --mount=/proc/$PID/ns/mnt -- \
        /bin/mount --bind /system/etc/security/cacerts /apex/com.android.conscrypt/cacerts &
done
wait # Launched in parallel - wait for completion here

echo "System certificate injected"

The corresponding change to fully automated this in HTTP Toolkit is here.

In my testing, this works out of the box on all the rooted official Android 14 emulators, and every other test environment I've managed to get my hands on (if you have a case that doesn't work, please get in touch!). With a few tweaks (see the commit above) it's possible to build this into a single script that works out of the box for all modern Android devices, down to at least Android 7. When actually running this in practice, entering and remounting certificates within every running app on a device seems to take comfortably less than a second, so this fits nicely within the acceptable time for automated device setup time in my use cases.

This has all come together just in time, since at the time of writing Android 14 is on its (likely) final beta release with a full launch coming within weeks. HTTP Toolkit users will have automated setup for Android 14 ready and waiting before any of their devices even start to update.

That's everything from me, and hopefully that resolves this for many Android versions to come. A big thanks to everybody who discussed this and shared suggestions, and especially mastodon.social/@tbodt, ioc.exchange/@tmw & infosec.exchange/@g1a55er, who popped into my Mastodon mentions with some really helpful background & suggestions.

Have thoughts, feedback or questions? Get in touch on Mastodon, on Twitter or directly.

Want to inspect, debug & mock HTTP(S) traffic on your Android devices and test this out? Try out HTTP Toolkit - hands-free HTTP(S) interception for mobile, web browsers, backend services, Docker, and more.

Android 14 blocks modification of system certificates, even as root

Tue, 05 Sep 2023 14:00:00 GMT

Update: This post sparked a lot of excellent discussion and debate on workarounds, and there are now multple working solutions to allow certificate injection on Android 14, despite the restrictions discussed here. See the update post for more details.

When Android was initially announced in 2007 by the Open Handset Alliance (headed by Google) their flagship project was billed as an "open platform", "providing developers a new level of openness", and giving them "complete access to handset capabilities and tools".

We've come a long way since then, steadily retreating from openness & user control of devices, and shifting towards a far more locked-down vendor-controlled world.

The next step of Android's evolution is Android 14 (API v34, codename Upside-Down Cake) and it takes more steps down that path. In this new release, the restrictions around certificate authority (CA) certificates become significantly tighter, and appear to make it impossible to modify the set of trusted certificates at all, even on fully rooted devices.

If you're an Android developer, tester, reverse engineer, or anybody else interested in directly controlling who your device trusts, this is going to create some new challenges.

Before we get into the finer details, first I want to talk a little about the context around Android CA management and how we got here, but if you want to jump to the latest details you can go straight to the Enter Android 14 section below.

"Open Software, Open Device, Open Ecosystem"

While the initial principles of Android were very much focused on open software, controllable by users and developers, over more recent years Android has increasingly limited the control of users, developers & researchers over their own devices.

The key turning point in this process was Android 7 (Nougat, released in 2016) in which the certificate authorities (CAs) on the device that were previously fully modifiable by the owner of the phone were split in two: one fixed list of CAs provided by the OS vendor and used by default by every app on your phone, and another set of user-modifiable CAs that users could control, but which was used only for apps that specifically opted in (i.e. almost none).

The set of CAs on a device is a small configuration setting with big consequences, as your device's trusted CAs are the organizations guaranteeing the security of your encrypted network traffic. A CA can issue TLS certificates (as used in HTTPS to secure all communication on the web) for any domain name they like, and anybody who trusts that CA will trust those certificates as evidence of a secure & legitimate connection to that domain.

That also means though that if you create your own CA and trust it then you can intercept your own HTTPS or other TLS traffic, to see exactly what your phone is sending & receiving, and potentially modify or block it. Being able to configure your device's CAs is key to this.

That is a lot of power. Making this difficult to modify accidentally for non-technical users and impossible to modify without user knowledge is certainly reasonable. At the same time however, being able to control this is critical for privacy & security research, reverse engineering, app debugging & testing, for assorted enterprise internal network configurations, for anybody who doesn't trust one of the standard CAs provided by their vendor, and many other cases.

With that one change in Android Nougat in 2016, each of those use cases became significantly more challenging. It became impossible for users on normal devices to control who was trusted to secure the communication of apps on their own devices, and a substantial hurdle was created that directly transferred power from users to vendors & third-party app developers.

Rooting around the Nougat problem

Although this is very inconvenient, fortunately it's long been possible to root Android devices, allowing full administrative access over the device, and making it possible to work around these kinds of restrictions. This isn't officially encouraged by Google, but it's been sufficient as a workaround to allow researchers, developers & reverse engineers to take control of their own devices for these advanced use cases.

By doing so, it was possible to deal with Android Nougat's restrictions on rooted devices, manually adding the trusted certificate to the system store via the filesystem, injecting them into the /system/etc/security/cacerts/ directory.

This is a bit harder than it sounds, because /system is generally read-only, even on rooted devices. There are two main ways to solve that:

Make /system directory writable (requires a little reconfiguration & a device reboot) and then manually modify the real system certificates directory.
Mount a temporary read-write filesystem over the top of the read-only directory, copy the existing CA certs into there, and then add your own additions on top too.

In each case there are a few other steps required to ensure that the certificates have the appropriate naming, permissions, and SELinux labels to be accepted by the system (for more low-level details and discussion see this post), but it's relatively simple and HTTP Toolkit has long automated the temporary mount-based process (see the certificate injection script here). In practice, this means it's possible to provide one-click automated interception setup for any rooted Android device or emulator.

These approaches have been effective not only on custom rooted devices and specialized Android distributions, but even in most of Google's own official emulator images (everything except the full 'Google Play' edition images, which are locked down to match a normal OEM device) not to mention other emulators from Genymotion to Bluestacks.

Easy & effective CA setup has powered myriad tools that let you see what apps on your phone are sending & receiving: helping developers to debug their networking issues, keeping app developers honest about the data they share, and shining a light on security vulnerabilities in both apps & their APIs.

These techniques are used by HTTP Toolkit's automatic setup, but also referenced in the setup guides for similar tools like mitmproxy, in endless blog posts, StackOverflow answers and forum threads, widely used in tools including popular Magisk packages and by organizations like the community-run CA cacert.org.

These are widespread techniques that have worked for many years. Although the required root access has become more a little challenging more recently (due to first SafetyNet and later 'Play Integrity' using attestation to allow apps to block users who use rooted devices) this solution has generally been quite manageable, and a just-about-acceptable balance between "inconvenient enough to disuade users unaware of the implications" and "accessible to power users who know what they're doing".

Enter Android 14

Right now, Android 14 is currently in its final stages of beta testing, slated for imminent release within a couple of weeks.

One of its headline new security features is remotely updatable CA certificates, which extracts management of CA certificates from the core OS image entirely, and moves it into a separately updateable component, delivered & updated via Google Play. This allows for faster CA updates for Google, allowing them to revoke trust of problematic or failing CAs on all Android 14+ devices with just a Google Play System Update, without waiting for each phone vendor to release an OTA update for the whole operating system.

Although I'm sure you can see what's coming, let me caveat first: at a very high level, the goal here is a Good Thing.

CAs trusted by default like this are in a powerful position, and there needs to be serious oversight & consequences to ensure they stick to their responsibilities and continue to justify that trust. When they fail to do so, it's important that this power is taken away quickly, before it can be abused.

In the most notable recent case, in January 2023 TrustCor was untrusted as a CA by effectively everybody (including Google), after close ties to a malware-distributing organization and associated US defence/intelligence contractor were discovered.

In the other direction, the inability to widely distribute & trust new CA certificates has caused issues for new CAs on the block such as Let's Encrypt, who have had to repeatedly delay the rollout of improvements to their signing chain, because old Android devices missing recent CA root certificates would not have trusted them, and would thereby have been locked out of significant parts of the web.

Mechanisms to improve the responsiveness of this system are valuable. In addition to just speeding up removals & additions, this should also widen access to those updates, since even devices for which vendors no longer offer official OS updates can continue to receive system component updates like this via Google Play for significantly longer.

Unfortunately though, despite those sensible goals, the reality of the implementation has serious consequences: system CA certificates are no longer loaded from /system, and when using root access to either directly modify or mount over the new location on disk, all changes are ignored by all apps on the device. Uh oh.

The mechanics

The key change to enable this is here. Instead of reading from the venerable /system/etc/security/cacerts directory, this new approach reads certificates from /apex/com.android.conscrypt/cacerts, when it exists.

That root /apex path is where Android Pony EXpress (APEX) containers are mounted. These APEX modules are independently updatable system components, delivered as signed & immutable containers. In this case, the certificates form part of Android's com.android.conscrypt module - its core TLS/SSL library delivered as an independently updatable system module.

The exact mechanisms behind APEX are challenging to fully understand, as many low-level details seem undocumented, and what documentation there is includes links to key details only available within Google's internal sites. Tessting the resulting behaviour though, it seems that this is using some kinds of containerization primitives to expose the mounted content directly to individual processes, resulting in surprising behaviour when trying to modify these files elsewhere. As a result, delivering content through an APEX module makes it much harder (seemingly impossible) to manually modify, even with full administrative control.

It's easy to test this for yourself, using the latest Android 14 beta official emulators. Both the Android Open Source Project (AOSP) and 'Play Services' images have always allowed root access (unlike the 'Google Play' images) and by creating an emulator using those you can easily open a root shell.

Follow either of the two existing techniques though, and the expected updates do nothing. Let's walk through a demo.

First, set up your device. You'll need the Android SDK installed, and you probably want Android Studio, since it makes this much easier, although you can use the CLI directly if you like.

First, create an emulator:

Through the Android Studio UI, select any device model, and the API 34 'Google APIs' image for your architecture.
Or, using the Android SDK tools on the CLI, run avdmanager create avd -n TestAVD -k 'system-images;android-34;google_apis;x86_64' (your architecture may vary)

Let's try messing with temporary mounts and see what we can do:

Start your emulator, via the UI or emulator -avd TestAVD
Open a root shell via adb shell, then su
Try mounting an empty temporary filesystem over the various cacerts directories now present:

  mount -t tmpfs tmpfs /system/etc/security/cacerts
  mount -t tmpfs tmpfs /system/etc/security/cacerts_google
  mount -t tmpfs tmpfs /apex/com.android.conscrypt/cacerts
  mount -t tmpfs tmpfs /apex/com.android.conscrypt@340818022/cacerts
  # N.b. that last @id may vary in future updates

In the emulator, open Settings -> Security & Privacy -> More -> Encryption -> Trusted Credentials

Under the 'System' tab, all the certificates you've just hidden from view on disk are there!

So, where's this coming from?

We can try searching the entire filesystem to find the source of this data. For example, the top certificate shown is from 'ACCV'. You can also find that in Android's sources here, and it's present both there and in unmodified Android CA cert folders as 3c9a4d3b.0. We can search for this with:

  find . -name 3c9a4d3b.0 2&>/dev/null

It doesn't exist!

But in fact, it does: remove your 4 mounts again (umount ), retry that find command, and you'll see that this is indeed present in the original cacerts directories.

Try the exact same steps with an Android 13 image, and you'll find that this modifies the certs just fine, and the Settings certificates list appears entirely empty, as expected:

Let's try another route and see if we can rewrite the system image filesystem directly to modify this list:

Stop your emulator, and restart it from the CLI with a writable system partition:

  emulator -avd TestAVD -writable-system

Make everything writable:

  adb root
  adb remount
  adb shell avbctl disable-verification
  adb reboot
  adb root
  adb remount

You can delete all the normal certificates now (fair warning: this may create problems when using this emulator in future!) with: bash rm -rf /system/etc/security/cacerts/* rm -rf /system/etc/security/cacerts_google/*
You'll find that you can't delete the certs from /apex though! Despite the remount, it's read-only, and mount -o remount,rw ... commands to do so manually will all fail.
The closest you can do, so far as I can tell, is to unmount the certificates entirely with umount so that they don't appear in the output of mount at all.
Doesn't matter though: no matter how aggressive you get, seemingly no matter how much of the emulator's relevant internals you delete, AFAICT there's nothing you can do to stop these certs all happily loading up in the 'Trusted' list in the Settings.

As in with the other method, those same steps will work just fine on every other version of Android, up until now.

Note that this isn't just a detail about the Settings app, where these are cached or stored elsewhere. The certificates as shown here are reloaded each time, and they're representative of every app's view of the certificate store.

No matter what you modify on the filesystem, every app will continue to see Google's list of CA certificates regardless. I've been playing with this for a while, and as far as I can tell there's no working method to modify the certs anybody sees.

What is going on here?

It's hard to tell precisely, so I'm guessing & inferring here (but if anybody does have more information, I'd love to hear it! Get in touch on Mastodon, on Twitter or directly).

I think the most likely case is that as part of the wider modularization of Android, these system files and components are now exposed to apps through an entirely different mechanism. It looks clear from the Android internals that they're still being read from disk and the code to do so hasn't changed in many years, so this implies not everybody is seeing the same filesystem. This is similar to how Docker and friends use chroot, overlay filesystems and mounts to run containers with an isolated view of system files and other devices.

Clearly, this has some serious consequences.

As touched on above: if you're configuring your own system CA certificates on Android right now for debugging, reverse engineering, testing or research, that option is going away in Android 14, and presumably all future versions too.

For now anybody interested in these use cases will just have to avoid updating, or use custom OS releases that don't use the APEX module to manage their CA certs. As time goes by though, this will likely become increasingly impractical, since it means either diverging strongly from Android mainline on a key internal component or running outdated software indefinitely.

Concerningly though, this also implies that APEX system modules are going to be a wider problem. If, as it appears, content within APEX modules is unmodifiable to users even with root access to the device, then every future system component that's moved into the remit of APEX is another part of Android that becomes completely removed from user control.

More investigation is required and it's hard to know the full implications of that now, but for the many forks of Android like GrapheneOS & LineageOS, and for advanced device configuration tools like Magisk and its many modules, it probably spells trouble.

Personally, for now I'm investigating some other promising alternative solutions to allow interception of your own network traffic on Android, and I'll share details here as soon as I have something working, so watch this space.

In the meantime, if you want to debug your own HTTPS traffic, you'll need to stick to Android 13.

Want to inspect, debug & mock Android traffic on your Android 13 device anyway? Try out HTTP Toolkit - hands-free HTTP(S) interception for mobile, web browsers, backend services, Docker, and more.

Apple already shipped attestation on the web, and we barely noticed

Tue, 25 Jul 2023 14:00:00 GMT

There's been a lot of concern recently about the Web Environment Integrity proposal, developed by a selection of authors from Google, and apparently being prototyped in Chromium.

There's good reason for anger here (though I'm not sure yelling at people on GitHub is necessarily the best outlet). This proposal amounts to attestation on the web, limiting access to features or entire sites based on whether the client is approved by a trusted issuer. In practice, that will mean Apple, Microsoft & Google.

Of course, Google isn't the first to think of this, but in fact they're not even the first to ship it. Apple already developed & deployed an extremely similar system last year, now integrated into MacOS 13, iOS 16 & Safari, called "Private Access Tokens":

Private Access Tokens are powerful tools that prove when HTTP requests are coming from legitimate devices without disclosing someone's identity.

The focus here is primarily on removing captchas, and as such it's been integrated into Cloudflare (discussed here) and Fastly (here) as a mechanism for recognizing 'real' clients without needing other captcha mechanisms.

Fundamentally though, it's exactly the same concept: a way that web servers can demand your device prove it is a sufficiently 'legitimate' device before browsing the web.

How do Private Access Tokens work?

The mechanism is a fairly simple exchange over HTTP, handled by built-in browser APIs, which in turn integrate with operating system components to confirm that the browser & OS are 'legitimate' (the exact definition of that is left to the attester - i.e. Apple).

The flow looks like this:

A browser makes an HTTP request from a web server.
The web server refuses the request, and returns an HTTP 401 response with a PrivateToken challenge:
```
HTTP/1.1 401 Unauthorized
WWW-Authenticate:
    PrivateToken challenge=,
        token-key=
```
(Newlines added for readability)
The browser recognizes this, and sends parts of the challenge, in addition to verified details of your device provided by the OS, to an attester (e.g. Apple).
The attester verifies your device is real & unmodified (depends on the device, but both Android & iOS have existing ways to check this) and works with a token issuer (somebody trusted by the origin & that trusts the attester, e.g. Cloudflare/Fastly) to return a signed token, proving that your device has been verified as legitimate.

The browser resends the request, with the signed token in their Authorization header:

GET /protected-content HTTP/1.1
Host: example.com
Authorization: PrivateToken token=

The server now knows that the client has been verified by a trusted provider (but nothing more) and can treat the client differently on that basis.

This all happens on Apple devices today when using Safari, any time a service using this (such as Fastly & Cloudflare) is concerned about the legitimacy of your requests.

The privacy protections in here appear fairly strong (I'm not an expert on this, but that's a very clear goal of the proposal and the separation of the origin/issuer/attester flow) but the core issue from Web Environment Integrity remains: your treatment on the web depends on whether Apple says your device, OS & browser configuration are legitimate & acceptable.

How bad is this?

This feature is largely bad for the web and the industry generally, like all attestation (see below).

That said, it's not as dangerous as the Google proposal, simply because Safari isn't the dominant browser. Right now, Safari has around 20% market share in browsers (25% on mobile, and 15% on desktop), while Chrome is comfortably above 60% everywhere, with Chromium more generally (Brave, Edge, Opera, Samsung Internet, etc) about 10% above that.

With Safari providing this, it can be used by some providers, but nobody can block or behave differently with unattested clients. Similarly, Safari can't usefully use this to tighten the screws on users - while they could refuse to attest old OS versions or browsers, it wouldn't make a significant impact on users (they might see statistically more CAPTCHAs, but little else).

Chrome's usage is a larger concern. With 70+% of web clients using Chromium, this would become a major part of the web very quickly. With both Web Environment Integrity & Private Access Tokens, 90% of web clients would potentially be attested, and the "oh, you're not attested, let's treat you suspiciously" pressure could ramp up quickly.

Why is attestation bad generally?

This has been covered extensively elsewhere, so I won't dig into it too deeply, but as a quick summary:

Attestation means you can only use approved clients, which is terrible for competition and innovation (sorry, it's now impossible to make a new browser or OS!) particularly for open-source, community & smaller indie efforts that are consistently excluded from these mechanisms. If attestation was around in the days of IE6, it would've been a major roadblock in the rise of Firefox & Chrome.
Attestation blocks users' control of their own devices, by design. A key goal is that users using modified software should not be attested as legitimate. I would like to be able to browse the web on my rooted Android phone please. There's no way any fully user-modifiable OS or hardware can ever be attested in the way these proposals intend.
Attestation opens dangerous doors that allow the approved providers to freely tighten the rules later. "Sorry, we only attest devices during their 2 year official support window", and "Sorry, we don't attest browsers with ad-block extensions installed" are both perfectly plausible and thoroughly problematic next steps.

Proponents would argue that none of this applies to the Web Environment Integrity proposal, as it suggests 'holdbacks', i.e. refusing attestation a small percentage of times to make blocking based on attestation impossible. I suspect business pressures will make that unworkable in practice (there's already strong industry pushback from a closely involved F5 / Shape Security representative) and it's notable that Google's existing Android Play Integrity attestation does not do this.

Free usage of different clients & servers on the web is what has built the open web, and is a large part of why it's so successful. Attestation breaks this by design.

All of this has already played out on Android, where you technically can create and run your OS, and Android distributions like LineageOS are perfectly functional, but attestation features mean that this implies the constant risk of key apps (like your bank!) blocking you as suspicious.

Fundamentally, attestation is anti-competitive. Blocking competition between different Android versions is already problematic. Blocking competition for both browsers & OSs on the web would be catastrophic.

Shit

Quite.

If you're concerned about all this, you might also be interested in the other proposals from the Anti-Fraud Community Group, discussing various other potential web 'features' along the same lines.

Fraud & bots on the web are a real problem, and discussion on ways to defend against that is totally reasonable, and often very valuable! It's a hard problem. That said, this has to be carefully balanced against the health of the web itself. Blocking competition, hamstringing open-source and the open web, and removing all user control over their own devices is not a reasonable tradeoff.

Have thoughts or comments? Get in touch on Twitter, on Mastodon or directly.

Want to understand how this all works up close? Try intercepting, exploring & modifying your web traffic with HTTP Toolkit now.

Leaking secrets through caching with Bunny CDN

Tue, 20 Jun 2023 11:30:00 GMT

Caching is hard.

Unfortunately though, caching is quite important. Hosted caching & CDNs offer incredible powers that can provide amazing performance boosts, cost savings & downtime protection, essential for most modern sites with any serious volume of users.

Unfortunately, while there are strict standards for how caching is supposed to work with HTTP on the web, many cache providers do not quite follow these, instead giving their customers free reign over all kinds of invalid caching behaviour, and providing their own default configurations that often don't closely follow these standards to start with either.

There are many good reasons for this, but the main one is that CDNs are now doing dual service: providing performance improvements, and actively protecting upstream sites from DoS attacks and traffic spikes (similar problems - the key difference between a DoS attack and hitting #1 on Hacker News etc is intent, not impact). This conflicts with many of the standards, which prioritize correctness and predictability over this use case and, for example, expect clients to be able to unilaterally request that the cache be ignored.

Bunny.net provides one of these CDNs, and like most they aggressively cache content beyond the limits of the standards, both to help protect upstream servers and to support advanced user use cases.

This has upsides, but in some edge cases can result in awkward bugs and break developer expectations.

In some more dramatic cases though, it can expose private user data, break applications & even leak auth credentials, and that's where this story gets serious. A few months ago, I ran into exactly that issue while testing out deployment options with Bunny.net, where I discovered that private HTTP responses intended for one authenticated user could be served to other users instead.

Spoiler: this is now fixed! That said, it's worth exploring where this went wrong, the many ways this can work right, and how CDNs solve issues like this in practice.

Caching vs HTTP Authorization

HTTP API authentication is typically implemented with an Authorization header sent in every request that looks something like Authorization: Bearer ABCDEF... where ABCDEF… is an authentication token linked to a specific account that you're provided elsewhere.

There are many good reasons why this pattern is so widespread despite the many possible ways to authenticate an API request over HTTP. One of the best is that middleboxes and other standard tools on the web know what an authorization header means and will handle it sensitively, for example not storing the header in log files, masking it in error reports, and definitely NEVER CACHING RESPONSES FROM AUTHENTICATED REQUESTS AND SERVING THEM TO OTHER USERS.

Ahem.

The HTTP caching standard makes this last point abundantly clear:

A shared cache MUST NOT use a cached response to a request with an Authorization header field to satisfy any subsequent request unless a cache directive that allows such responses to be stored is present in the response.

If you ignore that, as Bunny.net did, this happens:

In this case, user A sends an authenticated request through the CDN's cache. Once this has passed through the CDN, the cache then stores it, using a cache key that doesn't include the authorization header (for simplicity, let's assume it just includes the URL).

Later user B sends another request. This could contain a different Authorization header, or no auth header at all. Regardless, user B isn't allowed to see user A's private content, but here they receive it anyway! The CDN finds a matching response in its cache, ignores the mismatched Authorization headers, and serves the private data straight back up to the wrong person.

This is what Bunny's CDN was doing, until recently. This is very bad indeed.

The response intended for user A sent instead to user B could contain all sorts of things user B should not see, and any of them could now be exposed. Private user data is the main problem (imagine the response to a /private-messages endpoint) but plausibly more problematic things like working API keys for the wrong account if a /list-api-keys or /generate-api-key response is cached (well-behaved APIs should do this with POST requests - not generally cached - and not expose a list of active keys, but much of of the internet is not well-behaved). If any such auth credentials are exposed, everything owned by user B is up for grabs.

Worse, this could be invisible. If CDNs are used only for media and heavier content, then APIs like /private-messages might not go through the cache (hiding the issue) while requests to APIs like /private-messages/attachments/123 would. Assuming that URL should return the same content for all users authorized to access the attachment, no normal user would ever see a problem, while in reality any user could request that private content and see it, allowing attackers to retrieve all private content from the CDN by just guessing or crawling through resource ids. Ouch.

(How much are CDNs used in front of APIs like this, rather than just for public static content? It's hard to say in general, but Cloudflare's stats say more than 50% of their handled requests in 2022 were API requests.)

Plugging the leak

For cache providers, there's a few ways to properly handle this issue:

Strip Authorization headers from all incoming requests by default

This effectively returns unauthenticated responses for authenticated requests, which isn't great, but does solve the immediate security problem.
This handles DoS and spike protection issues effectively (or at least, fully empowers the CDN to deal with them). Everything that can possibly come from the cache does so, and you can't skip it by just setting a random auth header.
This is a reasonable response if your expectation is that the cache is purely for public static data, and nobody should be using it for authenticated requests in the first place.
Can still allow users to configure custom behaviour on top of this, for cases where you do want authenticated data - you just force them to decide explicitly how to handle it.

Treat the Authorization header as part of the cache key, caching responses per user

This ensures you never serve content to the wrong user while still providing great performance
However, it can explode cache sizes, and still allows DoS attacks (just add any old auth header and you skip the cache).
It also implies storing very sensitive data (authorization tokens) directly in your cache, which is generally not great for security.

Never cache responses for requests with an Authorization header

This is the behaviour defined in the standard. These should never be cached unless they have explicit response headers marking them as cacheable.
This neatly solves the issue, without exploding cache sizes or creating new security concerns.
Anonymous requests can still be cached and returned to other anonymous users, but all authenticated traffic effectively skips the cache entirely.
This does still leave you somewhat exposed to DoS attacks though, as anybody can add an auth header (valid or not) to skip the cache entirely, even when requesting public content.

Never cache responses for requests with an Authorization header, but do use existing cached responses if available

If the Authorization header skips response caching but not request cache lookups (and isn't in the cache key) then successful anonymous responses may be served to authenticated requests when they're already in the cache.
Note that's only for successful responses - if anonymous requests get a 401, that should never be cached (as far as I'm aware, all HTTP caches handle this correctly by default).
This seriously limits the DoS exposure, as it ensures that endpoints serving any successful anonymous requests are always served from the cache.
This is a bit weird & can cause unpredictable issues: your authenticated request might unexpectedly start receiving an anonymous-user response because of requests elsewhere. In most typical API designs though this won't happen (most endpoints are either authenticated or not) but weirdness is definitely possible.

Bear in mind that all of this is defining the defaults for authenticated requests. Of course individual responses can still define precisely how cacheable they are via Cache-Control and Vary response headers, and APIs with clear ideas of how fresh/cached responses can be should use that to manage CDN & client caching directly.

Personally, my strong preference would be to purely follow the standard, but that doesn't fit with the design of many CDN services, Bunny included, and so Bunny have implemented option #4. I strongly suspect that's for DoS protection reasons - my initial suggestion of following the spec approach was met with "Unfortunately, if we solve it the other way, by having it bypass the cache automatically, you open up a potentially worse vector of complete downtime and financial loss for any client".

Given those constraints, option #4 seems like a reasonable balance of security/caching. I can see how this is a challenging balance, and I do appreciate Bunny's work to quickly confirm this issue and then roll out a fix globally (released in late May).

Caching is hard

All this serves to neatly highlight once again that caching is a hard problem! There are a very wide variety of use cases and issues to handle, and the 'right' answer is often situation-specific, making specifying useful defaults for infrastructure providers very challenging.

So, what does everybody else do in this scenario? As far as I can tell:

Varnish bypasses the cache entirely (option #3) for all requests that use an Authorization header.
AWS Cloudfront strips the Authorization header by default (option #1) but offers options to configure more advanced cases including adding it to the cache key (option #2). Notably, if you configure a policy that forwards the header without adding it to the cache key, Cloudflare will refuse to forward it at all to avoid the original issue here.
Cloudflare workers bypass the cache for all requests with Authorization (option #3) even if cacheEverything is specifically enabled.
Cloudflare CDN bypasses the cache for requests with Authorization headers (option #3) unless either the server explicitly declares the response as cacheable via Cache-Control, or you disable Origin Cache-Control and manually mark it as cacheable yourself.
Google Cloud CDN blocks all caching of requests with Authorization headers unless the response specifically declares itself as cacheable via Cache-Control.

(I haven't tested any of these myself - if you have more information, or more examples I should include here, do please get in touch)

If you're using a CDN, and you're not sure you've configured this correctly, an easy way to test is to set up a URL you can GET through your CDN which logs the complete requests received and returns a 200 (or forward to a test service like webhook.site). Then you can make two requests with different Authorization headers, and if headers are stripped you'll see one request with no Authorization header, if the cache is bypassed you'll see both requests in full, or if authenticated requests are cached (the original security issue above) then you'll see just one request that does include its Authorization header.

You can even echo the Authorization header back in the response, to fully test this and see the security issue in action for yourself, but of course do be careful about doing that near any production traffic, as if this issue is present you'll be directly exposing authentication tokens between users.

Want to debug caching and API interactions up close? Try out HTTP Toolkit now. Open-source one-click HTTP(S) interception & debugging for web, Android, terminals, Docker & more.

Automatic npm publishing, with GitHub Actions & npm granular tokens

Wed, 22 Mar 2023 10:00:00 GMT

This week, at long last, GitHub announced granular access tokens for npm. This is a big deal! It's great for security generally, but also particularly useful if you maintain any npm packages, as it removes the main downside of automating npm publishing, by allowing you to give CI jobs only a very limited token instead of full 2FA-free access to your account.

In the past, I've wished for this, because I maintain a fair few npm packages including some very widely used ones. The previous solution of "just disable 2FA on your account, create an all-powerful access token with global access to every package, and give that token to your CI job" was not a comfortable one.

Regardless of your situation, isolating any risk of issues in security-sensitive situations like this is a good move, and ensures that any leak of (or legitimate access to) your CI secrets for one project doesn't imply a complete takeover of everything on your npm account.

As soon as I saw this was now available, I jumped on automating npm publishing for a few of the packages that I've been manually publishing until now. The process is pretty quick and easy, let's walk through the steps:

Get an access token for your package
- Log into npmjs.com
- Click your profile picture in the top right, then 'Access Tokens', 'Generate New Token', and 'Granular Access token' (or jump to npmjs.com/settings/$YOUR_USERNAME/tokens/granular-access-tokens/new)
- Set a useful name, a long expiry (up to you), 'Read and write' permissions, and pick the specific package that you're publishing
Add your token as a secret for your project's GitHub Actions
- Jump to https://github.com/$YOU/$REPO/settings/secrets/actions/new
- Set NPM_PUBLISH_TOKEN as the secret name
- Copy the npm_... token from the previous step as the secret value
In your npm package's settings (i.e. https://www.npmjs.com/package/$PACKAGE_NAME/access), allow publish without 2FA for granular/automation tokens only, so that tokens can be used for publishing:

Add a publish step to your GitHub actions script.

The specific details of this will depend on your current setup - you might want to do this on tagged releases, automatically on a schedule, or with a manually triggered job.
In my case, I'm most interested in automatically publishing openapi-directory-js, and I've set this all up initially with a workflow I can manually trigger - the full script is here.

Regardless of how you manage the trigger, the key parts you'll need for the publish itself are these:

# When setting up node:
- uses: actions/setup-node@v3
  with:
    node-version: '16.x'
    registry-url: 'https://registry.npmjs.org' # <-- the registry-url here is required

# ...[build & test etc]...

# Bump the version & push (if you're not doing that elsewhere)
- name: Bump version & push
  run: |
    git config --global user.name 'Automated publish'
    git config --global user.email '$YOUR_USERNAME@users.noreply.github.com'# Update the version in package.json, and commit & tag the change:
npm version patch # YMMV - you might want the semver level as a workflow input

git push && git push --tags
# Publish the result to npm with your granular token:
- run: npm publish
  env:
    NODE_AUTH_TOKEN: ${{ secrets.NPM_PUBLISH_TOKEN }}

That's it! Once this is in place, your job will automatically bump the version of your package, then commit, tag & push that bump, and then publish the result to npm. All without needing to disable 2FA for your package for normal usage, or add any globally all-powerful npm tokens anywhere.

Hope that helps out others in the same space. If you have feedback or questions, let me know on Mastodon, on Twitter, or send a message directly.

Want to debug, test or mock HTTP(S), from Node.js, browsers, servers, phones, and everything else? Try out HTTP Toolkit now.

Dodge the next Dockerpocalypse: how to own your own Docker Registry address

Fri, 17 Mar 2023 11:00:00 GMT

As you may have seen, Docker Hub made a dramatic shift in policy this week, and effectively gave a 30 day eviction notice to almost all community-run images.

They've now made an apology to 'clarify' a few details, and helpfully take some of the hard edges off, but this still highlights a big problem. Fortunately, there are solutions.

As initially described, this would've been catastrophic: Docker Hub has been used as the default host in tooling, tutorials, demos, blog posts, scripts, deployment definitions, CI builds and more for many years, and all those references were going to break - a self-inflicted left-pad for the Docker ecosystem. In their updated policy, it appears they now won't remove any existing images, but projects who don't pay up will not be able to publish any new images, so they've effectively lost control of the namespace they used to deploy to their communities unless they purchase a full team subscription. Many will not do so.

This is an interesting challenge. Even if the existing images aren't removed, the direction of travel for Docker Hub is now clear: they don't want to host the core of the Docker community any more, no more freebies, pay up or go elsewhere (not unreasonable, but something of a rugpull after a full decade of the opposite).

As a small/open-source/community/hobbyist image publisher, or if you depend on Docker Hub for free image publishing in any capacity, you now have a problem. They don't want you there. You're explicitly not their audience, and the rules will likely tighten further. This isn't unreasonable - it's their service and hosting isn't free - but it's worth considering explicitly and reacting accordingly. If you're not a paying Docker Hub customer, it's time to leave Docker Hub.

The hard part is what to do instead.

Self-hosting a registry is not free, and it's more work than it sounds: it's a proper piece of infrastructure, and comes with all the obligations that implies, from monitoring to promptly applying security updates to load & disk-space management. Nobody (let alone tiny projects like these) wants this job.

Alternatively, there are plenty of other free hosted Docker registries, and paid services too, but migrating to one directly feels a lot like you're just going to hit the exact same problem 6 months from now, and have to change your image references everywhere all over again.

What if you could use your own fixed registry URL, on your own domain & entirely under your control, but without having to self-host forever, or even commit to any particular registry, or handle all the bandwidth & storage costs?

We're looking for a way to:

Reference your images from an address you fully control (docker pull docker.my-org.example.com/org/my-image)
Do so whilst still being able to use any registry hosted elsewhere, or self-hosted ourselves.
Avoid storing, loading, or serving the content separately. For now at least, there's quite a few other registries who will happily do this for public images for free, and even if there weren't we'd like to avoid extra latency or ingress & egress fees from proxying this traffic.
Be able to change which backing registry we're using in future, without any of the image addresses ever changing again.

What if I told you that's actually super easy?

Exploring the possibilities

Let's talk about how this could work, and then we'll dig into what docker pull actually does, and put together a quick solution (if you just want to know how to do this immediately, now's the time to skip to the end).

Docker's registry API runs on fairly simple HTTP, and HTTP APIs have a few different solutions available for situations like this.

The classic 'host under your own domain' solution is to use CNAMEs at the DNS level. This means setting up a DNS record under your domain, which points to a domain elsewhere, effectively defining an alias. When a client tries to connect, they'll lookup your-registry.example.com, find a record referencing the backing registry (registry.hub.docker.com, for example), and then all requests will get sent over there.

If this worked here, that'd be great! Zero hosting required, just handle it on the DNS level.

Unfortunately, this requires that target server to correctly handles HTTP requests with your 3rd party domain name in the Host header, knowing that they should be processed as requests to the real service. For Docker Hub at least, that's not possible (not for free certainly - although like many other services this might be offered as a paid addon). Requests sent to Docker Hub with the wrong hostname simply fail:

> curl -I https://registry.hub.docker.com
HTTP/1.1 200 OK
...

> curl -I -H'Host: example.com' https://registry.hub.docker.com
HTTP/1.0 503 Service Unavailable
...

I suspect this applies to many other registries too, so redirecting just at the DNS level is out.

Next plan: can we do this by redirecting and/or proxying at the HTTP level? There's lots of standard tools & approaches to do this within HTTP itself, along with an entire ecosystem of reverse proxies. Unfortunately though, whether or not API clients will handle redirects as we'd like is not guaranteed, and proxying without running into other issues is non-trivial.

To work out whether this'll work, we need to do some digging into Docker's traffic directly.

How 'Docker Pull' works

First, let's take a look at what a Docker pull really does under the hood.

When you run docker pull, or do anything else with Docker (e.g. building an image) that triggers an image pull en route, there are a few requests that have to happen to download the full image you're looking for.

To dig into this traffic, the easiest option is to use an HTTP-debugging tool (such as HTTP Toolkit) to see the raw interactions, and configure Docker to use this as your HTTP proxy (docs here) and trust the CA certificate (here).

Unless you're super keen though, you can all skip that - I've done the hard work for you. Here's what happens when you run docker pull nginx:

What we have here is:

An initial /v2/ request to check the API status (docs here). On Docker Hub this typically returns a 401, with headers redirecting the client to authenticate.
An authentication request to auth.docker.io, which returns a JWT.
A HEAD request to the base image URL (/v2/library/nginx/manifests/latest) which returns a response with a docker-content-digest header containing a sha256 hash:
Two GET requests for specific manifests, both receiving a 200:
- /v2/library/nginx/manifests/sha256:aa0a... (the hash from the previous response header) which returns a list of manifests tagged by platform:
- /v2/library/nginx/manifests/sha256:942a... (the hash of the linux platform from the previous request) which returns a manifest listing hashes for individual image layers.
A set of parallel requests for specific blob hashes, all in the format of /v2/library/nginx/blobs/sha256:$HASH.

Each of these does not return the content - they return 307 redirects to the content! In the case of Docker Hub, they return appear to return redirects to a Cloudflare-backed CDN:
An interleaved set of parallel requests to the real image host (production.cloudflare.docker.com) to actually retrieve the content of the image config & layers.

Once the client has pulled all the layers and the image config, they're composed back together as a Docker image you can use directly locally.

Transparently wrapping a Docker Registry

This is all very interesting, and gives us a good idea what's going on at the network level, so we can start testing this out to build what I'm calling a "registry facade" (a service that sits in front, but just as a shell, not a proxy).

Conveniently, in the traffic above we can see that there are already redirects in place, and working! That means that all Docker clients must support redirects at least for /blobs/ requests (otherwise Docker Hub would be unusable) and so probably support them for all requests.

So, given that, what happens if we just do the same directly ourselves, by creating a rule to return 307 HTTP redirects from all $OUR_HOST/* URLs to the corresponding $OUR_REGISTRY/* for any request?

Bingo.

This works fairly well! We're adding a bit of overhead with an extra 307 redirect response at each step (each request with the red icon is an injected redirect) but they're very quick, everything here is being sent successfully, and pulls work perfectly in every scenario I've tested. Definitely good enough to get started with (and because this will all be under our own control, we can iterate to improve this solution in future).

I tested this with a quick hacky rewrite rule in HTTP Toolkit - how do you do this in production?

Turns out that's pretty easy too: I've created a tiny Caddy-based Docker container (I enjoyed the irony of publishing this to Docker Hub) which you can deploy directly to any Docker hosting platform to do this in no time, or if you already have a CDN or hosting platform (e.g. Netlify) that lets you define simple rules like "redirect all requests for X to the same path at host Y" then you can use that too.

In my case, I'm using Bunny CDN, who have a nice rules system that can do this very easily like so:

In production, one thing you may want to do is limit this functionality to just your own org's images, to avoid it being used as a general-purpose facade for all images or similar, so you know all requests to your domain will always get your images. The Caddy-based container above supports this by setting the REGISTRY_ORG variable, e.g. to httptoolkit, in which case only those images will be available and everything else will get a 403.

If you want to limit requests like this yourself with other tools, you'll just need to ensure that requests to all URL paths starting with /v2/$YOUR_ORG/ are redirected, along with the specific /v2/ endpoint - without that latter endpoint authentication won't work.

Once that's in place, you're all good. In my case, I've deployed this as docker.httptoolkit.tech, so you can now pull my Docker images from that hostname, even though they're currently still hosted on Docker Hub, like so:

> docker pull docker.httptoolkit.tech/httptoolkit/docker-socks-tunnel

In future I'll be migrating my images elsewhere, but I can start using this image address immediately, safe in the knowledge that it'll always work, backed by any registry I like, as long as I control that domain.

Dodging the next Dockerpocalypse

If you're a project affected by this issue, this is something you can set up right now as a quick wrapper before even starting to migrate from Docker Hub, and you can start shifting all your docs & scripts to reference that new URL immediately with no downsides.

More importantly though, either way, this ensures that whichever registry you migrate to, there's zero impact to switching in future, when your new registry of choice inevitably also goes bust/loses all your data/changes their rules with only 30 days notice.

That's enough for now (I need to get back to actually doing the full migration for all HTTP Toolkit's existing images) but I hope that helps others in the same mess. If you have comments, get in touch on Mastodon, Twitter, or send a message directly.

EU Funding for Mobile App Traffic Interception

Mon, 27 Feb 2023 12:00:00 GMT

HTTP Toolkit has been selected to receive another round of open-source funding from the EU! This aims to improve interception of HTTPS traffic from mobile apps, making it easier for both security/privacy researchers and normal technical users to inspect & manipulate the data that any app they use sends & receives.

This funding will directly support work to improve the precision & usability of mobile app interception, on both iOS and Android. In a couple of clicks, you'll be able to target any app installed on a connected device (on either platform) and inspect all its traffic. That means no background noise from full system interception, automatic certificate unpinning, and no fiddly manual setup required.

HTTP Toolkit can already automatically intercept Android devices, but only globally for the whole device (using Android's VPN APIs), with system certificates injected but without certificate unpinning, and with no automatic setup support for iOS at all. All those caveats are going away (of course, I'm intending the existing device-wide interception to remain a fully supported option indefinitely too).

This is going be powered by a set of new integrations & hooks for Frida, a popular open-source instrumentation toolkit. Many advanced HTTP Toolkit users are already using Frida independently (see the Frida certificate unpinning guide for more info) but doing so often requires quite a bit of setup and specialist knowledge. This project is going to take that away, making network interception of mobile apps easily accessible to anybody who knows what "HTTP" is.

All this is being funded by the fantastic NLNet Foundation as part of NGI Zero Entrust, a program funding open-source EU projects that support transparency around data use & privacy in modern technology. This is part of the EU's Next Generation Internet (NGI) initiative, aiming to directly fund researchers & open-source developers to encourage the future engineering of the Internet towards European values: "openness, inclusivity, transparency, privacy, cooperation, and protection of data". All things I'm thoroughly on board with.

This is equity-free R&D funding - in effect it's a charitable donation to HTTP Toolkit to encourage open-source product development in a direction they think is valuable. The development work is going to run over the next year, but it's notably not intended as a full-time commitment (more like 60% of my time) and so other HTTP Toolkit development is still going to continue alongside as normal.

(By the way - if you're in Europe and this kind of funding for open-source development sounds like something you might be interested in, their call for proposals for the 4th round of this funding is running from now until April 1st 2023! See nlnet.nl/entrust/ for more details)

Let's get into the meat though: what actually is Frida, what will all this do for you in practice as a user, and what's the plan to make this happen?

What is Frida?

Frida is a dynamic instrumentation toolkit. That means it lets you attach to an existing application, and dynamically inject your own logic to change how the application works. Frida is a substantial general-purpose and mature toolkit that works for applications on Windows, Mac, Linux, iOS, Android, and quite a few more platforms you've never heard of. If you're familiar with Greasemonkey/Tampermonkey, you can think of this as being the equivalent tool for native applications. It's powerful stuff.

Using Frida, you can trace which functions are called inside an application, read or rewrite arbitrary data from an application's memory, or (in our case) directly change how any targeted parts of application code work.

To use it, you typically set up a device running a Frida server, to which you can send commands e.g. using Frida's CLI. You then send a command requesting that it attaches to a certain process and runs a script you've written within that process.

Frida scripts are written in JavaScript, using the Frida JS API, and look something like this:

// Log socket activity:
Process.getModuleByName({
    linux: 'libc.so',
    darwin: 'libSystem.B.dylib',
    windows: 'ws2_32.dll'
}[Process.platform])
.enumerateExports()
.filter(ex =>
    ex.type === 'function' &&
    ['connect', 'recv', 'send', 'read', 'write'].some(prefix =>
        ex.name.indexOf(prefix) === 0
    )
).forEach(ex => {
    Interceptor.attach(ex.address, {
        onEnter: function (args) {
            const fd = args[0].toInt32();
            const socktype = Socket.type(fd);

            if (socktype !== 'tcp' && socktype !== 'tcp6') return;

            const address = Socket.peerAddress(fd);
            if (!address) return;

            console.log(fd, ex.name, address.ip + ':' + address.port);
        }
    });
});

// Open an alert on iOS:
const UIAlertController = ObjC.classes.UIAlertController;
const UIAlertAction = ObjC.classes.UIAlertAction;
const UIApplication = ObjC.classes.UIApplication;
const handler = new ObjC.Block({
    retType: 'void',
    argTypes: ['object'],
    implementation: function () {}
});

ObjC.schedule(ObjC.mainQueue, function () {
    const alert = UIAlertController
        .alertControllerWithTitle_message_preferredStyle_(
            'Frida',
            'Hello from Frida',
            1
        );
    const defaultAction = UIAlertAction.actionWithTitle_style_handler_(
        'OK',
        0,
        handler
    );
    alert.addAction_(defaultAction);
    UIApplication
        .sharedApplication()
        .keyWindow()
        .rootViewController()
        .presentViewController_animated_completion_(alert, true, NULL);
})

// Disable certificate pinning in Java (just one case - there are many):
Java.perform(function () {
    const HttpsURLConnection = Java.use("javax.net.ssl.HttpsURLConnection");
    HttpsURLConnection
        .setDefaultHostnameVerifier
        .implementation = function (hostnameVerifier) {
            return; // Do nothing, i.e. don't change the hostname verifier
        };
});

(Examples from github.com/iddoeldor/frida-snippets and github.com/httptoolkit/frida-interception-and-unpinning/)

This is just a quick intro to the power of Frida. Take a look through the full docs or one of the many guides for more. Using this, we can make arbitrary changes to how a target application works. Neat! But why?

What's the goal?

The overall theme of this fund is "Trustworthiness and Data Sovereignty". HTTP Toolkit fits into this by providing tools to inspect & modify all the data that apps on your mobile device send & receive, so you can control that, and so you (and security & privacy researchers) can analyse and report on the data that your apps are sharing with the world.

Inspecting traffic like this is especially useful for researchers, helping investigations using HTTP Toolkit like the FT's exploration of how health info is shared with advertisers, ProPrivacy's investigation into user tracking by UK charities and Privacy International's analysis of how diet apps leak your data to quickly & easily see the traffic they're interested in. But it's also useful for:

Individual users, who want to see up close what an app they're using is sharing.
App developers, who want to analyse, debug or test their own app's traffic.
QA teams, testing that apps send the correct data, and verifying behaviour when injecting custom responses or simulating errors.
Security testers, who want to look for vulnerabilities or test how apps handle certain types of attacks or interception.
Reverse engineers, trying to understand how APIs work and how apps use them.

Using HTTP Toolkit or similar tools today, this is all already possible for some cases (demo), but increasing OS restrictions and use of certificate pinning & similar creates challenges that often require specialist skills and fiddly manual reverse engineering work, making all this inaccessible to many users. This project is going to fix that.

In HTTP Toolkit itself, the end UX I'm aiming for is:

Click a button in HTTP Toolkit, and a list of potential target devices pops up.
Click a device to see the target apps available on that device.
Click an app, that app is automatically individually intercepted, with certificates automatically unpinned, and all the traffic appears in HTTP Toolkit moments later.

This is a best-case fully automated setup. There's quite a bit of exploration to do to confirm which devices will be able to support this perfectly, but at the very least that appears to be possible for rooted Android devices, most Android emulators, jailbroken iOS devices, iOS simulators, iOS apps with debugging enabled, or any other apps with Frida gadget manually injected.

Other cases (like jailed iOS devices) are possible too, but might require some manual (but easy) setup steps. Alongside this I'm going to build a walkthrough within HTTP Toolkit for that process, to make that as easy as possible.

En route this is going to involve creating quite a few standalone components, all of which will be open-source and usable completely independently of HTTP Toolkit, including:

Pure-JS Frida bindings, to connect to & control any Frida server from Node.js or a browser via WebSocket
A Frida script to reconfigure proxy & CA configuration in a target iOS app
A Frida script to reconfigure proxy & CA configuration in a target Android app
A Frida script to disable certificate pinning in a target iOS app
~~A Frida script to disable certificate pinning in a target Android app~~ (I've already built this: https://github.com/httptoolkit/frida-interception-and-unpinning/)

Everything in this project will be fully open-source (this is a requirement of the funding, but HTTP Toolkit is 100% open-source anyway) and so in addition to those general-use components, the details of the integration into HTTP Toolkit will be available for any other tools or services interested in further exploring the same kinds of workflows too (and I've already been talking to one of the other funded projects about exactly this).

A side note on abusive use/surveillance concerns

3rd parties intercepting & inspecting network traffic is a reasonable thing to be cautious about, especially when talking about development that's funded by government organizations like this. In case it's not clear: this is not a project that helps enable any of those kinds of surveillance scenarios.

To intercept any traffic, or to use Frida in any capacity whatsoever, you have to have full administrative control of the device, or the ability to directly modify the application before it's installed. In mass surveillance scenarios neither is available, and in any situation where somebody malicious has that control, that means they can already do whatever they like anyway (if they wanted to read your secret WhatsApp messages, they can already extract your keys directly from the process, or install a fake modified version of the WhatsApp app, and it's game over).

Additionally, intercepting traffic with HTTP Toolkit itself generally requires being on the same network as a computer that's actively running HTTP Toolkit at all times, or preferably having the device directly connected to the computer via USB. These aren't tools that could be used to spy on people at a distance, and nobody is going to use this to steal your emails.

The tools provided here are really only useful to intercept your own device, so you can inspect & understand what the apps that you're using yourself are actually sending.

How is this going to happen?

Development for this is going to happen over the next 12 months.

The first step (starting now!) is a lot of research investigating the potential & limits of the tools involved, to confirm precisely what's going to be possible in what scenarios and build initial prototypes of the core interactions.

That should sharpen up the outline of the workflows for different use cases & device configurations. Once that's in place, the core development work of the project is:

Building the pure-JS WebSocket bindings for Frida, implementing all of the Frida APIs necessary (as informed by the prototypes) and publishing that as a standalone JS library.
Creating & publishing a set of iOS hook scripts for Frida:
1. A script capable of modifying the HTTP proxy used in an application
2. A script capable of modifying the trusted CAs used in an application
3. A script capable of disabling certificate pinning within an application
Creating & publishing a similar set of Android hook scripts for Frida:
1. A script capable of modifying the HTTP proxy used in an application
2. A script capable of modifying the trusted CAs used in an application
3. (No certificate unpinning script, as this already exists)
Building logic into HTTP Toolkit to automatically setup Frida, where possible, on target devices.
Integrating the JS WebSocket bindings into HTTP Toolkit, to allow connection to and control of Frida instances.
Building the UI to expose this, and trigger device setup, Frida connection, and injection of the appropriate scripts.
Detailed documentation on how to use this (in a variety of common configurations) and how it all works internally.

All of this is kicking off soon, so watch this space!

If you'd like more updates on the progress of the project, you can join the HTTP Toolkit announcements mailing list here or subscribe to the blog in general in the form below.

If you can't wait, and you want to try out the existing mobile interception functionality right now, take a look at HTTP Toolkit's Android features and download it now to test that out for yourself.

Debugging WebRTC, IPFS & Ethereum with HTTP Toolkit

Fri, 28 Oct 2022 13:30:00 GMT

HTTP is important on the web, but as other alternative protocols grow popular in networked applications, it's often important to be able to capture, debug and mock those too.

I've been working on expanding HTTP Toolkit's support for this over the past year (as one part of a project funded by EU Horizon's Next Generation Internet initiative), to extend HTTP Toolkit to cover three additional rising protocols that are often used alongside simple HTTP in decentralized web applications: WebRTC, IPFS & Ethereum.

This is now live! If you're using HTTP Toolkit to intercept browsers, and a web application connects to another peer over WebRTC, interacts with the Ethereum blockchain, or pulls content from the IPFS network, then you'll now see this inline amongst your collected HTTP traffic, and you can create rules to rewrite these requests, define mock responses, or inject errors.

Let's talk about why this matters, what you can do with this, and how it actually works internally:

Why?

HTTP remains the key protocol on the web, by a wide margin, and that's not going to change any time soon.

That said, there are an increasing number of new protocols being explored both alongside and on top of HTTP, covering alternate use cases and supporting different communication models, and as software evolves this will only increase. HTTP Toolkit needs to be able to support these, to ensure that you can easily understand and test all communications from the increasing number of multi-protocol applications.

There's a lot of examples of HTTP-adjacent protocols where this applies:

WebSockets, which grew to cover use cases needing persistent connections that weren't well supported by the request/response model, which use HTTP just for initial connection setup, and are now widely supported and used in HTTP-based apps.
WebTransport, a new QUIC-based connection protocol that's aiming to supplant WebSockets with a modern approach that adds a swathe of other benefits on top.
GraphQL, a general-purpose HTTP-based protocol for APIs that avoids REST patterns and typical HTTP semantics, to instead support a flexible querying language within the request body directly.
gRPC, an RPC protocol built on top of HTTP/2 and protobuf to support high-performance bi-directional streaming RPC APIs on the web and elsewhere.
MQTT, a pubsub protocol, often used for pubsub API within backend infrastructure, for with IoT devices, and accessible on the web too over WebSockets.
WebRTC, a protocol designed particularly for video/audio streaming but also supporting arbitrary data, allowing general-purpose peer-to-peer data transfer on the web.
IPFS RPC, a protocol for reading from & publishing to the content-addressed IPFS network.
Ethereum RPC, a protocol for querying & submitting transactions to the Ethereum blockchain (and many other API-compatible blockchains).

Although some of these are based on HTTP, and as such the basic data is visible in HTTP Toolkit, advanced support is still important - you might be able to see raw gRPC or Ethereum requests, but you can't read them without manually decoding the unintelligible raw protobuf/ABI-encoded data within.

These last three in that list are what we're focusing on here today, as they define a clear set of protocols that are essential to understand, debug & test interactions within the new wave of decentralized web applications. There's more to come though: full WebSocket support has also been added to HTTP Toolkit alongside these changes already, WebTransport is definitely planned eventually (once Node.js gains HTTP/3 support), expanding support for GraphQL & gRPC is on the roadmap in the short term, MQTT is quite possible long-term too, and any & all other related popular protocols are welcome.

Clearly HTTP is the core focus, but any other network protocol you might realistically use in the same codebase alongside HTTP needs to be intercepted too, or you can't understand & test what your app is doing on the network (this is a long road, and we'll never be able to support everything, but this is the general direction).

What can you do with this?

Using HTTP Toolkit, with these changes you can now:

Intercept all network interactions over these three protocols, to capture the data send & received by decentralized web apps.
Inspect these interactions, to easily view both the raw data and understand the parsed meaning of each interaction.
Define rules to modify these interactions, matching interactions you're interested in and changing how they behave at the network level - allowing you to accurately test hard-to-trigger cases like timeouts & connection resets, inject error responses, or mock data & peer behaviour.

That's all available within the tool - in addition, outside HTTP Toolkit, you can also use the internals directly to build automation that does the same, e.g. for automated testing. Take a look at the previous blog posts to see how to do this with WebRTC and Ethereum & IPFS.

Let's test that out in practice:

Debugging WebRTC with HTTP Toolkit

First, set up HTTP Toolkit (take a look at the Getting Started guide if you haven't already done this).

Then, launch any Chromium-based browser, like Chrome, Brave or Edge. For now, WebRTC interception is limited to Chromium (it's powered by a Chrome extension - more on that later) but this will be expanded to support Firefox and others in future.

In your new intercepted browser, open webrtc.github.io/samples/src/content/datachannel/messaging/. This is the offical WebRTC messaging demo, and allows you to manually connect a WebRTC connection and send messages within a single page for easy testing.

Click 'Connect' to create two WebRTC connections within the same page, connected together. Immediately, you'll see two connections and two data channels appear in HTTP Toolkit:

Click on either connection, and you can see the full details: the connection parameters, the offer & answer SDP, and each stream within the connection (in this case, a single data channel, but in complex applications there could be many data & media channels here).

Click the data channel in the list (or scroll to the end of the full connection details) and you'll see a channel with zero messages. Send a message in the local or remote sections of the web page, and you'll instantly see the raw message data reflected here exactly as it was delivered.

That's a simple example, but you can test more complex cases like snapdrop.net (open it in two pages to make a connection, and try sending a file to see the data) which provides peer-to-peer file sharing over WebRTC, and you'll see all the raw data that apps like this are sending peer-to-peer between pages:

We can also define rules to mock WebRTC behaviours, or even a whole chain of steps that will run when each connection is created.

Looking at the Snapdrop traffic, you can see that when a file is sent, the page first sends a header message, containing the file details as JSON, then a binary message containing the content, then a transfer-complete message, with progress messages along the way. We can easily mess with this ourselves by creating a simple echo rule on the Modify page, like so:

Create and save this rule, refresh the page in your browser, and try to connect to a peer and send a file again - you'll find it sent back to yourself automatically, with each of the echoed messages visible in HTTP Toolkit, and the remote peer receiving nothing at all.

Debugging IPFS with HTTP Toolkit

To test real IPFS traffic, you'll need an IPFS node running locally. If you have Node.js installed, you can do this by just running npx ipfs daemon. If not, the full instructions are here.

Once you have a node running, open HTTP Toolkit (check the Getting Started guide if you haven't before) and launch any browser.

From your intercepted browser, open your node's WebUI at 127.0.0.1:5002/webui (by default), and you'll immediately start seeing IPFS traffic mixed in with HTTP in HTTP Toolkit. To test this more actively, click 'Files' in the IPFS WebUI, enter QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco and click 'Browse'. You'll see the full IPFS interactions, as the web app queries the content type, lists the contents, and then queries the details for each file within:

If you select each of these requests, you'll see a new "IPFS RPC API" section at the top on the right, and there you can see the details of the operation and parameters used, with descriptions inline and links to the full documentation, along with the default values for any omitted default parameters. In addition, the raw HTTP that's sent is visible below, so you can debug traffic at either level.

As with WebRTC, we can also define rules to mock this traffic. Try clicking 'index.html' in the WebUI listing for example, and you'll see that it pulls a tiny snippet of HTML page from the IPFS network. We can mock this content, by setting a rule on the Modify page like:

Go back in the WebUI, click 'index.html' again, and you'll see it now reads your own custom content, instead of reading from the real IPFS network.

Debugging Ethereum with HTTP Toolkit

Last of all, we can inspect Ethereum traffic. First, open HTTP Toolkit if you haven't already (or see the Getting Started guide if you never have) and launch any browser.

From your intercepted browser, you can now go to app.ens.domains and you'll immediately start seeing Ethereum blockchain interactions inline between your HTTP requests:

As with IPFS, each of these requests includes a new section when selected, which appears as "Ethereum JSON-RPC API" at the top, and shows the full parsed interaction details, details of each parameter and links to the corresponding documentation pages.

Again we can define rules to mock this. In this case, we can have some more fun. Install either Brave browser (or some wallet extension) and you can intercept traffic from this and mock arbitrary wallet balances right from HTTP Toolkit, like so:

With that rule set, create a wallet in Brave (click the wallet icon in the top right), and watch the money roll in:

How does this work?

How does this work? Internally, this is built on top of the three standalone libraries also built as part of this project: MockRTC, MockIPFS & Mockthereum.

Mockthereum & MockIPFS work by defining their protocols in terms of HTTP, proving higher-level rules and parsing received traffic on top of Mockttp, the HTTP library that powers HTTP Toolkit.

MockRTC works differently. As part of this project, Mockttp has been refactored to support plugins for arbitrary alternative protocols, acting as a network traffic mocking platform that starts and stops sessions intercepting any set of protocols, of which HTTP is just one.

MockRTC then acts as a Mockttp plugin, which defines its own interception server (acting a headless WebRTC peer itself) and rules that can be used to define the behaviour for each connected peer. It also provides a set of JS hooks, which HTTP Toolkit automatically inserts into intercepted browsers using a web extension, which capture signalling data (connection configuration, essentially) and swap it out for MockRTC's connection parameters.

In practice, that means it redirects all WebRTC traffic from the intercepted web page via MockRTC's configuration behaviour, and then (optionally) proxies the traffic onwards to external connections that are made to remote peers, if there is one.

In all 3 cases, this means that the intercepted protocols now run through a proxy within HTTP Toolkit, which has full access to the raw data to parse & expose it in the UI, or to transform it or inject responses en route.

That's the high-level summary - if you'd like more details then check out the previous blog posts on WebRTC interception and IPFS & Ethereum interception.

Dive in

If you're excited by this, you can get started with HTTP Toolkit right now. All the features described here are now live, and they're all free and open-source (check out the source at github.com/httptoolkit if you're interested).

Of course, this is all still new and experimental! Feedback is very welcome, especially if you run into issues - feel free to open an issue, get in touch on Twitter or send me a message directly.

This‌ ‌project‌ ‌has‌ ‌received‌ ‌funding‌ ‌from‌ ‌the‌ ‌European‌ ‌Union’s‌ ‌Horizon‌ ‌2020‌‌ research‌ ‌and‌ ‌innovation‌ ‌programme‌ ‌within‌ ‌the‌ ‌framework‌ ‌of‌ ‌the‌ ‌NGI-POINTER‌‌ Project‌ ‌funded‌ ‌under‌ ‌grant‌ ‌agreement‌ ‌No‌ 871528.

Testing libraries for the Decentralized Web

Mon, 24 Oct 2022 17:00:00 GMT

The world of decentralized web applications is an exciting place that has exploded in recent years, with technologies such as IPFS and Ethereum opening up possibilities for a peer-to-peer web - creating applications that live outside the traditional client/server model, where users to interact and control their own data directly.

At the same time, it's still immature, and for software developers it lacks a lot of the affordances & ecosystem of the traditional HTTP-based web app world. There's far fewer tools and libraries for developers working in this space.

I've been working on improving this over the last year (as one part of a project funded by EU Horizon's Next Generation Internet initiative), by building network interception libraries for both IPFS & Ethereum: MockIPFS & Mockthereum. These each act as both an immediately useful automated testing library, to support modern integration testing & CI workflows, and a base for building more general network proxy tools for web applications using either (or both) technologies.

If that sounds cool and you just want to jump straight in and try these for yourself, you can get started at github.com/httptoolkit/mockipfs/ and github.com/httptoolkit/mockthereum/.

On the other hand, if you want to hear what this can do in practice, and learn a little about how it works under the hood, read on:

A new way to build web apps

Decentralized web apps often use a mix of many different technologies, at various layers of the stack, such as:

IPFS - for decentralized static content hosting & data storage
Ethereum - for decentralized consistent global state, computation on that state, and financial transactions
Filecoin/Storj - for paid decentralized long-term content storage
WebRTC - for peer-to-peer raw data transfer, and video/audio connections
Service workers - a JavaScript API allowing fully offline web apps
Handshake (HNS)/Ethereum Name System (ENS) - to map domain names to web applications
GunDB - a decentralized database for the web, with peer-to-peer syncing
HTTP - for interactions with the existing 'traditional' web, and for communication with nodes that allow access to many of these protocols.

By combining these technologies, it's possible to create a web application that's served from a distributed network, rather than a single server that can go offline or be blocked, and which stores data, communicates with others, and generally provides all the features you'd expect from a traditional SaaS webapp.

Right now, an example architecture for this looks something like:

Publish a JS-based single-page webapp to IPFS, using service workers to make it run entirely offline and locally
Use HNS/ENS to map a domain name to the published content hash
Allow users to communicate peer-to-peer via WebRTC, either sending messages directly or using GunDB over the top to sync a structured data store
Publish user's persistent content to IPFS (potentially encrypted) which they can either pin locally in their IPFS node, or pay to mirror via Filecoin/Storj
Modify global state or support paid transactions via Ethereum.

Given such a setup, a user with a compatible browser (Brave, by default, or Chrome/Firefox/etc with the IPFS companion & Metamask extensions installed) can load the web app, use it on their machine and send & receive data from others, all without a single central server involved, and with all data stored either locally, or on a service under their own control.

Even if the original publisher ceases to exist and all their infrastructure turns off, if well designed around this model, users will be able to keep using the app forever.

That's the theory at least. In practice, there's quite a few rough edges, so this is complicated and challenging, but it's an interesting space with many new technologies appearing and evolving constantly. Even today, the above list is very far from complete! Put together, these technologies hint at an interesting future of decentralized technologies on the web.

How HTTP connects to this is notable though. While each of these protocols is independent of HTTP, for browser connectivity in web apps many of them use HTTP as the last-mile transport. For IPFS, for example, you would typically run an IPFS node on your machine that communicates directly with the IPFS network, then configure your browser to use that node for all IPFS, and then all IPFS interactions would happen by making HTTP requests to the node from your web app. Similarly, for Ethereum, in the vast majority of cases Ethereum interactions on the web involve an HTTP request to a hosted Ethereum API (this isn't the same as a centralized service, since any working node will work equally well, but some hosted node must be used).

Enter MockIPFS & Mockthereum

If you build a web app like this, you'll quickly discover that testing it is a serious challenge. There's few tools or libraries available, so you're forced to either mock out the APIs, libraries or raw HTTP requests entirely manually (non-trivial and very hard to do accurately) or run a real IPFS/Ethereum node for testing (slow, heavy, limited, and with persistent state - useful, but not what you want for automated testing use cases).

MockIPFS & Mockthereum take a different approach: stateless and fully configurable mocking at the HTTP level, with a built-in interpretation and mocking for HTTP interaction protocols used between client libraries and hosted nodes.

This means you can:

Mock the results of most common interactions for both protocols in one line of code.
Directly monitor, log or assert on all Ethereum/IPFS interactions made between a client and the networks.
Simulate scenarios like connection issues and timeouts.
Create, reset & destroy mock nodes in milliseconds.
Run multiple fully isolated mock nodes at the same time on the same machine, with minimal overhead, to easily run tests in parallel.

Testing a dweb app using IPFS with MockIPFS

There's many ways a decentralized web app might want to interact with IPFS, but the most common is that you'll want to read some IPFS data from a CID, so let's use that as an example.

To do this on the web, you'd typically write code like:

import * as IPFS from "ipfs-http-client";
import itAll from 'it-all';
import {
    concat as uint8ArrayConcat,
    toString as uint8ToString
} from 'uint8arrays';

const IPFS_CONTENT_PATH = '/ipfs/Qme7ss3ARVgxv6rXqVPiikMJ8u2NLgmgszg13pYrDKEoiu';

async function runMyApp(ipfsNodeConfig) {
    const ipfsClient = IPFS.create(ipfsNodeConfig);

    // ...
    // Somewhere in your code, read some content from IPFS:
    const content = await itAll(ipfs.cat(IPFS_CONTENT_PATH));
    const contentText = uint8ToString(uint8ArrayConcat(content));
    // ...
}

runMyApp({ /* Your IPFS node config */ });

This uses ipfs-http-client, the widely used official library for using IPFS on the web, to make an HTTP request to a local IPFS node for an IPFS content id (Qme7ss3ARVgxv6rXqVPiikMJ8u2NLgmgszg13pYrDKEoiu, in this example).

Using MockIPFS to test this code, and mock out the result returned, looks something like this:

// Import MockIPFS and create a fake node:
import * as MockIPFS from 'mockipfs';
const mockNode = MockIPFS.getLocal();

describe("Your tests", () => {
    // Start & stop your mock node to reset state between tests
    beforeEach(() => mockNode.start());
    afterEach(() => mockNode.stop());

    it("can mock & query IPFS interactions", async () => {
        // Define a rule to mock out this content:
        const ipfsPath = "/ipfs/Qme7ss3ARVgxv6rXqVPiikMJ8u2NLgmgszg13pYrDKEoiu";
        const mockedContent = await mockNode.forCat(ipfsPath).thenReturn("Mock content");

        // Run the code that you want to test, configuring the app to use your mock node:
        await runMyApp(mockNode.ipfsOptions); // <-- IPFS cat() here will read 'Mock content'

        // Afterwards, assert that we saw the requests we expected:
        const catRequests = await mockNode.getQueriedContent();
        expect(catRequests).to.deep.equal([
            { path: ipfsPath }
        ]);
    });
});

In this case MockIPFS handles the request, parses the API call to match the specific CID used, and then returns the content correctly encoded and formatted just like a real IPFS node, fully integration testing the entire client-side code of your app, but with none of the overhead, complexity or unpredictability of a real IPFS node.

Mocking ipfs.cat like this is the simplest case, but MockIPFS can go much further:

Test content pinning & unpinning, e.g. throwing errors for invalid/duplicate pins, with calls like mockNode.forPinAdd(cid)....
Inject timeouts for IPNS queries, with mockNode.forNameResolve(name).thenTimeout().
Mock content publishing results, with mockNode.forAdd().thenAcceptPublishAs(hash).

To get started, take a look at the README for more details and the full API docs, or take a look through the test suite for a selection of complete working examples covering each of the main areas of the IPFS API.

Testing a dweb app using Ethereum with Mockthereum

When building a web app on Ethereum, one common interaction is to call a contract - i.e. to query data on the blockchain, without actually creating a transaction.

The code to do so, using the popular Ethereum web client Web3.js, might look like:

import Web3 from 'web3';

// Parameters for some real Web3 contract:
const CONTRACT_ADDRESS = "0x...";
const JSON_CONTRACT_ABI = { /* ... */ };

async function runMyApp(ethNodeAddress) {
    const web3 = new Web3(ethNodeAddress);

    // ...
    // Somewhere in your code, call a method on the Ethereum contract:
    const contract = new web3.eth.Contract(JSON_CONTRACT_ABI, CONTRACT_ADDRESS);
    const contractResult = await contract.methods.getText("test").call();
    // ...
}

runMyApp(/* Your Ethereum node API address */);

Much as with IPFS above, we can easily define a mock node which can intercept this request, returning whatever value or simulating whatever other behaviour you'd like:

// Import Mockthere and create a fake node:
import * as Mockthereum from 'mockthereum';
const mockNode = Mockthereum.getLocal();

describe("Your tests", () => {
    // Start & stop your mock node to reset state between tests
    beforeEach(() => mockNode.start());
    afterEach(() => mockNode.stop());

    it("can mock & query Ethereum interactions", async () => {
        // Define a rule to mock out the specific contract method that's called:
        const mockedFunction = await mockNode.forCall(CONTRACT_ADDRESS) // Match any contract address
            // Optionally, match specific functions and parameters:
            .forFunction('function getText(string key) returns (string)')
            .withParams(["test"])
            // Mock contract results:
            .thenReturn('Mock result');

        // Run the code that you want to test, configuring the app to use your mock node:
        await runMyApp(mockNode.url); // <-- Contract call here will read 'Mock result'

        // Afterwards, assert that we saw the contrat calls we expected:
        const mockedCalls = await mockedFunction.getRequests();
        expect(mockedCalls.length).to.equal(1);

        expect(mockedCalls[0]).to.deep.include({
            // Examine full interaction data, included decoded parameters etc:
            to: CONTRACT_ADDRESS,
            params: ["test"]
        });
    });
});

To get started and see the many other Ethereum behaviours that can be mocked, take a look at the README, or take a look through the test suite for a selection of complete working examples covering a wide range of typical Ethereum interactions.

Beyond testing

In the quick examples above, we've seen simple demos of how MockIPFS & Mockthereum can handle specific common interactions, by configuring a client with the mock node's address instead of the real node, so that the mock node handle all traffic independently from the wider network.

When used like this, all unmatched requests will receive default responses, e.g. all IPFS add requests will appear to succeed (whilst not really publishing anything) and all Ethereum wallet balances will be zero.

Both libraries can go beyond this though. Each can be configured to forward unmatched requests elsewhere, so that some or all traffic is passed through the mock node to a real IPFS/Ethereum node. This makes it possible to log traffic for debugging, or to mock only a subset of interactions while all other requests behave as normal.

To configure this, pass an unmatchedRequests option to the getLocal call when creating the mock node, like so:

const ipfsMockNode = MockIPFS.getLocal({
  unmatchedRequests: { proxyTo: "http://localhost:5001" }
});
const ethMockNode = Mockthereum.getLocal({
    unmatchedRequests: { proxyTo: "http://localhost:30303" }
});

With this configuration, you can use these nodes as your normal node address in your browser (by configuring the address in IPFS companion/Metamask/etc) for advanced proxying use cases. By default they'll behave just like the real node they proxy to, but you can additionally add logging of received interactions, to monitor the client-side Ethereum/IPFS interactions as you browse the web, or you can mock out or even disable certain types of interactions by adding rules to match those requests.

Getting started for yourself

It's difficult to squeeze everything that's possible with these tools in here while keeping this article short! But if this has piqued your interest already, take a look at the libraries themselves on GitHub for in-depth getting started guides and explanations, along with detailed API documentation covering their full functionality: MockIPFS, Mockthereum.

Have questions, issues or suggestions? These tools are still in an early stage, and feedback is very welcome! Please file an issue on one of those repos, or get in touch directly on Twitter or by email.

How to intercept, observe & mock WebRTC traffic

Thu, 13 Oct 2022 10:00:00 GMT

WebRTC allows two users on the web to communicate directly, sending real-time streams of video, audio & data peer-to-peer, from within a browser environment.

It's exciting tech that's rapidly maturing, already forming the backbone of a huge range of video chat, screen sharing and live collaboration tools, but also as a key technology for decentralization of web apps - providing a P2P data transport layer used by everything from WebTorrent to IPFS to Yjs.

Unfortunately though, it doesn't have the tooling ecosystem that developers used to networking with HTTP often expect. There's few supporting tools or libraries, inspecting raw traffic is hard or impossible, and mocking WebRTC traffic for automated testing is even harder. Even built-in low-level browser tools like chrome://webrtc-internals don't allow seeing messages sent on WebRTC data channels.

It's hard to build modern secure web applications on top of protocols that you can't directly see or interact with.

This doesn't just affect developers: it also seriously impacts security & privacy researchers and reverse engineers, each trying to investigate the traffic sent & received by the apps we all use. If you want to know what data a webapp you use is sending over WebRTC, right now it's very hard to find out.

Intercepting WebRTC traffic to build these tools and libraries is difficult, because unlike protocols like HTTP that were designed to allow active proxying and user-configureable PKI (i.e. CA certificates) early on, WebRTC encrypts all traffic using peer-to-peer negotiated certificates for authentication without PKI, communicates in a wide variety of different negotiated ways at the network level to avoid NAT issues, and offers no convenient APIs to configure this for debugging. All of this provides some great features to the protocol as a user, but some serious challenges when building developer tools.

As it turns out though, despite this, there are just enough places where we can hook into that it is possible to do MitM interception on WebRTC traffic. When we control one peer, with just one line of code you can fully intercept the entire connection, allowing you to easily observe all raw unencrypted network traffic, and inject or transform any WebRTC data or media en route.

I've been working on this over the last year (as one part of a project funded by EU Horizon's Next Generation Internet initiative) and building a framework called MockRTC, which provides all the foundations needed to automatically intercept, inspect & mock WebRTC traffic for testing, network debugging & MitM proxying.

If that sounds cool and you just want to jump straight in and try it yourself, get started here: github.com/httptoolkit/mockrtc/.

On the other hand, if you want to understand the gritty details of how this works and learn what you can do with it in practice, read on:

WebRTC, under the hood

Let's step back slightly here before we dive in. WebRTC is:

A set of JavaScript APIs, available in web pages, that allows access to user media streams (microphones, webcams, screen sharing) and creation of peer-to-peer network connections.
A set of network protocols, combined to allow reliable connection negotiation, setup and data streaming directly between peers on the public internet.

The web API part is relatively straightforward: you can access user media on the web using navigator.mediaDevices.getUserMedia() (with some permission prompts) to get a media stream, and you can create P2P connections by creating an RTCPeerConnection and using the methods there to connect to a remote peer.

Once the connection is set up, it can include any number of named data channels, carrying arbitrary messages between peers, and bidirection or unidirectional media streams carrying video or audio. Internally each can be carried over different independent transports with different properties - sending video over fast-but-unreliable UDP while messages travel over an ordering-guaranteed TCP connection, for example - or bundled together into multiplexed streams.

Before your connection gets to this happy state though, some relatively simple calls to the WebRTC JS API actually trigger a quite complex negotiation process by the browser(s) involved to agree and create the required network connections. This is complicated, in large part because they're trying to reliably set up secure peer-to-peer connections, on a modern internet where a very large number of peers are behind firewalls and NAT devices (like your home router), which block most inbound connections.

We don't need to get into the full details of how this works at a low level, but there is a common structure to connection setup - managed by the browser by coordinated via the exposed APIs - that it's useful to understand before we dig in into intercepting this.

WebRTC connection setup

To do initial connection negotiation, WebRTC peers swap configurations in a format called SDP (Session description Protocol). This is a string, containing a series of lines like a=group:BUNDLE 0 1, which defines every connection parameter. That includes the available IP/port/protocol combinations to try connecting to, the video/audio codecs and extensions that will be used on each media stream, and the encryption keys and fingerprints used to authenticate & secure the connection.

There's a nice example & line-by-line breakdown of an SDP configuration here.

These configurations aren't swapped simultaneously: one peer sends an SDP 'offer', the other peer receives it, decides which proposed configuration parameters will work, and sends back a confirmatory SDP 'answer' with the agreed parameters, which the first peer receives & accepts to complete the connection.

The process of transferring the offer and answer here is called 'signalling', and is the one part of this process that isn't managed by the browser or fixed by the spec. Peers can share the connection configuration any way they like.

It's most common to share this via websocket connections through a central server both peers connect to, but you can also bootstrap a connection over a chat channel, or by manually by copying & pasting configuration between computers, or you could put it in a QR code on a series of postcards, tie it to a homing pigeon, you name it.

That bit is up to you, but all the other offer/answer generation and calculation work is generally handled for you by your browser automatically, such that the real code to swap an offer and answer looks something like:

// --- In one browser: ---
const firstPeer = new RTCPeerConnection();
// ...here, first configure the media/data channels you want on the connection, then...

const offer = firstPeer.createOffer();
await firstPeer.setLocalDescription(offer);

sendOfferToSecondPeer(offer); // <-- How you do this signalling is up to you

// --- In the other browser: ---
const offer = receiveOfferFromFirstBrowser();

const secondPeer = new RTCPeerConnection();
secondPeer.setRemoteDescription(offer);
const answer = secondPeer.createAnswer();
secondPeer.setLocalDescription(answer);

sendAnswerBackToFirstPeer(answer);

// --- Back in the first browser: ---
const answer = receiveAnswerFromSecondPeer();

await firstPeer.setRemoteDescription(answer);

// If all went well, both peers are now connected!

I am simplifying this a bit! Most notably there's an optional but widely used 'ICE candidate trickling' process that I'm omitting, where you immediately send your offer/answer without including full network address & port details, and then you 'trickle' over extra candidates later over your signalling channel as you discover them. But this basic offer/send/answer/receive flow is a good outline, and it's worth keeping this in mind as we continue.

Once this is done, both peers have agreed a set of connection details, media codecs & parameters and most relevantly encryption keys, and they can now talk happily over their peer-to-peer fully encrypted WebRTC connection, safe in the knowledge that nobody else can intercept or interfere with the contents of that.

Intercepting WebRTC

So, given all that, how can we intercept this, so once these connections are created we can see the data that's sent, and transform and inject our own traffic to mess with it?

Barring finding a major issue in a widely used security algorithm or cryptographic system, by design we're never going to be able to do this for an existing completed connection. That's good! Otherwise attackers would be able to do the same.

We can do this though if we can intercept the signalling traffic here. WebRTC connections are only as secure as their signalling channel, and so if we can easily provide our own configuration at the signalling stage, then we can change the connection certificates and network addresses agreed there at will.

By doing so, we can sit between two peers, tell both to connect to us instead of each other, and then we can forward traffic between the two any way we like.

Doing this in the general case is complicated - we'll get into those details in a second - but this is enough to let us start with the simplest use case: creating a mock peer to connect to directly in automated testing.

How to create a mock WebRTC peer in testing environments

In a web application test environment, you usually have direct control over signalling - you're often mocking setup processes & network traffic in other ways anyway - so in most setups it's easy to tweak your signalling setup to manually provide data for a remote peer who wants to connect. That means we can directly mock traffic to a single peer, which is often what you want to do for simple tests of WebRTC-based application.

To support this, MockRTC exposes APIs to define a mock WebRTC peer, preconfigure it with certain behaviours, and then get connection parameters you can directly drop into a signalling channel.

That looks something like this:

// --- In your test code:
const mockPeer = await mockRTC.buildPeer()
    .waitForNextMessage()
    .thenSend('Goodbye');
    // ^ This defines the rules of how the peer will behave

const { offer, setAnswer } = await mockPeer.createOffer();
sendOfferToChannel(offer); // <-- Send the offer on your app's normal signalling channel

// --- Do normal WebRTC setup in your application code:
const offer = receiveOfferFromChannel();

const realPeer = new RTCPeerConnection();
realPeer.setRemoteDescription(offer);
const answer = realPeer.createAnswer();
realPeer.setLocalDescription(answer);

sendAnswerToChannel(answer);

// --- Back in your test code:
const answer = receiveAnswerFromChannel();

setAnswer(answer);

// Your application being tested is now connected to the mock peer!

// As soon as the real peer sends any message on a WebRTC data channel, it will
// receive a 'Goodbye' message response, and the connection will close.

As seen here, MockRTC peers can be built with .buildPeer() followed by a series of step methods (to send messages, open channels, add delays, echo data and media and more). Once defined, you can call createOffer() or createAnswer() on them to get connection parameters that you can connect to with any real WebRTC peer.

Once you have a mock peer, it's possible to inspect every message it received with mockPeer.getAllMessages() and mockPeer.getMessagesOnChannel(channel), allowing you to verify the expected WebRTC traffic at the end of a test. It's also possible to actively monitor all WebRTC traffic, with events like mockRTC.on('data-channel-message-received', (event) => ...) or media track stats with mockRTC.on('media-track-stats', (event) => ...), to observe and log all traffic more generally across many connections.

If you want to try this out yourself, take a look at the getting started guide.

This is a neat demo, and useful for testing, but this isn't really interception, and no traffic is being proxied - it's just a convenient way to build test peers.

Fortunately we can extend this! Let's intercept real WebRTC traffic between two real unsuspecting peers.

How to intercept & debug WebRTC in a real browser environment

To intercept traffic between two peers, we need to go further, and inject our signalling configuration into both peers at once.

MockRTC can do this automatically, by using a set of JavaScript hooks in one peer's browser environment, which wrap the built-in WebRTC APIs to transparently redirect all traffic from both peers, making MockRTC a MitM WebRTC proxy. You can inject these JS hooks manually into your code (e.g. in a testing environment) or you can dynamically inject this into pages, e.g. via a web extension or similar (more on that later).

Using the provided hooks requires running one line of code in the target page of one peer:

// Redirect all traffic from one specific connection via a mock peer:
MockRTC.hookWebRTCConnection(realConnection, mockPeer); // Call before negotiation

// Or: redirect all WebRTC traffic in the entire page through a mock peer:
MockRTC.hookAllWebRTC(mockPeer); // Call before creating connections

After running that, MockRTC will automatically transform normal WebRTC connections from this:

Into this:

I.e. redirecting traffic so that the hooked WebRTC peer always connects to a mock peer, instead of its real target, and remote peers are instead redirected to connect to an external peer, whose traffic can be dynamically proxied into the mock peer.

Once we're in this configuration, MockRTC is receiving and proxying all traffic unencrypted internally, and so can observe and modify it at will.

The hook logic that makes this work, in the case where the hooked peer offers first, looks something like this:

Peer A creates an offer, by calling with connection.createOffer()
- MockRTC hooks this method. The hook creates a real offer for a real peer A connection (let's call this offer AO) but passes it to MockRTC instead, then just stores AO and doesn't return it.
- From AO, MockRTC returns to the hook an equivalent offer, but for the external mock peer (EO) which we return from this method as if it was the real offer for this connection.
Peer A sends the offer to Peer B
- It thinks this is an offer to connect to itself (AO) but it's actually an offer to connect to the external mock peer (EO).
Peer B receives an offer and creates an answer (BA)
- This all happens remotely, for real with no hooks involved, but using the EO offer from the external mock peer.
Peer A receives the BA answer and accepts it with connection.setRemoteDescription(answer)
- MockRTC hooks this method, and passes the answer back to the external mock peer, which completes the external connection (connecting BA with EO).
- The hook then sends the original real AO offer and BA answer to MockRTC to create an equivalent answer (MA)
- Once it receives it, it accepts that answer, completing the internal connection (connecting AO to MA)
The WebRTC connection is complete and usable for both real peers
- But we actually have two WebRTC connections, not one, and the MockRTC internally receives all traffic unencrypted, and can freely inspect it, proxy it, or transform it en route.

That's just one possible flow here, and so reality is a bit more complicated - if you're really interested, you can read the full hook definition here: https://github.com/httptoolkit/mockrtc/blob/main/src/webrtc-hooks.ts.

The end result though is that application code following the normal negotiation flow to connect to a real remote peer, send signalling over real channels to a real remote peer, ends up with a MitM'd connection through MockRTC that you can observe and control.

Once the connection is established, how exactly proxying works between the two MockRTC endpoints (the mock/external peers in the diagram above) is up to you, as it's defined by the peer's configured behaviour. For example:

const mockPeer = await mockRTC.buildPeer()
    .waitForNextMessage() // Wait for and drop the first datachannel message
    .send('Injected message') // Inject a message into the data channel
    .thenPassThrough(); // Then proxy everything else to the real peer

Traffic is only proxied between the mock & external peers at a passthrough step. That means that in this example, the hooked peer will receive nothing from the other real peer until they send a message (which won't be forwarded), then they'll receive 'Injected message', and then from there the connection will begin proxying as if it were a direct connection to the remote peer.

Just as when testing directly, we can also inspect all this traffic, so you can do totally transparent data channel inspection like so:

const mockPeer = await mockRTC.buildPeer()
    .thenPassThrough(); // Proxy everything transparently

// Log all data channel traffic in both directions:
MockRTC.on('data-channel-message-sent', ({ content }) => console.log('Sent:', content));
MockRTC.on('data-channel-message-received', ({ content }) => console.log('Received:', content));

// Or wait until later, then capture and log all messages at once:
mockPeer.getAllMessages().then((message) => {
    console.log(message);
});

Putting this into practice

Using the steps above, with the full documentation for MockRTC, you can very quickly put this into practice to automated testing: either passing MockRTC's details directly through your own signalling mechanism, or using the MockRTC hooks and making real connections that are intercepted automatically.

For working testing examples, take a look at MockRTC's own integration tests, which create real WebRTC connections and then intercept & observe them in all sorts of fun ways: https://github.com/httptoolkit/mockrtc/tree/main/test/integration.

For testing use cases, this is very powerful, and immediately makes it possible to reach into the bowels of WebRTC and start observing and poking things. MockRTC is still very new, so the ways to observe and poke things are still limited, but all raw data is directly available internally as it's proxied, so observing and modifying anything is possible in theory, and suggestions or pull requests for new capabilities are very welcome! Just open your issues/PRs in the MockRTC repo.

For more general use cases, like debugging all traffic from your browser live or building WebRTC proxy automation, you'll need to configure a mock peer and enable the hooks from inside the WebRTC client (your browser). MockRTC gives you all the tools to do so, but for now this does require some manual work (although there's another blog post coming very soon about HTTP Toolkit support to do this for you automatically…).

To get you started for now though, there's an example webextension project that you can clone, build & run to immediately test this out here: github.com/httptoolkit/mockrtc-extension-example/. This is a barebones example that doesn't do much by itself, but will let you directly observe all data channel traffic from all tabs within the extension, and provides a base to add your own arbitrary logic, and monitor or rewrite WebRTC traffic in any way you like.

This is just the start, and there's more coming here soon! In the meantime, if you run into trouble or have questions, do please open an issue or send me a message directly, and watch this space for more posts on deeper WebRTC integration in HTTP Toolkit itself…

Android Chrome 99 expands Certificate Transparency, breaking all MitM dev tools

Wed, 11 May 2022 16:00:00 GMT

Certificate transparency is superb improvement to HTTPS certificate security on the web that's great for users and businesses, but on Android it creates a huge problem for the many developer tools like HTTP Toolkit which install trusted system certificates into Android to intercept & debug app traffic.

This doesn't appear in the main announcements anywhere, but buried deep in the enterprise release notes for Chrome v99 there's a small note that says:

Certificate transparency is already enforced on desktop platforms, and for some Android users. Chrome 99 expands certificate transparency to all Android Chrome users.

And with that small note, Chrome on Android become uninterceptable for all HTTP Toolkit users using rooted devices, and anybody else who actively installs and trusts their own system CA certificates.

If you're running into an ERR_CERTIFICATE_TRANSPARENCY_REQUIRED error in Chrome while trying to debug your HTTPS traffic with some MitM debugging proxy, then this is affecting you too.

Let's talk about how certificate transparency works, why this breaks, and how you can work around it to keep debugging HTTPS from Chrome on your Android device regardless.

Certificate Transparency (CT)

HTTPS certificates are issued and signed by Certificate Authorities (CAs) who are trusted by your browser & OS.

That's great when it works, but sometimes it doesn't. CAs can make mistakes when issuing certificates, when verifying a client's identity beforehand, or through malice somewhere, and issue fraudulent certificates to people who shouldn't have them.

For example, let's say a trusted CA issues a certificate for google.com to the wrong person (this actually happened, repeatedly). That issued certificate is incredibly powerful - whoever has it can freely intercept all traffic sent by anybody to Google.com and both see & modify that traffic, whilst browsers will show all users a padlock and tell them everything is totally fine & secure.

Even worse though: attacks like this were invisible. In the past, if you managed to get such a certificate and started doing targeted interception of Google.com somewhere, you could keep doing so almost indefinitely. Unless you really slip up or you intercept somebody who's checking for this very very closely, nobody will ever know.

Certificate Transparency aims to solve this, by allowing domain owners to be aware of all valid certificates ever issued for their domain, and thereby ensuring any misissued certificates can be spotted and revoked immediately. It works (very roughly) like this:

Some independent organizations host certificate logs (CT logs), which immutably (via merkle trees) record the details of issued TLS certificates, and let people query these logs.
When a CA issues a new certificate, it submits the details to one or more of these log providers, receiving an signed certificate timestamp (SCT) signed by the provider.
The CA sends the certificate with the SCT embedded back to their client (e.g. a person hosting a web service who wants a certificate for HTTPS).
The CA's client serves all their HTTPS traffic using this certificate.
Browsers receiving this traffic enforce that all certificates they receive come with a matching SCT, signed by a log provider they trust.

SCTs effectively act as a receipt, proving that you have submitted your certificate to a trusted log provider and so that its existence is public knowledge. By requiring an SCT at the browser level, you're guaranteeing that certificates must be publicly recorded to be usable. That means anybody can query the public logs to see the full list of valid certificates that have been issued for their domain, and can set up warning system to immediately know if any certificates are ever issued unexpectedly.

This makes it very very difficult to issue fraudulent certificates for a domain without publicly announcing it in a CT log.

Certificate Transparency meets MitM debugging proxies

MitM proxies like HTTP Toolkit sit between an HTTP or HTTPS client and the server they want to talk to, and proxy the traffic, so that it can be inspected and modified. These are popular tools for developers, testers, security/privacy researchers & reverse engineers. For most software, a lot of important interactions involve HTTP(S) and being able to see and poke that traffic directly is very powerful.

To do this for HTTPS, these proxies basically issue fraudulent certificates for the domains being intercepted, with the cooperation of the HTTPS client (by configuring the client or the OS itself to trust the proxy's certificate) so that they're accepted.

Although it sounds like CT should block this, it's not actually intended to - by design, it only applies to public root certificates, not to manually trusted self-signed certificates, or it would make all self-signed certificates impossible to use, breaking everything from enterprise intranet sites to local HTTPS development servers.

It's these self-signed certificates that MitM proxies use to intercept traffic, so this shouldn't be an issue, and on Chrome on desktop it's not. The MitM certificate is installed into the OS trust store manually, or trusted explicitly via Chrome flags, and the certificate transparency requirements aren't applied.

That is, not until Chrome 99 was released on Android.

Certificate Transparency in Chrome 99 on Android

On Android, there are a few different certificate stores (full details here). You can install your own CA certificates to intercept HTTPS traffic on Android, but intercepting any interesting traffic from an app that isn't explicitly opting-in to interception requires you to put your certificate into the 'system' store, not just the normal 'user' store.

Almost all apps will trust the CAs from this system store to issue certificates for HTTPS. That means if you can put your MitM proxy of choice's CA certificate there, then your proxy can intercept all those app's HTTPS, and you can immediately see what traffic any app is sending, and test out what it does with alternate responses. This is great! Writing to the system store is only possible on rooted devices and emulators, but that's fairly standard for reverse engineering work, and many developers & testers have rooted devices too so they can do the same with production builds, or to modify other system settings.

The problem though is that Chrome's condition for what it considers to be a public root certificate (and so subject to certificate transparency requirements) is just whether it's in the system store or the user store. The system store is widely used for self-signed certificates like this, but as soon as you put your certificate in there, suddenly everything issued by your little testing CA must be formally submitted to a CT log and served up with an SCT proving it's been published publicly.

That means that since Chrome 99 was released until I shipped a fix (just now), when you used HTTP Toolkit on a rooted device Chrome threw loud NET::ERR_CERTIFICATE_TRANSPARENCY_REQUIRED errors on every single HTTPS page, like so:

This applies to all MitM web debugging proxies. If you put your certificate in the system store so you can intercept app traffic, then you can't intercept Chrome v99+. If you put your certificate in the user store to intercept Chrome, then you can't intercept anything except Chrome. (And no, you can't do both: if the certificate's in the system store at all, then certificate transparency is mandatory and it's game over). Very inconvenient!

How to Fix It

First up, it'd be great if Chrome fixed this themselves, by treating user-installed system certificates differently to the built-in root certificates. I've filed a Chromium bug suggesting that, but I'm not holding my breath - despite the popular use case, I suspect they'll likely consider adding your own system certificates as officially unsupported, and an acceptable casualty for the benefits of certificate transparency. We'll see though (feel free to star that bug if this is interesting to you too).

In the meantime, there is a way to work around this: you can manually modify the flags used by Chrome, to explicitly trust your specific CA certificate, in addition to installing it the system store, thereby disabling certificate transparency checks.

You can do this using the --ignore-certificate-errors-spki-list= option. This is available on all platforms, but it's a bit tricky to set on Android, since you don't directly control how Chrome starts up. To enable this, you need to:

Get the SPKI fingerprint of your certificate. You can do so using this OpenSSL magic incantation:

  openssl x509 -in $YOUR_CA_CERTIFICATE -pubkey -noout | openssl pkey -pubin -outform der | openssl dgst -sha256 -binary | openssl enc -base64

Create a file containing one line:

chrome --ignore-certificate-errors-spki-list=$YOUR_SPKI_FINGERPRINT
Use adb push to store this on your Android device at:
- /data/local/chrome-command-line
- /data/local/android-webview-command-line
- /data/local/webview-command-line
- /data/local/content-shell-command-line
- /data/local/tmp/chrome-command-line
- /data/local/tmp/android-webview-command-line
- /data/local/tmp/webview-command-line
- /data/local/tmp/content-shell-command-line
This ensures it applies for all varieties of Chromium, in both normal & debug environments. You'll need root access to set the non-tmp files, which is what's used on production devices (while the tmp files are used by userdebug builds).
Set the permissions of each the above with chmod 555 to ensure that it's readable by Chromium when it starts.
Force stop Chrome (am force-stop com.android.chrome), and then open it again.
Check the command line flags shown on chrome://version to ensure this command line option is included there.

Not convenient at all, but not the worst workaround in the world, and although it requires root in many cases, this only affects rooted devices & emulators so that's no big deal.

This is now integrated into HTTP Toolkit as part of the automated ADB setup in the latest release, so if you're using a rooted device with HTTP Toolkit this will work for you automatically in future! Of course, if you do still have problems do please file an issue.

Want to take your Android reverse engineering to the next level? HTTP Toolkit gives you one-click HTTP(S) interception, inspection & mocking for any Android app.

Fighting TLS fingerprinting with Node.js

Tue, 07 Dec 2021 13:40:00 GMT

The modern internet is full of services that want to know who you are. Fingerprinting is the latest way to do this: capturing many small details about your client, and using it to create an id that's sufficiently unique to recognize you and infer details about your network client and device.

This is a privacy problem, which I'm not going to focus on here, but collecting and analysing interaction metadata is also a powerful tool to recognize certain types of clients, even when they attempt to disguise themselves.

TLS provides a particularly good surface for this kind of fingerprinting, which allows a server or proxy to recognize the kind of software (a specific browser version, Python, Ruby, Node.js, etc) that's opening any TLS connection, before the client has even sent any data (such as an HTTP request) within the connection, and purely using unencrypted public data from the connection.

In many cases, this is a problem. HTTP Toolkit acts as a MitM proxy for HTTP(S) traffic inspection & mocking, and this potentially allows servers to recognize and block it, along with any other similar debugging proxies. Many other automated scripts and tools can also be recognized, blocking web scraping and other requests from anything but a real browser.

Until recently, I thought this was fairly theoretical, but last week an HTTP Toolkit user showed me a real-world example, where non-browser traffic is blocked completely, based just on its TLS fingerprint, causing big problems for HTTP Toolkit usage.

Fortunately, we can work around this. In this article, I want to explain how TLS fingerprinting works, look at a real-world example, and then see exactly how you can defeat this blocking using Node.js (with techniques that you can easily apply elsewhere too).

How does TLS fingerprinting work?

TLS provides a huge amount of data for fingerprinting. Every connection secured by TLS (for example, all HTTPS requests) starts with a 'client hello' message from the client, sent unencrypted, with the essential details that the client proposes for the connection. This client hello includes many parameters that can be specified in multiple equivalent ways, and which are independent of the specific server you're talking to.

If you want to see this data up close, this byte-by-byte walkthrough of a TLS connection is a great demo: tls13.xargs.org

Some of the specific fingerprintable parameters in a client hello include:

A list of ciphers, in the client's order of preference (ciphersuite.info lists 60 options that are currently considered 'secure', a typical client might send 20 options)
A list of TLS extensions, in an arbitrary order (a typical client hello includes about 15)
A list of elliptic curves ordered by client preference (typically 4 options)
A list of elliptic curve formats ordered by preference (with 3 possible options defined)

(As I understand it - I am not a cryptographer - elliptic curves are a way to use keys backed by more difficult maths problems than RSA, thereby increasing security without increasing the size of the encryption keys, which are already pretty big. The details don't matter here anyway though; it's just a parameter you need for modern TLS setup)

If you take the ids of each of these parameters in order, and hash the resulting string, then you get a simple but unique id for a client hello, which is likely to be the same for every server that that client connects to. The specific algorithm generally used is called JA3, defined more precisely here.

How unique is that id? With a quick bit of maths we can check the combinations of these. Assuming a typical 20 ciphers, 15 extensions, 4 curves and 3 curve formats, we get:

CipherPermutations = 20!
ExtensionPermutations = 15!
CurvePermutations = 4!
CurveFormatPermutations = 3!
TotalPermutations = 20! + 15! + 4! + 3! = 2432903315851008030

That's 2 quintillion ways of sending a typical client hello.

The orders of the ciphers and curve parameters do have an effect, so all those hellos are not all completely equivalent, and so this is an upper bound on the possible hellos you could ever see. In reality, it's unlikely that you'll see many hellos that prefer slow & insecure ciphers over quick & more secure ones. That said, extension order is totally arbitrary. Using that ordering alone (15 factorial permutations) gives you 15 trillion different possibilities, and there is definitely some real variance in the order of the ciphers & curve parameters in addition to that.

So yes, this is pretty unique.

Taking some quick examples, it's possible to recognize & differentiate TLS client hellos from:

Firefox 94: 2312b27cb2c9ea5cabb52845cafff423
Firefox 87: bc6c386f480ee97b9d9e52d472b772d8
Chrome 97: b32309a26951912be7dba376398abc3b
Chrome 70: 5353c0796e25725adfdb93f35f5a18f7
Tor: e7d705a3286e19ea42f587b344ee6865
Trickbot C2 malware: 8916410db85077a5460817142dcbc8de
Dridex malware: 51c64c77e60f3980eea90869b68c58a8
cURL 7.68: eaa1a9e1db47ffcca16305566a6efba4
Python urllib 3.8: 443dc2089573571e9e8a30d49e52572a
Node.js v10: 5d1b45c217fe17488ef0a688cf2cc497
Node.js v12/14/16: c4aac137ff0b0ac82f3c138cf174b427
Node.js v17: 4c319ebb1fb1ef7937f04ac445bbdf86

So it's not perfect, but we can differentiate a lot interesting clients here. You can check your own JA3 TLS fingerprint via ja3.zone, and you can search previously seen fingerprints to see the associated user agents there too.

Again, this fingerprinting is based on un-encrypted content, which is sent the initial client hello. This fingerprint is visible to everybody on the connection path, not just the target server (so everybody on your local wifi, your ISP, your college, your office IT team, you name it) and it's available before your client has completed TLS setup and sent any application data (e.g. any HTTP requests).

What about GREASE?

GREASE (Generate Random Extensions And Sustain Extensibility) is a neat new technique designed to stop protocol ossification (more poetically, GREASE will stop TLS from 'rusting shut'), which is also relevant here.

In the past, TLS has had problems evolving, because many middleboxes (proxies and other network infrastructure) which passed through TLS traffic examined it in some ways, using hardcoded values, and/or without any planning to handle possible future extensions of TLS at all. Because of this, some changes in the TLS protocol like using a new version number or adding new extension types became completely impossible - if you try to send a TLS request with version 1.3 in the client hello, many networks would reject it outright, and because of this TLS 1.3 still continues to use TLS 1.0's version number, and makes only backward-compatible additions.

This is clearly bad, and GREASE is designed to stop it getting worse in future. The idea is to preemptively generate and include invalid random values in client hellos, such as non-existent ciphers and extensions, which don't have any effect, but ensure that all new networking code has to be able to handle unexpected values. This is in active use in Chrome for TLS and others, and the same technique is being used in new protocols, for example Cloudflare uses it for all HTTP/3 connections

In theory, although this isn't the main intention, this would break TLS fingerprinting too - since clients include random values, you can't get a reliable hash. In practice though, it doesn't help, as you can still generate consistent TLS fingerprints by simply excluding any unrecognized values before computing the hash.

You could extend GREASE to randomize the order of extensions, which would help a lot, but this isn't the main goal of GREASE, and it seems unlikely that popular clients like Chrome are going to start doing that any time soon.

TLS fingerprinting in the wild

While I've been aware and interested in this for a while, I didn't think it was a major concern for HTTP Toolkit.

It is practical for real-world use, but the original research into this in 2015 was focused on malware detection in the presence of encrypted malware traffic. It's also used by Cloudflare for TLS interception research and metrics, and by Salesforce and others (unclear how, but Tor & malware detection is discussed). None of that should matter for intercepting your own HTTPS on your own network.

In reality though, it turns out it is used more widely: Akamai use TLS fingerprinting to block bot traffic at the CDN level for some of their customers. I don't know for sure, but I suspect this is part of their optional bot management features.

An easy example is available on the Zalando website (a major online retailer based in Germany).

This page makes various requests to the Zalando API, to endpoints like https://en.zalando.de/api/navigation, which you can see from your browser console. These work happily in the browser, returning a 200 response with some JSON data.

Unfortunately, if you make the same request from Node.js (in Node v12 - v16), it fails:

request = require('https').get('https://en.zalando.de/api/navigation');
request.on('response', (res) => {
    console.log(res.statusCode); // Prints 403

    res.pipe(process.stdout);
    /*
    Prints:
    {
        "edge_error": "halt",
        "ref_id": "15.72843445.1638543455.958cf4d0",
        "wait": 60,
        "feedback": {
            "email": true,
            "url": "",
            "recaptcha": {
                "enabled": false,
                "type": 0,
                "sitekey": ""
            }
        }
    }

    */
})

We get an immediate 403 forbidden response, and a edge_error: halt message from the CDN.

(I've omitted copying the full headers from the browser here for brevity, but you can trust me that the same thing happens even when all the exact same browser headers are included too)

This applies in all currently supported LTS versions (Node.js v12 to v16) but not in Node.js v10 or v17 (which have different fingerprints) even though they're all sending the exact same content.

Zalando's API, backed by akamaiedge.net, is actively detecting & rejecting requests from Node.js clients, regardless of the content, based on the TLS fingerprint.

What can we do about this?

Defeating TLS fingerprinting

In a perfect world, you'd beat this by exactly matching somebody else's TLS fingerprint. If you make connections with the exact same TLS fingerprint as the latest version of Chrome, nobody can tell the difference. In theory, you can match any client you like.

In practice, this is not always possible. Many TLS implementations (including Node's) don't allow you to configure low-level details that aren't semantically meaningful, such as the order the TLS extensions are set in the client hello. This isn't unreasonable - outside of TLS fingerprinting there is never any reason to do so whatsoever - but it does create some limitations.

Fortunately, you still have options. There's one major advantage to clients when trying to defeat TLS fingerprint blocks: nobody can implement them with a fixed list of allowed fingerprints. They have to use a blocklist instead, because new fingerprints appear frequently. Fingerprints often change when new browser versions come out, with new versions of other client runtimes and TLS libraries, and simply because a client has set any individual TLS parameter for themselves.

That means you can quite reliably defeat TLS fingerprint blocking by simply coming up with a configuration that generates a new fingerprint the server doesn't yet block.

In Node.js, while we can't do this by changing the extension order, there are APIs available to set the specific cipher order, which is enough to change the fingerprint. You can do this using the ciphers option to any API that takes TLS options (including https.get() and similar). You do have to be careful though - if you change the ciphers incorrectly, you can enable outdated or broken ciphers and seriously weaken the security of your outgoing requests.

The default set of ciphers in Node.js 14 are:

> require('crypto').constants.defaultCoreCipherList.split(':')
[
  'TLS_AES_256_GCM_SHA384',
  'TLS_CHACHA20_POLY1305_SHA256',
  'TLS_AES_128_GCM_SHA256',
  'ECDHE-RSA-AES128-GCM-SHA256',
  'ECDHE-ECDSA-AES128-GCM-SHA256',
  'ECDHE-RSA-AES256-GCM-SHA384',
  'ECDHE-ECDSA-AES256-GCM-SHA384',
  'DHE-RSA-AES128-GCM-SHA256',
  'ECDHE-RSA-AES128-SHA256',
  'DHE-RSA-AES128-SHA256',
  'ECDHE-RSA-AES256-SHA384',
  'DHE-RSA-AES256-SHA384',
  'ECDHE-RSA-AES256-SHA256',
  'DHE-RSA-AES256-SHA256',
  'HIGH',
  '!aNULL',
  '!eNULL',
  '!EXPORT',
  '!DES',
  '!RC4',
  '!MD5',
  '!PSK',
  '!SRP',
  '!CAMELLIA'
]

(No idea what any of these mean? You can look them up easily at ciphersuite.info)

If you pass these (as a colon-separated string) as the cipher option on a request, changing the order of these in any way, then you'll get a new TLS fingerprint, and fingerprint blocks will happily let you through. But how should you change them?

Those first 3 TLS_ prefixed ciphers are all modern and strongly recommended TLS v1.3 ciphers with no known current issues, and all modern clients will include those 3 first as their first three options, in some order. While Node.js picks the order above, any order of those is a pretty safe & reasonable bet. In a quick test on my machine, it seems like:

cURL 7.68 uses the same order as Node.js
Firefox uses TLS_AES_128_GCM_SHA256 (#3) then TLS_CHACHA20_POLY1305_SHA256 (#2) then TLS_AES_256_GCM_SHA384 (#1)
Chrome uses TLS_AES_128_GCM_SHA256 (#3) then TLS_AES_256_GCM_SHA384 (#1) then TLS_CHACHA20_POLY1305_SHA256 (#2)

The specific order has various performance & security trade-offs, and if you're writing extremely security or performance sensitive software you should absolutely investigate that in more detail, but in most software any ordering of these 3 is totally fine, secure, and performant. That means we can shuffle them up!

In practice, that means you can beat TLS fingerprinting in Node.js with an implementation like:

const tls = require('tls');
const https = require('https');

const defaultCiphers = tls.DEFAULT_CIPHERS.split(':');
const shuffledCiphers = [
    defaultCiphers[0],
    // Swap the 2nd & 3rd ciphers:
    defaultCiphers[2],
    defaultCiphers[1],
    ...defaultCiphers.slice(3)
].join(':');

request = require('https').get('https://en.zalando.de/api/navigation', {
    ciphers: shuffledCiphers
}).on('response', (res) => {
    console.log(res.statusCode); // Prints 200
});

Bingo.

You can of course go further: reordering the later ciphers (with research to ensure the resulting combination is secure for your use case).

You can even put any number of duplicate ciphers in the list, which will still affect the fingerprint. As far as I can tell from reading the TLS RFCs this is entirely valid! It's technically possible that you could find servers which reject this, but I think it's very unlikely, and by injecting randomly ordered duplicates you can generate an arbitrary number of entirely random fingerprints at will.

These techniques should be enough to get you past any TLS fingerprint block you run into, and I'm working on integrating them into the next HTTP Toolkit release as we speak.

It is unfortunate that perfectly matching a TLS fingerprint or generating a huge set of equivalent random values (by randomizing the extensions order) isn't possible in Node.js. This is possible in other low-level languages such as Go, where libraries like uTLS allow for direct manipulation of the client hello for these purposes, and if you're using other languages it's worth investigating that in more detail. In theory you could alternatively write a native extension for Node.js to do the same, but it's not a quick job, and I suspect the above will be enough to stay ahead of any TLS fingerprint blocks anyway for at least the foreseeable future.

Spend lots of time sending & debugging HTTPS? Try out HTTP Toolkit. One-click HTTP(S) interception, inspection & mocking for browsers, Android apps, Docker containers, Node.js/Python/Ruby/Java applications & more.

Reverse engineering & modifying Android apps with JADX & Frida

Mon, 22 Nov 2021 12:30:00 GMT

I get a lot of emails from users who want to know exactly what their favourite Android app is doing, and want to tweak and change how that works for themselves.

There are some great tools to do this, including JADX & Frida, but using these is complicated, and every reverse engineering problem has its own unique challenges & solutions. There's few good guides to getting started, and even fewer guides on the advanced tricks available.

In this article, I want to talk you through the core initial steps to look inside any Android app, give you the tools to find & understand the specific code that matters to you, and then show you how you can use that information to modify the app for yourself.

Let's set the scene first.

Context

I'm assuming here that somebody else has written an Android app that you're interested in. You want to know exactly how a specific bit of behaviour works, and you want to change what it's doing.

I'm going to focus on the classic HTTP Toolkit user example here of certificate pinning: where security-conscious apps that send HTTPS traffic go beyond the normal HTTPS validation requirements, and actively check that the HTTPS certificates used are from a small set of specific trusted certificates, not just the standard set trusted by all Android devices.

(I'm focusing on certificate pinning because it's a common use case and it's convenient, but the techniques here work for all other kinds of reverse engineering & patching too, don't worry!)

Certificate pinning is a problem for HTTP Toolkit users, who are trying to intercept HTTPS traffic to see what messages their Android apps are sending & receiving. It's not possible to intercept these app's traffic because they won't trust HTTP Toolkit's certificate, even after it's been injected into the device's system certificate store.

Using the tools we're going to talk about in a moment we can take an unknown 3rd party app, find the certificate pinning code within it, and disable that remotely while the app runs on our device. This makes it possible to intercept, inspect & mock all of its traffic in any way we like!

This isn't not easy, but it's usually not necessary. For starters, 99% of apps don't use certificate pinning beyond Android's standard restrictions, and for that case if you use HTTP Toolkit on a rooted device you're done in one click. For most apps that do explicitly pin their certificates, you can disable that using this general-purpose Frida script which already knows how to disable all the most popular cert pinning libraries available.

In some cases though apps implement their own custom certificate pinning logic, or do something else unusual, which means the general-purpose script can't recognize and disable the right APIs. In these kinds of cases, or if you're trying to modify any other kinds of app behaviour, you need to roll up your sleeves and get your hands dirty.

For this article, I've prepped an certificate pinning demo app:

Each button sends an HTTPS request, and validates the connection in a slightly different way. The 'unpinned' option does nothing, the next 4 use various standard pinning techniques, and the last button uses totally custom code to manually check the certificate.

If you use this with HTTP Toolkit normally, you can only intercept the first request. If you use the general-purpose Frida script, you can intercept the next 4 too, but not the last one.

In this article we're going to focus on that last button, reverse engineer this app to see how it works, and write a custom Frida script to disable the certificate checking functionality.

The Plan

To reverse engineer an app and hook some behaviour, there's a few core steps you need to work through:

Download a copy of the app on your computer
Extract the source code
Find the code we're interested in
Understand how that code works
Write a Frida hook to change how that code works

Download the app

Android apps are generally published to the Google Play store, but you can't easily download the app from there directly to mess around with on your computer.

Fortunately, many sites that mirror the Google Play store, and do provide direct downloads of almost all available apps. ApkMirror.com and ApkPure.com are two good examples.

In the general case, you should go to your favourite APK mirror site, and download the latest APK for the app you're interested in.

In this specific case, I wrote the app, so I've conveniently published it directly on GitHub. You can download its APK here.

Android app formats

What is this APK file? Let's start with some quick but necessary background on Android app formats. There's two distribution formats you'll run into: APKs (older) and XAPKs (newer, also known Android App Bundles).

In this example, the app is provided as a single APK, so that's easy enough, but many other apps you'll run into may be XAPKs, so it's worth understanding the difference.

APKs are fairly simple: they're a ZIP file with a bunch of metadata, all the application's assets & config files, and one or more binary .dex files, which contain the compiled application.

XAPKs are more complicated: they're a zip file that contains multiple APKs. In practice, they'll contain one large primary APK, with the main application code & resources, and then various small APKs which include the config or resources only relevant to certain types of devices. There might be separate config APKs for devices with larger screens, or different CPU architectures. For reverse engineering you usually just need the main APK, and you can ignore the rest.

Extract the code

Inside the APK, if you open it as a zip, you'll find a large classes.dex file (for multidex apps, there might even be a classes2.dex or more). These DEX files contain all the JVM classes of the application, in the compiled bytecode format used by Android's Runtime engine (ART, which replaced Dalvik a few years back).

These DEX files contain the compiled application, but do not contain all the original source. Many things, most notably including local variable names & comments, are lost when compiling an Android application, and it's always impossible to extract those from the app.

The external interfaces of each class are generally present here though (assuming that obfuscation wasn't used). That will usually be enough to find the method that you're interested in. Using those external interfaces you can usually then deduce what each line is trying to do, and progressively rename variables and add your own comments until you have some code that makes sense.

To start that process, we need to convert the DEX file into a format we can mess around with ourselves. The best tool to do this is JADX, which you can download from their GitHub release page (or there are many other similar tools too, such as Androguard).

Once JADX is installed, you run it like so:

jadx ./pinning-demo.apk

This will create a folder with the same name as the APK, containing 'resources' and 'sources' folders. The sources folder is what we're interested in: this is JADX's best guess at the Java source code that would've generated this DEX file. It's not perfect, but it should be pretty close.

If you use JADX on the latest pinning demo APK, you'll find a structure like this:

sources/
- android/ - the core Android classes
- androidx/ - Android Jetpack classes
- com/
  - android/volley/ - The Volley HTTP client
  - datatheorem/android/trustkit - One of the popular pinning libraries used
  - google/ - Firefox, GSON & various other Google packages
- kotlin/ - runtime components of Kotlin
- okhttp3/ - OkHttp3, a popular HTTP library
- […various other namespaces & packages]
- tech/httptoolkit/pinning_demo/ - the main application code

Once you've extracted the code from an app like this, you can explore it any way you like - using Android Studio, using any other text editor, or just grepping for interesting text, it's up to you. By default, I'd recommend using some editor that can highlight and do basic automated refactoring (variable renaming) on Java code, since that'll make the next steps much easier.

Find the code you care about

Which code you want to reverse engineer & hook depends on the problem you're trying to solve. In my case, the problem is that when I intercept the app's HTTP using HTTP Toolkit and press the "Manually pinned request" button, I get a "Certificate rejected" message in HTTP Toolkit, and I want to stop that happening.

That message typically means that the app is pinning a certificate - i.e. even though the HTTP Toolkit certificate is trusted on the device, the app is including its own custom checks, which are rejecting the HTTPS certificates and blocking HTTP Toolkit's automatic HTTP interception.

So, the goal here is to find out which bit of code is making the custom-checked HTTPS request behind that last button, find out where that checks the certificate, and then later disable that check.

Whatever code you want to change in your case, there are a lot of tricks available to help you hunt it down. Let's try out a few different approaches on this demo app.

Search for relevant strings

In my case, I know the failing request is going to sha512.badssl.com (a known-good HTTPS test site) so searching for that is a good start. That works, and gives me a few different places in the code that are sending requests, but there's options here for all the different possible pinning mechanisms, and related config files too. It's not immediately clear which code is relevant, so it'd be better to find something more precise.

Some other strings that might be interesting, for the certificate pinning case:

checkCert
validateCert
pinning
pinner
certificate
SSL
TLS

Here you're looking for anything might be included in the name of a class, field or method, or which might be included in strings (e.g. error messages), since all of that will be preserved and searchable in the decompiled code.

For example, if you're trying to understand where some HTTP API data comes from, you could try searching for the API endpoint path, or the name of query parameters. If you're looking for the implementation of a specific algorithm, it's worth searching for the common domain terms in that algorithm, or if you're trying to extract secrets or keys from the app then 'secret', 'key', and 'auth' are all worth investigating.

Search for usage of relevant Java APIs

Although local variable names aren't available, and in obfuscated apps even the class & package names may be obscured, the built-in JVM classes & package names are always available and unchanged.

That means they're a great way to find related functionality. If you know the code you're interested in is likely to be using a certain data type, calling a specific API, or throwing a certain type of exception, you can use that to immediately narrow down your search.

In this example, I think it's likely that all manual certificate checks are going to be using java.security.cert.X509Certificate, so I can search for usages of that type. This does give some good answers!

Unfortunately though the entire app is filled with lots of different ways to do certificate pinning, by design, so this still comes back with a long list of matches, and it's not easy to tell which is relevant immediately. In most other apps that won't be a problem (most apps implement certificate pinning just the once!) and we could trawl through the results, but for now it's better to test out some other options first.

Check for HTTP error reports

Many apps nowadays include automatic error reporting using tools like Sentry.

This is useful to app developers, but also to reverse engineers! Even when the app's own requests may use certificate pinning, requests sent by external libraries like these generally will not, so they're inspectable using HTTP Toolkit (or any other HTTP MitM proxy). That's useful because those requests themselves will usually include the stacktrace for any given error.

This provides an excellent way for finding the source of any errors that you want to work around:

Intercept traffic from your device using HTTP Toolkit or another proxy
Trigger the error
Look through the captured HTTP traffic for error reports
Find the stacktrace in the relevant error report
Follow the stacktrace into the codebase extracted earlier to immediately find the relevant code

Bingo! In this case though, we're out of luck, as it's a tiny demo app with no error reporting. More searching required.

Check ADB for errors

Very commonly, apps will log errors and extra info to the console for easy debugging. Android captures this output from all running JVM processes in a single output buffer, along with stack traces from all uncaught errors, and makes that accessible via ADB using the logcat command.

Outputting errors and debug info here is especially common in smaller apps which don't use an automated error reporting tool, so if you're looking to find & change some code that throws errors it's a great alternative to the previous approach. Even in non-error cases, the output here can provide excellent clues about application behaviour at the moments you're interested in.

To capture the logs from a device, run:

adb logcat -T1

This will stream the live logs from your device, without the history, until you stop it. It's often useful to pipe this to a file instead (i.e. ... > logs.txt) to save it for more detailed later analysis, since there can be a lot of noise here from other activity on the device.

While this command is running, if you reproduce your error, you'll frequently find useful error stacktraces or error messages, which can then guide you to the right place in the code.

For our demo app, this works great. By enabling logging when pressing the button, if you look carefully between the other noisy log output, we can now get the specific error message unique to that button:

> adb logcat -T1
--------- beginning of main
...
11-22 10:46:16.478 31963 31963 I Choreographer: Skipped 114 frames!  The application may be doing too much work on its main thread.
11-22 10:46:16.996  1785  1785 D BluetoothGatt: close()
11-22 10:46:16.997  1785  1785 D BluetoothGatt: unregisterApp() - mClientIf=5
11-22 10:46:17.000   791  1280 I bt_stack: [INFO:gatt_api.cc(1163)] GATT_CancelConnect: gatt_if:5, address: 00:00:00:00:00:00, direct:0
11-22 10:46:17.092   573   618 D LightsService: Excessive delay setting light
11-22 10:46:17.258   282   286 E TemperatureHumiditySensor: mCompEngine is NULL
11-22 10:46:18.773 26029 26129 I System.out: java.lang.Error: Unrecognized cert hash.
11-22 10:46:19.034 26029 26080 W Adreno-EGL: : EGL_BAD_ATTRIBUTE
...

We can search the codebase for this Unrecognized cert hash error message, and conveniently that message is shown in exactly one place. This error is appears deep inside invokeSuspend in MainActivity$sendManuallyCustomPinned$1.java:

throw new Error("Unrecognized cert hash.");

Explore the code in depth

Still stuck? At this point, your best bet is to try and explore the application more generally, or to explore around the best clues you've found so far.

To do so, you can use the manifest (in resources/AndroidManifest.xml) to find the entrypoints for every activity and background service registered in the application. Start with the services (i.e. background processes) or activities (i.e. a visible page of the UI) that sound most relevant to your situation, open up the corresponding source, and start digging.

This can be time consuming. Keep going! You don't need to dig into every detail, but walking through here can quickly give you an idea of the overall architecture of the app, and you can often use this to find the code that's relevant to you. It's well worth keeping notes & adding inline comments as you go to keep track of the process.

Understand the code

Hopefully by this point you've found the code that's relevant to you. In this demo app, that code decompiled by JADX looks like this:

public final Object invokeSuspend(Object obj) {
    IntrinsicsKt.getCOROUTINE_SUSPENDED();
    if (this.label == 0) {
        ResultKt.throwOnFailure(obj);
        this.this$0.onStart(R.id.manually_pinned);
        boolean z = true;
        try {
            TrustManager[] trustManagerArr = {new MainActivity$sendManuallyCustomPinned$1$trustManager$1()};
            SSLContext instance = SSLContext.getInstance("TLS");
            instance.init(null, trustManagerArr, null);
            Intrinsics.checkExpressionValueIsNotNull(instance, "context");
            Socket createSocket = instance.getSocketFactory().createSocket("untrusted-root.badssl.com", 443);
            if (createSocket != null) {
                SSLSocket sSLSocket = (SSLSocket) createSocket;
                SSLSession session = sSLSocket.getSession();
                Intrinsics.checkExpressionValueIsNotNull(session, "socket.session");
                Certificate[] peerCertificates = session.getPeerCertificates();
                Intrinsics.checkExpressionValueIsNotNull(peerCertificates, "certs");
                int length = peerCertificates.length;
                int i = 0;
                while (true) {
                    if (i >= length) {
                        z = false;
                        break;
                    }
                    Certificate certificate = peerCertificates[i];
                    MainActivity mainActivity = this.this$0;
                    Intrinsics.checkExpressionValueIsNotNull(certificate, "cert");
                    if (Boxing.boxBoolean(mainActivity.doesCertMatchPin(MainActivityKt.BADSSL_UNTRUSTED_ROOT_SHA256, certificate)).booleanValue()) {
                        break;
                    }
                    i++;
                }
                if (z) {
                    PrintWriter printWriter = new PrintWriter(sSLSocket.getOutputStream());
                    printWriter.println("GET / HTTP/1.1");
                    printWriter.println("Host: untrusted-root.badssl.com");
                    printWriter.println("");
                    printWriter.flush();
                    System.out.println((Object) ("Response was: " + new BufferedReader(new InputStreamReader(sSLSocket.getInputStream())).readLine()));
                    sSLSocket.close();
                    this.this$0.onSuccess(R.id.manually_pinned);
                    return Unit.INSTANCE;
                }
                sSLSocket.close();
                throw new Error("Unrecognized cert hash.");
            }
            throw new TypeCastException("null cannot be cast to non-null type javax.net.ssl.SSLSocket");
        } catch (Throwable th) {
            System.out.println(th);
            this.this$0.onError(R.id.manually_pinned, th.toString());
        }
    } else {
        throw new IllegalStateException("call to 'resume' before 'invoke' with coroutine");
    }
}

There's a lot going on here! The original code (here) is written in Kotlin and uses coroutines, which adds a lot of extra noise in the compiled output.

Fortunately, we don't need to understand everything. To change this behaviour, we just need to work out what code paths could lead to the highlighted line above, where the error is thrown.

As you can see here, JADX has taken some best guesses at the variable names involved in this code, inferring them from the types created (e.g. printWriter = new PrintWriter) and from the methods called (peerCertificates = session.getPeerCertificates()). This is pretty clever, and helps a lot to see what's happening.

It's not perfect though. You can see from some inferred variables like createSocket = instance.getSocketFactory().createSocket("untrusted-root.badssl.com", 443), where the variable has just taken the name of the method, or the z boolean variable, where no clues where available to infer anything useful at all.

If you have experience with code like this it may be easy to see what's happening here, but let's walk through it step by step:

The line we're interested in only runs if z is false, since the preceeding if (z) block ends with return.
We can rename z to isCertValid (made easier by automated refactoring) and remove some Kotlin boilerplate to make the code immediately clearer, giving us code like: java boolean isCertValid = true; //... int length = peerCertificates.length; int i = 0; while (true) { if (i >= length) { isCertValid = false; break; } Certificate certificate = peerCertificates[i]; MainActivity mainActivity = this.this$0; if (mainActivity.doesCertMatchPin(MainActivityKt.BADSSL_UNTRUSTED_ROOT_SHA256, certificate)) { break; } i++; } if (isCertValid) { // ... return Unit.INSTANCE; } sSLSocket.close(); throw new Error("Unrecognized cert hash.");
The block before the if is while (true), so this code only runs after that breaks.
The break commands happen after either checking all values (setting isCertValid to false) or after doesCertMatchPin returns true for one value.
That means the exception is only thrown when doesCertMatchPin returns false for all values, and that method is indeed what causes our problem.

This gives us a good understanding of the logic here: the code checks every certificate linked to a socket, and calls doesCertMatchPin from the MainActivity class to compare it to BADSSL_UNTRUSTED_ROOT_SHA256.

This is an intentionally simple example. Real examples will be more complicated! But hopefully this gives you an idea of the process, and the same techniques of incremental renaming, refactoring and exploring can help you understand more complex cases.

It's worth noting that the relatively clear code here isn't always available, usually because obfuscation techniques are used to rename classes, fields & methods throughout the code to random names (a, b…, aa, ab…).

In that case, the same process we're discussing here applies, but you won't have many of the names available as clues to start with, so you can only see the overall structure and references to built-in JVM APIs. It is still always possible to reverse engineer such apps, but it's much more important to quickly find the precise code that you're interested in before you start, and the process of understanding it is significantly more difficult. That's a topic for another blog post though (watch this space).

Patch it with Frida

Once we've found the code, we need to think about how to change it.

For our example here, it's easy: we need to make doesCertMatchPin return true every time.

Be aware Frida gives you a lot of power to patch code, but the flexibility is not unlimited. Frida patches are very focused on method implementation replacement, and it's very difficult (if not impossible) to use Frida to patch to individual lines within existing methods. You need to look out for method boundaries at which you can change behaviour.

For certificate pinning, that's fairly easy, because certificate checks are almost always going to live in a separate method like checkCertificate(cert), so you can focus on that. In other cases though this can get more complicated.

In this specific case, we're looking to patch the doesCertMatchPin function in the tech.httptoolkit.pinning_demo.MainActivity class. Within a Frida script, we first need to get a reference to that method:

const certMethod = Java.use("tech.httptoolkit.pinning_demo.MainActivity").doesCertMatchPin;

Then we need to assign an alternative implementation to that method, like so:

certMethod.implementation = () => true;

After this patch is applied, the real implementation of that doesCertMatchPin method will never be called, and it'll just return true instead.

This is a simple example. There's many more complex things you can do though. Here's some examples:

// Disable a property setter, to stop some fields being changed:
const classWithSetter = Java.use("a.target.class");
classWithSetter.setATargetProperty.implementation = () => {
    return; // Don't actually set the property
};

// Wrap a method, to add extra functionality or logging before and after without
// changing the existing functionality:
const classToWrap = Java.use("a.target.class");
const originalMethod = classToWrap.methodToWrap;
classToWrap.methodToWrap.implementation = () => {
    console.log('About to run method');
    const result = originalMethod.apply(this, arguments);
    console.log('Method returned', result);
    return result;
};

// Hook the constructor of an object:
const classToHook = Java.use("a.target.class");
const realConstructor = classToHook.$init;
classToHook.$init.implementation = () => {
    // Run the real constructor:
    realConstructor.apply(this, arguments);
    // And then modify the initial state of the class however you like before
    // anything else gets access to it:
    this.myField = null;
};

There's a huge world of options here - those are just some of the basic techniques at your disposal.

Once you've found a method you want to patch and you've got an idea how you'll do it, you need to set up Frida (see this guide if you haven't done so already) to test it out. Once Frida is working you can test out your patch interactively, and tweak it live to get it working.

For example, to test out our demo hook above:

Attach HTTP Toolkit to the device
Run the app, check that the "Manually pinned request" button fails and shows a certificate error in HTTP Toolkit.
Start Frida server on the device
Restart your application with Frida attached by running: frida -U -f tech.httptoolkit.pinning_demo
This will start the app, and give you a REPL to run Frida commands
Run Java.perform(() => console.log('Attached')) to attach this process to the VM & class loader (it'll pause briefly, then log 'Attached').
Test out some hooks. For our demo app, for example, you can hook the certificate pinning function by running: javascript Java.use("tech.httptoolkit.pinning_demo.MainActivity").doesCertMatchPin.implementation = () => true;
Clear the logs in HTTP Toolkit, and then press the "Manually pinned request" button again
It works! The button should go green, and the full request should appears successfully in HTTP Toolkit.

Once you've something that works in a REPL, you can convert it into a standalone script, like so:

Java.perform(() => {
    console.log("Patching...");
    const mainActivityClass = Java.use("tech.httptoolkit.pinning_demo.MainActivity");
    const certMethod = mainActivityClass.doesCertMatchPin;
    certMethod.implementation = () => true;
    console.log("Patched");
});

and then you can run this non-interactively with Frida using the -l option, for example:

frida -U -f tech.httptoolkit.pinning_demo -l ./frida-script.js

That command will restart the app with the script injected immediately, so that that certificate pinning behind this button is unpinned straight away, and tapping the button will always show a successful result:

If you want examples of more advanced Frida behaviour, take a look through the my cert unpinning script for certificate pinning examples for every popular library and some other interesting cases, or check out this huge selection of Frida snippets for snippets demonstrating all sorts of other tricks and APIs available.

I hope you find this helps you to reverse engineer, understand & hook Android applications! Have questions or run into trouble? Get in touch on Twitter, file issues against my Frida script, or send me a message directly.

New HTTP standards for caching on the modern web

Wed, 20 Oct 2021 14:00:00 GMT

If you run any large public-facing website or web application on the modern web, caching your static content in a CDN or other caching service is super important.

It's also remarkably complicated and confusing.

Fortunately, the HTTP working group at the Internet Engineering Task Force (IETF) is working to define new HTTP standards to make this better. There's been a lot of work here recently to launch two new HTTP header draft standards intended to make debugging your caching easier, and to provide more control over your cache configuration.

Let's see what that means, how these work, and why everyone developing on the web should care.

The Standards

The two proposed standards I'm talking about are:

These are designed to update HTTP standards to match the reality of the CDN-powered web that exists today, creating specifications that formalize existing practices from popular CDNs (like Fastly, Akamai & Cloudflare, all of whom have been involved in the writing the standards themselves).

Both of these are fairly new specifications: Cache-Status has completed multiple rounds of review in 2021 and is currently awaiting (since August) final review & publication as a formal RFC, while Targeted Cache-Control Headers is currently an adopted draft standard, but in its last call for feedback. In both cases, they're backed by the IETF, they're received a lot of discussion already and it's unlikely they'll change much beyond this point, but they're also still new, so support isn't likely to be widespread yet.

Why does caching matter?

If you're running a high-profile user-facing web application, caches and CDNs are absolutely critical to providing good performance for end users at a reasonable cost. Caches and CDNs sit in front of your web server, acting as a reverse proxy to ensure that:

Content is cached, so your backend server only receives occasional requests for static content, not one request direct from every visitor.
Content delivery is resilient to traffic spikes, because static caches scale far more easily than application servers.
Content requests are batched, so 1000 simultaneous cache misses become just a single request to your backend.
Content is physically distributed, so responses are delivered quickly no matter where the user is located.

If you're running a high-profile site, all of these are a strict necessity to host content on the modern web. The internet is quite popular now, and traffic spikes and latency issues only become bigger challenges as more of the world comes online.

As an example, Troy Hunt has written up a detailed exploration of how caching works for his popular Pwned Passwords site. In his case:

477.6GB of subresources get served from his domain every week
Of those 476.7GB come from the cache (99.8% cache hit ratio)
The site also receives 32.4 million queries to its API per week
32.3 million of those queries were served from the cache (99.6% cache ratio)
The remaining API endpoints are handled by Azure's serverless functions

In total his hosting costs for this site - handling millions of daily password checks - comes in at around 3 cents per day. Handling that traffic all on his own servers would be expensive, and a sensible caching setup can handle this quickly, effectively & cheaply all at the same time. This is a big deal.

What's the problem?

That's all very well, but creating & debugging caching configurations isn't easy.

The main problem is that in most non-trivial examples, there's quite a few layers of caching involved in any one request path. In front of the backend servers themselves, most setups will use some kind of load balancer/API gateway/reverse proxy with its own caching built-in, behind a global CDN that provides this content to the end users from widely distributed low-latency locations. On top of that, the backend servers themselves may cache internal results, enterprises and ISPs may operate their own caching proxies, and many clients (especially web browsers) can do their own caching too (sometimes with their own additional caching layers, like Service Workers, for maximum confusion).

Each of those layers will need different caching configuration, for example browsers can cache user-specific data in some cases, but CDNs definitely should not. You also need cache invalidation to propagate through all these caches, to ensure that new content becomes visible to the end user as soon as possible.

Predicting how these layers and their unique configurations will interact is complicated, and there's many ways the result can go wrong:

Your content doesn't get cached at all, resulting in traffic overloading your backend servers.
Your content gets cached, but only at a lower layer, not in your distributed CDN.
Outdated responses are preserved in your cache for longer than you expect, making it hard to update your content.
The wrong responses get served from your cache, providing French content to Germany users or (much worse) logged in content to unauthenticated users.
A request doesn't go through your CDN at all, and just gets served directly from your backend or reverse proxy.
Your web site or API is cached inconsistently, serving some old & some new data, creating a Frankenstein combination of data that doesn't work at all.

This is a hot mess.

It's made worse, because lots of cache configuration lives in the request & response metadata itself, for example the Cache-Control header. This is very powerful for precise configuration, but also means that the configuration itself is passing through these layers and can be cached en route.

If you accidentally cache a single "cache this forever" response without realizing, then you can get yourself into a whole world of pain, and forcibly invalidating every layer of your caches to fix it is harder than it sounds.

There's always a relevant XKCD

How will Cache-Status help?

One of the clear problems here is traceability inside your caching system. For a given response, where did it come from and why?

Did this response come from a cache, or from the real server? If it came from a cache, which cache? For how much longer will it keep doing that? If it didn't come from a cache, why not? Was this new response stored for use later?

The Cache-Status response header provides a structure for all that information to be included in the response itself, for all CDNs and other caches that saw the request, all in one consistent format.

It looks like this:

Cache-Status: OriginCache; hit; ttl=1100, "CDN Company Here"; fwd=uri-miss;

The Cache-Status header format

The header format is:

Cache-Status: CacheName; param; param=value; param..., CacheName2; param; param...

It's a series of caches, where each one has zero or more status parameters. The caches are in response order: the first cache is the one closest to the origin server, the last cache is the one closest to the client.

It's worth being aware that responses may be saved in the cache with this header, and that that's preserved in future responses. Nonetheless, it's possible to use the parameters reading from the right to understand where this response was stored this time specifically, and where it came from previously too.

The cache-with-parameters values here are separated by commas, while the parameters themselves are separated by a semicolon (this is the sf-list and sf-item syntax from the now-standardized Structured Headers RFC), and cache names can be quoted when they contain otherwise-invalid characters like spaces.

Cache-Status header parameters

To explain each cache's behaviour, there's a few parameters and values defined:

hit - the response come from this cache without a request being sent upstream
fwd= - if set, a request was sent upstream to the next layer. This comes with a reason:
- fwd=bypass - the cache was configured not to handle this request
- fwd=method - the request must be forwarded because of the HTTP method used
- fwd=uri-miss - there was no matching cached data available for the request URI
- fwd=vary-miss - there was matching cached data for the URI, but a header listed in the Vary header didn't match
- fwd=miss - there was no matching cached data available (for some other reason, e.g. if the cache isn't sure why)
- fwd=stale - there was matching cached data, but it's stale
- fwd=partial - there was matching cached data, but only for part of the response (e.g. a previous request used a Range header)
- fwd=request - the request requested non-cached data (e.g. in its Cache-Control headers)
fwd-status= - if fwd was set, this is the response status that was received from the next hop
stored - if fwd was set, this indicates whether the received response was stored by the this cache for later
collapsed - if fwd was set, this indicates whether the request was collapsed with another request (i.e. not duplicated because an equivalent request is already in process)
ttl= - for how much longer (in seconds) this response will be considered 'fresh' by this cache
key - the (implementation-specific) key for the response in this cache
detail - an extra freeform field for additional implementation-specific information

Using those, we can interpret response headers like:

Cache-Status: ExampleCache; hit; ttl=30; key=/abc

This means that the request was received by ExampleCache, which found a response in its cache (under the key /abc) and returned it, and expects to keep doing so for the next 30 seconds.

We can also examine more complicated cases, like:

Cache-Status:
    Nginx; hit,
    Cloudflare; fwd=stale; fwd-status=304; collapsed; ttl=300,
    BrowserCache; fwd=vary-miss; fwd-status=200; stored

(Newlines just for readability)

This means that the browser sent the request, and didn't use a cached response that it has with the same URI because a header listed in the Vary header didn't match.

The request was then received by Cloudflare, who had a matching response cached (a response with Nginx; hit, meaning it itself was a response that came from Nginx's cache) but that response was now stale.

To handle that, Cloudflare sent a request to Nginx to revalidate the response, who sent a 304 (Not Modified) response, telling Cloudflare their existing cached response was still valid. The request that was sent was collapsed, meaning that multiple requests came to Cloudflare for the same content at the same time, but only one request was sent upstream. Cloudflare is expecting to now keep serving the now-revalidated data for the next 5 minutes.

That's a lot of useful information! With some careful reading, this header alone can immediately tell you exactly where this response content came from, and how it's currently being cached along the entire request path.

(The above might sound intimidating if you're not used to debugging caching configurations, but believe me when I tell you that having this written down in one place is a million times better than trying to deduce the same information from scratch)

Cache-Status in practice

This isn't a totally new concept, but the real benefit is providing a single source of consistent data from all caches in one place.

Today, there's many existing (all slightly mismatched) headers used by each different cache provider, like Nginx's X-Cache-Status, Cloudflare's CF-Cache-Status and Fastly's X-Served-By and X-Cache. Each of these provides small parts of the information that can be included here, and each will hopefully be slowly replaced by Cache-Status in future.

Today, most major components and providers don't include Cache-Status by default, but contributors from Fastly, Akamai, Facebook and many others have been involved in the standardization process (so it's likely coming to many services & tools on the web soon) and there is progress already, from built-in support in Squid and Caddy's caching handler to drop-in recipes for Fastly.

This was only submitted for RFC publication in August 2021, so it's still very new, but hopefully we'll continue to see support for this expand further in the coming months. If you're developing a CDN or caching component, I'd really encourage you to adopt this to help your users with debugging (and if you're a customer of somebody who is, I'd encourage you to ask them for this!).

How will Targeted Cache-Control help?

The existing Cache-Control header was designed in a simpler time (1999). IE 4.5 had just been released, RIM was launching the very first Blackberry, and "Web 2.0" was first coined to describe the very first wave of interactive web pages. Configuring multi-layered CDN architectures to cache terabytes of data was not a major topic.

Times have changed.

The Cache-Control header defined in 1999 is a request and response header, which lets you define various caching parameters relevant to the request (what kind of cached responses you'll accept) and the response (how this response should be cached in future).

We're not really interested in request configuration here, but response cache configuration is very important. Cache-Control for responses today is defined with a list of directives that tell caches how to handle the response, like so:

Cache-Control: max-age=600, stale-while-revalidate=300, private

This means "cache this content for 10 minutes, then serve the stale content for up to 5 minutes more while you try to revalidate it, but only do this in private (single-user e.g. browser) caches".

This is a bit of a blunt instrument - the rules set here must be followed in exactly the same way by all caches that handle the request. It is possible to limit the scope of control rules to just end-user caches (with private) and in the past a few duplicated directives have been added that only apply to shared caches (CDNs etc) like s-maxage and proxy-revalidate, but you can't be any more precise or flexible than that.

This means you can't:

Set different stale-while-revalidate lifetimes for browsers vs CDNs
Mark a response as needing revalidation with every request in your internal caching load-balancer but not your CDN
Enable caching with CDNs whilst telling external shared caches (like enterprise proxies) not to cache your content

This holds back a lot of advanced use cases. For most caching components, there are configuration options available to define rules within that component to handle this, but that's less flexible in its own way, making it far harder to configure different rules for different responses.

Targeted Cache-Control aims to solve this by defining new headers to set cache-control directives that target only a specific cache, not all caches.

How does Targeted Cache-Control work?

To use this, a server should set a response header format like:

-Cache-Control: param, param=value, param...

The header is prefixed with the specific target that this should apply to. The syntax is technically subtly different to the syntax used by Cache-Control, because it now uses the standard Structured Fields format, but in practice it's mostly identical.

The target used here might be a unique service or component name, or a whole class of caches. The specification defines only one target - CDN-Cache-Control, which should apply to all distributed CDNs caches, but not other caches - but other classes can be defined later. In future you can imagine Client-Cache-Control to set rules just for caching in HTTP clients, ISP- for internet service providers, Organization- for enterprise organization caches, you name it.

To use these headers, each cache that supports them will define (fixed or user configurably) a list of targets that it matches in order of precedence. It uses the first matching -Cache-Control header that's present, or the normal Cache-Control header (if there is one) if nothing more specific matches.

All in all this is pretty simple and easy to use, if you're already familiar with existing caching mechanisms. Targeted headers match certain targets, you can configure the caching rules per-target however you'd like, and the best match wins. For example:

Client-Cache-Control: must-revalidate
CDN-Cache-Control: max-age=600, stale-after-revalidate=300
Squid-Cache-Control: max-age=60
Cache-Control: no-store

This says that:

End clients (at least, those who recognize the Client-Cache-Control header I've just made up) can cache this content but must revalidate it before use every time
All CDNs can cache the content for 10 minutes, and then use the stale response while revalidating it for 5 additional minutes
Squid (a caching reverse proxy) can cache the content only for 60 seconds (and implicitly cannot use it while it's stale, since there's no stale-while-revalidate directive)
Anything else or anybody who doesn't understand targeted cache-control directives must never cache this content at all.

Targeted Cache-Control in practice

This is newer and earlier in the standardization process the Cache-Status, so it still might change. If you have feedback, the spec itself is on GitHub here and you can file issues in that repo (or send a message to the Working Group mailing list) to share your thoughts.

That said, the spec itself is written by authors representing Fastly, Akamai and Cloudflare, so it's got good industry support already, and it's far enough through the process that it's unlikely to change drastically.

Today, both Cloudflare and Akamai already support this, so if you're using those caches you can start precisely configuring both with CDN-Cache-Control, Akamai-Cache-Control and Cloudflare-CDN-Cache-Control right now. It's pretty likely that there'll be similar support in the pipeline for many other tools and services, so watch this space.

More to come

Caching in 2021 can be difficult, but Cache-Status and Targeted Cache-Control are rapidly maturing, and they're going to make it much easier to configure and debug. If you're working with caching, it's worth taking a closer look.

There are just two HTTP standards that the IETF have been working on recently - there's lots of others if you're interested in helping the web develop or learning about upcoming standards. Everything from rate-limiting headers to Proxy-Status, to HTTP message digests, and HTTP client hints. HTTP is an evolving standard, and there's lots more to come! If you're interested in any of this, I'd highly recommend joining the working group mailing list to keep an eye on new developments and share your feedback.

Want to test or debug HTTP requests, caching and errors? Intercept, inspect & mock HTTP(S) from anything to anywhere with HTTP Toolkit.

EU Funding for Dev Tools for the Decentralized Web

Wed, 15 Sep 2021 12:00:00 GMT

Through the Next Generation Internet (NGI) initiative, HTTP Toolkit has been selected for funding from the EU's Horizon research & innovation program, to expand beyond HTTP and offer the same interception, debugging & testing functionality for applications built on top of the decentralized web.

This is going to be a huge opportunity to invest in expanding HTTP Toolkit to support some very exciting new technologies, and extending existing core functionality to do so along the way.

For this project I'm going to be specifically focusing on 3 main protocols that today form the backbone of the decentralized web: IPFS, WebRTC & Ethereum.

For each of these protocols, the essential technologies are usable now, but the wider ecosystem and uptake is still in their infancy. I think it's clear that most mainstream web developers are not currently using these technologies to build production-grade decentralized applications.

There are many reasons for this, but one is a lack of high-quality developer tooling. Moving from traditional client/server architectures to building decentralized applications requires developers to replace many day-to-day debugging & testing tools with manual logging, custom scripts and guesswork. This tooling gap makes decentralized development significantly more difficult.

The goal here is to fix that, by providing modern tools for the future of the web.

This is funded as part of the NGI Pointer project, funding me as an individual to work on this for a year, with all the resulting output to be available under free & open-source licenses (of course, HTTP Toolkit is conveniently already 100% open-source, so that's no change at all).

If you're interested in the future of the web and you think developer tooling matters, I hope this is very exciting!

If you're an existing HTTP Toolkit user though and you're only interested in HTTP, don't worry. This is equity-free R&D funding, so HTTP Toolkit remains a completely independent open-source project, and although there's some crypto involved, it's just at the protocol level. HTTP Toolkit is not going to start issuing coins, gambling on NFTs, demanding to connect to your wallet, or anything of the sort. If you want to ignore the new features completely, that's totally fine.

Notably in terms of ongoing development, this is not going to be a full-time commitment, and a significant proportion of my time will still be spent on continuing to support & develop the existing HTTP-focused functionality alongside this. There'll be plenty of crossover too, and I'm confident that many of the new features & UI improvements for decentralized protocols will prove valuable to HTTP-only users along the way (notably there's a huge UX overlap between WebRTC & WebSockets debugging…).

That's the high-level summary. Let's dig into the details.

Why is the EU funding this?

NGI is funded by the EU to build "a European initiative for a Human Internet that respects the fundamental values of privacy, participation and diversity".

I think a key motivation behind this is that many of today's protocols and major players on the internet have come from the US, and there's a feeling that many of these don't sufficiently protect some key European values, especially privacy and transparency.

Wherever you're from, I think we can all agree that today's internet has some problems.

Open-source is part of the solution to this, and has often driven & underlaid key innovations on the internet, but it's rarely well funded. This results in critical projects either languishing without the support they need (relevant XKCD) which creates security problems and other issues, or projects depending on the backing of larger commercial entities with their own private and sometimes problematic interests.

The internet is going to continue to evolve. The EU wants to fund projects, especially open-source projects, to ensure that future evolutions are designed with key issues like privacy & security in mind from the onset, and to ensure that European projects are leading the way in building that.

Decentralization of the web is an important step in this direction. By decentralizing services, we can give users control over their own data, make it easier to protect privacy, give users more power to publish content for themselves, and improve the resilience & performance of internet services along the way.

HTTP Toolkit fits in here because it's open-source, it's European (I'm British, so Brexit has made that more awkward, but I live in Barcelona), it already provides tools to support transparency & privacy (HTTP Toolkit has been used for privacy research by organizations from the FT to Privacy International, plus a huge part of the user base is security researchers, especially for Android), and so it's perfectly placed to provide tooling for this kind of decentralization.

If you're interested in funding too, NGI have a variety of other open calls currently offering equity-free funding for researchers, open-source hackers, fledgling tech startups & others related to a whole variety of different scopes & topics.

NGI Pointer was just one of those, focused on supporting bottom-up open-source projects & tools, but there's funding available now for everything from software startups working in the blockchain ecosystem to academics researching the new building blocks for search & discovery.

It's early but NGI have been great so far, so if you're based in an EU or associated country (most of Europe) then I'd recommend taking a look at their open calls and the other NGI-related EU Horizon open calls available.

Why these protocols?

These three protocols were picked as they're some of the most popular & mature decentralization technologies for the web today, covering a wide range of functionality: persistence (IPFS), peer-to-peer communication (WebRTC) and payments/distributed computation (Ethereum).

If you're not familiar with them, let's run through a quick (heavily simplified) summary of each:

IPFS

IPFS is a content-addressed network, unlike HTTP, where content is addressed relative to the server that publishes it. Instead of saying "Hi example.com, I would like hello.html", clients say "Hi everybody, I would like the content with hash ABCDEF".

Content is then delivered by whoever has it available, whether that's somebody else on your local network, an IPFS node that's hosted near you (perhaps by your ISP), or an IPFS node hosted by the original publisher. It's the same content regardless (verifiable using the hash) but allowing that content to be distributed widely means that it's possible to:

Use IPFS during network partitions (as long as somebody on your network has the content you want, you can load it, even if the rest of the internet is unavailable)
Access content even if the original publisher is entirely offline, as long as somebody has it cached somewhere
Improve latency for popular content (when it's likely that somebody near you already has it)
Reduce traffic spike load on publishers' servers (if content is popular, you don't need to go to the original publisher to get it)

In many ways, you can think of this as being technically similar to Bittorrent - content is loaded from the group of people who currently have it available, not from any one single source, and performance improves with popularity.

It's also very easy to publish to IPFS, making it possible to publish new content that's immediately accessible directly from a web browser, with no concept of servers or hosting providers necessary.

IPFS has gained more and more attention in recent years, with native support released in both Brave and Opera in the last 12 months, and an official HTTP gateway (for HTTP-only client compatibility) made available by Cloudflare to allow usage with clients that don't support IPFS. IPFS can also be used in other browsers using the IPFS Companion extension (Chrome, Firefox).

Put together, this creates a versatile content distribution and publishing network, providing a good basis for persistence & content hosting for web applications with no central server and no single point of failure.

WebRTC

WebRTC is a peer-to-peer protocol that's generally used through its standard JavaScript web API, supported out of the box today in all modern browsers.

It's already widely used for video and audio, powering many web-based video chat applications. It allows peer-to-peer communication between web pages, meaning that video can be sent directly between two users on the same site, without going through a central server, improving latency and reducing server load.

In addition to that though, WebRTC also supports data channels, allowing web pages to directly send arbitrary messages between one another. Think of these like user-to-user websockets, but with no server required.

This is incredibly powerful for decentralized applications: you can use WebRTC to let users directly interact, without needing any server whatsoever!

Using this, it's easy to imagine a fully decentralized chat room, perhaps with the page itself and its JavaScript served over IPFS, where every message is simply sent between peers directly and stored only in their browsers. It's also possible to go further, even building applications where a full database is stored client-side, with changes synchronized directly between peers over WebRTC.

Ethereum

Ethereum is a widely used cryptocurrency, the second largest after Bitcoin by total value, and processing far more transactions per day than Bitcoin and most other popular cryptocurrencies (currently ~1.3 million transactions per day, compared with Bitcoin's ~250 thousand).

Ethereum's most notable feature though goes beyond simple transfers of money: Ethereum can host and execute smart contracts, effectively acting as a decentralized computer. Code in these contracts is fully public and auditable, and it's executed by the network's miners, who run the code & update state on the blockchain in much the same way that they run requested financial transactions and record the result on the blockchain.

This is important because it turns Ethereum from being just a currency into a platform, making it possible to store and atomically mutate application state in a fully decentralized system, with no servers or single points of failure required.

You can use Ethereum's smart contracts to build things like decentralized SaaS checkouts (send X money to an address, your account can now use paid features), to atomically transfer blockchain resources between users of your app, to build systems for your users to vote on planned features, or to provide an API that queries & exposes the blockchain-hosted state of your app.

Both Opera and Brave support Ethereum wallets natively, and other browsers can do so using wallet extensions such as Metamask. Alternatively, pages can interact with the network directly via a hosted HTTP Ethereum API provider like Infura.io.

Notably these and other Ethereum clients all generally work by communicating with an Ethereum Node's JSON-RPC API. This has become a defacto standard, and the same API is also supported by many similar platforms, like Theta, GoChain, Moonbeam and others. While those aren't the primary target here, they should be supported automatically regardless, along with any other future platforms built to support the same API.

There is one caveat here: Ethereum (and other crypto) does have a substantial environmental impact which I'm not keen to encourage. That said, Ethereum is currently expected to switch to proof-of-stake to enormously reduce this very soon (the current deadline is December 2021). I'm hopeful that that will be in place well within the next year, before this project is complete, and they have been making promising progress on that front recently (and significantly more progress than any other crypto).

…and more

These three protocols are intended as a starting point, not a final target.

A key goal here is to explore the kinds of developer tooling that decentralized app developers need, independent of the specific protocol. Once the core technology & UX for these three are in place, that will provide a base on which other protocols could be more easily supported in future, to provide tooling for alternative protocols from Hypercore to Filecoin as the ecosystem develops.

What's the problem today?

These protocols are all very well in theory, but right now they're a massive pain to build serious applications with, because browsers and other tools don't provide any of the kind of support you're used to when working with HTTP and other mainstream technologies.

That means if you build a web application using these, it's very difficult to debug or test. You cannot easily:

See the contents of data sent through WebRTC data channels, at all.
Add latency to explore how your application handles slow IPFS retrievals or timeouts.
Stub an Ethereum smart contract for testing or prototyping.
Quickly replace some IPFS content for testing, without republishing everything and updating all hashes.
Breakpoint an Ethereum transaction to test an alternate result.
Inject a message into a WebRTC channel.

This is a massive pain, which creates constant friction, and it's a long way away from the functionality that web developers are used to on the mainstream HTTP-powered web.

This isn't just bad for developers though. Investigating apps to see what exactly they're doing and poke at their internals is extremely important for security and privacy researchers, and without tooling the traffic for each of these protocols today is almost invisible & untouchable. As their usage increases, this poses a serious risk to security research & transparency on the web.

Fortunately, HTTP Toolkit already has the technology to support this kind of debugging with HTTP, it's already used in this kind of security & privacy research on today's web, and it should be very possible to expand this functionality to cover these decentralized protocols in future too.

How will this work?

The basic idea is that HTTP Toolkit will sit between your browser and the rest of the network, and will intercept each of these protocols, and proxy their interactions through to the rest of the network, but allowing you to inspect those interactions and potentially change them or inject messages en route.

This might sound ambitious! Adding automatic interception & debugging support for three completely different protocols, in addition to HTTP, all within the existing app?

Fortunately, I have a trick up my sleeve: it's HTTP all the way down.

Specifically, the application-facing interface to both IPFS and Ethereum is actually just HTTP. In both cases, browsers and other clients use the protocol by making HTTP requests to the API of a local or remote node, which handles all the low-level peer-to-peer interactions between the wider network. Take a look at the APIs here and here.

For IPFS & Ethereum, this simplifies things a lot! HTTP Toolkit already has all the building blocks to intercept, inspect & rewrite HTTP(S) traffic, so capturing and manipulating this traffic at a low level is mostly just a matter of configuration management.

In effect, HTTP Toolkit will act as an IPFS/Ethereum node that can transparently pass traffic through to a real target node (and so to the wider network) but which can also inspect and modify all API requests en route, for debugging & testing.

That's not to say this will be easy of course. There's significant additional work to make this truly usable, by providing an interface that allows you to easily understand and modify these interactions. For example, you would want to be able to interpret contract execution calls, test IPFS latency or failures, and breakpoint and modify Ethereum transactions (more details & mockups below).

WebRTC meanwhile is somewhat more complicated, but still tractable by extending HTTP Toolkit's existing interception framework, and the UI work is closely related to the work required to debug WebSockets too.

To make a WebRTC connection, a web page uses the built-in JS APIs in the browser, passing the details of the peer they want to connect to. These peer details are shared using out-of-band signalling - i.e. you need to use some non-WebRTC mechanism to share connection details before you can connect. The details one client should signal to the other via the signalling channel are also provided by the browser, and then shared by some other mechanism (a central server, QR codes, recommendations via other peers, you name it).

This signalling is the key for interception. If we can change the signalling details each browser uses, then we can change a connection from being peer-to-peer to peer-to-HTTP Toolkit plus HTTP Toolkit-to-peer, and from there we can inspect and modify WebRTC data however we like.

For normal HTTP(S) interception, today HTTP Toolkit intercepts browsers and other clients by injecting proxy & certificate information into the target process when it's started. To support WebRTC, this startup interception can be extended by injecting a temporary browser extension that hooks the JS WebRTC API in the target browser, and replacing all connection details there, to inject HTTP Toolkit as a proxy within every WebRTC connection from inside the browser itself.

With that in place, browsers will connect to HTTP Toolkit as the other peer in every WebRTC connection, and using standard WebRTC libraries we can accept those connections, to act as a mock peer, proxy the connection to another real peer, or do something else entirely.

That still leaves similar UX challenges to IPFS & Ethereum: we need to provide useful information when inspecting these interactions, and tools to easily modify & mock them. What does that look like, and how do we get there?

What's the plan?

1. Release standalone interception libraries

The first step to build this is to extend Mockttp to provide a convenient API that anybody can use to create a proxy for these protocols, and to then verify & mock their interactions.

These libraries will provide the base for traffic handling in HTTP Toolkit and will be usable headlessly and standalone (just as Mockttp is). This allows application developers to intercept IPFS, WebRTC & Ethereum interactions in their own code to verify and mock behaviour in automated testing, or build other kinds of automated rewriting proxies on top of these protocols.

They'll be released as standalone libraries, one for each protocol. As a quick mockup for Ethereum for example, if the new library is called 'Mockthereum' (TBC) then you might be able to write code like:

const Web3 = require("web3")
const mockNode = require("mockthereum").getLocal();

const contractToMock = "0x11f4d0A3c12e86B4b5F39B213F7E19D048276DAe";

describe("Mockthereum", () => {
  beforeEach(() => mockNode.start(8080));
  afterEach(() => mockNode.stop());

  it("allows you to mock Ethereum contracts", async () => {
    const web3 = new Web3(mockNode.url);

    // Define a mock result for an Ethereum contract:
    const mockedContract = await mockNode.whenCall(contractToMock)
      .thenReturn("mock contract result");

    // Actually call the contract using web3.js, just like normal
    // (this step would normally run within the real code being tested)
    const result = web3.call({ to: contractToMock });

    // Confirm that the code called the contact & got the expected result:
    expect(mockedContract.calls.length).to.equal(1);
    expect(result).to.equal("mock contract result");
  });
});

This is a simple extension of the existing Mockttp API, just adding an easy method to mock recognized Ethereum requests. You can imagine more methods though like:

mockEthNode.rejectTransactionsTo(address)
mockIPFSNode.withContent(hash, content)
mockWebRTCPeer.echoAllMessages()

The specific methods that will be supported are TBC, but you get the idea. Take a look at Mockttp's existing mock rules for examples of what's already possible today for HTTP.

2. Automatically intercept traffic

With the standalone libraries, we'll be able to inspect & transform traffic once it reaches us, but we still need to redirect the traffic to our proxies somehow to do so.

We discussed the low-level of how this will work above. The practical steps to implementing that are:

Inject configuration into target browsers to proxy Ethereum & IPFS's HTTP traffic via HTTP Toolkit at startup
Create a WebRTC-intercepting browser extension
Inject the WebRTC extension into browsers to intercept WebRTC p2p connections

With this in place, for any launched browser, Ethereum, IPFS & WebRTC will all work exactly like normal, but tunnelled through code that can freely transform and inspect everything they do. Raw HTTP requests for Ethereum & IPFS will appear in HTTP Toolkit immediately, and WebRTC requests will be transparently proxied but invisibly.

3. Build a UI to explore collected traffic and to define rules to transform it

Once all the traffic is in our control, we can start doing things with it. Let's take a look at some quick mockups

Below you can see traffic from a webapp that's using HTTP, IPFS, WebRTC and Ethereum, extending HTTP Toolkit's existing UI to show interactions from all of these protocols with their key metadata, all in one place:

In addition, HTTP Toolkit's rule builder UI will be extended to expose the interaction handlers from each standalone library in the UI:

Remember these are mockups! Actual UI will vary. Do please send some feedback if you have opinions on this though.

When?

This is going to kick into action over the next few weeks. The first step is to put a detailed roadmap in place before October this year, and then the project will run for a year until October 2022.

That's the plan - hopefully this sounds exciting! Watch this space to hear about all the next developments. If you haven't already, you can subscribe to this blog below, or subscribe to HTTP Toolkit major release announcements here.

If you have opinions on this, especially if you're using one of these protocols today, I would love to hear about them. Get in touch! Send me your thoughts, questions and feedback either on Twitter, as a GitHub issue or by messaging me directly.

Proxies are complicated: RCE vulnerability in a 3 million downloads/week NPM package

Tue, 31 Aug 2021 11:00:00 GMT

Pac-Resolver, a widely used NPM dependency, had a high-severity RCE (Remote Code Execution) vulnerability that could allow network administrators or other malicious actors on your local network to remotely run arbitrary code inside your Node.js process whenever you tried to send an HTTP request.

This is bad!

This package is used for PAC file support in Pac-Proxy-Agent, which is used in turn in Proxy-Agent, which then used all over the place as the standard go-to package for HTTP proxy autodetection & configuration in Node.js. It's very popular: Proxy-Agent is used everywhere from AWS's CDK toolkit to the Mailgun SDK to the Firebase CLI (3 million downloads per week in total, and 285k public dependent repos on GitHub).

I found this lovely little issue a short while back, while adding proxy support to HTTP Toolkit (yes, code reviewing your dependencies is a good idea!). The vulnerability was fixed in v5.0.0 of all those packages recently, and was formally disclosed last week as CVE-2021-23406.

First things first: are you personally at risk? This vulnerability seriously affects you if:

You depend on Pac-Resolver before v5.0.0 (even transitively) in a Node.js application
And, you do one of the below:
- Explicitly use PAC files for proxy configuration.
- Read & use the operating system proxy configuration in Node.js, on systems with WPAD enabled.
- Use proxy configuration (env vars, config files, remote config endpoints, command-line arguments) from any other source that you wouldn't 100% trust to freely run code on your computer.

In any of those cases, an attacker (by configuring a malicious PAC URL, intercepting PAC file requests with a malicious file, or using WPAD) can remotely run arbitrary code on your computer any time you send an HTTP request using this proxy configuration.

If you're in this situation, you need to update (to Pac-Resolver v5 and/or Proxy-Agent v5) right now.

If not, you're probably not in any immediate risk (but it's a good idea to update anyway). For now, settle in and let's talk about why this matters, how this works, and how it can be exploited.

What's a PAC file?

PAC stands for "Proxy Auto-Config". A PAC file is a script written in JavaScript that tells an HTTP client which proxy to use for a given hostname, using dynamic logic to do so.

This is a system originally designed as part of Netscape Navigator 2.0 in 1996 (!) but it's still in widespread use today.

An example PAC file might look like this:

function FindProxyForURL(url, host) {
    // Send all *.example requests directly with no proxy:
    if (dnsDomainIs(host, '.example.com')) {
        return 'DIRECT';
    }

    // Send every other request via this proxy:
    return 'PROXY proxy.example.com:8080';
}

It defines a function in JavaScript, which can be used to find the right proxy for a URL.

This is designed to be run in a sandbox, accessing only a few specific useful methods (like host regex matching) that are required. Still, by using those methods these scripts can become very complicated - MDN has some good docs on the features available if you're interested.

PAC files provide a way to distribute complex proxy rules, as a single file that maps a variety of URLs to different proxies. They're widely used in enterprise environments, and so often need to be supported in any software that might run in an enterprise environment.

How exactly is this distributed though, you ask? Usually from a local network server, over plain-text HTTP (it's a local address, so there's often no certs available). Distributing it from a remote server, rather than locally configuring the file, is useful & very common as it allows network administrators to change it quickly & easily, safe in the knowledge that clients will always have the most up to date version.

In fact this is so common that there's a standard for automatically discovering the PAC file URL when connecting to a network: WPAD (Web Proxy Auto-Discovery Protocol). Your local network can give you a PAC file URL via DNS or DHCP when you connect to the network, and many systems (including Windows, by default) will automatically download & use this file as the system's proxy configuration.

It's a JavaScript file you have to execute to connect to the internet, which is loaded remotely, often insecurely and/or from a location that can be silently decided by your local network. 1996 was truly a simpler time. What could go wrong?

What's the vulnerability?

Pac-Proxy-Agent attempts to provide support for PAC files specifically for Node.js - a noble goal. It doesn't automatically provide WPAD support (fortunately, given this vulnerability) although only because the PR was never completed. WPAD can easily be supported manually though, or even implicitly by reading the PAC URL from an OS that autodetects it using WPAD (i.e. Windows).

Pac-Proxy-Agent is instantiated with a PAC file URL, such as pac+http://config.org.local/proxy.pac. It retrieves the PAC file from that URL, and then acts as a Node.js HTTP agent (middleware for outgoing requests) which runs that PAC file for every outgoing URL before sending the request onwards upstream according to the PAC file's result.

So far so good. This is how PAC files are designed to work, and some implementation of this is necessary to support the many enterprise environments that use them.

This then is used in Proxy-Agent, which takes arbitrary proxy URLs and maps them to the appropriate agents. This is very convenient if you need to support a variety of system configurations! Read the system config, pass it to Proxy-Agent, and use the resulting agent for all outgoing requests. If you pass a pac+... URL to Proxy-Agent, it'll instantiate a Pac-Proxy-Agent for you from that URL, immediately giving you an agent you can use to make HTTP requests via the proxy.

Unfortunately however, Pac-Proxy-Agent doesn't sandbox PAC scripts correctly. Internally, it uses two modules (Pac-Resolver and Degenerator) from the same author to build the PAC function. Degenerator is designed to transform arbitrary code, and returns a sort-of sandboxed function, using Node.js's 'vm' module, that's then executed by Pac-Resolver.

VM's documentation starts with:

The vm module is not a security mechanism. Do not use it to run untrusted code.

Uh oh.

This is an easy mistake to make - it's small text (frankly, it should be the headline on that page and next to every method) and MongoDB did the exact same thing too in 2019, with even worse consequences.

Unfortunately though this creates a big problem. While VM does try to create an isolated environment in a separate context, there's a long list of easy ways to access the original context and break out of the sandbox entirely (we'll take a look at an example in a minute, but for now just trust me), allowing code inside the 'sandbox' to basically do anything it likes on your system.

If you accept and use an untrusted PAC file, this is Very Bad. Every time you make a request using the PAC file, it can run arbitrary code and do anything on your system. If it's malicious, you're in big trouble.

How might you end up using a malicious PAC file? Let me count the ways:

You read your proxy configuration from a config file, API endpoint, command line argument or environment variable and somebody manages to add their malicious PAC file's URL there.
You load a trusted PAC URL insecurely, and somebody else on your network changes its contents in transit.
You securely use a trusted PAC URL, but somebody successfully attacks the PAC file host and changes the file.
WPAD is enabled on your system (as it is by default on Windows), somebody on your local network abuses it to configure your system with their PAC file URL, and you use that system configuration (Node doesn't use system proxy config by default, but many implementations will do so explicitly).
You take proxy configuration in any other happy-go-lucky way, under the reasonable misapprehension that doing so can only risk exposing insecure traffic that you explicitly send via the proxy, and that that's acceptable for your case.

In practice, this either requires an attacker on your local network, a specific vulnerable configuration, or some second vulnerability that allows an attacker to set your config values.

If you end up in any of those situations though, it's game over, and it's easier than it sounds - anybody using a Node.js CLI tool designed to support enterprise proxies in a coffee shop, hotel or airport is potentially vulnerable, for example.

How could this be exploited?

To exploit this, the attacker needs to somehow provide a malicious PAC file (see above for ways this could happen), with contents that looks something like this:

// Here's the real PAC function:
function FindProxyForURL(url, host) {
    return "DIRECT";
}

// And here's some bonus arbitrary code:
const f = this.constructor.constructor(`
    // Here, we're running outside the sandbox!
    console.log('Read system env vars:', process.env);
    console.log('!!! PAC file is running arbitrary code !!!');
    process.exit(1); // Kill the HTTP client process remotely
    // ...steal data, break things, etc etc etc
`);
f();

That's it - this is all that's required to break out of the VM module sandbox. If you can make a vulnerable target use this PAC file as their proxy configuration, then you can run arbitrary code on their machine.

The example here will log env vars to the console in the client application and then shut it down, but of course it could silently send them elsewhere instead, write to files on the machine, attack other devices on the network, change application behaviour to attack clients, start mining crypto, etc etc.

This is a well-known attack against the VM module, and it works because Node doesn't isolate the context of the 'sandbox' fully, because it's not really trying to provide serious isolation. In line 7 above, this comes from a context passed to vm.runInContext to create the sandbox, which comes from an object parameter in the external Node.js environment. We can follow that chain to get a function constructor for the external runtime environment, and from there we can instantiate a function (f) outside the sandbox from a string. Then we just run it, and the code we provided runs without any of the sandbox's constraints.

What's the fix?

This is now fixed in Pac-Resolver v5.0.0, Pac-Proxy-Agent v5.0.0, and Proxy-Agent v5.0.0. The fix is simple: use a real sandbox instead of the VM built-in module.

In this case, this was done using the VM2 npm module, which provides a similar API while being explicitly designed to run untrusted code, and hardened to block sandbox escapes like this. Switching to VM2, the above exploit code prints:

process is not defined

It's hard to guarantee that it's impossible to escape VM2, like any sandbox, but it's widely used for this exact purpose. There are no known ways to escape VM2 today, and using a sandbox that's designed for running untrusted code is a dramatic improvement on the flimsy (by design) isolation provided by the VM module.

Using VM2 also makes it likely that any future sandbox escapes will be far more complicated and will be quickly dealt with when they appear, requiring just an update to VM2 to remain secure, and creating a difficult & moving target for any attacker.

Wrapping up

If you depend on Pac-Resolver, and there's any way you might be using PAC files in your proxy configuration: update to Pac-Resolver v5+ now.

For everybody else, I hope this was an interesting walk into some of the dangerous eccentricities of proxy configuration! Hopefully you're safe from the worst of the risk here, but you should probably update when you get a minute anyway.

What about the future? Is this going to happen again?

Yes, unfortunately, it definitely is. Even ignoring malicious supply chain attacks, there will be plenty of insecure code we're unaware of on NPM today, and on every other community package platform.

There are tactical mitigations that can be made (I'd be fully onboard with deprecating Node's VM module entirely for example, since it's a massive footgun, and better sandboxing primitives generally everywhere would really help - Deno being a good start) but the best thing you can do is keep an eye on published vulnerabilities which could affect you and ensure you quickly handle them.

While right now in Node-land that's quite a painful process (npm audit is noisy to the point of uselessness) there is very promising progress there that could really help to find the signal in the noise and make this easier to do. Here's hoping that comes soon, and package managers elsewhere follow suit!

This is also a great example of the value of code reviewing your dependencies! In many cases, especially for large applications with complex dependencies, that's not possible for all dependencies, but at least reviewing your most sensitive dependencies (like automatic proxy configuration) will help you catch these unintentional bugs and help get them fixed for everybody.

I do think this is not an example of the classic "developers nowadays use too many dependencies" or "NPM's many pointless tiny dependencies create risks" arguments that tend to get bandied around. There are real problems there, but if you need to support enterprise environments then writing your own proxy autoconfiguration code from scratch is a bad alternative, from both a productivity and a correctness standpoint, and building support for niche features like PAC files & WPAD into Node.js itself doesn't seem good either. Dependencies need management, but they're not always bad.

Lastly, I should give a big thanks to Snyk.io & their team for their help resolving this. I disclosed the issue to them directly (they provide support for reporting community package vulnerabilities for many languages here) since I couldn't see an clear way to get in touch with the maintainer privately, and they made contact, handled all the communication, coordinated the fix, and managed the disclosure itself. As a developer, rather than a full-time security researcher, it's definitely useful having somebody familiar with best practices to ensure vulnerabilities are resolved safely & responsibly.

Have any thoughts, questions or feedback? Get in touch on Twitter or send me a message and let me know.

Want to inspect, mock & debug Node.js HTTPS for yourself, for debugging & testing, with no vulnerabilities required? Try out HTTP Toolkit.

HTTPS certificate non-validation vulnerability in Node.js

Wed, 11 Aug 2021 17:00:00 GMT

Today Node.js announced and released a security fix for CVE-2021-22939, along with two other high severity issues. They've rated this vulnerability as 'low severity', but I think it's worth a closer look, as (imo) this really understates the risk here, and the potentially widespread impact.

In practice, this poses a risk to anybody making TLS connections from Node.js, e.g. anybody making HTTPS requests. Not all usage is vulnerable, but many common use cases are, it's not easy to guarantee that your code is 100% secure, and all Node.js versions since at least v8.0.0 are affected. If you're using TLS/HTTPS in Node.js, you should update ASAP.

I reported this issue to Node myself a couple of weeks ago, after running into it during my own development testing HTTP Toolkit. Let's talk through why this is a problem, how it works, and what you should do about it.

Everything here applies to TLS in general, but I'm going to focus on HTTPS specifically, since it's by far the most likely use case, and it's simpler and clearer.

What's the problem?

Here's an example of common but vulnerable code (TypeScript types included for clarity):

const https = require('https');

// Any convenient wrapper or library around the HTTPS module. It takes a URL, and
// extra optional parameters, including a `verifyCertificates` option, which can
// be set to `false` to disable cert verification when necessary.
function makeRequest(url: string, options: { verifyCertificates?: boolean } = {}) {
    // [...Do some custom logic...]

    // At some point make a request, using the optional verification option:
    return https.get(url, {
        rejectUnauthorized: options.verifyCertificates
    });
}

// Later usage looks like it's making a secure HTTPS request, but in fact the certificate
// is not being verified at all, so you could be talking to *anybody*:
makeRequest("https://google.com");

The key here is rejectUnauthorized. This Node.js option configures whether the request will check that the server's certificate is valid. If this is disabled, then all HTTPS protections are silently disabled. Anybody who can touch your HTTPS traffic can impersonate any server, to inspect or edit any traffic they like. In security terms this is normally described as "Very Bad".

The Node.js documentation for this option says:

If not false, the server certificate is verified against the list of supplied CAs. […] Default: true.

I.e. you can actively disable this verification if you need to, but by default it's always enabled unless you explicitly pass false.

Unfortunately, this wasn't true.

In reality, falsey values including undefined, will also disable certificate verification. That's a problem because it's extremely easy to introduce falsey values in JavaScript. In addition, every other API will treat undefined the same as 'no parameter provided', and so use the default (true), which would indeed securely validating the server's certificate. That's how every other Node.js API I've tested works, and how syntactic support for default parameters is defined.

That turns code that passes undefined from something that most developers would assume is perfectly safe and secure, and which the documentation explicitly says is safe, into something that's invisibly disabled fundamental security protections.

This means that anybody accidentally passing undefined to rejectUnauthorized is unknowingly not verifying the server's TLS certificate, and all the protections of HTTPS have been silently disabled.

When certificate verification is disabled like this, anything goes. You can use a self-signed certificate you just made up, use a real certificate signed for the wrong hostname, use expired certificates, revoked certificates, or even many invalid certificates.

That allows any malicious party who can get between you and the target server to pretend to be that server, and so take all your traffic in both directions and inspect and/or modify the proxied traffic before it's sent on to the real server.

Node.js won't show any warnings or clues that this is happening, and when there's no malicious parties involved everything will work exactly like normal, making this an otherwise invisible vulnerability.

Falling into this undefined trap is easy, because of how people frequently build options objects like these in JavaScript. A common convention is to define all the properties, referencing options from elsewhere that may or may not be defined. This isn't something you'd often do when doing an HTTP(S) request from scratch in your own code, but it's very common pattern when building a library or smaller wrapper around the raw HTTPS APIs.

Any code like this is usually vulnerable:

https.request({
    // ...
    rejectUnauthorized: options.anOption
});

This is vulnerable because values on options objects are optional by definition (so will usually be undefined), while rejectUnauthorized does not behave like a normal option, and should never be undefined.

This needn't necessarily be as simple as this. It's quite possible that internal libraries will generate values for rejectUnauthorized based on other parameters (allowing self-signed certificates for specific hostnames for example) so there's a variety of ways to run into this.

In practice, this is common - suffice to say I'm aware of npm modules with millions of weekly downloads who follow this pattern today, and there's plenty of examples you can find on GitHub too. I'll avoid pointing at specifics before those are fully resolved, but I intend to coordinate with vulnerable packages I'm aware of to update this code, and applications which are running on one of the latest Node.js releases are safe regardless.

How can an attacker exploit this?

Exploiting this is trivial if you can get on the network path of a request from a vulnerable application. That usually means anybody on your local network could exploit this (e.g. on the same wifi while you use a vulnerable Node.js CLI tool) or anybody who handles your traffic upstream, for example your ISP, any proxy, reverse proxy services like CloudFlare, and so on.

Whilst that is a challenge, attackers on the path between you and the server you're talking to are exactly what HTTPS is trying to prevent, and the only reason it exists. Those protections are important, and the vast majority of software you use assumes that they're in place, and builds other security mechanisms on top of that foundation.

Exploiting this in reality requires three steps:

Be on the path between a vulnerable client and an HTTPS server they want to talk to (for an ISP or proxy this is always true, for local networks this is reliably achievable using various techniques like ARP spoofing or evil twin wifi)
Pretend to be the target server (accept the TLS connection when you see it, generate a random certificate yourself for the TLS handshake, and vulnerable code will always accept it as a real valid certificate regardless)
Do something with the intercepted traffic (proxy it to the real server untouched, but inspected, inject your own responses, or proxy it while changing the request and response data)

Step 2 and 3 are extremely easy, and libraries like Mockttp (which I maintain, for testing HTTP(S) request traffic) can do it for you automatically in a couple of lines. Step 1 is harder, but not much harder in many environments.

The Node.js clients at risk are likely to be either CLI tools, or backend services making requests to APIs. For vulnerable clients, any such API keys are exposed, and all API requests & responses are potentially visible & editable by 3rd parties en route. For most non-trivial API usage, that is Very Bad.

What's the fix?

For Node.js itself, it's a very simple fix: the TLS module needs to explicitly check for false, which they now do. This was already done for server verification of client certificates a while back (when the documentation was updated) but seemingly never completed for the (far more commonly used) client verification of server certificates.

For downstream developers, there's two things you can do:

Update to the latest Node.js version
Ensure all your code and your dependencies code always sets rejectUnauthorized explicitly to either true (by default) or false (only where definitely necessary).

To test if code is vulnerable, try making a request to a known-bad HTTPS service. Badssl.com hosts a selection of these covering various types of bad HTTPS configurations, for example expired.badssl.com. Unfortunately, due to the nature of this, you need to ensure that that works with various

In the example code above, makeRequest("https://expired.badssl.com") will work, sending the request with no errors. Using one of the fixed versions of node released today, or when fixing the code itself, it will throw an error instead.

Why is this a low severity issue?

Good question! If you're not interested in how security reporting works, this might not be interesting, but it is important, because these severity scores affect how much attention vulnerabilities get, and how quickly systems are secured.

If you're not aware, vulnerabilities are generally scored using CVSS (Common Vulnerability Scoring System), which takes a series of parameters like "Confidentiality Impact" and "Privileges Required", and combines them together to give an overall severity from 0.0 (no issue) to 10.0 (critical disaster). The set of parameters together describe the details of how the vulnerability is exploited and its potential impact of vulnerable systems.

These scores don't related to how many systems are affected, or the chance of a random person on the street being impacted, or anything like this - a Node.js vulnerability is scored no higher than the equivalent bug in a tiny rarely-used FORTRAN package. These scores only aim to give a score of the potential risk to the systems that are vulnerable, so that maintainers of those systems understand their exposure.

In my opinion, using the standard CVSS definitions, a good measure of the real severity is:

Attack Vector: Network (you can attack remotely, if you're between the client and their target server)
Attack Complexity: High (you do need to get between the client & the target server)
Privileges Required: None (you don't need a user account in the vulnerable server or application)
User Interaction: None (attackers can exploit this with zero user action involved)
Scope: Unchanged (this usually only affects the vulnerable application)
Confidentiality impact: High (an attacker can inspect all HTTPS traffic)
Integrity impact: High (an attacker can arbitrarily change any HTTPS traffic)
Availability impact: None (generously - there are advanced attacks where you could use this to make a vulnerable application unavailable, but most attacks won't)

The definitions of these are standardized, and this is a textbook example of many of them (interception of network traffic is literally the example of "Attack Complexity: High" taken straight from the standard).

Putting the above into the calculator rates this instead at 7.4 out of 10 (High severity). That's in line with many very similar vulnerabilities in the past - for example in npm modules, Ruby modules, Java modules and Wordpress - and that's a much clearer representation of the real risk here, in my opinion.

(If there's something I'm missing though that limits the risk or increases the challenge to exploit this, I'd love to hear about it! Get in touch)

I'm not sure of the full reasons why Node is treating this as a low severity (1.9) issue, but I suspect it's due to a simple misunderstanding of the overall exploitability, or the Node team consider protecting against undefined options like this to be out of scope, even though they're widely used and actively supported. I've attempted to resolve this myself, but unsuccessfully.

Credit to them though, while I do disagree with this decision, they have still quickly triaged the report, found the cause, and shipped a fix for the issue.

Vulnerability timeline

July 26th: I find the issue and file a report.
July 28th: The Node team acknowledge the issue and find the likely cuplrit.
August 5th: The Node team announce an upcoming security release.
August 9th: A fix is committed.
August 11th: Node.js v16.6.2, v14.17.5 and v12.22.5 are released with fixes for this issue and others.

Wrapping up

You might be vulnerable to this issue: there's plenty of code in the wild that clearly is, and it's very easy to become vulnerable if you've written your own HTTP request utility function or similar.

If you are vulnerable, this is potentially easy to exploit and the impact is very significant.

Fortunately, this is easy to fix. Update Node wherever you can to v16.6.2, v14.17.5 or v12.22.5 now, and update any other code that passes a potentially undefined value to rejectUnauthorized to ensure it's a boolean (defaulting to true) where possible too, just in case.

Have any thoughts or feedback? Get in touch on Twitter or send me a message and let me know.

Want to inspect & debug Node.js HTTPS for yourself, for debugging & testing, with no vulnerabilities required? Try out HTTP Toolkit.

Safari isn't protecting the web, it's killing it

Wed, 28 Jul 2021 15:50:00 GMT

There's been a lot of discussion recently about how "Safari is the new IE" (1, 2, 3, 4, 5).

I don't want to rehash the basics of that, but I have seen some interesting rebuttals, most commonly: Safari is actually protecting the web, by resisting adding unnecessary and experimental features that create security/privacy/bloat problems.

That is worth further discussion, because it's widespread, and wrong.

More specifically, Safari's approach isn't protecting the web from bloat & evil Google influence, because:

Most features that Safari hasn't implemented have no hint of security, privacy or performance concerns, and they've been implemented in every other browser already.
The largest Safari complaint is unrelated to experimental features from the Chrome team: it's the showstopping bugs in implemented features, made worse by Safari's slow release cycle.
Refusing to engage with the contentious API proposals for real use cases doesn't actually protect the web anyway - it just pushes web developers and users into the arms of Chromium.

We'll dig into each of these points in more detail in a second, and then we'll talk about what Safari could do instead.

There have been other arguments made too, including much speculation about why Safari might be killing the web - is this motivated by protecting Apple's app store profits? I'm going to ignore those suggestions entirely, and stick to concrete problems. Their reasons are their own, outside Apple we can do little more than guess, and the concrete issues can make the point without conjecture.

Before we start, I do want to recognize that the Safari/WebKit team are working hard, and I do desperately want them to succeed! Chromium's domination is bad for everybody, and building a popular browser that's focused on privacy & security, as they appear to be trying to do, is a fantastic goal. That does not mean their current approach deserves our blind support.

I'm sure the Safari team are working on the issues below already, and I think it's likely that the problems fundamentally derive from management decisions about company priorities rather than the team themselves. Unfortunately though, today there are big problems, and the current trajectory is making the web worse, not better.

Of the three points above, I think the final one will be most interesting and contentious, but let's get the first two cleared up first:

Safari is killing the web by omitting easy safe features

A frequent argument made is that the features which Safari does not implement all either:

Reduce user privacy, by supporting tracking
Risk security, by increasing the browser attack surface
Hurt battery life, by making web pages bloated & inefficient

This isn't true. Here's a quick list of some of the features that every other browser has implemented but Safari has not, with no suggestion of any privacy, security or battery life concerns:

CSS's contain property, which isolates an element's layout from the rest of the DOM, improving browser render performance, and simplifying page layout for developers through isolation. Implemented in Chrome in 2016, and Firefox in 2019.
CSS's offset-path property, which allows elements to be animated declaratively along SVG paths. Implemented by Chrome in 2015 and Firefox in 2020.
CSS's overflow-anchor property, which stops pages jumping around while the user is reading. Implemented in Chrome in 2017 and Firefox in 2019.
Resolution media queries, which allow content to be styled to match the device pixel density. Implemented in Firefox in 2012 and Chrome in 2013.
:focus-visible, which avoids accessibility/design conflicts by showing focus styling only during keyboard navigation. Implemented in Chrome in 2020 and Firefox in January 2021.
TouchEvents, supporting multi-touch and touch gestures on the web. Implemented in Chrome in 2012 and Firefox in 2017.
BroadcastChannel, which allows pages on the same origin to easily communicate, e.g. to log all pages out together. Implemented in Firefox in 2015 and Chrome in 2016.
beforeprint and afterprint JavaScript events, allowing pages to dynamically customize print layouts beyond simple media styles. Implemented in IE 6 (!!!) in 2001, Firefox in 2011 and Chrome in 2018.
Regex lookbehind in JavaScript. Implemented in Chrome in 2017 and Firefox in 2020.
scrollIntoView({ behavior: 'smooth' }) to scroll to an item on the page. Implemented in Firefox in 2015 and Chrome in 2017.
Screen orientation JavaScript APIs, allowing pages to dynamically handle screen orientation changes. Implemented in Chrome in 2014 and Firefox in 2016.
AV1 video and AVIF images, a new efficient and freely licensed compression format. Implemented in Chrome in 2018 and Firefox in 2019.

Each of these has a published standard and is implemented by multiple browser engines, including Firefox, with no concerns I can see anywhere. There's been no specific public objections from the Safari team on any of these that I can see, only silence.

As far as I'm aware, there's also no signal from the Safari team that any of these are coming any time soon, and I've omitted quite a few more missing features that are implemented, but behind flags (often for years) but which are presumably going live sometime soon.

According to Can I Use's metrics, Safari is lacking about 10% behind Firefox and 15% behind Chrome in feature support. That's including every basic feature like HTML images, forms and links - so it's a major underestimation of the modern feature set.

Meanwhile the web platform tests dashboard (unaffiliated with any vendor, with contributors from Mozilla, Google, Apple and across the industry) has its own metric for this, a count of browser support for their list of core web features most used by web developers. Safari is not doing well:

The "they're only ignoring bad features" argument is made weaker by Safari's previous behaviour with such missing features, where many have eventually been implemented without objection, but years behind other browsers. If there was a good argument against these features, they should clearly never have been implemented.

There's no good case for implementing web platform features but just many years after everybody else, such as:

Date and time input types - released 4 years after Firefox and 9 years after Chrome
Service Workers, for page request middleware (offline support & caching) - released 2 years after it was supported everywhere else
AbortController, to abort fetch requests and other async operations - released a year after it was supported everywhere else
IntersectionObserver, to detect element visibility, e.g. to allow deferred loading - released 2 years after it was supported everywhere else
Form validation - technically released just ahead of Firefox & Chrome, but so broken as to be unusable for 7 years
WebP images - released 1.5 years after Firefox and 6 years after Chrome

There are hundreds more examples like this, where features are discussed, implemented and standardized in every other major browser, but not in Safari for years afterwards. This adds delays easy real-world use of otherwise standardized features shipped by every browser for an extra year or two (on top of the existing time for standardization & implementation) if it's ever implemented at all.

Again: these are not contentious features shipping by only Chrome, they're features with wide support and no clear objections, but Safari is still not shipping them until years later. They're also not shiny irrelevant features that "bloat the web" in any sense: each example I've included above primarily improving core webpage UX and performance. Safari is slowing that down progress here.

Ignoring standards like this does not help the web evolve more cautiously - once these features have been stable for years in every other browser they can't be changed anyway. A far better way to improve APIs would be to ship such features early in Safari, behind flags & origin trials, and gather feedback from as wide an audience of developers and browser implementors as possible before they become stable, so that feedback can help every browser include better APIs.

Instead, whenever Safari doesn't support otherwise widely available web features, developers can't depend on them 100%, so some will hold back on using them (especially in mobile use cases) or hack in workarounds, and so clear feedback is reduced, issues are harder to find, and the development of good web APIs is made harder for everyone.

I'll avoid guessing at the reasons for all this, but it is clear that it's a new development. In the past (the early 2010s) Apple was frequently leading the way on new features, as the very first browser to ship major JavaScript APIs like Web Workers, and the browser driving experimental prefixed features like CSS Canvas backgrounds. It's exceedingly rare now to see a web feature primarily driven by Apple. Something has changed.

Safari is killing the web through show-stopping bugs

In addition to missing features, Safari has a lot of bugs in its implemented features of various web standards. Many of these bugs have serious effects, where an otherwise working webpage entirely fails or has its layout significantly broken, etc. Here's a sample of the current bugs that exist in the latest stable Safari release:

IndexedDB APIs hangs indefinitely on initial page load, making it almost completely unusable: https://bugs.webkit.org/show_bug.cgi?id=226547
LocalStorage is broken when a page is open in more than one tab, in a way likely to cause major data loss in most use cases: https://bugs.webkit.org/show_bug.cgi?id=225344
Support for background-attachment: local has suddenly completely disappeared: https://bugs.webkit.org/show_bug.cgi?id=219324
Some Fetch requests incorrectly completely skip the service worker: https://bugs.webkit.org/show_bug.cgi?id=187461
Using border-image with border-style: none is rendered completely wrong, reported 9 years ago: https://bugs.webkit.org/show_bug.cgi?id=99922
Focus events for non-input elements behave differently in Safari to every other browser, reported 13 years ago: https://bugs.webkit.org/show_bug.cgi?id=22261
Safari incorrect blocks localhost as mixed content when accessed from an HTTPS page (but allows it from HTTP!), breaking use cases from Spotify to Ethereum: https://bugs.webkit.org/show_bug.cgi?id=171934
100vh (100% viewport height) means a different thing in mobile Safari to everywhere else: https://bugs.webkit.org/show_bug.cgi?id=141832
Fetch request streaming is implemented just enough to pass feature detection, but it doesn't actually work: https://twitter.com/jaffathecake/status/1420306878580547586
Mousemove events fire when modifier keys are pressed, even if the mouse isn't moved: https://twitter.com/jaffathecake/status/1420315350009356293
Appending an element to the shadow DOM in many cases hard crashes the browser process, making sites including redhat.com completely inaccessible: https://bugs.webkit.org/show_bug.cgi?id=224408

There's many many more. Moving from anecdotes to data: the graph below counts the number of web platform tests from the full suite that fail in only one browser. The yellow line is Safari, clearly failing far more tests than Firefox & Chrome, for years:

For every bug above and all the data in that graph, pages that correctly use the standard APIs - those that are fully supported by both Firefox and Chrome (in the localStorage case, supported by IE8!) - are broken for all Safari users.

This is bad. It's made much worse by the incredibly slow pace of Safari releases. Here are the browser release cycles today:

Chrome: every 6 weeks, planning to move to every 4 weeks in Q3 2021
Edge: every 6 weeks, planning to move to every 4 weeks in Q3 2021, with an 8-week stable enterprise option
Vivaldi: every 6 weeks
Firefox: every 4 weeks
Brave: every 3 weeks
Safari: every 6 months

Spot the odd one out.

(That's just for stable bugfix & feature releases - browsers also ship their own nightly/beta/preview releases and urgent patches for critical security issues outside this schedule)

This makes the whole problem so much worse, because even if bugs were quickly recognized and fixed, they're going to be around for at least 6 months, and likely well beyond too (because updates are manual and tied to OS updates, rather than automatic, background & low-hassle).

That means that even in the best case, web devs and JS library authors everywhere have to add permanent workarounds for every Safari issue, and support those workarounds for at least a year, rather than quick fixes to work around Firefox bugs that may only exist for a little over 4 weeks. Dave Rupert wrote an excellent article this week, listing his specific set of workarounds required to get Safari to behave like every other modern browser. It's hard work.

As an example: the localStorage bug above seriously breaks a core web API, and was very quickly fixed (within 24 hours! Superb) but today nearly 3 months later that working fix still hasn't been released to users, and all the Safari team can say is:

We are aware of how important this bug is. We have no comment on future releases. - https://bugs.webkit.org/show_bug.cgi?id=225344

It's hard to overstate how bad this is for the web.

Right now, every single website that wants to store any data in local storage has to simply accept unpredictable unnecessary data loss, and it's likely that this will continue for months to come. It's sort-of possible to work around this by using IndexedDB instead, but that itself is broken too by the other bug above.

This slow release cycle also cuts down on Safari's ability to get feedback, push fixes & test experiments through frequent iterative releases. A key change in software development over the last 10 years has been a move towards smaller and more incremental releases, rather than occasional big-bang deployments. Getting software into the hands of users as quickly as possible, and building a pipeline to take the resulting feedback, make changes and deploy fixes, and get a new release out again quickly is incredibly valuable.

It's a shame to see Safari avoid the benefits iterative releases entirely, and it's making the other problems here much worse.

Safari is killing the web by ignoring proposed new APIs

We've talked about the uncontentious standard APIs that Safari doesn't support. Let's talk about the contentious experimental ones.

These APIs, often proposed by the Chrome team, give browsers power to use bluetooth, write to local files, and sync content with servers in the background. For each of these, Safari and Firefox have signalled that they intend to ignore the API entirely, never implementing it, due to security, privacy & battery life concerns.

Firstly, I do think that Chromium is overly aggressive in pushing new APIs and publishing them before proper consensus is reached. Building consensus on web standards is extremely important, and it often feels like Google's team can be keener to immediately ship new APIs than take the time to work with other vendors, and ship the right API.

That said, many of the contentious APIs they've proposed - from Web Bluetooth to Filesystem Access - are clearly tapping into genuine use cases and real demand. Read through the replies to https://twitter.com/jensimmons/status/1418920407642656775 (from Apple's Web Dev Evangelist), the comments on Firefox's discussion of the File System Access API or WebMIDI, or Safari's issue for Web Push.

Behind the debate, there are floods of passionate developers, excited to build products on top of these technologies, and fighting to get wider support for them.

Given that there is real demand for these features, and Chrome is keen to ship them, this poses a very big problem for Safari, Firefox, and others.

The problem with popular features

I'm very sympathetic to the argument that there are security & privacy concerns around these features. Unfortunately though, Safari & Firefox live in a world where the leading browser, with far larger marketshare than both of them combined is absolutely going to fulfil the demand and ship these new APIs.

There is no plausible option where other vendors stop these features coming to the web entirely. In many cases, they're already there. There's only a world in which they stop them reaching beyond Chrome's 65% market share (~70%, including all Chromium-based browsers, or ~80% for desktop).

Progressive enhancement is an approach often used as a solution to web compatibility concerns like this. Unfortunately, in a situation where the leading browser builds a popular feature that can't be polyfilled, and minority browsers try to ignore it indefinitely, progressive enhancement means:

Chromium ships a feature, other browsers don't
If it's a valuable feature, web developers will use it when available, but with some fallback behaviour for other browsers
Other browsers get a worse experience, and this hurts metrics/engagement/etc
Web developers put up a "Works best in Chrome/Edge/Opera" notice, to encourage users towards the best experience
Users switch to Chromium and get a good experience
Other browsers slowly die, the browser ecosystem collapses, the web dies, sadness

That's not to say that every website will use these shiny new APIs - they're mostly isolated to web application use cases. In some ways that's worse though: your average user isn't going to install a new browser to read a news article, but they might if it offers them reliable notifications from their chat app (background push), makes it easier to work with a webapp they use at work every day (native filesystem), or if it's required to set up a new bluetooth gadget (web bluetooth). Once they switch for one thing, it significantly increases the change they switch for everything.

You might be thinking "I don't care - I won't switch to Chrome, I don't want webapps that use these APIs". That doesn't matter.

The percentage of users who'll never use Chrome out of principle is vanishingly small compared to the group of users who will switch to the best tool for the job. If Chromium is genuinely more functional than other browser engines, users will switch to Chromium, and your experience of the entire web will get worse as your alternative browser's market share shrinks and developers begin to ignore it, no matter what non-Chromium browser you want to use or what web pages you visit.

The health of the browser ecosystem affects everybody.

In Safari's case, the risk of this would be reduced if supporting Safari was easy and they had a well of web developer goodwill from which to draw. Back in the early days of Firefox vs IE, Firefox's tiny initial market share was less problematic because web developers would actively work to ensure their sites worked there, as it was a nice browser to support and use. Unfortunately (see the first two sections of this article) Safari is not easy to support, and decidedly does not have web developer goodwill to rely on.

All this isn't theoretical - this is visibly happening today. These features are popular enough that they're in use in real products on the web right now, and the above process is exactly what is happening.

For example, if you buy an Espruino (a popular programmable IoT gadget) the recommended dev process is their web-based IDE, which uses Web Bluetooth, and requires a Chromium-based browser:

Similarly, Excalidraw (a popular online whiteboarding tool) offers a far better UX only to browsers with the filesystem access API (Chromium only), Godot Engine (a open-source game engine) is now building a web-based editor which will require filesystem access API support (Chromium only) for convenient saving & loading, and Noteflight (a popular music composition service) shut down their existing MIDI adapter and moved their primary workflow onto Web MIDI (Chromium only).

These APIs are already part of the fabric of the web. These are popular webapps (Noteflight has 6 million users, Excalidraw has 22,000 github stars), many users want to use them, and they have core functionality that only works well in Chromium.

Of course, it's still early days, and the likely reality isn't that the browser ecosystem will actually collapse at the end of this. Despite Firefox & Safari's concerns, if an API really takes off and reaches critical mass, the reality is that they'll have to just implement these APIs as they are, or risk becoming incompatible with the real-world web.

That is a mildly better result - we still have multiple compatible browsers - but only very mildly: at that point, they have unintentionally allowed Google to unilaterally set web standards. We should avoid that.

The walls are falling down

This paints a bleak picture. The one saving grace today is that Apple blocks use of any non-WebKit engine on iOS, which protects that one environment, and the iOS market (in the US at least) is large enough that this means Safari must be prioritized.

Unfortunately however, Apple is currently tied up in antitrust battles, where allowing alternate browser engines on iOS is a plausible legal imposition, or a plausible concession to avoid accepting alternate app stores. This restriction seems unlikely to last forever.

Even if that restriction does hold, there's nothing to stop the above playing out on desktop alone, where Chromium already has 80% market share. Notable, all the examples above where this is happening today are desktop-focused webapps. And on mobile, while this restriction helps, Chrome is still holding steady at 64% market share (and rising - up 3 points since November 2020), which is easily a large enough audience that some web apps will accept losing non-Chromium users in return for the chance to build a app or game on the web in ways that would be totally impossible in Safari/Firefox.

There are two clear parallels with the past here:

The slow death of IE: by offering web developers fewer bugs, better tools and more features while IE stagnated, Firefox built enough developer goodwill to dramatically expand its marketshare against the odds, forcing IE (later Edge) to follow its lead.
WebExtensions: despite every browser previously offering their own add-on APIs, Chrome effectively dominated developer mindshare, provided more powerful & easier to use extension APIs that became far more popular, and both Firefox & Safari have eventually killed their own APIs and accepted Chrome's, unintentionally allowing Google to unilaterally set the web extension standard.

Chrome is following the same path today: offering web developers more powerful tools and a better development experience (better devtools, fewer bugs) than Safari. If nothing changes, the outcome is likely to be similar. This is bad.

There is a better way

So, outright ignoring popular features will not stop them happening, and risks either giving all market share to Google, or all browsers being forced to follow Google's standards. What the hell do we do instead?

Safari, Firefox and others need to make better proposals for these use cases.

If they're concerned that Web Bluetooth (for example) could be abused, they need to work together and with the Chrome team to improve permissions controls and UX, tighten up standards, limit functionality to the minimum for real use cases, give users control over these APIs, and build standards to support these use cases without endangering users.

At the end of the day, it's very hard to sell "We're the browser that doesn't support bluetooth" once users start seeing websites with cool features that require bluetooth. It's much easier to sell "We're the browser that securely supports bluetooth while protecting your privacy".

This is hard, but it's absolutely not impossible.

Some ideas:

Expose no information/access to web applications without explicit permission from the user (I think this is already the case for these APIs, but let's set a clear baseline).
Support limited functionality, or extra permissions for dangerous functionality, e.g. Web MIDI without SysEx instructions by default.
Avoid permissions fatigue, by disallowing sensitive permissions popups or PWA install prompts before significant user engagement (repeated visits, manual PWA install, etc).
Don't allow permissions prompts for sensitive permissions at all - require the user to actively enable something in the browser UI to activate sensitive features per-domain.
Build a reputation system linked to domain names, and restrict access to some APIs or show louder warnings on that basis until domains gain reputation.
Go even further: require these PWAs to register with the app store, tie a personally-identified Apple developer subscription to them, and ban accounts that abuse them.
Alternatively: Sell more expensive HTTPS certificates that are required to use sensitive APIs on the open web.
Allow all users who don't want these APIs to easily disable such features from the whole web in their browser settings.

These are not all good ideas. None of them are perfect, there are complex tradeoffs and challenges, and yes these are absolutely hard problems. There are a lot of clever people involved in each of these teams though, Apple have a lot of motivation and money available to work on this, and the alternative is that the Google approach happens regardless, and Safari/Firefox have to either become incompatible with the web or accept Google's standard as-is later on.

It's also important to remember that today any website can already use bluetooth, by getting you to download and run a binary (or install an app with bluetooth permissions). Protections don't have to be perfect - they just have to be significantly harder to defeat than it is to convince a user to run a malicious binary.

This is all a lot of work though, and I am sympathetic to Firefox's own position here: they just don't have to the resources to seriously keep up with Google's engineering efforts. That is not the case for Apple, and by combining forces Apple would help keep Firefox in the game to limit Google's dominance outside of Apple's own platforms. Apple have the oompf to lead web standards, keep up with Google, and push alternative approaches if they choose to. Right now it seems they're choosing not to.

Let's protect the web

Ok, wrapping up:

The features Safari has not implemented are generally not dangerous - the clear majority are widely accepted standards.
The "Safari is the next IE" argument is well supported by Safari's many showstopping bugs and the extra workarounds required for developers - it's not a misunderstanding of Safari's battle to protect privacy & security.
Safari and others can't simply ignore serious proposals for popular features that Chrome wants to implement. They need to engage and offer alternatives, or the problem will only get worse.

It's not accurate to describe Safari's approach as protecting the web, and right now it looks more likely that it is making the web worse for everybody.

For the new proposed APIs specifically, in the end they'll either have to engage with Chrome's proposals, or become incompatible with the growing part of the web that has, losing large portions of their userbase and their influence on standards along the way. There is no point in winning on principles if there are no users left.

I want to see a world where Apple, Mozilla, Microsoft and Brave are leading web standards, driving the web forwards with features that support new use cases and allow for exciting new products, but with care for user privacy, tracking-resistance and security embedded as first-class priorities.

I want a world where Safari, Firefox, Chrome & others all support a consistent set of evolving APIs, working together to avoid showstopping bugs or release fixes for them quickly, and giving web developers a consistent reliable platform to build on.

Right now, that's not happening. I'm scared that Safari's current approach of outright refusal and neglect of the web is going to give us the exact opposite result, and all the evidence suggests that's starting to happen already.

Apple has the resources to do this, and arguably a responsibility to do so if they want to support the privacy and security of their users. If they don't, the web is in big trouble.

Public CDNs Are Useless and Dangerous

Mon, 19 Jul 2021 14:15:00 GMT

Once upon a time, loading common scripts & styles from a public CDN like cdnjs or Google's Hosted Libraries was a 'best practice' - a great way to instantly speed up your page loads, optimize caching, and reduce costs.

Nowadays, it's become a recipe for security, privacy & stability problems, with near-zero benefit. Just last week, a security researcher showed how this could go horribly wrong.

There are ways to mitigate those risks, but in practice the best solution is to avoid them entirely: self-host your content and dependencies, and then use your own caching CDN directly in front of your application instead for performance.

I'll explain what that means in a second. First though, why was this a good idea, and how has it now become such a mess?

Why was this a good idea?

The main benefit that public CDNs of popular libraries offered was shared caching. If you used a popular version of jQuery then you could reference it from a public CDN URL, and if a user had recently visited another site that used the same version of jQuery from the same CDN then it would load instantly, straight from their cache.

In effect, sites could share resources (almost always JavaScript) between one another to improve caching, reduce load times, and save bandwidth for sites and visitors.

Even in the uncached case, this still offered benefits. Browsers limit the number of simultaneous open connections by domain, which limits the performance of parallel resource downloads. By using a separate domain for some resources, resource loading could be spread across more connections, improving load times for visitors.

Lastly, the main site's cookies aren't sent in requests to 3rd party domains. If you have large cookies stored for your domain, this creates a lot of unnecessary data sent in every request to your domain, again unnecessarily increasing bandwidth usage and load times (honestly I'm not sure if this overhead really had a practical impact, but it was certainly widely documented as an important 'web best practice').

Those are the abstract technical reasons. There were practical reasons too: primarily that these CDNs are offering free bandwidth, and are better equipped for static resource distribution than your servers are. They're offering to dramatically reduce your bandwidth costs, while being they're better prepared to handle sudden spikes in traffic, with servers that are more widely distributed, putting your content closer to end users and reducing latency. What's not to like?

Where did it all go wrong?

That's the idea at least. Unfortunately, since the peak of this concept (around 2016 or so), the web has changed dramatically.

Most importantly: cached content is no longer shared between domains. This is known as cache partitioning and has been the default in Chrome since October 2020 (v86), Firefox since January 2021 (v85), and Safari since 2013 (v6.1). That means if a visitor visits site A and site B, and both of them load https://public-cdn.example/my-script.js, the script will be loaded from scratch both times.

This means the primary benefit of shared public CDNs is no longer relevant for any modern browsers.

HTTP/2 has also shaken up the other benefits. By supporting parallel streams within a single connection, the benefits of spreading resources across multiple domains no longer exist. HTTP/2 also introduces header compression, so repeating a large cookie header is extremely efficient, and any possible overhead from that is no longer relevant to performance either.

This completely kills all the performance benefits of using a shared public CDN (although the cost & performance benefits of using CDNs in general remain - we'll come back to that later).

On top of the upside going away though, a lot of major downsides have appeared:

Security concerns

The security risks are the largest problem, conveniently highlighted by the disclosure last week of a security vulnerability that would have allowed any attacker to remotely run code within cdnjs, potentially adding malicious code to JS libraries used by 12.7% of sites on the internet.

There are many potential routes for this kind of attack, and other CDNs like unpkg.com have been found to have similar vulnerabilities in the past.

These kind of vulnerabilities are threats to all security on the web. Being able to inject arbitrary JavaScript directly into a tenth of the web would allow for trivial account takeovers, data theft, and further attacks on an unbelievably catastrophic scale. Because of that, large public CDNs like these are huge targets, providing potentially enormous rewards and impact against the whole web with a single breach.

While both the major CDN vulnerabilities above were found by security researchers and promptly fixed, attackers have successfully exploited specific shared 3rd party scripts elsewhere in the past, injecting crypto mining scripts into thousands of public websites in a single attack.

So far, there hasn't been a known malicious takeover of a major CDN in the same way, but it's impossible to know if the above CDN holes were ever quietly exploited in the past, and there's no guarantee that the next wave of vulnerabilities will be found by good samaritans either.

Privacy concerns

Public CDNs also create privacy risks. While online privacy was a niche topic when public CDNs first became popular, it's now become a major issue for the public at large, and a serious legal concern.

This can be problematic for public CDN usage because loading resources from a 3rd party leaks information: that the user is loading that 3rd party resource whilst on your site. Specifically, your site's domain (and historically full URL, though generally not nowadays) is sent in the Referer header with all subresource requests, like so:

A CDN subresource, as intercepted from Chrome v91 with HTTP Toolkit, leaking the referring site domain in its Referer header

At the very least, this tells the public CDN that a user at the source IP address is currently visiting the site listed in the Referer header. In some cases, it can leak more information: e.g. if a payment provider script is loaded only at checkout time, then 3rd party resource requests like this provide enough information for these CDNs to identify (by IP address and browser fingerprint) which users are purchasing from certain stores.

This is potentially a serious privacy problem, especially for public CDNs provided by companies with a clear interest in tracking users, like Google.

Reliability problems

CDNs are not infallible. They can go down entirely, become inaccessible, throw unexpected errors, timeout under load, or even serve the wrong content.

Of course, your own website can do the same too, as can any alternative infrastructure. In those cases though, you have some recourse: you can fix your site, chase the support team for the CDN service you're paying for, or switch to using a different CDN in front of your content server transparently. When your production site depends on a free public CDN, you're explicitly given zero formal guarantees or support, and you have no control of the CDN at all.

This is worse if you're worried about the long-term because no CDN will last forever. If you still have the code of your application in 20 years, but the CDN URLs used have gone away, you can't use the application anymore without lots of debugging & hunting down old copies of library files. Similarly but more immediately: if you're on an airplane, you can't reach your CDN, so doing some quick offline development is impossible.

Using a public CDN adds an extra single point of failure to your site. Now, if your servers go down or the public CDNs servers are inaccessible, everything is broken. All else being equal, it's best to keep the circle of critical dependencies small.

Performance limitations

Public CDNs as standard load every resource as a separate file, without bundling. Whilst HTTP/2 does reduce the need for bundling, this is still suboptimal for non-trivial web applications, for two reasons:

Worse compression: HTTP response compression is always applied per response. By splitting your script across many responses, instead of compressing it all together, compression performance is reduced. This is especially true for easily compressible content that likely shares lots of common content - i.e. JavaScript files.
No tree shaking: a public CDN must send you the entire JavaScript library in your response every time. Meanwhile, modern bundles can intelligently detect which parts of imported scripts are used through tree shaking at build time, and include only those portions in your application code, which can shrink the total size of your runtime dependencies dramatically.

This is a complicated issue - there'll be times where the above doesn't apply, and it's important to measure the reality for your specific application. That said, dependency bundling is going to be faster 90% of the time and it's a reasonable default if you're building anything substantial.

What should you do instead?

Host your own dependencies, put a cache directly in front of your application, and make your application resilient to missing resources.

By hosting your own dependencies, you have control over everything your application needs, and you don't have to depend on public infrastructure. By using a cache directly in front of your site, you gain the same caching, content distribution and performance benefits of public CDNs, while keeping mitigations available for the possible downsides.

When talking about caches, I'm primarily suggesting a paid caching reverse-proxy service, like Cloudflare, Fastly, Cloudfront, Akamai, etc (although these are paid, most do have generous free tiers where you can get started or host small sites). In addition to the caching, these each offer various features on top, like DDoS protections, serverless edge workers, server-side analytics, automatic content optimization, and so on.

It's also possible to run your own caching reverse-proxy of course, but unless you're going to distribute it around hundreds of data centers globally then you're missing big parts of the performance benefit by doing so, and it's likely to be more expensive in practice anyway.

Since this caching infrastructure can cache anything, with the content itself and most caching configuration (in the form of Cache-Control headers) defined by your backend server, it's also much easier to migrate between them if you do have issues. If a public hosted CDN goes down, you need to find a replacement URL that serves the exact same content, change that URL everywhere in your code and redeploy your entire application.

Meanwhile, if your caching reverse-proxy goes down, you have the option to immediately put a different caching service in front of your site, or temporarily serve static content from your servers directly, and get things working again with no code changes or backend deployments required. Your content and caching remain under your control.

Whilst this is all good, it's still sensible to ensure that your front-end is resilient to resources that fail to load. Many script dependencies are not strictly required for your web page to function. Even without CDN failures, sometimes scripts will fail to load due to poor connections, browser extensions, users disabling JavaScript or quantum fluctuations of the space-time continuum. It's best if this doesn't break anything for your users.

This isn't possible for all cases, especially in complex webapps, but minimizing the hard dependencies of your content and avoiding blocking on subresources that aren't strictly necessary where possible will improve the resilience and performance of your application for everybody (async script loading, server-side rendering & progressive enhancement are your friends here).

All put together, with modern tools & services this can be an incredibly effective approach. Troy Hunt has written up a detailed exploration of how caching works for his popular Pwned Passwords site. In his case:

477.6GB of subresources get served from his domain every week
Of those 476.7GB come from the cache (99.8% cache ratio)
The site also receives 32.4 million queries to its API per week
32.3 million of those queries were served from the cache (99.6% cache ratio)
The remaining API endpoints are handled by Azure's serverless functions

In total his hosting costs for this site - handling millions of daily password checks - comes in at around 3 cents per day, with the vast majority of savings over traditional architectures coming from caching the static content & API requests.

Running code on servers is slow & expensive, while serving content from caches can be extremely cheap and performant, with no public CDNs required.

Subresource Integrity

Subresource Integrity (SRI) is often mentioned in these discussions. SRI attempts to help solve some of the same problems with public CDNs without throwing them out entirely, by validating CDN content against a fixed hash to check you get the right content.

This is better than nothing, but it's still less effective than hosting and caching your own content, for a few reasons:

SRI will block injection of malicious code by a CDN, but when it does so your site will simply fail to load that resource, potentially breaking it entirely, which isn't exactly a great result.
You need to constantly maintain the SRI hash to match any future changes or version updates - creating a whole new way to completely break your web page.
It doesn't protect against the privacy risks of 3rd party resources, and if anything it makes the reliability risks worse, not better, by introducing new ways that resource loading can fail.
In many organizations, if important SRI checks started failing and broke functionality in production, unfortunately the most likely reaction to the resulting outage would be to remove the SRI hash to resolve the immediate error, defeating the entire setup.

That's not to say that SRI is useless. If you must load a resource from a 3rd party, validating its content is certainly a valuable protection! Where possible though, it's strictly better not to depend on 3rd party resources at runtime at all, and it's usually easy and cheap to do so.

Hypothetical future

A last brief tangent to finish: one technology that I do see being very promising here in future is IPFS. IPFS offers a glimpse of a possible world of content-addressed hosting, where:

All content is distributed globally, far more so than any CDN today, completely automatically.
There is no single service that can go down to globally take out large swathes of the internet (as Fastly did just last month).
Each piece of content is loaded entirely independently, with no reference to the referring resource and retrieved from disparate hosts, making tracking challenging.
Content is defined by its hash, effectively baking SRI into the protocol itself, and guaranteeing content integrity at all times.

IPFS remains very new, so none of this is really practical today for production applications, and it will have its own problems in turn that don't appear until it starts to get more real-world use.

Still, if it matures and becomes widespread it could plausibly become a far better solution to static content distribution than any of today's options, and I'm excited about its future.

Caveats

Ok, ok, ok, the title is a bit sensationalist, fine. I must admit, I do actually think there's one tiny edge case where public CDNs are useful: prototyping. Being able to drop in a URL and immediately test something out is neat & valuable, and this can be very useful for small coding demos and so on.

I'm also not suggesting that this is a hard rule either, or that every site using a public CDN anywhere is a disaster. It's difficult to excise every single 3rd party resource when they're an official distribution channel for some library or if they're loaded by plugins and similar. I'm guilty of including a couple of these in this page myself, although I'm working towards getting those last few scripts removed as we speak.

Sometimes this is easier said than done, but I do firmly believe that aiming to avoid public CDNs in production applications wherever possible is a valuable goal. In any environment where you're taking development seriously enough that it's worthy of debate, it's worth taking the time to host your own content instead.

Wrapping up

Caching is hard, building high-profile websites is complicated, and the sheer quantity of users and potential sources of traffic spikes on the internet today makes everything difficult.

You clearly want to outsource as much of this hard work to others as you possibly can. However, the web has steadily evolved over the years, and public hosted CDNs are no longer the right solution.

The benefits of public CDNs are no longer relevant, and their downsides are significant, both for individual sites and the security of the web as a whole. It's best to avoid them wherever possible, primarily by self-hosting your content with a caching reverse-proxy in front, to cheaply and easily build high-performance web applications.

Have thoughts, feedback or bonus examples for any of the above? Get in touch by email or on Twitter and let me know.

Want to test or debug HTTP requests, caching and errors? Intercept, inspect & mock HTTP(S) from anything to anywhere with HTTP Toolkit.

Defeating Android Certificate Pinning with Frida

Tue, 06 Jul 2021 13:30:00 GMT

Some Android apps go to astounding lengths to ensure that even the owner of a device can never see the content of the app's HTTPS requests.

This is problematic for security research, privacy analysis and debugging, and for control over your own device in general. It's not a purely theoretical problem either - protections like this attempt to directly block HTTPS inspection tools like HTTP Toolkit, which allow you to automatically intercept HTTPS from Android devices for inspection, testing & mocking, like so:

This depends on the target application(s) trusting the debugging proxy's certificate for HTTPS traffic. These HTTP interception and mocking techniques are super useful for testing and understanding most apps, but they have issues with the small set of hyper-vigilant apps that add extra protections aiming to lock down their HTTPS traffic and block this kind of inspection.

In the end, this is your Android device, and whether you're a security researcher checking for vulnerabilities, a developer trying to understand how an app uses its API, or a privacy advocate documenting what data an app is sharing, you should be able to see the messages that the apps you use transmit and receive on your own phone.

Protections like certificate pinning make this difficult.

Let's talk about how you can fight back, by using Frida to remove SSL pinning, and expose the real traffic that any app is sending.

What's certificate pinning?

By default, when an Android app makes an HTTPS connection, it makes sure that it's talking to a trusted server by comparing the issuer of the server's certificate to Android's built-in list of trusted system certificate authorities.

99% of apps stick with that default. You can't change the system certificate authorities on normal devices, so this list is fairly reliable and secure. You can change it though on rooted devices and most emulators, so it's quite possible to intercept and inspect HTTPS traffic from these apps by using a debugging proxy for HTTPS interception in those environments.

Unfortunately however, the last 1% which don't stick with the default configuration are more complicated. These apps include their own custom certificate validation, to specify the exact HTTPS certificate issuers they're prepared to trust, instead of trusting all of the device's trusted certificate authorities. This ensures they will never trust a new certificate from a certificate authority that they don't explicitly recognize, and so won't accidentally expose HTTPS traffic to anybody other than the real server.

This is generally known as "public key pinning", "certificate pinning", or "SSL pinning".

Because this blocks all except a specific list of certificate authorities, it also blocks the private certificate authorities used by HTTPS debugging proxies, and so we hit our problem.

Certificate pinning used to be a much more popular technique, back before Android Nougat when Android's own certificate validation was more lax and users could easily be tricked into installing new trusted certificates on their devices. Nowadays this is more tightly controlled, and certificate pinning is much rarer, since (as we'll see) it's really security theater, and Google's own docs now specifically recommend against the practice:

For similar reasons, it's not popular on the web. There was a short-lived HTTP standard to support this (HTTP Public Key Pinning) but it's deprecated and support was removed from browsers, as it makes it far too easy to unexpectedly and irreparably (!) break applications for little security benefit.

That said, it's still used on Android in some corners, particularly by very high-profile apps (e.g. Twitter) and very security-sensitive apps (e.g. banking apps, like N26 or BBVA), all of whom are extremely protective over the details of how their APIs are used, and would prefer that prying eyes can't look too closely.

In practice, that means that if you want to know how the Twitter app uses the Twitter API, you're going to need to make it trust your HTTPS interception certificate.

Enter Frida

Frida is a cross-platform multi-purpose framework for dynamically transforming how applications work, from outside the application. Think Greasemonkey, but for programs instead of web pages.

Frida lets you do things like logging every time an app calls a specific method, changing constants within built applications, recording how values within an application change or replacing methods to disable functionality entirely.

You make these changes by writing small scripts in JavaScript, which use Frida's API to define transformations that will be applied to the target process.

Frida supports Android, primarily using an on-device server that runs on rooted devices, and exposes an API via ADB so you can use Frida's CLI tools on your computer to transform apps on your phone on the fly.

This is very neat! But it's also quite intimidating if you're not familiar with low-level reverse engineering, since the internals and much of the documentation are very involved in the finer details of how applications work on each of the various target platforms.

Fortunately, it's not actually that complicated or that difficult. Let's walk through the whole process step by step:

How can you remove certificate pinning with Frida?

At a high level, you need to:

Connect ADB to a rooted device or emulator
Install and start Frida on the device/emulator
Install Frida on your computer
Tell Frida the app that you want to edit, and provide a script that knows how to remove the certificate pinning logic

Let's walk through how to do that in practice:

Connect to a device via ADB

ADB is the Android Debug Bridge an official Android tool for remotely debugging and controlling Android devices.

If you don't have ADB already, you'll need to install it. To do so, you need to either install Android Studio and use the SDK manager UI there, or by downloading the platform tools including ADB directly as a standalone package.

The rest of this guide will assume you've got adb in your $PATH.

You'll also need a target device with root access.

If you have a rooted device available, or you want to set one up, then that's great. You just need to plug it into your computer and enable USB debugging to allow debugging via ADB.

If you don't have a rooted device, you can use an emulator instead. To set one up, use the AVD (Android Virtual Device) manager either from the Android Studio UI, or by running avdmanager in the standalone command-line tools.

Your emulator can use any Android version (although a recent version matching your machine's architecture is a good idea), but must use a non-'Google Play' build. Either Vanilla or 'Google APIs' is fine, but Google Play builds include restrictions similar to physical devices that limit debug access.

Once everything is set up, you should be able to run adb devices on the command line and see your device listed there.

Install and start Frida on the device

First, download the Frida Android server from GitHub.

You want a version like frida-server-$VERSION-android-$ARCH.xz, for the latest $VERSION, where $ARCH is the architecture of your device. For emulators, that's probably x86_64, for physical devices it's probably arm64 (or maybe arm, for older devices).

You'll need to extract that .xz file to get the binary within. This isn't a common compression format, so you might need 7-Zip (Windows) or The Unarchiver (Mac) if you don't already have them.

You then need to copy the binary onto the device, make it executable, and start the server as root. Like so:

# Copy the server to the device
adb push ./frida-server-$version-android-$arch /data/local/tmp/frida-server
#        ^Change this to match the name of the binary you just extracted

# Enable root access to the device
adb root

# Make the server binary executable
adb shell "chmod 755 /data/local/tmp/frida-server"

# Start the server on your device
adb shell "/data/local/tmp/frida-server &"

The last command will start Frida, and keep running silently. If it prints any output then something is probably wrong - most likely you've downloaded the server for the wrong architecture or you're not running these commands as root.

Install Frida on your computer

Ok, you've got a debuggable device connected with the Frida server running.

To control it, you need to install the Frida CLI tools on your computer. You'll need Python installed for this, and then you just need to run:

pip install frida-tools

You can test this by running frida-ps -U. This will connect to the server via USB (-U) and list the details over every running process on the target device. If this shows you a list of processes, you're all good!

Disabling SSL pinning with Frida

The last and most important step: we need to tell Frida to transform the target application, removing certificate pinning so we can see the traffic it's sending.

To do so, we first need the package id of the target process. This will be something like com.httptoolkit.android (like a domain name backwards).

You can find this listed somewhere in:

The output of frida-ps -U -a, which lists every app that's currently running.
The output of running adb shell then pm list packages -f, to see the full raw list of packages on the device.
The app's play store URL (for example, HTTP Toolkit's Play Store page is play.google.com/store/apps/details?id=tech.httptoolkit.android.v1 and the package id is tech.httptoolkit.android.v1).

If you want to test this, but you're not sure what to un-pin, I've published a demo certificate pinning app at httptoolkit/android-ssl-pinning-demo. You can download a built APK of that app from its GitHub releases page and install it with adb install ./ssl-pinning-demo.apk. The package id is tech.httptoolkit.pinning_demo.

Once we have a target app, we need a script, which will rewrite the application. Frida scripts are simple JavaScript which can use Frida's API to define replacements for methods in the target application. By doing so, they can make a certificate-checking method do nothing, make a class ignore certificate pinning configuration, or almost anything else.

Writing these scripts is quite complicated. There's many small individual scripts available, designed to remove pinning from specific target apps or certain HTTPS libraries, but not many that try to remove pinning for all HTTPS traffic.

Fortunately I've been working on a set of general-purpose Frida scripts to do this, so you can just use that. These are available in the httptoolkit/frida-interception-and-unpinning GitHub repo.

These scripts actually fully implement everything required for HTTPS interception (including proxy configuration and system certificate installation) but you can also use them independently for unpinning by itself. The script for Android certificate unpinning specifically is in the android directory as android-certificate-unpinning.js. You'll also need to include the configuration in config.js.

This script draws approaches and tricks from a wide range other public unpinning scripts, it's been tested against a huge variety of different targets already, and covers the vast majority of cases you'll find (and contributions to extend it to cover any new libraries or techniques that aren't currently covered are very welcome!). You can also go further, and use the android-certificate-unpinning-fallback.js script, which includes experimental auto-patching for obfuscated and unusual approaches that can't be covered with static rules.

To use this:

Save config.js and android-certificate-unpinning.js from github.com/httptoolkit/frida-interception-and-unpinning/ on your computer.
Modify config.js, and put the contents of your interception CA certificate into the CERT_PEM variable.
Then run: bash frida -U \ -l ./config.js \ -l ./native-tls-hook.js \ -l ./android/android-certificate-unpinning.js \ -l ./android/android-certificate-unpinning-fallback.js \ -f $TARGET_PACKAGE_NAME (Note that the slashes here just used to allow multi-line commands in bash)

This will restart the app on your phone and immediately disable all unpinning so that traffic can be captured.

See the README in that GitHub repository for more details on how this works, and all the various scripts available.

If you'd like to know more about what's detected and unpinned, you can set the DEBUG_MODE variable in config.js, and you'll see output showing every detected script and whether it was patched, along with logs each time a hooked method is used.

Testing certificate unpinning

Ok, you've got a device, you've got Frida set up and you've got a script that can unpin HTTPS certificates. It's time to test this out!

Here's a few high-profile apps that use certificate pinning to protect their HTTPS traffic which might be interesting to play with:

Let's try taking a look at their HTTPS traffic. You'll need an HTTPS debugging proxy to test this - HTTP Toolkit will work, or you can use another HTTPS-intercepting proxy like Burp or Charles if you'd prefer.

Once you're intercepting the device, try opening any of the above apps and you'll see TLS connection errors in the debugging tool, and odd behaviour in the app. For example, when opening Twitter, HTTP Toolkit shows me this:

That means that an HTTP client (the Twitter app) is connecting and then rejecting the certificate immediately, without sending any requests. Each of these apps provides some errors like this, for the specific hosts whose certificates have been pinned within the app.

To defeat this and intercept Twitter's real API traffic, I just need to run:

frida -U \
    -l ./config.js \
    -l ./native-tls-hook.js \
    -l ./android/android-certificate-unpinning.js \
    -l ./android/android-certificate-unpinning-fallback.js \
    -f com.twitter.android

That restarts Twitter on my phone, and I've immediately got traffic:

Jackpot! We've removed the pinning, so that the Twitter app now trusts our MitM HTTPS proxy, and we can intercept and inspect its traffic.

From here, you can explore the content of each of those requests, or add rules to rewrite, mock or block them entirely.

Caveats

In theory, Frida is capable of defeating absolutely any certificate pinning you could possibly implement: if you can write some code to check a certificate, Frida can remove that code.

That said, this all depends on whether the script you use is aware of the specific certificate pinning code or APIs that are used. Whether this technique works depends entirely on the combination of target app and the Frida script.

The above script does remove certificate pinning from every built-in API or widely used library I'm aware of, and I've tested it successfully against the apps listed here and a long list of others. It's a good general-purpose script for most cases, but it won't work in absolutely 100% of certificate-pinned apps today. If you do find cases that aren't handled, I'm very interested in examples and contributions to cover more cases to help strip out as many certificate pinning implementations as possible, so do please file an issue!

Notably some apps which will go above and beyond, by implementing their own custom certificate pinning techniques from scratch, to make disabling it as difficult as possible. The prime example of this is the various Facebook apps, which all use their own custom reimplementation of TLS rather than the standard platform APIs.

It's definitely possible to automatically remove certificate pinning features from that too within the same Frida script in theory (contributions very welcome!), but it's significantly more difficult than mocking out a well-known common library, so I haven't done that yet, and so this script won't work for Facebook, Facebook Messenger, Instagram, or similar.

Fortunately that doesn't matter though, because Facebook offer a whitehat option in their apps to allow security researchers to disable certificate pinning directly, and you can just use that instead.

What next?

Hopefully you've now got Frida working, and you can see, debug & rewrite secret API traffic from every app you're interested in.

The next step is to start exploring further, to examine the APIs used and data leaked by other popular apps, and to help find and fix cases where this Frida script doesn't yet work, so we can stub out every last pinning API. Get testing!

Have any questions, or run into problems? Feel free to open an issue on GitHub or get in touch on Mastodon or on Twitter.

Encoding your HTTP for fun and profit

Thu, 10 Jun 2021 14:00:00 GMT

HTTP content encoding is an incredibly powerful tool that can save you huge amounts of bandwidth and make your web or mobile application faster, basically for free.

Unfortunately, it's poorly understood by most developers. There's a lot of power here, but few people are aware of the options or what "content encoding" really means, so it's mostly left to be handled automatically (for better or worse) by your web server.

In many cases that means no encoding at all. In some helpful cases (typically CDNs or static site PaaS hosts) a useful basic default will be provided, but those defaults are rarely the best choice for every situation.

With just a tiny sprinkle of knowledge, you can enable this and speed up your web application, your API, and all your HTTP requests & responses in no time at all.

What is content encoding?

Content encoding is the wrapper around the meaningful body of an HTTP request or response.

It is not the type of the content - that's something else entirely. This is a common mistake!

For example, you might have a response that contains JSON data, encoded with gzip. In that case, you'd use HTTP headers like:

Content-Encoding: gzip
Content-Type: application/json

This tells the HTTP client that it needs to unwrap the content using gzip, and then it's going find some JSON inside. That's the best way to think of it: if you receive a request or response with a content-encoding header, then you should undo that content encoding (e.g. un-gzip the body) and then you'll find content that matches the content-type header.

The main use for this by a very long way is compression: there's a variety of different compression algorithms you can use to shrink your request and response bodies dramatically. You could also use it to describe a layer of encryption around the content though (for unusual environments where HTTPS isn't sufficient/possible) or to send content encoded in a format that's more easily compatible with other infrastructure (encoding it as base64 rather than raw binary data, for example).

It's important not to confuse this with transfer-encoding which is used to describe encodings at a hop-by-hop level (e.g. between a client and a proxy, not the final server). You can mostly ignore this. It's rarely used as far as I can tell except for 'chunked' encoding, which is very widely used to send data without specifying the length in advance. It's effectively independent of compression or other content transformation, and you can forget about it unless you're streaming data back and forth in HTTP.

Lastly, for bonus fun, in some places you'll see a content-transfer-encoding header. Some people try to use this in HTTP - it was a MIME header for email, made obsolete in 1996, which is designed to describe how content should be encoded in an email-safe format. It's not relevant to HTTP, unless maybe you're delivering HTTP requests and responses via email? Avoid.

Why should I encode my content?

Content encoding is mostly used to compress request and response data. If you do this, you can shrink all your requests and responses enormously. In practice, you're often looking at reductions on the order of 70% for many typical responses, and up to 95% for very compressible data like JSON API responses. These are huge bandwidth savings!

Smaller requests means faster data transfer, fewer bandwidth costs for your servers, and lower data costs for clients with limited data plans.

In most cases, compressing your HTTP requests & responses is an easy win for everybody. It happens automatically in many CDNs but elsewhere it's largely forgotten by many developers, unnecessarily increasing the amount of data they transfer by a huge amount.

As an example, OYO Rooms reduced their latency by 37% and Google Play reduced bandwidth usage by 1.5 petabytes per day (!) by changing their content encoding configurations.

That said, there are some good reasons not to compress an HTTP message body:

If the body is very very small it doesn't help much, and below about 150 bytes you risk making the content larger than it was before (although only ever so slightly).
If the content is a format that itself already includes compression, then more compression doesn't usually help. This applies to many image and video formats and PDF files.
If bandwidth is much cheaper than processing power, e.g. in some IoT environments with very dumb hardware decompression time can be larger than transfer time.
If you don't know if it's supported. All modern browsers and HTTP clients will support compressed content in some form though, and you can detect this automatically using the Accept-Encoding header in incoming requests.

If you're building a mobile or web app, these conditions don't apply for most of your content, especially for HTML, JavaScript, CSS, your API requests & responses, and any other human-readable or structured data. You probably want to encode basically everything except images & video (in most cases - e.g. SVGs usually compress great).

How would you like to make all your HTTP requests and responses 70% smaller?

What HTTP content encodings can I use?

The first question before you can start using content encoding is to decide which encoding to use.

The encoding of choice depends on the context, especially the client and server involved, since both have to support the encoding. Helpfully, clients advertise the encodings they support with every request in a Accept-Encoding header, so you can use that to detect this automatically.

The official registry of encodings is here. Most encodings there are pretty rarely used or outdated though, so the common and interesting content-encoding values you might care about:

identity - an encoded body, it's just raw content. Valid, but you might as well just omit the header entirely.
gzip - the standard: the content is compressed using gzip, widely used & supported almost everywhere.
br - the new kid: the content is compressed using Brotli, a new (well, 2013) compression algorithm originally used in WOFF font files. Supported in most browsers for a couple of years now, and significantly more powerful than gzip.
zstd - the very very new kid: the content is compressed using Zstandard, a compression algorithm from Facebook designed to compress better than gzip, but also especially focused on allowing much much faster decompression (at least 3.5x faster than Brotli or Gzip, according to their benchmarks). Only standardized in 2018 and not widely supported at all, yet…

In most cases, you should use Brotli if you can, and gzip when you can't. Zstd looks very promising, but it's still too early and unsupported for most use cases.

You can also combine these if you want (although it's unusual). For example a content-encoding header like content-encoding: gzip, brotli means that the content was gzipped, and then brotlied. This is a weird thing to do, and it's usually a bad idea, but it's an option.

You're not strictly limited to the official list though. It's encouraged, and it's useful to stick with those if you want easy support from existing tools and clients, but if you have your own specific content wrapper format then you can use that instead. Try to use a unique name that's not going to conflict in future, and remember that this is a content wrapper format, it's not the format of your content itself. And if you use a content-encoding that you think might be useful for others, do consider registering it officially!

All of this sounds simple enough once you get the idea, but lots of people did not read this helpful article. You'll see quite a few totally wrong content-encoding values used in the wild, for example:

amz-1.0 - only used in some AWS APIs. There's no wrapper here, it's just plain JSON, so why is this required?
json - JSON is a content type! It's not a wrapper that you open to get to the real content (how would you 'decode' JSON to get the real data within, without any more information?) A JSON body is content itself.
utf-8 - This is a charset, not a content wrapper. If you want to specify it, put it in your content-type, like content-type: application/json; charset=utf-8.
text - What? How you unwrap text to get at real content?
binary - Sigh.

All of these are content-encoding values I've seen in real world traffic. All of them are wrong. Don't be that person.

Where's my profit?

OK, that makes sense, but you want some useful info that's going to make your life better. How should you use this?

Supporting content-encoding on the server

First, check if your site or API already uses an encoding for responses. The easiest way is to make a request that offers every encoding, and look for a content-encoding header in the response to see what's supported.

For example, with cURL:

curl -I -H"Accept-Encoding: gzip, br, zstd" https://example.com | grep -i content-encoding

(Some poorly behaved servers might not handle multiple values, in which case you can test with a separate request for each one).

If you already support Brotli, that's great! For modern browsers and servers that's generally the right choice. You might want to look into Zstd, if you control both the server & client (e.g. for mobile apps), but that will require more work since most tools don't support it automatically.

If you're using gzip, you should probably investigate Brotli. There's easily available libraries for every server under the sun, so you can normally drop it in and go for substantial improvements with clients that support it (e.g. every modern web browser). This will depend on your data and use case though, so make sure you test the differences here.

If you're not using any compression at all for some of your responses, you should be able to get huge improvements with minimal effort by enabling at least gzip on your server. This is often an standard option or module you just need to enable in your server.

On top of that, you may want to support compressed requests too, not just responses. If you're accepting large compressible uploads (>1KB, not just images/video) then compression can make a big difference. That's rarely supported automatically so you'll need to check how this works in your framework and/or server of choice, and potentially write a small middleware wrapper to check the content-encoding of incoming requests and handle this.

Lastly, if you're caching these responses, don't forget to include Accept-Encoding in your Vary header. This ensures that you won't cache encoded content in one encoding and return it to clients asking for a different encoding.

Supporting content-encoding in clients

In the browser:

For response content, there's nothing required - browsers will automatically set Accept-Encoding and decode content for you in every format they support. You could manually decode unusual encodings, like zstd, but it's rarely worthwhile.

If you want to compress large request bodies (uploads), you'll need to do so manually in JavaScript. You can use pako to do this with gzip in the browser. Brotli is possible too but it's complicated and the compression engine is too large to include in most web applications, so it's rarely a good idea unless you're doing a lot of large (>10MB) uploads.

On mobile:

First, you'll need to make sure you send an Accept-Encoding header, and handle response decompression if the response has a Content-Encoding response header.

Most libraries have built-in support for at least gzip, e.g. OkHttp will handle gzip completely automatically and can support Brotli with a one-line change, while NSSessionUrl will handle gzip responses completely automatically, and apparently Brotli too on iOS 11+.

That only tends to apply to responses. For requests you'll often need to enable this yourself, e.g. by registering a custom interceptor with OkHttp.

What next?

Ok, hopefully you're compressing all your non-trivial requests & responses now, for a huge bandwidth boost!

If you're interested in debugging this more closely, check out HTTP Toolkit to easily inspect and debug all your HTTP. It's fully open-source, and free version will show you everything you need here, while Pro even has a performance section that compares the compression ratios for each encoding on each of your responses so you can see the potential benefits directly inline.

Do keep an eye out for more developments in content encodings in future - it's likely that Zstandard will mature in the coming years, and that there will be even more powerful and performant encodings that come out in down the line.

Have questions, or interesting content-encoding facts you'd like to share? Get in touch on Twitter.

Do you work with HTTP all day? Download HTTP Toolkit now to inspect & mock HTTP from browsers, servers, apps and anywhere else in one click.

Build an HTTPS-intercepting JavaScript proxy in 30 seconds flat

Tue, 27 Apr 2021 14:00:00 GMT

HTTP(S) is the glue that binds together modern architectures, passing requests between microservices and connecting web & mobile apps alike to the APIs they depend on.

What if you could embed scripts directly into that glue?

By doing so, you could:

Inject errors, timeouts and unusual responses to test system reliability.
Record & report traffic from all clients for later analysis.
Redirect requests to replace production servers with local test servers.
Automatically validate and debug HTTP interactions across an entire system.

It turns out setting this up is super quick & easy to do. Using easily available JS libraries and scripts, you can start injecting code into HTTP interactions in no time at all. Let's see how it works.

Putting the basics together

Mockttp is the open-source HTTP library that powers all the internals of HTTP Toolkit, built in TypeScript. It can act as an HTTP(S) server or proxy, to intercept and mock traffic, transform responses, inject errors, or fire events for all the traffic it receives.

First though, if you want to inspect & edit HTTP manually with a full UI and tools on top, it's better to download HTTP Toolkit for free right now instead, and start there!

On the other hand, if you do want to build scripts and automations that capture & rewrite HTTPS, or if you've used HTTP Toolkit and now you want to create complex custom behaviour on top of its built-in rules, then Mockttp is perfect, and you're in the right place.

Getting started with Mockttp is easy: install it, define a server, and start it. That looks like this:

Create a new directory
Run npm install mockttp

Create an index.js script:

(async () => {
    const mockttp = require('mockttp');// Create a proxy server with a self-signed HTTPS CA certificate:
const https = await mockttp.generateCACertificate();
const server = mockttp.getLocal({ https });

// Inject 'Hello world' responses for all requests
server.forAnyRequest().thenReply(200, "Hello world");
await server.start();

// Print out the server details:
const caFingerprint = mockttp.generateSPKIFingerprint(https.cert)
console.log(`Server running on port ${server.port}`);
console.log(`CA cert fingerprint ${caFingerprint}`);
})(); // (Run in an async wrapper so we can use top-level await everywhere)

Start the proxy by running node index.js

And you're done!

To make this even easier I've bundled up a ready-to-use repo for this, along with easy Chrome setup to test it, on GitHub.

This creates an HTTPS-intercepting MitM proxy. All requests sent to this server directly or sent through this server as a proxy will receive an immediately 200 "Hello world" response.

From the client's point of view (once configured) it will appear that this fake response has come directly from the real target URL (e.g. https://google.com/) even though it's clearly being injected by our script.

When started, this script prints the port it's running on, the fingerprint of CA certificate used, which you can use to quickly temporarily trust that certificate in some clients, e.g. all Chromium browsers.

To test your proxy right now, connect a browser to it (assuming you have Chrome installed) by running:

google-chrome --proxy-server=localhost:$PORT --ignore-certificate-errors-spki-list=$CERT_FINGERPRINT --user-data-dir=$ANY_PATH

You'll need to replace the $variables appropriately ($ANY_PATH will be used to store the profile data for a new temporary Chrome profile that will trust this CA certificate) and you may need to find the full path to the browser binary on your machine, if it's not in your $PATH itself.

If you don't like Chrome, the exact same arguments will work for any other Chromium-based browser, e.g. Edge or Brave, and we'll look at how to intercept all sorts of other clients too in just a minute.

If you run this, and visit any URL in the browser that opens, you should immediately see your "Hello world" response being returned from all requests to any URL, complete with the nice padlock that confirms that this message definitely definitely came from the real website:

Neat!

With this, we can now invisibly rewrite real HTTPS traffic. Let's make that traffic do something more exciting.

Rewriting HTTPS dynamically

Mockttp lets you define rules, which match certain requests, and then perform certain actions.

Above, we've created a script that matches all requests, and always returns a fixed response. But there's a lot of other things we could do, for example:

// Proxy all example.com traffic through as normal, untouched:
server.forAnyRequest()
    .forHostname("example.com")
    .thenPassThrough();

// Make all GET requests to google.com time out:
server.forGet("google.com").thenTimeout();

// Redirect any github requests to wikipedia.org:
server.forAnyRequest()
    .forHostname("github.com")
    .thenForwardTo("https://www.wikipedia.org");

// Intercept /api?userId=123 on any host, serve the response from a file:
server.forGet("/api")
    .withQuery({ userId: 123 })
    .thenFromFile(200, "/path/to/a/file");

// Forcibly close any connection if a POST request is sent:
server.forPost().thenCloseConnection();

Rules like these give you the power to rewrite traffic any way you like: pass it through untouched like normal, replace responses, redirect traffic, you name it.

Replace the "hello world" line in the previous example with some of these rules, restart your server, and then try browsing the web again. Example.com will now work fine, but Google will be completely inaccessible, all POST requests will fail, and Github.com will be inexplicably replaced with the content of Wikipedia.org:

If you'd like to use more rules like this, the detailed API docs provide more specific information on each of the methods available and how they work.

By default each rule will only be run for the first matching request it sees, until all matching rules have been run, in which case the last matching rule will repeat indefinitely. You can control this more precisely by adding .always(), .once(), .times(n), etc, as part of the rule definition. If you're defining overlapping rules, you probably want to use .always() every time.

Advanced custom rewrite logic

There's some more advanced types of rule we can add to our script: we can define our own custom request or response transformation logic.

Using this, it's possible to run arbitrary code that can send a response directly, intercept a request as it's sent upstream, or intercept a response that's received from a real server. You can examine all real request & response content in your code, and then complete that request or response with your own changes included.

That looks like this:

// Replace targets entirely with custom logic:
let counter = 0;
server.forAnyRequest().forHostname("google.com").thenCallback((request) => {
    // This code will run for all requests to Google.com
    return {
        status: 200,
        // Return a JSON response with an incrementing counter:
        json: { counterValue: counter++ }
    };
});

// Or wrap targets, transforming real requests & responses:
server.forAnyRequest().forHostname("example.com").thenPassThrough({
    beforeResponse: (response) => {
        // Here you can access the real response:
        console.log(`Got ${response.statusCode} response with body: ${response.body.text}`);

        // Values returned here replace parts of the response:
        if (response.headers['content-type']?.startsWith('text/html')) {
            // E.g. append to all HTML response bodies:
            return {
                headers: { 'content-type': 'text/html' },
                body: response.body.text + " appended"
            };
        } else {
            return {};
        }
    }
});

The first rule will handle all Google.com requests by itself. The second rule will forward requests upstream, get a response, and then run the custom logic before returning the appended response back to the client:

You can similarly use beforeRequest to change the content of outgoing requests. Check the docs for a full list of the options and return values available.

Connecting more clients

So far we've created a proxy that can automatically rewrite specific traffic from a Chromium-based browser. That's great, but a bit limited. How do you connect more clients?

There are generally two steps required:

Configure the client to use your Mockttp proxy as its HTTP(S) proxy
Configure the client to trust your HTTPS CA certificate

Configuring your client to use your proxy

Configuring the proxy settings will depend on the specific HTTP client you're using, but is normally fairly simple and well documented.

You can often get away with just setting HTTP_PROXY and HTTPS_PROXY environment variables to http://127.0.0.1:$YOUR_PROXY_PORT, as that's a common convention, but that won't work everywhere. Alternatively, in many cases you can change your system-wide proxy settings to use this proxy, but be aware that this will intercept all traffic on your machine, not just the target application.

If you want to intercept a Node.js application specifically, there is no global configuration option, but you can use the global-agent npm module with a GLOBAL_AGENT_HTTP_PROXY environment variable to do this like so:

npm install global-agent

export GLOBAL_AGENT_HTTP_PROXY=http://127.0.0.1:$YOUR_PROXY_PORT
node -r 'global-agent/bootstrap' your-target-app.js

For other cases, you'll need to look into the docs for the HTTP client in question.

Configuring your client to trust your CA certificate

This is the step that ensures the client trusts your proxy to rewrite HTTPS.

It's normally easiest to create CA certificate files on disk, and then import them, so you can easily load them directly into other software.

You can do that in JS by saving the key and cert properties of the CA certificate to a file. Like so:

const mockttp = require('mockttp');
const fs = require('fs');
const { key, cert } = await mockttp.generateCACertificate();
fs.writeFile("key.pem", key);
fs.writeFile("cert.pem", cert);

This creates key.pem (your certificate private key) and cert.pem (your public CA certificate) files on disk, so you can use the same key & certificate every time, and so you can import the CA certificate into your HTTPS clients.

You can reuse these saved certificate details, instead of creating a certificate from scratch every time, by changing your server setup to look like this:

const server = mockttp.getLocal({
    https: {
        keyPath: './key.pem',
        certPath: './cert.pem'
    }
});

These certificate files can be imported into most tools either via UIs (e.g. in Firefox's certificate settings) or via environment variables (e.g. SSL_CERT_FILE=/path/to/cert.pem).

If you want to intercept a Node.js process, there's a custom NODE_EXTRA_CA_CERTS variable you can use to do this.

As a full example, combining that with the proxy settings above, that looks like this:

export NODE_EXTRA_CA_CERTS=/path/to/cert.pem # Trust the cert
export GLOBAL_AGENT_HTTP_PROXY=http://127.0.0.1:$YOUR_PROXY_PORT # Use the proxy

# Start your target app, fully intercepted:
node -r 'global-agent/bootstrap' your-target-app.js

If you're having trouble with either of these steps, you may be interested in the source behind the HTTP Toolkit Server, which automatically sets up a wide variety of clients for use with HTTP Toolkit in general, from Android to Electron apps to JVM processes.

Going further

To wrap up then, what can you do with this? Here are some ideas:

Create a proxy that completely blocks various hostnames or file types. No more ad networks, no more PDFs, no JS bigger than 100KB, whatever you like.
Proxy traffic during testing to replace some of your internal services or external dependencies with simple mocked versions, with no code changes required in the system under test.
Capture and log all traffic sent through your proxy matching certain patterns.
Randomly add delays or timeouts to test the reliability of your clients in unstable environments.
Combine this with HTTP Toolkit by redirecting some traffic there to your local proxy, to combine a full debugging UI with any custom logic you please, like so:

Play around with the example repo, and feel free to get in touch on Twitter if you build anything cool or if you have any questions.

One Port to Rule Them All

Thu, 15 Apr 2021 16:45:00 GMT

Traditionally, a TCP port has a single server listening for incoming connections, and that server expects you to send messages in the right protocol for that port. For HTTP, it's normally a web server that'll send you a response directly, or some kind of proxy that will pass all requests through to another server, and then pass the responses back.

This is boring.

What if you could accept everything, from proxied HTTPS to plain-text HTTP/1.0, all on a single port?

HTTP Toolkit acts as an HTTP(S) proxy for debugging and interception. With all the possible combinations of clients and configurations, tools like this can be complicated to set up, and getting everything working and properly intercepted is a common pain point.

To make setup as easy as possible, HTTP Toolkit uses a single incoming port for absolutely everything, for every widespread HTTP format, for both HTTP & HTTPS, for both direct and proxied requests.

Specifically, on one single port it accepts:

Plain HTTP/1.* (1.1, or 1.0 if you just can't quit the 90s)
HTTP/1.* over TLS (HTTPS)
Plain-text HTTP/2 with prior knowledge
Plain-text HTTP/2, upgraded on the first request by an Upgrade header
HTTP/2 over TLS (HTTPS) negotiated via ALPN

These can then all be combined to suit your tastes with a selection of ways to make your actual HTTP request:

Make a direct request to HTTP Toolkit's URL as if it were a server, and mock a response for that in the app (GET /).
Proxy through HTTP Toolkit explicitly in plain text (GET http://example.com/).
Redirect unsuspecting traffic that's not aware of the proxy to HTTP Toolkit, to transparently proxy traffic elsewhere:

  GET /
  Host: example.com

Tunnel traffic by connecting with HTTP/1.1, sending CONNECT example.com:443 to make the connection into a tunnel to another server, and then doing any of the above within that tunnel.
Tunnel traffic within a single HTTP/2 stream, by sending a CONNECT request to convert that one stream into a tunnel, and then doing any of the above again within that tunnel.

No matter what you send, or what tunnels you create, at every step you're only ever talking to HTTP Toolkit.

All tunnels and proxying are just connections that get unwrapped, intercepted, and handled again, looping back through HTTP Toolkit until you make a real request, at which point your configured rules are applied (which might then proxy traffic upstream, redirect it, return a fixed response, reset the connection, or anything else).

All the above can be combined together on a single connection, and then combined in different ways in the following tunnel. You can connect to HTTP Toolkit with TLS, use HTTP/1.1 to open a CONNECT tunnel to a remote server through that, send the remote server a plain text HTTP/1.0 request asking to upgrade, then make your real request with HTTP/2, and you're still just talking to HTTP Toolkit.

This allows HTTP Toolkit to transparently intercept traffic from every possible client configuration, all in one place.

It might sound confusing right now, but it's certainly not boring. How does it work?

Under the hood

There's a few steps involved in making this work smoothly, powered by two key tricks: connection packet sniffing, and the magical stream & server APIs of Node.js.

1. Sniff the data

When a connection is received, we look directly at the first byte on the stream and:

If the first byte is 0x16, it's a TLS connection (this indicates a TLS handshake)
If the first byte is 0x50 ('P'), it's probably the start of the HTTP/2 preamble (which looks like PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n, sent before the raw binary data begins on all HTTP/2 connections). We wait for the full preamble, just to be sure, then treat this as plain HTTP/2.
Otherwise, it's probably plain-text HTTP/1 (or some completely unknown protocol)

Implementing this in practice looks something like this:

const firstByte = socket.read(1);

if (firstByte === 0x16) {
    // Do something with this TLS connection
} else if (firstByte === "P".charCodeAt(0) && isHttp2Preamble(socket)) {
    // Do something with this HTTP/2 connection
} else {
    // Do something with this HTTP request
}

This tells us what the first protocol on this connection is going to be, but we still need to fill in those blanks.

(Credit where it's due: the original concept here came from @mscdex, who built the original HTTP/1-only implementation that all this logic is based on)

2. Connect the right server

Ok, so we know what protocol is coming on our connection, now we need to actually handle the sniffed protocol.

To do this normally you'd create a server for the protocol, tell it to listen on a given port, and then expect it to handle traffic to that port and give you usable streams or requests or whatever that expose the meaningful data from within the protocol.

For example, you can start a TLS server to listen on a port, and it'll handle TLS for you and expose streams to which you can read and write your application data. Each stream write will be encrypted and sent on the TLS connection, and each incoming TLS packet will be transparently decrypted, with the stream exposing the useful data within.

Similarly, you can start an HTTP/2 server listening on a port, and once somebody connects and sets up the HTTP/2 connection it'll fire an event for each request, so that you can handle the request and send your response.

This is normally great, but it's not going to work for us: in every standard use of these servers, they completely control the port and the stack of protocols required internally to give you the behaviour you want.

Helpfully though, there is a little-used alternative API that can do this. Instead of asking the server to listen on a port, you can directly pass it a readable & writable stream (pretending it's an incoming raw network connection) and it'll run its own protocol on top of that, just as if it were a real socket.

The API to do this is simple: server.emit('connection', myStream). When you do that, the server runs all the same logic as if a new network socket had arrived, but it uses that stream as the transport.

Adding that into the mix, we can implement logic to sniff and then handle incoming connections like so:

// Create a real server that'll listen on a real port:
const rawServer = new net.Server();

// Create various sub-servers, which will handle the actual
// protocols, once we work out which one is relevant:
const httpServer = http.createServer();
const http2Server = http2.createServer();
const tlsServer = tls.createServer(tlsConfig);

// Sniff and then delegate incoming sockets:
rawServer.on('connection', (socket) => {
    const firstByte = socket.read(1);

    if (firstByte === 0x16) {
        tlsServer.emit('connection', socket);
    } else if (firstByte === "P".charCodeAt(0) && isHttp2Preamble(socket)) {
        http2Server.emit('connection', socket);
    } else {
        httpServer.emit('connection', socket);
    }
}));

rawServer.listen(8000); // Only the raw server is attached to a port

(Simplified for readability, feel free to dig into the full implementation if you're interested).

It's important to note that HTTP Toolkit can decrypt and intercept TLS connections for any domain using the above TLS server, because it's set up as an HTTPS MitM proxy. Those details are a topic for another blog post (e.g.) but in practice this means the tlsConfig here contains a CA certificate trusted by all clients to issue certificates for any host we like, so we can handle and decrypt TLS connections for any host that's requested.

With that, this gives us enough to immediately handle the first step for all 3 protocols in one place on one port, but there's one big problem.

3. Pretend we never sniffed the data

When you remove data from a stream, it's removed from the buffer entirely. Once we've read the first byte from an incoming socket, that data is removed from the socket's buffer, and it's no longer readable.

Because of this, when we pass the sockets to a server, they're all missing the essential initial data. TLS sockets are missing the 0x16 that signals an initial client handshake, plain-text HTTP is missing the first letter of the HTTP method (ET, OST, ELETE, PTIONS), and HTTP/2 is missing the whole of its required preamble.

This breaks everything. Fortunately, there's another convenient Node streams API that can save us!

After we've read the data, we just need to push it back into the socket's buffer, so everything is as it was before. We can do that nice and easily by adding socket.unshift(data). This is a rarely used Node streams API, but it's officially supported and it works nicely.

If we add that just after we read the data then everything will work nicely:

rawServer.on('connection', (socket) => {
    const firstByte = socket.read(1);
    socket.unshift(firstByte);

    // ...

4. Unwrap TLS

Even once that's working, we still need to do something inside the TLS server to make it useful. HTTP Toolkit is looking for HTTP requests, so when we do accept a TLS connection we then need to parse and handle the decrypted TLS content somewhere.

Once we get to the TLS stage though that's easy enough, because modern TLS protocols are negotiated explicitly, using ALPN.

For our purposes the details of that don't matter, but the end result is that after the TLS handshake is done, the client and server have agreed what protocol they're going to use. We just need to handle it, by replacing the TLS setup above with:

const tlsServer = tls.createServer(tlsConfig, (tlsSocket) => {
    if (tlsSocket.alpnProtocol === false || tlsSocket.alpnProtocol.startsWith('http/1')) {
        // If the client doesn't support ALPN, or explicitly wants HTTP/1.*, use that:
        httpServer.emit('connection', tlsSocket);
    } else if (tlsSocket.alpnProtocol === 'h2') {
        // The client wants to talk HTTP/2, so pass the socket to the HTTP/2 server
        http2Server.emit('connection', tlsSocket);
    } else {
        // Unknown protocol - this shouldn't happen because we can configure which
        // protocols the server will accept ourselves within the TLS config.
    }
});

Here we're now giving a TLS socket to each of the HTTP servers, while we're giving them a plain socket in the previous example. That's OK though, as this is all invisible to them. They just get given streams, and they read and write plain text data from them and it works, the protocol carrying the stream doesn't matter.

Strictly speaking, some of this isn't totally necessary. For HTTP/2, Node already supports accepting both HTTPS HTTP/1.1 and HTTP/2 on the same port via ALPN with the allowHTTP1 option. That intentionally only works for HTTPS though, not plain text, and we can't easily combine it with the rest of the logic here, so it's better to do everything ourselves instead.

5. Build some tunnels

We've now got a net.Server which receives packets from the network, and two HTTP servers that receive and process the appropriate requests, on all the protocols I listed at the start.

We're not handling the requests yet, but even if we added a request listener, we would still only be accepting direct HTTP requests so far (e.g. unproxied GET requests). To capture tunnelled content, we need to handle CONNECT requests too.

CONNECT tunneling is something that many application developers aren't aware of, but it's a powerful feature that's also actually very simple: the client sends a CONNECT request including the target host & port, the proxy sends a 200 OK response, and then the socket becomes a raw tunnel to the given target, so every byte sent is forwarded directly to the remote server untouched.

That gives you a connection to the target, and on top of this you'd typically use TLS so the proxy can't see what you're sending.

Implementing this ourselves is surprisingly easy & elegant:

// When somebody sends an HTTP/1.1 CONNECT request:
httpServer.on('connect', (connectRequest, socket) => {
    // Tell the client the tunnel is connected, so they can start talking
    // to the remote server:
    socket.write('HTTP/1.1 200 OK\r\n\r\n');

    // That was a lie: pass the socket straight back our raw sniffing server
    // and read all the tunnelled data ourselves as if it were a new connection.
    rawServer.emit('connection', socket);
});

This completes the loop: if you create a tunnel, the socket goes back to the net.Server, which reads the first byte to work out what the data is, and then passes the socket to the appropriate server for the sniffed protocol, and then we listen there for more CONNECT requests… That works just fine though, and this means we can handle tunnels in tunnels in tunnels, as deep as you want to go!

That's how this works for HTTP/1.1. For HTTP/2 the concept is a little different, because a single HTTP/2 connection contains many parallel streams, each of which can include requests and responses at the same time. This is how HTTP/2's multiplexing works: by wrapping all request and response data in frames, which include a stream id, so you can tell which requests match which responses.

That framing applies to CONNECT requests too. When you proxy over HTTP/2, a single stream within an HTTP/2 connection becomes a tunnel, not the whole connection. This means that when you send data through the tunnel, it's actually wrapped up in an HTTP/2 frame marking it as part of the tunnel stream, rather than being sent raw as in HTTP/1.1.

We don't have to care about all that though, because the API is still super easy:

http2Server.on('connect', (connectRequest, response) => {
    // Once again, tell the client we've created a tunnel:
    response.writeHead(200, {});

    // And then betray them, handling the connection ourselves:
    rawServer.emit('connection', response.stream);
});

We're now firing a connection event that doesn't even contain a socket any more. response.stream is just a stream that is part of the larger HTTP/2 connection. Doesn't matter though - net.Server can still write to it just like any other stream, so we loop around again and the protocol sniffing continues.

6. Handle real requests

All of this is great, and yet we've achieved nothing: when the ~~tonguing~~ tunneling is done we still can't handle an HTTP request. That's the last step:

const requestListener = (request, response) => {
    // ...Read from the request, write to the response.
    // In reality HTTP Toolkit matches the request against the configured
    // rules here, and then delegates this to an appropriate request
    // handler that can respond somehow.
};

// We pass both HTTP/1 and HTTP/2 requests to the listener. There's only a
// small number of differences here, but making the URL absolute using the
// appropriate header is 90% of the work to support this, and accept other
// transparently redirected requests too.
httpServer.on('request', (request, response) => {
    request.url = getAbsoluteUrl(request.url, request.headers['host']);
    requestListener(request, response);
});

http2Server.on('request', (request, response) => {
    request.url = getAbsoluteUrl(request.url, request.headers[':authority']);
    requestListener(request, response);
});

That's it! Put this code together, and you can handle all those different types of HTTP requests, all in one place.

Wait, what about HTTP/3?

Touché, you got me. This can intercept almost all kinds of HTTP requests in widespread use today, but it doesn't yet support HTTP/3.

HTTP/3 is different, in that it runs over UDP, not TCP, so it's never going to be possible to completely intermingle it with TCP connections and tunnels like this.

That should make it simpler to implement, as it creates a strictly separate request pipeline, although that would be a bit less fun. In theory it should support tunnels too though, so you can tunnel HTTP/1.0 over TLS over HTTP/3 over QUIC over UDP, I think… (This is going to need more research).

Either way I intend to try and ensure the server uses the same UDP & TCP port numbers regardless, where possible, to simplify setup as much as I can.

In the short term, the main reason this isn't supported is because Node.js doesn't support either QUIC (the underlying UDP-based protocol) or HTTP/3 yet without enabling experimental features. I'd rather wait for it to be production ready, but it's scheduled to be included in Node.js v16, landing next week, so hopefully this will be available soon! Watch this space.

Real talk

That's a quick overview of how this all works. Of course the code above is significantly simpler than the real code HTTP Toolkit runs in production. There's many more details involved in making this stable & effective!

However, if you're looking to implement similar things for real yourself, I have 3 pieces of good news:

First, I've published the connection sniffing HTTP server as a standalone npm package called @httptoolkit/httpolyglot, so you can drop that into your projects and immediately start accepting all HTTP protocols in one place straight away. It looks like this:

const httpolyglot = require('@httptoolkit/httpolyglot');
const fs = require('fs');

const server = httpolyglot.createServer({
    // Provide your HTTPS configuration:
    key: fs.readFileSync('server.key'),
    cert: fs.readFileSync('server.crt')
}, (req, res) => {
    // Both HTTP/1 and HTTP/2 requests will end up here, for both plain text and HTTPS.

    // Both of them support the same core request & response API:
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end((req.socket.encrypted ? 'HTTPS' : 'HTTP') + ' Connection!');
})

// This server can then handle everything on a single port:
server.listen(8000);

Second, the complete proxy is available as a separate npm package called Mockttp. This is all of the low-level internals of HTTP Toolkit, as a standalone JavaScript package, which you can use for testing & automation, or to build intercepting proxies for yourself.

Using that, if you want to write code that accepts and proxies HTTP requests of all kinds and then handles or transforms the traffic, you can get started right now in 20 lines:

const mockttp = require('mockttp');

const https = mockttp.generateCACertificate();
const server = mockttp.getLocal({ https });

server.start(8000).then(async () => {
    // Create rules to mock responses:
    await server.get('https://example.com/').thenReply(404);

    // Or proxy requests upstream, log them, and transform the response
    await server.anyRequest().thenPassThrough({
        beforeResponse: ({ statusCode, body }) => {
            console.log(`Got ${statusCode} with: ${body.text}`);
            return { body: body.text + " appended" };
        }
    });

    console.log(`Server running on port ${server.port}`);
});

Make any requests you like any way you like against port 8000 (making sure you trust the CA certificate first, for HTTPS) and they'll all be intercepted and handled according to your rules.

Lastly, if you want to go further, all the real-world underlying implementation of this is open source. You can go explore the connection sniffing or the proxy unwrapping implementation or the HTTP normalization logic to your heart's content.

I hope all this helps you in your HTTP endeavours! If you build something cool related to this, or if you want to ask lots more questions, feel free to get in touch on Twitter.

Doing interesting things with HTTP? Download HTTP Toolkit now to capture, inspect & mock HTTP from browsers, servers, apps and anything else in one click.

Defining a new HTTP method: HTTP QUERY

Mon, 12 Apr 2021 15:00:00 GMT

Nothing is ever finished or perfect, and HTTP is no exception.

HTTP QUERY is a new HTTP method, for safe requests that include a request body. It's still early & evolving, but it was recently adopted as an IETF draft standard, and it's going to add some great new tools for HTTP development everywhere.

What does that mean, why do we need a new HTTP method, how would HTTP QUERY work?

Update: This post previously called this method SEARCH, but since it was originally published the spec has been updated, and the method is now called QUERY. This post has been updated accordingly.

HTTP methods today

Today, there are 5 main HTTP methods you'll see in modern APIs.

To understand how each one works, it's important to remember that HTTP is defined in terms of resources. A resource might be a document, or a photo, or a specific customer, or the whole list of customers, and it's identified by a URL like example.com/customers (all customers of example.com) or example.com/customers/123 (one specific customer).

GET

A GET request asks the server for a resource. This is frequently used to request HTML pages, read data from an API, or load images.

These are intended to be 'safe' requests, which purely read data. They shouldn't change the state of the server, they shouldn't have side effects, and so they can be cached in many cases (which means that many client GET responses will come from a cache, and never hit the real server).

GET requests can be parameterized by their URL, which might contain a path and/or query parameters, but they can't have a request body. It's not specifically banned, but it is defined as being completely meaningless, and many existing implementations will ignore the body or reject the request entirely if you try to send one.

They can also use Accept, Accept-Language and Accept-Encoding headers to request a specific content type, language or encoding ('give me customer 123 as XML please'), and use Range headers to request only part of a document ('give me the first 100 bytes of video 24 please').

POST

A POST request sends data to a resource on the server, and asks the server to process that data. This is a very generic "do something" request, often used to post messages, create new resources (e.g. a new customer) or trigger processing of some input.

Just like GET requests they can be parameterized by URL and various headers, but they can also include a request body: the data that the server should process. To help the server process this, the request can also have a Content-Type header, specifying the type of data in the body (e.g. application/json).

Of course, these aren't safe requests. They can change server state (by design) and may have side effects elsewhere too. Because of that, they're not cacheable in almost all cases. In fact they're anti-cacheable: if a CDN or browser sees an outgoing POST request for a resource, it will invalidate and drop any existing cached data it has for that resource.

PUT

A PUT request sends data to a resource on the server, and asks the server to create or replace the resource using that data. This is more specific than POST: while POST is used for arbitrary actions, PUT is only used to create/update state to match the body of the request.

These are generally used in APIs as a way to specifically do create & update actions on data (whilst POST might be used to trigger arbitrary actions against that data instead).

Just like POST, this will include a request body and may affect the server's state (so invalidates caching), but it can't have side effects. Instead, PUT requests must be idempotent. That means that if you successfully send the exact same PUT request twice, and nothing else happens, then everything should be in the same state as if you sent the request just once.

You can see the difference if you think about POST /documents vs PUT /documents. That POST request typically means 'create a new document', and every time you make the same request a new document will be created. The PUT request meanwhile must mean 'create or replace the entire document list with the given data'. Assuming that's something the server allows, sending a request repeatedly will leave the documents in the same state as sending it just once.

PATCH

A PATCH request sends data to a resource on the server, and asks the server to partially update the resource using that data. This can be used to update a customer's address or to append to a text document.

Like PUT & POST, this will include a request body. They're not safe requests, and they're not idempotent either (appending the same thing to a document twice will change that document both times).

DELETE

A DELETE request asks the server to delete the resource. This one's pretty self-explanatory I think.

One important note: like GET, DELETE requests cannot include a body. It doesn't make any sense to provide data when deleting data, and servers reserve the right to ignore or reject you completely if you do.

The also-rans

There are other methods widely used elsewhere, including:

HEAD: like GET, but skip the body - i.e. response metadata only
OPTIONS: request information about how to make other requests, mainly used for CORS in browsers
CONNECT: request a raw tunnel to a different server, used in proxies
TRACE: request an echo of your request, used to trace requests through proxies & CDNs (very rarely used/supported)

All of these are great, but none of them matter right now, so let's ignore them.

Summary

So: GET gets data, POST performs arbitrary unsafe operations, PUT performs idempotent full updates, PATCH performs partial updates, and DELETE deletes. GET and DELETE aren't allowed a body, but all the others are.

The differences between these have important implications. Cacheability is critical in many large applications, request safety has important UX implications, and each of these helps communicate to API developers how your API works.

In practice, if you swap all your GET requests for POST requests tomorrow, it'll technically work, but only in the loosest possible sense. Responses will cease to be cacheable, so your CDN will give up entirely and your server will burst into flames as traffic increases 10,000%. Browsers will refuse to go back/forward in your page history or retry failed page loads without huge warnings (because that's unsafe, and might have side effects), and nobody else looking at your code or request will have any idea what the hell you're trying to do.

These semantics help many of the tools and infrastructure we all use to understand what your HTTP requests mean. Using the right methods for the right things matters.

What's wrong with this picture?

That's all very well, but these options are missing something.

What if you want to do a complicated data retrieval, sending lots of data but not changing the server state?

Right now, you have two main options:

Use a GET, and squeeze all the parameters you need in the URL or headers somewhere
Use a POST, and have the request considered as unsafe & uncacheable

Neither of these is a good option.

URLs & headers typically have arbitrary non-standard length limits, and create a terrible UX for large values. You have to encode special characters & newlines, so the URL becomes completely unreadable, and you can't specify a content type for convenient parsing either. Because it's non-standard, few tools will make this easy for you too, so you're back to stringifying and concatenating queries all by yourself. You deserve better.

Meanwhile, caching is a big deal, and POST invalidates that completely. It's also fundamentally the wrong semantics: this request is not going to change any state, it's not going to have any side effects, and requiring all tools to treat it as if it will is problematic.

In addition, the kind of resources to which you might want to send a complex query are also the kind of resources to which you might want to POST data. If POST /customers creates a new customer, how do I POST a query for customer data? It is possible, but using POST for multiple operations on the same resource quickly leads you down a long and hacky road to bad software.

Fortunately, HTTP is a living and evolving standard, so we can fix this.

Enter HTTP QUERY

HTTP QUERY is a proposed new HTTP method that's intended to solve this problem.

A QUERY request is a request that's safe (does not change the target resource) that can include a body.

This helps because with QUERY we can implement the above: we can send complex data queries clearly, without either encoding them in a URL or using a POST request.

Note that this is still only a draft standard. The details will probably change, and even the name isn't 100% fixed yet (the draft is officially named "safe method with body", rather than referencing QUERY, to make that easy to change).

Take all this with a grain of salt, but as of March 2021 it's now an officially adopted IETF HTTP draft specification, so it is on an official path towards eventual standardization, if all goes well.

A raw HTTP/1.1 request using QUERY, as specified today, might look something like this:

QUERY /customers HTTP/1.1
Host: example.com
Content-Type: application/sql

SELECT username, email
WHERE DATEDIFF(DAY, GETDATE(), signup_date) > 7

(No, you shouldn't let remote clients send you arbitrary SQL queries, but you get the idea)

Right now the spec does not define the result of this query as cacheable. It's not completely clear why, but I suspect this is because caches today never take the body into account, and starting to do so would be a major change that needs some careful thought and consultation.

That said, it does avoid the cache invalidation of the equivalent POST requests. The above request as a POST would require every cache en route to drop any cached data it has for /customers, forcing all that data to be reloaded. QUERY does not, and that alone will be a big boost to many caching setups.

This has a few benefits:

The request body is clearly readable and manageable - no special encoding or length limits involved
The semantics are clear: it's just querying data
You're now free to have separate semantics for GET, QUERY & POST on the same URL

Use cases

You can use this to support complex querying in any language you like, from GraphQL to SQL to OData. Of course the server needs to understand whichever query language you're using, and you should indicate the format clearly in the Content-Type header of the request to make that possible.

This is especially interesting for GraphQL. GraphQL currently falls perfectly into the above trap, supporting both GET requests or POST requests, but with awkward caveats in either case. Moving to QUERY for read-only GraphQL requests would improve the UX significantly, and could allow GraphQL to better integrate with built-in HTTP features like caching in future.

Query languages like this are the most obvious use case, but you can go well beyond that too: this supports anything that sends a body to request data from the server without side effects.

RPC-style APIs using HTTP or other APIs that don't really 'query' data as such will get value from this being supported too (although this does stretch the currently defined semantics a bit). For example, an API to which you can send data and have the server encrypt it and return it to you. This doesn't change anything on the server, so POST isn't appropriate, and GET has the same limitations as above.

You could even use this to support things like a dry-run API for POST requests (don't change anything yet, but tell me what would happen if I did POST this data). There's a long list of possibilities!

Accept-Query

In addition to QUERY, the specification also defines an Accept-Query header. That can be used in responses like so:

HTTP/1.1 200 OK

Accept-Query: application/sql, application/graphql

This allows a server to advertise that it accepts QUERY requests, and signal the specific format(s) of query that it will accept. This is similar to the existing Accept-Patch header.

The server could include this in any responses, but it's particularly useful in responses to OPTIONS requests, where the client queries a resource to ask what functionality is supported.

Caveats

This is the start of a great proposal imo, but there are things you need to be aware of, which create some gotchas and possible improvements in the spec as it stands today.

SEARCH has a history

(Update: this section is no longer wholly correct, but is kept because it's interesting context. These benefits and conflicts described here disappeared when the method was changed from SEARCH to QUERY, which has no such history.)

This standard was originaly based on the SEARCH method from WebDAV (an HTTP extension designed for document authoring & versioning on the web).

This has upsides and downsides.

Because a similar method has existed in the past, it does mean that many existing tools including HTTP Toolkit itself and infrastructure like proxies & CDNs will already support this, and could accept SEARCH requests immediately without any hassle.

On the other hand, adding SEARCH to HTTP itself without breaking WebDAV requires some thought around compatibility. The current workaround for this in the spec is that any request with an application/xml or text/xml content-type header must follow the specific format rules defined in the WebDAV spec for its query body.

Anything else would not be a valid WebDAV request, so can freely ignore that, but this does create a real problem for SEARCH in XML APIs. It's likely that in future the spec will be relaxed to apply this requirement only to XML within WebDAV's XML namespaces, but that's not yet formally specified.

Caching is hard

While not invalidating caches is a good start, the results of a QUERY aren't actually cacheable themselves. That doesn't just mean they're not cacheable by default: even with explicit cache headers, they are not cacheable.

This is unfortunate because it's a clear limitation when compared with GET, and caching query results is a super common use case.

There's ongoing work here to specify exactly under what conditions QUERY could become cacheable, which would unlock a lot more benefit from this standard. I think the likely result is that it won't be cacheable by default, but will be cacheable given the appropriate headers, but again that's not yet specified, so let's wait & see.

Naming is hard

(Update: this section is also no longer relevant, but is kept because it's interesting context too - in the end the name has indeed been changed!)

SEARCH isn't a great name. Not every query is a search, and there's a wide variety of other uses for "safe method with a body" that go entirely beyond simple querying of a data set, as discussed above.

This has been recognized and it's being debated, and there are other proposals (like QUERY or FETCH). SEARCH does have some compatibility benefits due to its existing usage in WebDAV though, so there's a challenging balance to be made.

Changing the name would slow down adoption in all existing software, but might make the method clearer for developers to understand and use. There's no easy answer here unfortunately.

What's next?

Personally, I think QUERY would be valuable in itself already, despite these caveats, and there are good options available here to quickly improve the standard further.

If you'd like to dig into the details further, the current specification is available from the IETF website, and you can get involved by joining the IETF HTTP Working Group mailing list or opening issues/PRs directly via the http-extensions GitHub repo (an umbrella repo for this spec plus a few other prospective HTTP additions). Share your thoughts and help shape the HTTP of the future!

If you have any other questions, or is there anything I've missed? Feel free to get in touch on Twitter, I'd love to hear about it.

Do you work with HTTP? Download HTTP Toolkit now to inspect & mock HTTP from browsers, servers, apps and anywhere else in one click.

What are CORS proxies, and when are they safe?

Thu, 01 Apr 2021 09:30:00 GMT

CORS can be complicated. If you're struggling with it, you might discover the concept of a 'CORS proxy' that promises to solve this, like cors-anywhere or one of the many 'free CORS proxy' hosted services.

CORS proxies let you bypass the security restrictions that CORS applies, with just a tiny change of URL.

That feels convenient, but turning off security feels dangerous. How do CORS proxies work, and what real-world security problems can they create?

Why is CORS a problem?

For a typical CORS request:

You serve some content to your user via your origin (let's say https://home.example).
Your content includes JavaScript, which makes a request to another origin (let's say https://other.example).
The browser now needs to make a request from the user's machine to that other host.

Browsers are very cautious about doing that last step, for two main reasons.

First, browsers often have credentials (e.g. cookies) linked to each domain, and one website shouldn't be able to make requests which might use your credentials & sessions for an unrelated domain. Random sites on the internet shouldn't be able to make requests to your bank's servers with your session cookies.

Second, the target server might be private, an internal network address like 10.0.0.1, localhost, or a remote server that only allows requests from certain IP addresses. These servers wouldn't normally be accessibly from the public internet, and remote websites shouldn't be able to make requests to them just by bouncing the request through your browser.

Both of these are important security protections for end users who (quite reasonably) want to visit websites without losing control of either their online banking or their home router.

To protect against this, browsers send CORS headers in requests (and sometimes a CORS preflight request, before the real request) to check that the server is happy to accept the request and share the contents of the response.

If the target server isn't aware of CORS, or doesn't want to allow browser clients, it won't send the CORS headers you need. In that case, the browser then won't allow you to make some requests to or view any responses from that site, even if the site is publicly available on the internet without any authentication (because the browser has no way to know that).

That failure case looks something like this:

Failures like this can be annoying if you just want to load some simple data from one website inside another, especially when it's publicly accessible outside the browser with no problems at all.

This is a particular problem for single-page applications, like React, Vue or Angular sites, where all API requests generally happen on the client side.

How do CORS proxies work?

CORS proxies let you work around this. Rather than the browser sending a request to the target server directly, it sends the request to a CORS proxy with the target URL, which might look like https://corsproxy.example/https://other.example (using the target URL as a path). The CORS proxy then forwards the request to the real server, and then returns the response plus the correct CORS headers.

That looks like this:

This lets you make requests to servers that don't support CORS, which is lovely.

From the browser's point of view, any request via the proxy is just a request to the proxy's origin which does seem to support CORS. It's not aware you're talking to the real target address at all.

Because from the browser's point of view the content now comes from the CORS proxy's origin, that means the request will never include any pre-existing credentials linked to the real target origin.

Of course, this also only works for publicly accessible sites, which the CORS proxy can directly access from wherever it's hosted. You can't use a CORS proxy to access anything on the end user's local network.

There are quite a few tools you can use to implement a CORS proxy, from modules to easily run your own proxy like cors-anywhere, to examples you can deploy in seconds on CloudFlare workers, to a variety of hosted CORS proxies.

All this seems great, and it sounds like it still protects users from abuse of their credentials or local network like CORS normally does too. Is this still secure? What are the dangers?

Are CORS proxies secure?

CORS proxies are safe only if you use them very very carefully. There are good reasons to use them, and safe ways to do so, but if you use them wrong you can create a whole world of new security problems.

Let's take a look:

Free hosted CORS proxies are dangerous

If you want to use a CORS proxy, don't use somebody else's CORS proxy.

The CORS proxy can read and do anything with the full request & response of all traffic through it. While the browser will treat the request as secure (assuming the proxy uses HTTPS) it's only as secure as the proxy itself. If that's run by somebody else, you're giving them complete control of all your interactions with the remote URL.

That means you can't trust the responses unless you 100% trust the proxy, and any private data you send to the proxy is completely available to whoever runs it (which is a GDPR problem, at the very least). This makes them only usable for trivial & static public data even in the best case, so you can never use them for any authenticated API.

If you ever request JavaScript content through the proxy (e.g. from a JSONP API, or just a script file) or anything that could include that (e.g. some HTML you embed in your page) you're now allowing the CORS proxy to run arbitrary JS in your page, to do trivial XSS attacks and read any of your site's client-side data from your users' browsers, all on your own domain.

Because of all this, they're juicy targets for an attacker: if you can compromise a widely used CORS proxy service, you can often compromise every website that uses it for free. Ouch.

All of this is bad. Lastly, on top of all that, hosted CORS proxy services are super unreliable. They're expensive to run, almost always free, prone to abuse & attacks, and (as we'll see next) come with a bunch of their own security risks that aren't always well mitigated.

Even the most famous ones get shut down eventually. If you build a production service that depends on somebody else's CORS proxy, it's going to break later on when you least expect it. Don't.

CORS proxies can leak private state between independent origins

For example, HTTP responses from a server might contain cookies. Normally, these would be stored in your browser and only be available to future requests and pages using the same origin.

Unfortunately, with a CORS proxy, every request through the proxy uses the origin to persist this kind of data: the origin of the proxy (not the origin of the real server).

Here's how this can go wrong:

You make a request to a.com in your web page, through your CORS proxy.
The response includes a Set-Cookie header, which sets a cookie containing some private data or state relevant to that origin.
The browser treats this as being owned by the CORS proxy origin, not by a.com.
You send a request to b.com through the CORS proxy.
Your browser will now send the cookie for a.com to b.com, since they're both part of the CORS proxy origin.

The same applies to various other protections, e.g. basic HTTP authentication which may share the entered username & password with every domain you request through the proxy.

To secure this you need to disable credentials entirely, by ensuring your CORS response never contains an Access-Control-Allow-Credentials: true header, and you need to drop all cookie headers.

This doesn't necessarily stop you from using authenticated APIs in CORS requests you proxy through your own servers, it just stops you from using built-in browser credentials like cookies. You can still send your own explicit authentication headers if required.

Some of this may be blocked by recent changes in browsers to block 3rd party browser state entirely, to restrict user tracking across websites. These features still include heuristics to allow certain real use cases though, and this won't work reliably for all browsers, so it's better to explicitly lock this down.

CORS proxies can expose the proxy's local network

CORS protects the end user's local network. If you run your own CORS proxy though, it's very easy to accidentally expose your server's network and infrastructure, so a user can request https://corsproxy.example/https://10.0.0.1/admin to make your proxy server make requests and return information from inside your network. In general these are known as Server-Side Request Forgery attacks.

Often this can be a huge problem. As just one example, all EC2 instances have access to a local-only http://169.254.169.254/latest/meta-data/ endpoint, which returns metadata that by default includes the full credentials for the EC2 instance's IAM role.

I think that's worth reiterating:

If you host a naive CORS proxy on EC2, external attackers may be able to access private internal resources from your AWS account.

The default IAM role for EC2 instances doesn't let them access everything, but does provide full read & write access to S3 buckets and your CloudWatch logs. If you've given the instance more privileges, this gets even worse.

Here's a detailed walkthrough of exactly how this attack works, and ways you can mitigate it: https://medium.com/certik/cors-anywhere-the-dangers-of-misconfigured-third-party-software-df232aae144c.

This is just one example of how this can go wrong though. There are often many valuable services running on your network which assume that local network traffic is trusted.

To fix this properly, you need to define a whitelist of valid origins for your CORS proxy, and to only allow requests to be proxied to origins on that list. This list should only contain the external services you're interested in. That ensures your CORS proxy can't be used to scan or access local network addresses or anything else unexpected.

CORS proxies are easily abused

You don't want to run an open CORS proxy, usable by everybody. If you do, you'll quickly discover other sites proxying traffic through it, attackers using it to send requests whilst hiding their IP, DoS attacks against yourself and others, and all sorts of other problems.

As in the previous point, a good first move is to limit the origins that your proxy can go to. If you do that, most of the abuse risk goes away immediately.

To go further, it's also usually a good idea to check the Origin header of the incoming request. If the request comes from a browser, from an origin other than the proxy's own origin, it'll be listed here. If somebody else tries to use the proxy in their website, that website origin will show up here. By limiting this to just allow your own origins you ensure no other pages can use your proxy.

You might still want to allow requests with no origin, if you're using the CORS proxy on the same origin as your own page, e.g. on a subpath. In that case your browser won't send an origin at all, and that's ok.

Lastly, if you're still having issues with abuse, a rate limit linked to the request source IP is a good idea - no individual user should be sending 100s of requests a second through your proxy.

I hope that's clarified some of the benefits and risks around CORS proxies. All of this is manageable, and CORS proxies can be very useful, but always make sure you lock them down tightly to allow only the use case you need, block cookies and credentials, and avoid free hosted proxies for any kind of non-trivial deployments.

If you want to inspect your HTTP traffic, debug CORS requests, and test out mock CORS headers in 5 seconds flat, give HTTP Toolkit a try.

Have questions, or do you think there's other CORS proxy dangers I've missed here? Feel free to get in touch on Twitter.

Intercept and edit HTTP traffic from (almost) any Android app

Wed, 24 Mar 2021 17:00:00 GMT

HTTP is used by almost all Android apps to request data, load content, and send changes to backend servers. If you can see and edit these requests & responses then you can understand, debug, and change how any app works, but Android makes this hard to do.

By default, almost all apps will use HTTPS but won't trust user-installed certificates. This means that you can't see their traffic with simple proxy tools, and you can't manually trust HTTPS debugging proxies without either editing and rebuilding the entire app, or setting up your own rooted device.

Fortunately, there's a quick & easy way around this: you can manually install official APKs into a normal Android emulator, which provides enough access that tools like HTTP Toolkit can capture all traffic for most apps for you totally automatically, and allow you to edit responses in just a couple of clicks.

Let's walk through how to do that, step-by-step:

Setting up the emulator

To get started, you'll need an emulator.

It is possible to create and start one using the Android SDK directly (see this article) but it's easiest to just install Android Studio, create an empty project, and use the developer tools provided there (if you're not familiar with these developer tools at all, there's an detailed official guide).

To create an interceptable Android emulator, you should create an AVD, that:

Can be any device model, though things may be smoother with a popular device like a Pixel 4.
Uses an image matching your computer's architecture (ARM64 on M1/M2 Macs, x86_64 on most other computers) since the performance will be far better.
Uses a relatively recent stable Android version - Android 7 to 13 should be fine.
Uses a 'Google APIs' or 'Android Open Source Project' target image. The 'Google Play' target includes extra restrictions and is not easily interceptable.

Once you've created your emulator, start it, and then we need to install the target app.

Since we don't have Google Play, you can't do that from the normal app store. You can do it easily though by downloading the APK directly from a 3rd party mirror like APKPure.com and installing using the Android developer tools.

Some apps are published as APKs, and some are published as XAPKs (app bundles), but either one can be installed manually using adb, which comes with the Android developer tools.

To install a normal APK you've downloaded into a running emulator, just run:

adb install

To install a downloaded XAPK:

Rename it to .zip and extract it
Look at the APKs within, and work out which are relevant to your device
- There should be a core app APK plus various config.* APKs.
- Assuming you're using an x86 emulator you don't want APKs like config.arm64_v8a.apk, which only apply to ARM64 devices.
To install the app and relevant config APKs, run:

  adb install-multiple  <...config-apk.apk>

As an example, let's install and intercept the Duolingo app:

You can download the Duolingo XAPK from APKPure here.
If you extract the APK, you'll find 5 files:
com.duolingo.apk
config.xxhdpi.apk
config.arm64_v8a.apk
icon.png
manifest.json
To install the app we only need the first two, so run: adb install-multiple com.duolingo.apk config.xxhdpi.apk
That should print 'Success', and Duolingo will appear in the app menu on your emulator.

Intercepting your emulator

Next we need to intercept traffic from the emulator.

HTTP Toolkit is an HTTP debugger that can intercept, inspect & rewrite HTTP from any client, including Android. It's open-source and all the core features are completely free.

If you haven't installed it yet, download it from here.

Once you've installed and started HTTP Toolkit, you should see an ADB option on the 'Intercept' page that looks like this:

Click that, wait a few seconds, and you'll see the HTTP Toolkit app install and show a VPN setup prompt on the emulator:

Android interception uses a VPN which redirects all traffic from your emulator via the HTTP Toolkit app while the VPN is activated.

Accept this prompt, and you'll see confirmation that interception is setup and fully activated:

This means that all HTTP and HTTPS from this device will be captured and shown in the HTTP Toolkit app.

Inspecting Android HTTP traffic

Once that's done, you can start your target app, and immediately start examining its traffic!

For Duolingo for example, when you start the app you'll immediately see a big list of requests:

In here we can see Duolingo API requests to check authentication, record device & billing state, and read the app feature flags. There's also requests elsewhere to set up the Facebook SDK and record data, configure Google ads, and to prepare to track app crashes later on.

You can click on any of these requests on the left to see the full request, response and body on the right.

If you continue to use the app, logging in and testing real functionality, you'll quickly see hundreds more requests, and you can start to piece together exactly how the app and the APIs it depends on all work together.

This will work for 99% of apps, but not 100%. This technique can capture traffic from every app that uses the default Android security settings, across all API versions, including major apps like Netflix, Slack and Ebay. Some very security-sensitive apps (like banking apps) or very high-profile apps (like Facebook and Twitter) will go further and pin their specific HTTPS certificates though, which will block this.

If that happens then you'll see warnings in the HTTP Toolkit app about rejected certificates or failed TLS connections. Defeating this to intercept that last 1% of extra-secure apps is very challenging, and requires non-trivial manual reverse engineering. It is possible, but that's a subject for another blog post…

Mocking & rewriting Android HTTP

HTTP Toolkit allows you to rewrite outgoing requests and returned responses.

You can do this from the 'Mock' tab, which allows you to configure rules. Each rule matches against something (an HTTP method, or a specific path, or a header value) and then does something (breakpoints the response to edit manually, redirects the response elsewhere, returns some fixed replacement data, disconnects the connection, etc).

Some of the advanced options here require HTTP Toolkit Pro but you can get started with manual breakpointing to immediately test & manipulate apps without that.

As an example for now, let's try changing some of Duolingo's behaviour. If you log in, pick a language (if you've never used the app before), and start a lesson, you'll see a request like this:

This is how the Duolingo app loads the data for each lesson: a POST request to https://android-api-cf.duolingo.com/2017-06-30/sessions, including a list of fields to return, plus some auth data and personal config (not shown here) in the headers.

That request will have a response body like this:

In my case, I'm learning Catalan from Spanish, and this response shows that the lesson starts with a prompt in Spanish ("El ratón es menos grande que el elefante"), and 3 possible answers in Catalan ("El ratolí és…") where the answer in index 1 ("…menys gros que l'elefant") is the correct one.

This response contains the entire lesson. Every single question, all the right answers, and various bits of metadata about how that should be presented and which images should be shown throughout.

Let's change it, to change how the app behaves. To do so:

Click 'Mock' on the left, to configure traffic rewriting rules
Click 'Add a new rule' at the top
On the left, match POST requests for https://android-api-cf.duolingo.com/2017-06-30/sessions
On the right, select 'Pause the response to manually edit it'
Save your new rule, using the button in the top right.

That should look like this:

That means next time the app sends a request like that, HTTP Toolkit will breakpoint at the response, so you can manually edit it before the app receives it.

Start a new lesson in the app, and HTTP Toolkit will jump to the breakpointed request immediately.

From there, you can edit the response (click the 'format body' icon in the top left if you'd like to make the minified JSON more readable first), to change the lesson content:

Click 'Resume' above and the app will receive the response, and use it like normal:

Easy! Bien hecho indeed.

You can go further to change the right questions or other any other content of each lesson, drastically change the content is presented, or apply the same technique to other API requests to change other features work or to cheat in more dramatic ways (but that's probably not going to improve your language skills, so I really wouldn't recommend it!)

This same technique applies to any app you can intercept: find an interesting request, create a rule that matches it, and change the request or response to instantly test and change how the app behaves.

Thanks for reading - give it a go for yourself, and feel free to get in touch on Twitter if you have any questions.

How to intercept & debug all Java HTTPS

Wed, 17 Mar 2021 18:00:00 GMT

Java and the JVM more generally are widely used for services everywhere, but often challenging to debug and manually test, particularly in complicated microservice architectures.

HTTP requests and responses are the core of interactions between these services, and with their external APIs, but they're also often invisible and inaccessible. It's hard to examine all outgoing requests, simulate unusual responses & errors in a running system, or mock dependencies during manual testing & prototyping.

Over the last couple of weeks, I've built a Java agent which can do this, completely automatically. It can seize control of all HTTP & HTTPS requests in any JVM, either at startup or attaching later, to redirect them to a proxy and trust that proxy to decrypt all HTTPS, allowing MitM of all JVM traffic. Zero code changes or manual configuration required.

This means you can pick any JVM process - your own locally running service, Gradle, Intellij, anything you like - and inspect, breakpoint, and mock all of its HTTP(S) requests in 2 seconds flat.

In this article, I want to walk you through the details of how this is possible, so you can understand some of the secret powers of the JVM, learn how to transform raw bytecode for yourself, and build on the examples and source code behind this to build your own debugging & instrumentation tools.

If you just want to try this out right now, go download HTTP Toolkit.

If you want to know how on earth this is possible, and how you can write code that does the same, read on:

What's going on here?

In some ways, intercepting all HTTP(S) should be easy: the JVM has standard HTTP proxy and SSL context configuration settings (e.g. -Dhttp.proxy and -Djavax.net.ssl.trustStore) so you could try to configure this externally by setting those options at startup.

Unfortunately for you, that doesn't work. Most modern libraries ignore these settings by default, opting to provide their own defaults and configuration interfaces. Even when the library doesn't, many applications define their own connection & TLS configuration explicitly. This is often convenient and sensible in general, but very inconvenient later when you want to start debugging and manually testing your HTTP interactions.

Instead of setting config values at startup that nobody uses, we can capture HTTP by force, using a Java agent. Java agents allow us to hook into a JVM process from the outside, to run our own code and rewrite existing bytecode.

When our agent is attached to the JVM (either at startup before everything loads, or later on) we match against specific classes used within built-in packages and a long list of popular external libraries, looking for everything from TLS configuration state to connection pool logic, and we inject a few small changes throughout. This lets us change defaults, ignore custom settings, recreate existing connections, and reconfigure all HTTP(S) to be intercepted by our HTTPS-intercepting proxy.

This is really cool! From outside a JVM process, we can use this to reliably rewrite arbitrary bytecode to change how all HTTP in a codebase works, and take control of the entire thing ourselves. It's aspect-orientated programming on steroids, and it's surprisingly easy to do.

Let's talk about the details.

What's a Java agent?

A Java agent is a special type of JAR file, which can attach to other JVM processes, and is given extra powers by the JVM to transform and instrument bytecode.

They're widely used by JVM tooling, for everything from application monitoring with New Relic to mutation testing with PiTest.

Despite the name, they're not Java-only; they work for anything that runs on the JVM.

There's two ways to use a Java agent. You can either attach it at startup, like so:

java -javaagent:= -jar

or you can attach it later dynamically, like so:

// Using com.sun.tools.attach.VirtualMachine:
VirtualMachine vm = VirtualMachine.attach(pid)
vm.loadAgent(jarPath, agentArgS)
vm.detach()

The agent can have two separate entry points in its JAR manifest to manage this: one for attachment at startup, and one for attachment later. There are also JAR manifest attributes that opt into transformation of bytecode. Configuring that for a JAR built by gradle looks like this:

jar {
    // A class which defines a static void premain(args: String, instrumentation: Instrumentation),
    // function that will run before the Main() of the primary JAR:
    attributes 'Premain-Class': 'tech.httptoolkit.javaagent.HttpProxyAgent'

    // A class (can be the same) which defines a similar 'agentmain' function, that will
    // run within the target JVM once the agent is attached:
    attributes 'Agent-Class': 'tech.httptoolkit.javaagent.HttpProxyAgent'

    // Can this agent do transformations, which receive class bytecode and transform it?
    attributes 'Can-Retransform-Classes': 'true'

    // Can this agent redefine classes entirely? This is an older API that is strictly more
    // limited I think, but you might as well take all the powers you can get...
    attributes 'Can-Redefine-Classes': 'true'
}

Lastly, you have an agent class that implements these methods. Like so:

class AgentMain {
    public static void premain(String agentArgs, Instrumentation inst) {
        System.out.println("Agent attached at startup");
    }

    public static void agentmain(String agentArgs, Instrumentation inst) {
        System.out.println("Agent attached to running VM");
    }
}

That Instrumentation class we're given here provides us with methods like addTransformer and redefineClasses which we can use to read and overwrite the raw bytecode of any class in the VM.

HTTP Toolkit includes an agent JAR built from all the above, which allows it to attach to any JVM application, run code within that application (to set defaults and configuration values using normal APIs, where possible) and to transform and hook internals of all HTTP-related classes we care about.

The agent setup is just the first step though: this gives us almost complete power to change what the target application is doing, but working out how to transform classes is complicated, there are some limitations to our transformations, and handling raw bytecode isn't easy…

How do you transform raw bytecode?

In short: using Byte Buddy.

This is a complex library, which can do a lot of powerful things with bytecode including generating subclasses and interface implementations dynamically at runtime (e.g. for mocking frameworks), manually mutating classes and methods, and transforming bytecode automatically through templates.

In agent cases like HTTP Toolkit's, we're interested in the template approach, because there is a Java agent limitation: when reloading already loaded classes, the new definition must match the same class schema. That means we can add new logic into existing method bodies, but we can't create new methods or fields on existing classes, or make changes to existing method signatures.

To handle this, Byte Buddy's built-in 'advice' system defines method transformation templates, which it can apply for us whilst guaranteeing that the schema is never changed in any other way.

First, we need to set up Byte Buddy. This configuration seems to work nicely:

var agentBuilder = new AgentBuilder.Default(
    // This allows you to transform non-Java classes, e.g. Kotlin (used in OkHttp)
    ByteBuddy().with(TypeValidation.DISABLED)
)
// Transform *everything* including some of the JVMs own built-in classes:
.ignore(none())
// Enable full transformation without class changes:
.with(AgentBuilder.TypeStrategy.Default.REDEFINE)
.with(AgentBuilder.RedefinitionStrategy.RETRANSFORMATION)
.disableClassFormatChanges()
// Log as we go (can be noisy - try withErrorsOnly/withTransformationsOnly if so):
.with(Listener.StreamWriting.toSystemOut());

Then, we define an Advice class which will transform our target. Advice classes look something like this:

public class ReturnProxyAdvice {
    @Advice.OnMethodExit
    public static void proxy(@Advice.Return(readOnly = false) Proxy returnValue) {
        returnValue = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(
                HttpProxyAgent.getAgentProxyHost(),
                HttpProxyAgent.getAgentProxyPort()
        ));
    }
}

This says "at the end of the targeted method body, insert extra logic which replaces the return value with [our proxy value]".

The code here is effectively injected into the end of the method body (because of Advice.OnMethodExit), and annotations can be used on method parameters (like @Advice.Return) to link variables in this template method to method arguments, field values, this, or return values in the existing method body.

To tie this all together, we have to tell Byte Buddy when to apply this advice, like so:

agentBuilder = agentBuilder
    .type(
        // Match the class we're interested in:
        named("com.squareup.okhttp.OkHttpClient")
    ).transform(
        // Provide a transformer that transforms that class:
        new AgentBuilder.Transformer() {
            @Override
            public DynamicType.Builder transform(
                DynamicType.Builder builder,
                TypeDescription typeDescription,
                ClassLoader classloader
            ) {
                // Map the advice class to a method (in this case: OkHttpClient.getProxy())
                return builder
                    .visit(Advice.to(ReturnProxyAdvice.class))
                        .on(hasMethodName("getProxy"))
            }
        });

Byte Buddy uses this fluent API to build maps from type matchers (like named here) to type transformers, and then build transformations that apply specific advice templates to methods matching certain patterns (e.g. hasMethodName("getProxy")).

The above code is effectively the real implementation logic we use to intercept OkHttp: for all OkHttpClient instances, even ones that are already instantiated when we attach, we override getProxy() so it always returns our proxy configuration, regardless of its previous configuration. This ensures that all new connections from all OkHttp clients go to our proxy.

This is just part of one simple case though (the full OkHttp logic is here) and doing this for all HTTP is significantly more involved…

What transformations allow you to capture all HTTPS?

With the above, we can build a Java agent that can attach to a JVM target, and easily arbitrarily transform method bodies.

Usefully intercepting HTTP(S) still requires us to find the method bodies we care about though, and work out how to transform them.

In practice, there's three steps to transforming any target library to intercept HTTPS:

Redirect new connections to go via the HTTP Toolkit proxy server
Trust the HTTP Toolkit certificate during HTTPS connection setup
Reset/stop using any open non-proxied connections when attaching to already running applications

I'm not going to walk through the detailed implementation of that for every version of every supported library (if you're interested, feel free to explore the full source) but let's look at a couple of illustrative examples.

Some of this logic is written in Kotlin, and it uses a few helpers on top of the above, but if you've read the above and you understand Java you'll get the gist:

Intercepting Apache HttpClient:

Apache HttpClient is part of their HttpComponents project, a successor to the venerable Commons HttpClient library.

It's been around for a long time in various forms, it's very widely used, and fortunately it's very easy to intercept.

For v5, for example, all outgoing traffic runs through an implementation of the HttpRoutePlanner interface, which decides where requests should be sent.

We just need to change the return value for all implementations of that interface:

// First, we create an advice class that modifies the existing return value of this method:
public class ApacheV5ReturnProxyRouteAdvice {
    @Advice.OnMethodExit
    public static void determineRoute(
            @Advice.Return(readOnly = false) HttpRoute returnValue
    ) {
        returnValue = new HttpRoute(
            returnValue.getTargetHost(),
            returnValue.getLocalAddress(),
            new HttpHost(
                HttpProxyAgent.getAgentProxyHost(),
                HttpProxyAgent.getAgentProxyPort()
            ),
            returnValue.isSecure()
        );
    }
}

// Then, elsewhere, we apply that to all implementations that plan routes:
class ApacheClientRoutingV5Transformer(logger: TransformationLogger) : MatchingAgentTransformer(logger) {
    override fun register(builder: AgentBuilder): AgentBuilder {
        // Match all concrete implementations of a given type:
        return builder.type(
            hasSuperType(named("org.apache.hc.client5.http.routing.HttpRoutePlanner"))
        ).and(
            not(isInterface())
        ).transform(this)
    }

    override fun transform(builder: DynamicType.Builder<*>): DynamicType.Builder<*> {
        // Match the method defined in the interface, and apply the above Advice:
        return builder.visit(
            Advice.to(ApacheV5ReturnProxyRouteAdvice.class)
                .on(hasMethodName("determineRoute"))
        )
    }
}

With that alone, we've redirected all traffic elsewhere.

Meanwhile resetting all SSL connections requires prepending to SSL socket creation to change the SSL configuration.

As a nice bonus, the above HttpRoutePlanner approach means we don't even need to reset connections: request routes no longer match existing open connections, so requests immediately stop using those connections, start using our proxy instead, and the existing connections harmlessly time out.

Intercepting Java's built-in ProxySelector:

Let's try something more difficult: we can rewrite a built-in Java class? Yes we can.

When our agent first attaches, it changes the default ProxySelector using the normal public APIs, so that any code using Java's default proxy selector automatically uses our proxy with no transformation required.

Unfortunately though, some applications manually manage proxy selectors, and this could result in HTTP not being intercepted.

To fix this, we set the proxy selector using the normal ProxySelector.setDefault() API during agent setup, and then later we transform the built-in class to disable that setter completely, so nobody else can change it.

That looks like this:

// First, we define an advice that tells Byte Buddy to skip a method body entirely:
public class SkipMethodAdvice {
    // This will run before the method, and will skip the real body if we return true
    @Advice.OnMethodEnter(skipOn = Advice.OnNonDefaultValue.class)
    public static boolean skipMethod() {
        // Then we just return true to trigger the skip:
        return true;
    }
}

// Second, we apply the advice template:
class ProxySelectorTransformer(logger: TransformationLogger): MatchingAgentTransformer(logger) {
    override fun register(builder: AgentBuilder): AgentBuilder {
        return builder
            // Match the built-in ProxySelector class:
            .type(
                named("java.net.ProxySelector")
            ).transform(this)
    }

    override fun transform(builder: DynamicType.Builder<*>): DynamicType.Builder<*> {
        return builder
            // Transform the static "setDefault" method with our advice:
            .visit(
                Advice.to(SkipMethodAdvice.class)
                    .on(hasMethodName("setDefault")));
    }
}

Transforming build-in classes does come with some caveats, e.g. you need to set .ignore(none() during Byte Buddy setup (see the example above) and you can't reference any non-built-in types within your advice class. For simple changes like this though, that's no big problem.

Intercepting Spring WebClient HTTP:

Ok, last example, let's see a more complicated case. How does Spring's WebClient work?

Spring WebClient is a relatively new client on the block - it's a reactive client released as part of Spring 5, offering a Spring-integrated API built over the top of Reactor-Netty by default (but configurable to use other engines too).

I suspect the vast majority of users use the default Reactor-Netty engine, and if they don't then they use an engine that's already intercepted by another one of our configurations. That means we just need to intercept Reactor-Netty, and we'll capture all Spring WebClient traffic ready for debugging.

Extremely helpfully, Reactor Netty stores all the state we care about (both proxy & SSL context) in one place: the HttpClientConfig class. We need to reset that internal state somehow for all instances, but it's not conveniently exposed in the public APIs…

Even more helpfully though, their HttpClient class is cloned during each request, passing the config to the request's client, giving us the perfect hook to grab the config and modify it before every request.

That looks like this:

// First an advice class to reset all config. More complicated this time!
public class ReactorNettyResetAllConfigAdvice {

    // We statically create a proxy provider, for our target proxy:
    public static final ProxyProvider agentProxyProvider = ProxyProvider.builder()
        .type(ProxyProvider.Proxy.HTTP)
        .address(new InetSocketAddress(
            HttpProxyAgent.getAgentProxyHost(),
            HttpProxyAgent.getAgentProxyPort()
        ))
        .build();

    // We also create an SSL provider that trusts our certificate:
    public static final SslProvider agentSslProvider;

    // And we store references to the relevant private fields using reflection, to
    // avoid the overhead of doing this on every request:
    public static final Field configSslField;
    public static final Field proxyProviderField;

    static {
        try {
            // Initialize our intercepted SSL provider:
            agentSslProvider = SslProvider.builder()
                .sslContext(
                    SslContextBuilder
                    .forClient()
                    .trustManager(HttpProxyAgent.getInterceptedTrustManagerFactory())
                    .build()
                ).build();

            // Rewrite the fields we want to mess with in the client config:
            configSslField = HttpClientConfig.class.getDeclaredField("sslProvider");
            configSslField.setAccessible(true);

            proxyProviderField = ClientTransportConfig.class.getDeclaredField("proxyProvider");
            proxyProviderField.setAccessible(true);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    // Then we hook the HttpClient method, so that before the constructor runs, we
    // grab the first argument (the config) and we overwrite these private fields:
    @Advice.OnMethodEnter
    public static void beforeConstructor(
        @Advice.Argument(value=0) HttpClientConfig baseHttpConfig
    ) throws Exception {
        configSslField.set(baseHttpConfig, agentSslProvider);
        proxyProviderField.set(baseHttpConfig, agentProxyProvider);
    }
}

// Using that, in the agent logic we match the constructor and apply this advice:
class ReactorNettyClientConfigTransformer(logger: TransformationLogger): MatchingAgentTransformer(logger) {

    override fun register(builder: AgentBuilder): AgentBuilder {
        // Find all HttpClient instances:
        return builder
            .type(
                hasSuperType(named("reactor.netty.http.client.HttpClient"))
            ).and(
                not(isInterface())
            ).transform(this)
    }

    override fun transform(builder: DynamicType.Builder<*>): DynamicType.Builder<*> {
        // Apply our advice to the matching constructor:
        return builder
            .visit(
                Advice.to(ReactorNettyResetAllConfigAdvice.class)
                    .on(isConstructor()
                        .and(takesArguments(1))
                        .and(takesArgument(0,
                            named("reactor.netty.http.client.HttpClientConfig")
                        )))
            )
    }
}

Isn't this fun?

Ok, while I'm fully expecting that while half the people who've read this far may be fascinated, the other half will be horrified.

We are elbow-deep in library internals here, and unrepentantly so.

This does have some caveats: it's quite possible that library changes could break this, or that some transformations could cause side effects. I wouldn't recommend doing this in production without significantly more careful transformation & testing, but for local development and testing the risk is low, and this works like a charm.

In practice, I suspect the fragility issues will be small. The code we're transforming is the low-level internals of connection setup, which changes relatively infrequently. Some git-blaming of the repos of various targets here suggests that in most cases this logic has barely changed since v1, or changes only marginally every 5 years or so, and updating this logic when there are changes is not a huge task. In addition, while new libraries will come out too, most of them build on top of these existing engines, so we can support them for free!

This kind of power is little-known and underused in much of the JVM community, and I'm really excited to see how you use it! Test this out now in HTTP Toolkit, try building your own Java agents, and get in touch on Twitter if you have any thoughts or questions.

HTTPWTF

Thu, 04 Mar 2021 15:00:00 GMT

HTTP is fundamental to modern development, from frontend to backend to mobile. But like any widespread mature standard, it's got some funky skeletons in the closet.

Some of these skeletons are little-known but genuinely useful features, some of them are legacy oddities relied on by billions of connections daily, and some of them really shouldn't exist at all. Let's look behind the curtain:

No-cache means "do cache"

Caching has never been easy, but HTTP cache headers can be particularly confusing. The worst examples of this are no-cache and private. What does the below response header do?

Cache-Control: private, no-cache

It looks like this means "don't store this response anywhere", right?

Hahaha no.

In reality, this means "please store this response in all browser caches, but revalidate it when using it". In fact, this makes responses more cacheable, because this applies even to responses that wouldn't normally be cacheable by default.

Specifically, no-cache means that your content is explicitly cacheable, but whenever a browser or CDN wants to use it, they should send a request using If-Match or If-Modified-Since to ask the server whether the cache is still up to date first. Meanwhile private means that this content is cacheable, but only in end-client browsers, not CDNs or proxies.

If you were trying to disable caching because the response contains security or privacy sensitive data that shouldn't be stored elsewhere, you're now in big trouble. In reality, you probably wanted no-store.

If you send a response including a Cache-Control: no-store header, nobody will ever cache the response, and it'll come fresh from the server every time. The only edge case is if you send that when a client already has a cached response, which this won't remove. If you want to do that and clear existing caches too, add max-age=0.

Twitter notably hit this issue. They used Pragma: no-cache (a legacy version of the same header) when they should have used Cache-Control: no-store, and accidentally persisted every user's private direct messages in their browser caches. That's not a big problem on your own computer, but if you share a computer or you use Twitter on a public computer somewhere, you've now left all your private messages conveniently unencrypted & readable on the hard drive. Oops.

HTTP Trailers

You're probably aware of HTTP headers. An HTTP message starts with a first line that contains the method & URL (for requests) or status code & message (for responses) and then it has a series of key/value pairs for metadata, called headers, and then it has a body.

Did you know you can also send trailers, to append metadata after a message body?

These are not widely used, but they're fully standardized and in theory everything should support them, or at least ignore them. They can be useful if you have metadata that isn't easily available initially, and you don't want need to wait for it before you send the body.

They are used in some API protocols like gRPC, and they're primarily valuable for metadata about the overall response itself, for example you can use trailers to include Server-Timing metadata to give the client performance metrics about server processing during a request, appended after the response is fully completed. They're especially useful for long responses, e.g. to include final status metadata after a long-running HTTP stream.

It's still rare that you'll need this, but it's pretty cool that it works when you do. There's a few requirements:

For server response trailers, the client must advertise support for this, with a TE: trailers header on the initial request.
The initial headers should specify the trailer fields that will be used later, with Trailer: .
Some headers are never allowed as trailers, including Content-Length, Cache-Control, Authorization, Host and similar standard headers, which are often required initially to parse, authenticate or route requests.

To send trailers in HTTP/1.1, you'll also need to use chunked encoding. HTTP/2 meanwhile uses separate frames for the body & headers, so this isn't necessary.

A full HTTP/1.1 response with trailers might look like this:

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Trailer: My-Trailer-Field

[...chunked response body...]

My-Trailer-Field: some-extra-metadata

HTTP 1XX codes

Did you know that an HTTP request can receive multiple response status codes? A server can send an unlimited number of 1XX codes before a final status (200, 404, or whatever it may be). These act as interim responses, and can all include their own independent headers.

There's a few different 1XX codes available: 100, 101, 102, and 103. They're not widely used, but in some niche use cases they have some cool powers:

HTTP 100

HTTP 100 is a response from a server that the request is ok so far, and the client should keep going.

Most of the time, this is a no-op. If you've started sending a request, you were probably going to keep going anyway, although it's always nice to have the server's support & encouragement.

This becomes useful though if you send a request including a Expect: 100-continue header. That header tells the server you expect a 100 response, and you're not going to send the full request body until you receive it.

Sending Expect: 100-continue allows the server to decide if it wants to receive the whole body, which might take a lot of time/bandwidth. If the URL & headers are enough for it to already send a response (e.g. to reject a file upload) this is a quick and efficient way to do that. If the server does want to receive the full body, it sends an interim 100 response, the client continues, and then the server handles the complete request as normal when it's done.

HTTP 101

HTTP 101 is used to switch protocols. It says "I've sent you a URL and headers, and now I want to do something completely different with this connection". Not just a different request, but different protocol entirely.

The main use case is to set up a websocket. To do so, the client sends a request including these two headers:

Connection: upgrade
Upgrade: websocket

Then, if the server accepts, it sends a response like:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade

And then from there they stop speaking HTTP, and start exchanging raw websocket data on this connection instead.

This status is also used to upgrade from HTTP/1.1 to HTTP/2 on the same connection, and you could use it to transform HTTP connections into all sorts of other TCP-based protocols too.

That said, this status isn't supported in HTTP/2, which uses a different mechanism for protocol negotiation and a totally different mechanism to set up websockets (which basically isn't supported anywhere - websockets are always HTTP/1.1 right now).

HTTP 102

HTTP 102 tells the client that the server is still processing the request, and it'll respond soon. This differs from 100 in that the whole request has now been received, and all the action is now happening on the server side, with the client just waiting.

This isn't much used as far as I can tell, and it seems to mainly exist as a keep-alive, to make sure the client doesn't think the server has simply died. It's in the original HTTP specifications, but it's been removed from many new editions.

Still, it is supported & used in real places in the wild, so it's quite possible to use it in your applications if it fits your needs.

HTTP 103

HTTP 103 meanwhile is a new & trendy status intended to partially replace HTTP/2's server push functionality (which is now being removed from Chrome).

Using HTTP 103, a server can send some headers early, before fully handling the request and sending the rest of the response. This is primarily designed for delivering link headers, like Link: ; rel=preload; as=style, telling the client about other content that it may want to start loading early (like stylesheets, JS & images, for web page requests) in parallel with the full response.

When the server receives a request that takes a little processing, it often can't fully send the response headers until that processing completes. HTTP 103 allows the server to immediately nudge the client to download other content in parallel, without waiting for the requested resource data to be ready.

Referer

The HTTP Referer header tells the server which page you came from previously, or which URL triggered a resource load. This has some privacy challenges, but it's stuck around, and it's sent in most requests made as you browse the internet.

Notably, it's spelled wrong. This was added in the very early days of the web, and the unix spell checker at the time didn't recognize either referer or referrer (the correct spelling). By the time anybody noticed, it was in serious use in infrastructure and tools all over the place, so nothing could be changed and we have to live with every browser request having a misspelled header forever.

Not especially important unless you're writing code to read this header yourself, but a great parable for the challenges of network compatibility.

For maximum confusion and damage potential, new privacy/security headers related to this like Referrer-Policy do use the correct spelling.

Websocket's 'random' UUID

There's always a relevant XKCD

We talked about how HTTP 101 requests are used to set up websockets earlier. A full request to do so might look like this:

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
Origin: http://example.com

with a response that starts the websocket connection like this:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk=
Sec-WebSocket-Protocol: chat

The Sec-WebSocket-Accept key here is interesting. This is designed to stop caching proxies accidentally reusing websocket responses that they don't understand, by requiring the response to include a header that matches the client header. Specifically:

The server receives a base64 websocket key from the client
The server appends the UUID 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 to the base64 string
The server hashes the resulting string, encodes the hash in base64, and sends that back

This is deeply weird. A single fixed random UUID that's used in the setup of every single websocket forever? Appending strings to base64 strings without decoding, and then base64-ing the result again too?

The idea is that this logic isn't something that could happen by accident, or something that could ever be used elsewhere, to guarantee that both parties are intentionally starting a websocket connection. This confirms that the server or proxy isn't used cached data without understanding it, and the client hasn't been tricked into opening a websocket connection that it doesn't understand.

This totally works, it's widely used and quick & easy to implement, which is all great, but it's wild that every websocket connection in the world relies on one magic UUID.

Websockets & CORS

While we're talking about websockets: did you know that websockets effectively ignore all the CORS and single-origin policy restrictions that would normally apply to HTTP requests?

CORS ensures that JavaScript running on a.com can't read data from b.com unless the latter explicitly opts into that in its response headers.

This is important for lots of reasons, notably including network-local servers (a public web page shouldn't be able to talk to your router) and browser state (requests from one domain shouldn't be able to use cookies from another).

Unfortunately though, websockets ignore CORS entirely, assuming instead that all websocket servers are modern & sensible enough to correctly check the Origin header for themselves. Many servers do not, and most developers I've mentioned this to weren't aware of it.

This opens a whole world of fun vulnerabilities, nicely summarized in this article.

In short: if you have a websocket API, check the Origin header and/or use CSRF tokens before trusting any incoming connections.

X-* headers

Once upon a time (1982) an RFC suggested that using an X- prefix for message headers was a good way to differentiate custom extensions from standardized names.

At the time this was relevant to email metadata, but this was later popularized for usage in HTTP headers too.

This is still a common pattern, and if you look at HTTP requests as you browse the web you'll see quite a few of these:

X-Shenanigans: none - this appears on every response from Twilio's API. I have no idea why, but it is comforting to know there's definitely no shenanigans this time round.
X-Clacks-Overhead: GNU Terry Pratchett - a tribute to Terry Pratchett, based on the message protocols within his own books.
X-Requested-With: XMLHttpRequest - appended by various JS frameworks including jQuery, to clearly differentiate AJAX requests from resource requests (which can't include custom headers like this).
X-Recruiting: - quite a few companies add these as a quick way to try and hire the kind of people who read HTTP headers for fun.
X-Powered-By: - used to advertise the framework or technology that the server is using (usually a bad idea).
X-Http-Method-Override - used to set a method that couldn't be set as the real method of the request for some reason, usually a client or networking limitation. Mostly a bad idea nowadays, but still popular & supported by quite a few frameworks.
X-Forwarded-For: - A defacto standard used by many proxies & load balancers to include the original requester's IP in upstream requests.

Each of these is weird and wonderful in its own way, but the pattern in general is mostly a bad idea, and a new (2011) RFC now formally discourages its use.

The problem is that many non-standard headers eventually do become standard. When that happens, if you used an X- prefix, now you either have to change the name (breaking all existing implementations) or standardize the X- prefix (defeating the point of the prefix entirely, and adding annoying noise to the name forever).

This is frustrating, and it's broken some real standards:

Almost all web forms on the internet submit data with an unnecessarily confusing & long-winded Content-Type: application/x-www-form-url-encoded header.
In the 1997 RFC for HTTP where it defines the parsing rules for content-encoding, it requires all implementations to treat x-gzip and x-compress as equivalent to gzip and compress respectively.
The standardized header for configuring web page framing is now forever X-Frame-Options, not just Frame-Options
Similarly, we have X-Content-Type-Options, X-DNS-Prefetch-Control, X-XSS-Protection, and various X-Forwarded-* CDN/proxy headers, all of which are widely implemented and have become either formally or defacto standard headers in widespread use.

If you want to use a custom header, just use a custom header name that's not standardized by anybody else. If you really want to avoid collisions, consider namespacing it, but you're usually pretty safe if there's no standard header that appears after a 30 second google.

Standardization is hard, and HTTP is full of weird corners and odd details when you look closely. Let me know what you think on Twitter.

Interested in inspecting & rewriting HTTP for yourself? Try out HTTP Toolkit.

Cache your CORS, for performance & profit

Wed, 17 Feb 2021 17:00:00 GMT

CORS is a necessity for many APIs, but basic configurations can create a huge number of extra requests, slowing down every browser API client, and sending unnecessary traffic to your backend.

This can be a problem with a traditional API, but becomes a much larger issue with serverless platforms, where your billing is often directly tied to the number of requests received, so this can easily double your API costs.

All of this is unnecessary: it's happening because you don't know how caching works for CORS requests. Let's fix that.

What are CORS preflight requests?

Before your browser makes any request that crosses origins (e.g. example.com to api.example.com) if it's not a simple request then the browser sends a preflight request first, and waits for a successful response before it sends the real request.

This preflight request is an OPTIONS request to the server, describing the request the browser wants to send, and asking permission first. It looks something like:

OPTIONS /v1/documents
Host: https://api.example.com
Origin: https://example.com
Access-Control-Request-Method: PUT
Access-Control-Request-Headers: origin, x-requested-with

The server has to respond with headers that confirm it's happy to accept the request, and the browser will wait to send the real request until this happens.

If you want to check exactly how these CORS rules work, and how you should respond, play around with Will it CORS? to test out the possibilities.

In practice, almost all cross-origin API requests will require these preflight requests, notably including:

Any request with a JSON or XML body
Any request including credentials
Any request that isn't GET, POST or HEAD
Any exchange that streams the request or response body
Use of any headers other than Accept, Accept-Language, Content-Language and Content-Type

Why is this bad?

Each of these requests blocks your real request for at least the round-trip time to your server. OPTIONS requests aren't cacheable by default, so your CDN won't usually handle them, and this will have to hit your server every time.

They are cached in clients, but only for 5 seconds by default. If a web page polls your API, making a request every 10 seconds, it'll repeat the preflight check every 10 seconds too.

In many cases this effectively doubles the latency of your API for all browser clients. From the end user's point of view, your performance is halved! And as I'm sure you've heard a hundred times, a few hundred milliseconds of delay translates to big differences in conversion rates & user satisfaction. This is pretty bad.

In addition, it can add meaningful extra load & cost to your API servers.

This applies especially with serverless billing models. Platforms including AWS Lambda, Netlify Functions, Cloudflare Workers and Google Cloud Functions all bill based on the number of function invocations, and these preflight requests count towards that like any other. Serverless can be free when you're small, but becomes more expensive once large production systems are in play, and potentially doubling your costs is a huge hit!

Even without serverless, this can still catch you out badly. If you expect a large percentage of your APIs requests to be handled by your CDN, it can be a major surprise when adding a custom header to browser requests creates an extra request right through to your backend servers for every single client request.

How can you cache preflight responses?

There two steps of caching you should put in place for these:

Caching in the browser, so individual clients don't repeat the same preflight request unnecessarily.
Caching in your CDN layer, where possible, to treat these as constant responses so your backend servers/functions don't have to handle them.

CORS caching for browsers

To cache CORS responses in browsers, just add this header to your preflight responses:

Access-Control-Max-Age: 86400

This is a cache time in seconds.

Browser limit this: Firefox caps the value at 86400 (24 hours) while all Chromium-based browsers cap it at 7200 (2 hours). Making this request once every 2 hours instead of before every API request can be a big improvement in user experience though, and setting the value higher to ensure that even longer lifetimes apply where possible is an easy win.

CORS caching for CDNs

To cache CORS responses in CDNs and other proxies between the browser and your API server, add:

Cache-Control: public, max-age=86400
Vary: origin

This caches the response in public caches (e.g. CDNs) for 24 hours, which should be enough for most cases without risking cache invalidation becoming a problem. For initial testing, you might want to set the cache time shorter, and increase it once you're happy that everything is set up correctly.

It's important to note that this isn't standard (OPTIONS is defined as not cacheable by default) but it does appear to be widely supported by most CDNs, who will happily cache OPTIONS responses that explicitly opt-in like this. Some may require this to be manually enabled, so do test this in your configuration.

In the worst case, if this is not supported by your CDN it will just be ignored, so there's no real downside.

The Vary header here is important: this tells the cache to use this response only for other requests with the same Origin header (requests from the same cross-origin source), in addition to using the same URL.

If you don't set a Vary header, you can have big problems. Preflight responses often include an Access-Control-Allow-Origin header that matches the incoming Origin value. If you cache the response without setting Vary then the response with one origin might be used for a request with a different origin, which will fail the CORS checks and block the request completely.

If you're using other CORS response headers that depend on the request, you should include them here too, e.g:

Access-Control-Allow-Headers: my-custom-header
Access-Control-Allow-Methods: GET, POST, PUT, DELETE
Vary: Access-Control-Request-Headers, Access-Control-Request-Method

If you want to test any of this this out right now, install HTTP Toolkit, add a rule that matches your requests, launch an intercepted browser, and you can try manually injecting these headers into API responses to see exactly how browsers handle them.

Configuration examples

How do you configure this in your case? There's some some helpful ready-to-go examples below. In each case, I'm assuming you already have preflight CORS handling set up, so we're just thinking about how to add caching on top of this.

Caching CORS with AWS Lambda

To enable CORS with AWS Lambda, you can either manually return the headers above in your HTTP response, or you can configure API Gateway to handle CORS for you.

If you use API Gateway's configuration, this allows you to configure the Access-Control-Max-Age header, but will not set Cache-Control by default, so if you're using CloudFront or another CDN, you should manually configure that and Vary too.

Alternatively, you can control this all yourself in a preflight lambda handler, like so:

exports.handler = async (event) => {
    const response = {
        statusCode: 200,
        headers: {
            // Keep your existing CORS headers:
            "Access-Control-Allow-Origin": event.headers['origin'],
            // ...

            // And add these:
            "Access-Control-Max-Age": 86400,
            "Cache-Control": "public, max-age=86400",
            "Vary": "origin"
        }
    };

    return response;
};

CloudFront specifically includes separate configuration that enables caching for OPTIONS responses, so you should ensure this is enabled if you're using Cache-Control here.

If you're using the Serverless framework, you can do this automatically in your serverless.yml instead, for example:

functions:
  hello:
    handler: handler.hello
    events:
      - http:
          path: hello
          method: get
          cors:
            origin: '*'
            maxAge: 86400
            cacheControl: 'public, max-age=86400'

Caching CORS in Node.js

If you're using Express, Connect, or a framework based on them, then you're probably using the cors module to handle CORS.

By default this doesn't enable any kind of caching at all, but you can configure Access-Control-Max-Age by passing a maxAge value.

You can't easily configure Cache-Control, so if you're using a CDN you probably want to do something slightly more complicated:

app.use(cors({
    // Set the browser cache time for preflight responses
    maxAge: 86400,
    preflightContinue: true // Allow us to manually add to preflights
}));

// Add cache-control to preflight responses in a separate middleware:
app.use((req, res, next) => {
    if (req.method === 'OPTIONS') {
        res.setHeader('Cache-Control', 'public, max-age=86400');
        // No Vary required: cors sets it already set automatically
        res.end();
    } else {
        next();
    }
});

Caching CORS in Python

Django's django-cors-headers module includes a reasonable default of 86400 as its Access-Control-Max-Age value.

Meanwhile Flask's Flask-Cors module enables no caching at all by default, but it can be enabled by passing max_age=86400 as an option in your existing configuration

With that, you can ensure that browser properly cache these responses. If you want CDN caching too then you'll need to manually configure Cache-Control. Unfortunately, as far as I can tell, neither module supports custom configuration or an easy workaround for this, so if CDN caching is important to you then you'll probably need to manually handle preflight requests, or wrap these modules yourself.

Caching CORS with Java Spring

With Spring, you're probably already using the @CrossOrigin annotation to handle CORS requests.

By default Spring will set a 30 minutes Access-Control-Max-Age header with this, adding relatively short caching in each individual browser, but won't set a Cache-Control header.

I'd suggest you increase the max age to 24 hours (86400 seconds, the maximum used by any browser) by setting the maxAge option, and also add the Cache-Control header if you're using a CDN. Spring's built-in CORS configuration doesn't have support for doing the latter automatically, but you can easily add the header yourself using a response filter:

@Component
public class AddPreflightCacheControlWebFilter implements WebFilter {
    @Override
    public Mono filter(ServerWebExchange exchange, WebFilterChain chain) {
        if (CorsUtils.isPreFlightRequest(exchange.getRequest())) {
            exchange.getResponse()
                .getHeaders()
                .add("Cache-Control", "public, max-age=86400");
        }
        return chain.filter(exchange);
    }
}

I hope this helps improve your CORS performance and reduce your API traffic! Have thoughts or questions? Feel free to in touch on Twitter or directly.

Debugging APIs and want to inspect, rewrite & mock live traffic? Try out HTTP Toolkit right now. Open-source one-click HTTP(S) interception & debugging for web, Android, servers & more.

Fixing DNS in Node.js

Wed, 17 Feb 2021 16:00:00 GMT

DNS is one of those invisible technologies that you use every day, but which works so well that you can conveniently ignore it.

That is until it breaks completely, or slows your application down to a crawl, or you want to resolve something unusual, and then you're in a world of trouble.

Unfortunately, DNS in Node.js isn't implemented quite as you might have hoped, and includes some gotchas that can create some of these problems for you, when you least expect it. If you're a node developer, it's worth understanding how it works in detail, so you can understand what your code is really doing, and boost your application performance & reliability.

Why is Node.js DNS problematic?

DNS is how you turn domain names (example.com) into IP addresses (93.184.216.34). That sounds easy, but there's a whole world of complexity in here.

Critically, this is something that most modern applications do a lot, every time you make an HTTP request or any other network request to a machine, on the internet and often within your internal infrastructure too.

In node specifically this can go wrong in quite a few ways, because:

DNS requests in node appear asynchronous, but they're actually internally implemented as synchronous calls within node's internal libuv threadpool (which by default has only 4 threads). That means if you do >4 DNS lookups in parallel then you're going to block the libuv threadpool, even though they look like async IO. This will block every other DNS lookup, and also unrelated file IO and various crypto APIs, creating some extremely confusing performance problems.
Node itself doesn't do any DNS caching at all. All of that is delegated to the OS, out of your control, and every DNS lookup must go to the OS every time.
DNS lookups aren't free. If you're frequently making requests to a wide variety of different hostnames or hostnames with a short TTL (e.g. if you support webhooks, if you're building a monitoring tool, or if you're polling URLs elsewhere) then you may quickly find you're spending a substantial amount of your network time on endless DNS queries.
When your DNS server goes down, all your outgoing requests will fail (eventually, once some invisible OS caching runs out) in an exciting new way that you haven't seen or tested before.
When your DNS server gets slow, eventually all your outgoing requests will become inexplicably slow, even while your connection appears to be working otherwise fine.

Because DNS is taken out of your hands in most cases, when this goes wrong it goes really wrong. Do you know whose DNS servers you're using in production, or how you'd change them? If not, it's going to be pretty hard to work around issues or even to check the status of those servers to confirm whether any of the above issues are your problem or theirs. Not fun.

This isn't theoretical. At Zalando they hit mysterious HTTP timeout errors that broke their internal API requests, which they eventually traced to node's DNS implementation failing to handle certain AWS configurations. Meanwhile Phabricator reconfigured their infrastructure to use raw IP addresses just to avoid node DNS issues, before eventually implementing an in-memory cache, and Yahoo implemented their own DNS caching module to try and improve node's DNS performance.

It would be great to avoid these issues. It's useful to at least have the option to take control over your DNS, to manage these risks, and give yourself visibility into a system that's involved in every single network request you make.

How does Node's DNS work?

Node's dns module is where all the magic happens. Internally, this is used by the http module and everywhere else, anywhere that node needs to translate a name into an IP address.

This defines a wide variety of useful methods, including dns.setServers to change the configured DNS servers, dns.resolveAny to look up all records for a hostname, and dns.reverse to look up a hostname from an IP address, and lots more. In addition, it exposes a Resolver class that you can use to encapsulate custom DNS configuration, and constants for all the various error codes & DNS lookup flags you might be interested in for these queries.

Surprisingly though, when node uses the dns module itself, it doesn't use any of this.

Instead, when node does DNS it only uses the weird dns.lookup function, which acts completely differently to everything else here, and ignores every DNS setting you might configure in your application. Excellent.

What's up with dns.lookup?

This is the key behind the 'default' node.js behaviour above. Rather than making DNS lookups explicitly, as dns.resolve* etc do, dns.lookup calls the synchronous getaddrinfo function from libc, which always uses the OS's own configuration and hostname resolution mechanisms.

This has a few important effects. This function:

completely ignores any servers you might have configured with dns.setServers.
can lookup things that aren't technically DNS, including names in your hosts file, localhost, and mDNS/Bonjour hosts on your local network.
is synchronous, and just simulates asynchronous behaviour from a JS point of view by blocking one of the internal libuv threads.
doesn't cache anything: it calls through to getaddrinfo to look up the name from scratch every. single. time.

This usually works fine as a default, and it's a good option for ensuring you can resolve any possible hostname, but in production it can fail & create performance problems in all sorts of fun ways.

How do you change Node's DNS?

You can reconfigure the DNS mechanism used by any socket by passing the lookup option, with a replacement function that has the same signature as dns.lookup.

This works for net.Socket, for http.request and http.get, and everything else similar. Like so:

http.get("http://example.com", {
    lookup: myCustomLookupFunction
});

That function will be called instead of dns.lookup during this HTTP request to look up the example.com hostname.

So it's easy to change the lookup function. What should we change it to?

Building a better DNS configuration

There's two key things we can do to improve on Node's defaults:

We should cache all lookups in memory in node (according to the DNS record's TTL) so that we don't block libuv and (so far as possible) our application doesn't unnecessarily wait for DNS resolutions elsewhere.
We should configure reliable & fast DNS servers, to improve performance, and make us more resilient to any individual DNS failure.

You could even go further with this, and add custom logic to our DNS resolution to do more exciting & wild things. It's just a function, you can resolve things however you like! For example, HTTP Toolkit's upcoming automatic Docker interception reconfigures its DNS so that it can resolve traffic between docker container network aliases automatically, from a node process running entirely outside docker (yes, building that is the original reason I started down this whole rabbit hole).

This is powerful, and there's a whole world of resolution games available with custom lookup functions, for everything from unusual service discovery approaches to monitoring & measuring your DNS queries.

For now though, let's focus on the immediately practical improvements.

Caching & configuring DNS in Node.js

To add caching & configure custom servers, there's a few different options. I'd strongly suggest you don't write your own, and you pick up a module off the shelf.

Personally, I think a good option is cacheable-lookup. This is part of Got, a popular HTTP client from Sindre Sorhus. It supports caching, which observes the DNS TTL plus a configurable maximum cache time, it supports configurable error caching too, it supports custom DNS servers, it (optionally) supports fallback to dns.lookup for domains that can't be resolved via real DNS like localhost et al, it's all asynchronous, and it's super easy to use.

Setting that up looks like this:

const http = require('http');
const CacheableLookup = require('cacheable-lookup');
const cacheable = new CacheableLookup({
    // Set any custom options here
});
cacheable.servers = [
    // Put your DNS servers of choice here
];

// Configure this to be used by a single request:
http.get("http://example.com", {
    lookup: cacheable.lookup
});

// Or configure this as the default for all requests:
cacheable.install(http.globalAgent);

If you'd like to add DNS servers, rather than replacing the default DNS configuration (i.e. node's own default DNS servers), then you can use cachable.servers.push(...) instead.

When multiple servers are configured, subsequent servers are queried only when the first server is inaccessible or fail to respond correctly. They're not used if the first server explicitly returns a NOTFOUND response.

For all hostnames that can't be resolved by these servers (for any reason at all) by default cacheable-lookup will then fall back to dns.lookup, and cache that result as normal. This is really useful for apps like HTTP Toolkit that are heavily used by developers, and so are often used with local servers and unusual network configurations, but it might not be applicable in other cases.

If you frequently make requests to domains that don't resolve then this extra step could create its own performance problems, so if you don't need this for your case then you can disable it by passing a lookup: false option.

When should you not do this?

Don't follow these suggestions blindly! This applies to all posts about performance on the internet. Don't make performance improvements without testing that they really improve real performance in your application specifically.

Whether these changes are helpful depends a lot on your specific system, and the pattern of DNS lookups your application is making. Test it!

In general even if this doesn't help it won't hurt, except for three specific cases:

If it's very important that you share your DNS cache with the rest of the OS, rather than caching lookups just in your node process. This may be affect performance in some process cluster scenarios (but, again, test it!).
If a major percentage of your DNS lookups are not really domain names: e.g. localhost, mDNS names, or host file mappings. In that case you're going to end up doing dns.lookup to resolve this in the end anyway, so making separate DNS requests elsewhere first may be unhelpful.
If a major percentage of your DNS lookups don't resolve. In that case this will work, but you probably want to disable dns.lookup fallback entirely by passing lookup: false to cacheable-lookup.

Hopefully that's a useful way to quickly improve the performance & reliability of your node network requests. Have any thoughts or questions? Feel free to in touch on Twitter or directly.

Debugging APIs or HTTP clients, and want to inspect, rewrite & mock live traffic? Try out HTTP Toolkit right now. Open-source one-click HTTP(S) interception & debugging for web, Android, servers & more.

How to Index 226,379 API Endpoints

Thu, 11 Feb 2021 13:00:00 GMT

Wouldn't it be neat if you could take any HTTP request URL, and immediately find the matching API documentation, along with a detailed specification of the endpoint you're talking to?

I thought that would be very neat, so I built an index to do it.

More specifically, I've taken the OpenAPI Directory - an amazing project that has collected the OpenAPI specifications for thousands of popular APIs, from AWS to Stripe to GitHub - and built a tiny & fast index to query it, for every single API endpoint listed (all 226,379 of them)

This was fun, but it's also useful: HTTP Toolkit uses this to automatically provide validation, metadata and links to documentation, so that when you intercept & inspect a request to a known API you can tell exactly what it's doing (and what it's doing wrong, if anything).

If you want to give this a quick test yourself, you can play with a demo directly at runkit.com/pimterry/openapi-directory-js-demo.

If you want to know how this works, read on. This gets pretty complicated, but it's a really interesting set of algorithmic challenges and neat datastructures that solve some concrete problems.

Why is this useful?

API specifications have been a growing field a long time, but they're really coming to maturity now. For a time we had fighting standards for this, but in the last few years one standard has reached widespread use and become the clear choice: OpenAPI.

API specifications allow tools and automated systems to automatically understand API interactions. That opens a whole world of options:

Documentation generated from specifications, staying automatically up to date, with tools like ReDoc.
APIs that can be automatically tested against their own specification to guarantee a single source of truth.
Interactive HTTP clients that can suggest the endpoints or query parameters you're looking for, and spot mistakes automatically.
SDKs that can be generated directly from API specifications for 40+ languages with tools like Swagger-CodeGen.
Error reporting tools that can include human-readable explanations of errors and links to documentation, drawn straight from the spec.
Mocking tools like Stoplight's Prism that can automatically generate API mock servers.
Type systems that know the type of HTTP response bodies automatically, without needing a locally hardcoded copy of the API's response format.
HTTP debuggers like HTTP Toolkit that can add metadata and validation to intercepted traffic.

All of this is great, and OpenAPI specifications are now published by most major APIs, like GitHub, Stripe and Twilio. Even more are written and maintained by the community, for APIs from healthcare.gov to XKCD.

Most of this is focused on single API usage though: the benefits when using specs internally, or when manually hunting down the spec for a single API you can about and using it while making requests or testing against that one API.

If you could do this for all APIs, you could provide these kind of benefits to all your HTTP traffic, with no manual research required. Your HTTP client could automatically validate all your HTTP requests and parse & type your HTTP responses for you, your error reporting could include detailed error descriptions from docs in exception messages instead of just "Bad status: 400", and automated testing tools could automatically mock all external APIs out of the box.

There's a whole world of amazing tooling you can build if detailed machine-readable context is available alongside every single HTTP request. Of course, we're not going to get a specification for every single API in the world, but with the OpenAPI Directory we can get damn close.

How does this work?

To do this, at a high level, we need to:

Take all the OpenAPI Directory specs and normalize them into a standard format.
Derive all the URLs that can identify each API.
Build an efficient data structure to conveniently index these for quick lookup.
Build an easy way to look up the right spec for a given request URL.

Of course, 'efficient' is a relative term. In HTTP Toolkit's case, we're potentially looking up URLs 100s of times a second as traffic comes past, and it's already busy doing lots of other things, so queries need to be significantly less than a millisecond of CPU time. If it did take longer, say a couple of milliseconds, and we intercept a few hundred requests in a second-long burst then we could plausibly spend most of our time querying this index, and the UX is going to go downhill fast.

It's also very useful to have the index be a small-ish file so we can, for example, embed it into a webpage and cheaply pass it around. Comfortably under 1MB uncompressed seems like a good goal.

That's the plan, let's dig into the details:

Standardizing the specs

If you're not familiar with OpenAPI itself, the basics are pretty simple: a specification is a YAML or JSON file that defines bits of metadata for an API (the name, the URLs of its servers, links to the docs), lists all the endpoints and operations for each endpoint with info (what does it do, perhaps some docs here too), and provides schemas for all the input & output values from the API, so you know what data you're handling. Twitter's OpenAPI spec is a fairly simple example.

OpenAPI is really powerful as format for this. Primarily because it has all the data we need for the myriad use cases above, but also because there's a huge world of existing tooling around it to support more advanced transformations and analysis, and it's fundamentally a fairly simple human-readable format that fits neatly in with other standards like JSON Schema. This all sounds great!

I had expected that getting specs in the OpenAPI Directory ready to process would be easy, but it very much isn't. There's a few problems here:

There's at least two major incompatible versions of OpenAPI in circulation: Swagger v2 (the old name) and OpenAPI v3+.
Both can be in either YAML or JSON format.
Many include $ref references that point to external resources (typically to partial schemas that are shared between multiple specs). You can't quickly or reliably use a spec if it requires a network connection to go fetch extra details, so we want to bundle all of these inline.
There's multiple versions of the same specs in many cases: sometimes mistakenly, and sometimes for genuinely different versions of the same API.
Many of these specs use non-standard extensions, e.g. Microsoft puts many of their endpoints under a x-ms-paths property instead of paths, so it can break certain rules.
Many others have bugs in either the original specs, or in OpenAPI Directory's filing of them, from $ref references to invalid local or external address, through to outright spec violations and invalid content. Some examples: https://github.com/APIs-guru/openapi-directory/issues/540.

To make all these specs consistently and easily usable, and make it possible to build our index at all, we need to normalize them.

This is a potentially endless task, but a few small steps take us a long way:

You can automatically convert Swagger to OpenAPI using the excellent Swagger2OpenAPI package.
You can bundle external references using API Dev Tool's @apidevtools/swagger-parser package. This also handles the initial parsing and does some validation for us en route, so this is perfect.
It's possible to manually transform many extensions into formats that work for our purposes and are valid enough that we can access the data within consistently with everything else.
The OpenAPI directory has a custom x-preferred field to indicate the preferred spec for each endpoint, allowing us to filter our specs to just the main official version.
Filing bugs and working through them, either with the API team themselves (whose details are listed in each spec, for official specs) or in the OpenAPI Directory directly (for custom or automatically generated specs like AWS's) has managed to fix up most remaining serious kinks.

All put together, the resulting code is not so bad at all:

import * as swaggerParser from '@apidevtools/swagger-parser';
import * as swaggerToOpenApi from 'swagger2openapi';

async function generateApi(specPath: string): Promise {
    // Happily parses both Swagger v2 and OpenAPI v3 files
    const parsedSwagger = await swaggerParser.parse(specPath);

    // Convert everything to OpenAPI v3
    const openapi: OpenAPIV3.Document = await swaggerToOpenApi.convertObj(parsedSwagger, {
        direct: true,
        patch: true // Automatically fix various common minor errors
    });

    // Extra conversion to transform x-ms-paths into normal-ish paths:
    mergeMsPaths(openapi);

    // Manually patch some other small known API issues:
    patchSpec(openapi);

    // Bundle all external $ref pointers:
    return > swaggerParser.bundle(openapi);
}

This immediately gives us a workable, valid, and mostly consistently formatted set of 2,000 API specifications to work with.

Before we go any further, I should give some credit to the ecosystem that makes this possible. There's been a huge amount of work behind the scenes here to build this collection of packages, including the API Dev Tools team (I believe are partly backed by Stoplight?) who maintain many of the tools I've used here, APIs.guru who built more and started the OpenAPI Directory itself in 2015, and especially from Mike Ralphson who's been almost single-handedly maintaining, fixing & expanding this huge API collection for 4+ years now. Amazing work.

Identifying every API

Once we have a big bag of consistent API specifications, we can start talking about indexing them. The goal here is to take a URL, and to automatically find the specification for that URL, with a tiny portable fast index we can use anywhere. That means the first step is to work out the identifying URLs for each API.

For some specifications, this is easy data to collect. OpenAPI specifications have a top-level servers field, which tells you which servers host the API and at what path:

  "servers": [
    {
      "url": "https://api.gigantic-server.com/v1",
      "description": "Production server"
    }
  ]

In the simplest case, there's a single spec for a service, that says the API is hosted at a single address like https://example.com/api/v1. We put that base URL in the index, and for any request URL that starts with this base address, we look up that spec for validation.

We just need to:

Get all our normalized specs.
Get the base server URLs from each.
Build a map from base server URL to the spec id, for each base server URL.

Easy!

Not so fast. In quite a few cases (e.g. Twilio, Azure, AWS) many specs share the exact same server base URLs, and each defines a different subset of the endpoints for part of the larger API.

To differentiate these, we need to check every separate endpoint in each spec…

Precisely matching API endpoints

So the base servers addresses from the spec are not specific enough. Fortunately the full endpoint paths are unique (well, almost - there are one or two places where specs do define exactly the same endpoints - but few enough to ignore). If we can filter a little more precisely to match the full endpoint path, then we can generally find the specific spec we're looking for.

To do this, we're going to need to collect every possible endpoint URL for every single API, and look up APIs from that using the full request URL.

One option would be to include every single endpoint URL in the index, just building a map from endpoint to API specification.

That works, it's totally correct, but it's not pretty. It's too big, and it's expensive to query.

It's too big not just because the specs themselves define 40,000 or so endpoints, but because in many cases there's many server base URLs (e.g. AWS has one per region for many services) and there's variables within endpoint paths and server URLs themselves, and every combination of these server & path values is a valid endpoint.

In practice, there are 226,379 defined API endpoints here in total (today, and that's increasing fast). This makes for an index large enough that we can't distribute anywhere near as easily as I'd like. Stored naively, the raw data alone is about 17MB. Not great.

Instead of trying to put all of those into a single index, we want to get the smallest set of base URLs that uniquely identifies each spec.

For example, if you have one API specification that includes

example.com/api/billing/customers
example.com/api/billing/orders
example.com/api/billing/refunds
example.com/api/files/invoices

and then a 2nd specification that includes

example.com/api/social/posts
example.com/api/social/comments
example.com/api/files/user-uploads

then we could say:

API 1 should match any URLs starting with example.com/api/billing or example.com/api/files/invoices
API 2 should match any URLs starting with example.com/api/social or example.com/api/files/user-uploads

Since most APIs cover a specific sub-path or a couple of them, this helps enormously. Reducing the set of endpoints like this takes us from 226,379 URL keys in our index down to a mere 10,000. Amazing!

Implementing this in practice though is non-trivial. If you're really interested, you can look at the full implementation, but my algorithm is basically:

Get a set of APIs that all have the same base server URL, and the list of paths that each one defines.
For each spec:
Create an initially empty list of prefixes unique to this spec
Loop through the endpoints in this spec, and build a list of common prefixes:
- If a prefix for this spec already covers this endpoint, we're good, skip it.
- If it doesn't, see if we have any prefixes we could shorten to include this endpoint without matching any other spec's endpoint. If there's multiple options, use the most specific.
- If we can't, push this specific path as a prefix for this spec
Build an index that maps these prefixes to this spec
Take any sets of still-conflicting prefixes in this index (i.e. two specs that define the same exact endpoints) then shorten them together, down to their shared conflicting prefix.
- If we don't do this, the previous step creates a separate conflicting prefix for every single conflicting endpoint, which blows up the size of the index hugely. If, for example, two specs conflict both define all the same example.com/api/billing endpoints (yes, Azure basically does this) then we just want one index value that points to both specs, not an index value for every single endpoint.
Merge the base URL plus path prefixes from this index back in with the base URL index we've used for every other spec.

Phew. Are we done? Not quite! We now have a map of URL prefixes to API specs, but it's still bigger than we'd like (about 1MB) and it's still not practical for quick querying.

Building the index

As I mentioned at the start, for HTTP Toolkit purposes (and for many interesting purposes), we want to be able to distribute this index in significantly less than 1MB, and query it in less than 1ms.

We've got it down to our size limit, but only just. That is before loading any spec content, to be clear: that's just the size of the index mapping from URL prefixes to spec ids (the ids are things like stripe.com or amazonaws.com/s3 - they're defined uniquely in the spec metadata).

It gets worse when we want to query this though, because doing so is not a straight string lookup. There's two big problems:

We're matching string prefixes of varying lengths. That means we can't do index[myUrl] to look values up directly. Instead, we need to compare every single key in the index with our input, like input.startsWith(indexKey), and then pick the longest matching key. Ouch.
Even worse: parts of the inputs are various kinds of wildcards (modeled as regexes within our index key) so now we need to potentially do a regex matching against every single key.

This is bad! To naively do this, we have to string match and/or regex match against every single key in the index. Even with some simple optimizations, this requires a lot of string prefix comparisons, and it takes about 10ms to query on my laptop - that's 100% CPU time, with no IO involved, on a fairly fast machine. Ok for some use cases and occasional queries, but not ok if you want to query this frequently with live HTTP traffic.

We need a fundamentally different index structure.

Let's talk about Tries.

Introducing Tries

A trie (pronounced 'tree', confusingly, and also known as a prefix tree) is a tree-like data structure that shares prefixes between keys. Wikipedia has some detailed background if you want to dig into the details, but I'll summarize here.

It's easiest to explain with a picture:

This trie stores a set of strings: "to", "tea", "ted", "ten", "A", and "inn". The characters of the strings are stored as links between nodes. To look up "ted", you start at the top, follow the link for "t", then the link for "e", then the link for "d".

You can extend this further by storing values in the nodes. Once you do that, you have a map from strings to values: you follow the characters of your key until you reach of node, or run out of links, and then you read the value from the node at the end.

Trie structures are useful for building indexes like this, because:

They store overlapping strings very efficiently, without duplicating shared prefixes. Where we have APIs indexed by the full URL of each endpoint, we have a lot of shared prefixes!
They can be quickly queried by prefix: we walk just down the trie making simple string hashmap lookups until either there's no next key which matches, or we reach a leaf value, and then we're done. It doesn't matter if it's an exact match or a prefix match, and the time taken is proportional just to the number of unique parts (like 't', 'e', 'd' above) in the key, not the number of keys in total.

That's the basics. We extend this concept a little further and fill it with API data, so that our real trie looks more like this:

{
  "api.": {
    "httptool": {
      "kit.tech": "http-toolkit-api"
    },
    "example.": {
      "com": "api.example.com",
      "org": {
        "/v1/": {
          "": "api.example.org-main-spec",
          "auth": "api.example.org-auth-spec",
          "bill": {
            "ing": "api.example.org-billing-spec"
          }
        }
      }
    }
  },
  "cool": {
    "-api.test/api/": {
      "v1": "cool-api-spec",
      "v2": "cool-api-v2-spec",
      "v3": "cool-api-v3-spec",
      "v4": "cool-api-v4-spec",
      "v5": "cool-api-v5-spec"
    }
  }
}

That trie is equivalent to:

{
  "api.httptoolkit.tech": "http-toolkit-api",
  "api.example.com": "api.example.com",
  "api.example.org/v1/": "api.example.org-main-spec",
  "api.example.org/v1/auth": "api.example.org-auth-spec",
  "api.example.org/v1/billing": "api.example.org-billing-spec",
  "cool-api.test/api/v1": "cool-api-spec",
  "cool-api.test/api/v2": "cool-api-v2-spec",
  "cool-api.test/api/v3": "cool-api-v3-spec",
  "cool-api.test/api/v4": "cool-api-v4-spec",
  "cool-api.test/api/v5": "cool-api-v5-spec"
}

In the former case, we can look up a value with between 3 & 6 hashmap lookups (index[nextUrlPart]). In the latter, we need to do myUrl.startsWith(key) at least 10 times, and the index contains many string prefixes (like api.example) that get repeated over and over again.

As the index gets larger, the trie version stays relatively compact and quick to query, while the simple map duplicates everything, and takes time directly proportional to the total size of the index. Although it appears longer as presented here, if you strip the whitespace from the trie example above it's already 25% shorter than the simple map, and that effect gets stronger as the number of overlapping strings in the index increases.

Tries in practice

Theory is all very well, but how does this actually work? The implementation is in two parts:

First, we build the trie as part of the indexing process.
Later, at runtime, we load and query the trie.

There's some details we'll talk about in a second, but the quick summary is:

To build the trie, for each key we incrementally build the tree of hashmaps, going through the key character by character, and then we use string values containing the specification id as leaf nodes at the end.
To query the trie, you walk the hashmaps, each time looking up the next part of your input URL. If you reach a leaf node before you run out of URL, then you have a prefix match for that id. If you reach a step where there's no key in the hashmap that matches your URL, or you run out of URL, then you've matched nothing.

On top of that, there's a few interesting and notable tricks in here:

Compressing the trie
Using prefix branches
Using regex keys

Compressed tries

We're actually using a compressed trie. Technically the trie index example above was a compressed trie too, and a normal trie should only have single-character keys, like:

{
  "a": { "p": { "i": { ".": { /* ...Etc */ } } } }
}

This is simpler and it's much easier to mutate if you're doing that, but it's clearly less efficient. We can avoid it! We do build our trie initially character by character like this, but then when it's done we compress it, by collapsing any node with just a single non-leaf child into that child, until we get larger strings (as in the index example above). For example, we turn { "a": { "b": ..., "c": ... } } into { "ab": { ... }, "ac": { ... } }.

This shrinks the tree a lot, but still ensures it's queryable by keeping the keys at each level always the same length. To look something up at one level, we find the key length by looking at any key, then get the corresponding next N characters of our input, and then look that up that string in the hashmap. Keeping the key length at each level the same allows us to do simple lookups everywhere, so we never have to scan over keys at any level.

Prefix branches

If you have two keys where one is a prefix of the other (api.com and api.com/v2 for example) you need a way to match the shortened version.

We represent that using empty string keys, like { "api.com": { "": "default-spec-id", "/v2": "v2-spec-id" } }. If there's ever no matching string value whilst querying, we look for an empty string value, and use that if present.

Regexes in tries

We need regexes in our keys because we can have wildcard placeholders within our URL paths (in fact, OpenAPI supports more specific cases, but we just treat them as wildcards for our purposes). For example, a URL might be api.example.com/users/{user-id}/invoices. To handle this, we use a single regex to act as a path placeholder ([^/]+). We always use this same regex, so there's only ever one regex at any level, which simplifies things.

To put this in the index, we use that regex as a key in the map directly (using an ES6 map, so we can use non-string keys) while building the index. Later, if we ever can't find a matching string value whilst querying, and there's a regex at the same level, we test that and continue that way if it matches.

For api.example.com/users/{user-id}/invoices, that might result in an index like:

{
  "api.example.com": {
    "": "main-spec",
    "/users/": {
      "": "main-users-api-spec",
      /[^/]+/: {
        "/invoices": "user-invoices-spec"
      }
    }
  }
}

Trie performance

This could absolutely be optimized further, but it's enough for our goals. In practice when indexing our 10,000 endpoints prefixes, the deepest key has 45 steps in it (an extremely heavily conflicting endpoint, deep in the darkest corners of the Azure API). That's a lot, but it's much cheaper to do 45 hashmap lookups than 10,000 string+regular expression tests. And that's the worst case: most lookups need just 2 or 3 steps to get a result.

In practice, this trie-based implementation can find the right spec within those 10,000 prefixes in just under 0.2 milliseconds on my laptop. That's 50x times faster than the naive alternative, and that difference will increase as the API directory expands further. It ends up reasonably sized too, weighing in under 500kB for the whole index with all the runtime code required to query it.

That's 500kB with some very simple & convenient serialization, and so it compresses far further, gzipping down to just 54kB on the wire. Not bad to match 200,000 API endpoints!

If all this sounds interesting, or you want to get more precise on the details, do look through the buildtime & runtime trie index implementations for the full story.

Putting it all together

We've now got a big box of standardized specifications, with an index that references to them. How do we distribute it?

First, all the above gets run at build time, and then the index and all the APIs are serialized to disk. The APIs are all stored as JSON, for easy & consistent parsing, and the index is serialized using serialize-javascript, which allows us to directly serialize things like ES6 maps and regexes, using a superset of JSON.

Then it's bundled it up for runtime use as an npm package, containing an /api/ folder, which includes every API spec and the index data itself, and a small set of runtime code to actually querying the index. To use it, you just need to do:

const { findApi } = require('openapi-directory');

// Look up the API in the index from a request URL:
const apiId = findApi(requestUrl);

// With the id, require() the full specification:
const apiSpec = require(`openapi-directory/api/${apiId}`);

The resulting package, including the index and normalized specifications, is distributed as an npm package so you can easily use the directory (with the index, or just the raw specs themselves) in JS codebases on the front or back end. It's not currently easily usable in other languages or as an API itself, but if there's interest that'd be easy to do (if you're personally interested, get in touch!).

And with that, HTTP Toolkit and other tools can automatically grab all the metadata data required to do things like this:

That's a request to GitHub, fully annotated with the API endpoint details, links to docs, parameter info, and fully validated against the spec, all automatically derived from just the request shown. And the same thing works for another 2,000 APIs or so, all without a manually configured specification in sight.

The full package with all the data that makes this possible is open-source & published on npm and so the raw content is also available directly from unpkg, so anybody can immediately start building OpenAPI-powered tools on top of this.

I hope that was interesting! Have a play with the demo, test it out in HTTP Toolkit, and feel free to get in touch by email or Twitter if you have any questions or you start building on this yourself.

Want to debug, test or mock HTTP(S), from browsers, servers, phones, and everything else? Try out HTTP Toolkit now.

The right way to turn off your old APIs

Thu, 21 Jan 2021 10:30:00 GMT

All things come to an end, even HTTP APIs. However great your API may be today, one day you'll want to release a completely new version, an improved but incompatible endpoint, a new parameter that solves the same problem better, or to shut down your API entirely. Your current API will not be live forever.

Inconveniently though, your API has clients. If you shut down endpoints, parameters, or entire APIs without properly warning them then they're going to be very unhappy.

How do you shut down your APIs safely, making it at easy as possible for your users?

There are right ways to do this, including two new draft headers being standardized by the exciting new IETF "Building Blocks for HTTP APIs" working group, designed to help with this exact process. Let's take a look.

Make a plan

First up: check if the API in question actually has any clients.

Hopefully you have some API metrics or at least logging somewhere. If you don't, add some! If you do, and you can tell for sure that nobody is using this API anymore, then you win. Turn it off right now, delete the code, skip this article and have a well-deserved nap.

The next question, if you're not napping, is to ask yourself whether there's an alternative to shutting down this API. Everything you turn off will break somebody's code and take their time to fix it. It's good for the health of your client ecosystem and the web as a whole if APIs keep working.

In many cases, old APIs can be translated internally, to transparently transform requests into calls to a new API instead, without maintaining two completely independent versions. This is a fundamental part of the API versioning approach at Stripe who include transformations with all API changes to ensure that requests for incompatible old versions continue to work as before, automatically translating the requests and responses to use the newer code as required.

Translation like this isn't always possible, and doing so forever can entail significant extra complexity, but if you can do it, this can provide a valuable stability for your users, and avoid a lot the work required for deprecation or old version maintenance.

However, if this service/endpoint/parameter is in use in production, and it's not practical to keep supporting it, it's got to go.

To do that, you need a plan. There's three key questions to ask first:

What do you expect clients using this to do? Common answers include:
- Update to a newer still-supported version of the same thing.
- Use some other substitute endpoint/parameter/service instead.
- Use a different service, they're on their own, you don't care.
When should they start migrating away from this API? Is your proposed replacement ready to use today?
What's the deadline? I.e. when will this API stop working completely? (If you're not totally sure yet, you can delay this answer for a little bit).

Once you've got a plan, it's time to tell people about it.

Communicate

First up: tell the humans.

Email your mailing list, post it on Twitter, update your API specification if you have them (e.g. OpenAPI has a deprecated field on operations and parameters), and highlight this loudly in the relevant documentation online.

You should include all the info above: what they should do instead, when you recommend they start migrating, and the deadline when they must migrate (if you have one).

Once you've told the humans, it's time to tell the computers. This is where the new IETF headers come in.

The Deprecation Header

The Deprecation header tells clients that the requested resource still works as before, but is no longer recommended. You can state that very simply with a single HTTP header:

Deprecation: true

Alternatively, you can provide a date. This date tells users when they should start migrating elsewhere. This can be in the past (if they should start migrating immediately) or the future (usually meaning that the thing they should migrate to isn't ready yet). Like so:

Deprecation: Thu, 21 Jan 2021 23:59:59 GMT

If you're deprecating the whole endpoint or service, you can just return this with every response. If you're deprecating a specific feature, perhaps a parameter, request method, or a certain field in the request body, then you want to just return this in requests when that feature is used.

To give the client more information, you can use Link HTTP response headers to link to endpoints or human-readable documentation elsewhere. You can include more than one of these in combination in the same Link header, by just comma-separating them (we'll see a full example later). The spec defines 4 links related to deprecation:

Deprecation links

You can link to a human-readable description of the deprecation like so:

Link: ; rel="deprecation"; type="text/html"

This is the main way to tell your users what's going on, and what they should do about it. You almost always want to use this! If you don't have the full details and a final shutdown date yet, then even a placeholder saying that will be helpful. In that case, don't forget to let users subscribe for updates, with a mailing list or RSS or similar, so they can hear about the full plan once it's ready.

Latest-Version links

If you want clients to move to the latest version of the same endpoint of your API, use this to point them there, like so:

Link: ; rel="latest-version"

Successor-Version links

If you have multiple versions of your API available, it's usually nicer to migrate one version forwards at a time, rather than jumping straight from the oldest now-deprecated version to the latest. To help with this, you can link to the next version of the deprecated endpoint, not just the latest, like so:

Link: ; rel="successor-version"

Alternate links

If there's no new equivalent version of this API, and users should migrate to a totally different resource that might be a good substitute, you can use alternate links to indicate that:

Link: ; rel="alternate"

The Sunset Header

Once you know when the API is going to shutdown entirely, you should add a Sunset header.

The Sunset header tells clients when this will stop working. It's a hard deadline: API clients must move elsewhere before this date, and you promise not to break anything until then.

You must provide a date here, and it should be in the future. If it's in the past, that's OK though: at that point you're effectively saying "this could turn off at any moment you need to stop using it immediately". It looks like this:

Sunset: Tue, 20 Jul 2021 23:59:59 GMT

This is super simple, and can be used for more than just API shutdowns: you can use it to signal HTTP redirects that will come in the future for URL migrations, or to indicate limited lifetimes of certain URLs (for content that's temporary by nature, or for regulatory reasons like data retention policies on certain resources). All it says is "this endpoint may stop doing what you expect after this date, be ready".

Sunset links

This spec also provides a sunset link relationship. This is designed to link to more information about your plan for shutting down this specific endpoint (probably the same documentation as your deprecation link, if you have one) or about the general sunset policy for your service. Like so:

Link: ;rel="sunset";type="text/html"

This is a good place to point out that a general sunset policy is a very useful thing! A sunset policy tells clients when you shut down endpoints (e.g. 1 year after a replacement goes live) how users should ensure they hear about this (mailing lists, status pages, HTTP headers, you name it), what they should usually do about it (update, check the docs, follow Link headers).

Adding one doesn't help much with doing a deprecation right now, but if you'd published one a year ago, your clients would be ready already. The second best time to publish a sunset/deprecation policy is now. Might be worth considering if you're writing deprecation docs anyway.

All together

These parts are designed to work nicely together. For example, to indicate that an API was deprecated recently, will be turned off in 6 months, link to the documentation, and provide a direct link to the next version, you should include headers like this in the response:

Deprecation: Thu, 21 Jan 2021 23:59:59 GMT
Sunset: Tue, 20 Jul 2021 23:59:59 GMT
Link: ; rel="successor-version",
    ; rel="deprecation"

Progressive shutdowns

Once all that's in place, and your sunset deadline has passed, you're good to go.

That doesn't mean you need to immediately kill the API completely though. Progressive shutdowns can help ensure that any clients still using this API get a last-chance warning before it disappears completely. GitHub did this when removing some crypto support in 2018: first they disabled it for one hour, then reenabled it, then they disabled it permanently two weeks later.

There's other tricks too: Android added increasing delays to deprecated native APIs in 2015, eventually going up to a full 16 second wait, before finally turning off the API entirely. These progressive shutdowns provide a little flexibility for clients who miss your deadline, and may help clients who haven't noticed the deprecation spot and deal with the issue before the API turns off completely.

Flip the switch

Either way, once you've done the best you can to communicate the shutdown, it's time to turn off the endpoint/feature/entire service, delete the code, and finally go take that nap.

Doing deprecations and shutdowns carefully like this makes it as clear as possible to your clients how they can depend on your API, when they need to take action, and what they need to do. These kind of changes can be a big deal, and this information is important!

These new draft headers allow us to communicate not only to humans, but also to expose this information to automated systems. As these headers become widespread, I'm really excited to start seeing more tooling building on top of them. Generic HTTP clients can log useful warnings automatically based on this data, API generators themselves can handle more and more of this for you based on API specifications, and HTTP debuggers like HTTP Toolkit can highlight usages of deprecated endpoints for you in intercepted live traffic. It's an exciting time to start turning things off!

It is important to note that these headers are draft HTTP specifications. It's possible they may change before they're finalized. That said, they've been through a few rounds of revisions already, it's fairly unlikely they'll change dramatically from here, and it's time to start testing them in the wild.

This does mean there's still time for feedback though! If you have thoughts on how this works and how it could work better, get in touch with the "Building Blocks for HTTP APIs" working group. You can email the mailing list at httpapi@ietf.org, or scroll the previous mailing list discussions here.

Debugging, integrating or building HTTP APIs? Intercept, inspect & mock HTTP from anything in one click with HTTP Toolkit.

Mining your CLI history for good git aliases

Mon, 30 Nov 2020 13:35:00 GMT

If you use the command-line all day, CLI improvements can add a huge boost to your workflow. One of the simplest ways to improve things is to make your most used commands easier & faster to type, by creating aliases.

But which aliases? Which commands are most important for your usage? You can probably guess a couple, but it's hard to know for sure, and there's plenty you'll miss. Fortunately your CLI history already has all the answers, if you just know how to ask.

As a software developer I'm going to focus on git aliases here, but this applies equally well to any command-line tools you use heavily. I'm also assuming a bash-compatible shell, but the same concept should translate elsewhere easily enough.

The first step is to use history to do some digging and find out what commands you run most frequently. history prints every line you've run recently in your shell, in chronological order. That gives us the data we need, and with a little bash-fu we can start to get some answers.

First, what're your most run git commands?

history -n | grep git | sort | uniq -c | sort -k1,1nr -k2

That takes your history, filters for git commands, sorts them alphabetically, counts the repeated lines, and then sorts by the repeat count.

You can add a | head -n X too, if you'd like to see just the top X results. For me, with a few weeks history on a new laptop, that looks like:

    543 git status
    272 git add -p
    214 git tree
     71 git diff
     55 git commit --amend
     53 git push origin master
     32 git checkout -p
     30 git reset
     27 git stash pop
     26 git stash

That tells you a bunch about my git workflow already! These are all common commands I'm using frequently, and commands I should very seriously consider aliasing.

It doesn't tell the whole story though. How come git commit --amend is so high up, but git commit doesn't appear at all?

That's because for many commits I run git commit -m "..." to commit and pass a message inline, so each of those commands is treated as unique, and won't appear in this top list. We need to get a little smarter.

We can catch cases like that too, by limiting the input we consider for uniqueness. Like so:

history -n | grep git | cut -d' ' -f -3 | sort | uniq -c | sort -k1,1nr -k2

Here I've added cut -d' ' -f -3, which splits each line by spaces, and includes only the first 3 parts (e.g. git push origin master becomes git push origin). This isn't perfect, but it does let us find command prefixes of a given length.

With that, my results become:

    543 git status
    299 git add -p
    214 git tree
    199 git commit -m
    117 git push origin
     71 git diff
     60 git commit --amend
     34 git checkout -p
     30 git reset
     29 git diff --cached

We can now see that I'm committing a lot too, but with messages, I'm pushing with many other branches, and I'm diffing my cached (already added) changes often, but usually with an argument ('show me what I've just added to the tests').

Fiddle around with this a little, try a few different lengths of prefix, and you'll quickly find a set of commands that stand out with frequent use patterns, with or without extra arguments.

From there, it's alias time. I could make these aliases within git, but I'd actually prefer to do it at the shell level, so I can shorten them further (not just to git x but gx).

In my case, I'm doing that by adding the below to my .bashrc:

alias gs=git status
alias gap=git add -p
alias gt=git tree

alias gc=git commit
alias gcm=git commit -m
alias gca=git commit --amend

alias gpo=git push origin
alias gd=git diff
alias gdc=git diff --cached

(Don't forget to check these don't conflict with anything else you use on your machine!)

These aliases can also be a very convenient place to add any extra arguments you often want but don't always remember, like -w for diff (ignore whitespace changes) or -v for git commit (include the diff contents in the commit template, so you can see it while you write your message).

It's a quick trick, but just a little bash magic can tell you a lot about your working habits, and shine a useful light on ways you can make your life easier. Give it a go, and let me know what you think on Twitter.

Want to debug or test HTTP(S) from command line tools, scripts or backend servers? Try out HTTP Toolkit

How do you know what's gone wrong when your API request fails?

Tue, 24 Nov 2020 13:35:00 GMT

When an API request doesn't work, hopefully the client receives a sensible HTTP error status, like 409 or 500, which is a good start. Unfortunately though, whilst 400 Bad Request might be enough to know who's at fault, it's rarely enough information to understand or fix the actual problem.

Many APIs will give you more details in the response body, but sadly each with their own custom style, varying between APIs or even between individual endpoints, requiring custom logic or human intervention to understand.

This is not inevitable. Suspend disbelief with me for a second. Imagine a better world, where instead every API returns errors in the same standard format.

We could have consistent identifiers to recognize types of errors, and clear descriptions and metadata easily available, everywhere. Your generic HTTP client could provide fine-grained details for any error automatically, your client error handling could easily & reliably differentiate specific errors you care about, and you could handle common errors across many different APIs with one set of shared logic.

RFC 7807 from the IETF is a proposed standard aiming to do exactly this, by defining a standard format for HTTP API error responses. It's seeing real-world usage already, it's easy to start supporting in existing APIs and clients, and it's well worth a look for everybody who builds or consumes HTTP APIs.

Why is a standard error format useful?

(Please don't do this)

Let's step back a little. One key feature of HTTP is the use of standard response status codes, like 200 or 404. When used correctly, these ensure that clients can automatically understand the overall status of a response, and take appropriate action based on that.

Status codes are especially great for error handling. Rather than requiring custom rules to parse & interpret every response everywhere, almost all standard HTTP clients will throw an error automatically for you when a request receives an unexpected 500 status, and this ensures that unexpected errors get reliably reported and can be handled everywhere easily.

This is great, but it's very limited.

In practice, an HTTP 400 response might mean any of the below:

Your request is in the wrong format, and couldn't be parsed
Your request was unexpectedly empty, or missing some required parameters
Your request was valid but still ambiguous, so couldn't be handled
Your request was valid, but due to a server bug the server thinks it wasn't
Your request was valid, but asked for something totally impossible
Your request was initiated, but the server rejected a parameter value you provided
Your request was initiated, but the server rejected every parameter value you provided
Your request was initiated, but the card details included were rejected by your bank
Your request completed a purchase, but some other part of your request was rejected at a later stage

Those are all errors, they're all plausibly 400 errors triggered by a 'bad' request, but they're very different.

Status codes help differentiate error & success states, but don't go much further. Because of this, HTTP client libraries can't include any kind of useful details in thrown errors, and every API client has to write custom handling to parse each failing response and work out the possible causes and next steps for itself.

Wouldn't it be nice if the exception message thrown automatically by a failing HTTP request was Credit card number is not valid, rather than just HTTP Error: 400 Bad Request?

With a standard format for errors, each of the errors above could have their own unique identifier, and include standard descriptions and links to more details. Given that:

Generic tools could parse and interpret error details for you, all without knowing anything about the API in advance.
APIs could more safely evolve error responses, knowing that error type identifiers ensure clients will still consistently recognize errors even if explanation messages change.
Custom API clients could check error types to handle specific cases easily, all in a standard way that could work for every API you use, rather than requiring a from-scratch API wrapper and epic boss fight against the API documentation every. single. time.

What's the proposed error format?

To do this, RFC7807 proposes a set of standard fields for returning errors, and two content types to format this as either JSON or XML.

The format looks like this:

{
  "type": "https://example.com/probs/out-of-credit",
  "title": "You do not have enough credit.",
  "detail": "Your current balance is 30, but that costs 50.",
  "instance": "/account/12345/transactions/abc"
}

or equivalently, for XML:



    https://example.com/probs/out-of-credit
    You do not have enough credit.
    Your current balance is 30, but that costs 50.
    /account/12345/transactions/abc

These RFC defines two new corresponding content types for these: application/problem+json or application/problem+xml. HTTP responses that return an error should include the appropriate content type in their Content-Type response header, and clients can check that header to confirm the format.

This example includes a few of the standardized fields defined by the spec. The full list is:

type - a URI that identifies the type of error. Loading the URI in a browser should lead to documentation for the error, but that's not strictly required. This field can be used to recognize classes of error. In future, in theory, sites could even share standardized error URIs for common cases to allow generic clients to detect them automatically.
title - a short human-readable summary of the error. This is explicitly advisory, and clients must use type as the primary way to recognize types of API error.
detail - a longer human-readable explanation with the full error details.
status - the HTTP status code used by the error. This must match the real status, but can be included here in the body for easy reference.
instance - a URI that identifies this specific failure instance. This can act as an id for this occurrence of the error, and/or a link to more detail on the specific failure, e.g. a page showing the details of a failed credit card transaction.

All of these fields are optional (although type is highly recommended). The content types are allowed to include other data freely, as long as they don't conflict with these fields, so you can add your own error metadata here too, and include any other data you'd like. Both instance and type URIs can be either absolute or relative.

The idea is that:

APIs can easily indicate that they're following this standard, by returning error responses with the appropriate Content-Type header.
This is a simple set of fields you could easily add on top of most existing error responses, if they're not already present.
Clients can easily advertise support and thereby allow for migration, if necessary, just by including an Accept: application/problem+json (and/or +xml) header in requests.
Client logic can easily recognize these responses, and use them to dramatically improve both generic and per-API HTTP error handling.

How do I start using this?

Right now, this is a proposed standard, so it's not yet widespread, and in theory it may change.

That said, it's already use in many places, including serious things like 5G standards, and there's convenient tools available for most languages and frameworks, including:

Built-in support in ASP.NET
Generic and Express libraries for Node.js
Generic and Spring Web MVC libraries for Java
Generic and Django REST API libraries for Python
Generic, Rails and Sinatra libraries for Ruby
Generic and Symfony libraries for PHP
Libraries for Rust, Go, Scala, Haskell…

So it's pretty widespread already in most major ecosystems, it's here to stay, and it's time for the next step: spreading usage in APIs & clients until we reach a critical mass where most API errors are formatted consistently like this, it becomes a default everywhere, and we can all reap the benefits.

How do we do that?

If you're building or maintaining an API:

Try to return your errors in RFC 7807 format with the appropriate Content-Type response header, if you can.
If you already have an error format, which you need to maintain for compatibility, see if you can add these fields on top, and extend it to match the standard.
If you can't, try detecting support in incoming Accept headers, and using that to switch your error format to the standard where possible.
File bugs with your API framework (like this one) suggesting they move towards standard error formats in future.

If you're consuming an API:

Check error responses for these content types, and improve your error reporting and handling by using the data provided there.
Consider including an Accept header with these content types in your requests, to advertise support and opt into standard errors where they're available.
Complain to APIs you use that don't return errors in this standard format, just as you would for APIs that didn't bother returning the right status codes.

And everybody:

Get involved! This is a spec under the umbrella of the new "Building Blocks for HTTP APIs" working group at the IETF. You can join the mailing list to start reading and getting involved with discussions around this and other specs for possible API standards, from rate limiting to API deprecations.
Spread the word to your colleagues and developer friends, and help make errors a little bit easier to handle for everybody.

Intercepting HTTPS on Android

Thu, 05 Nov 2020 16:45:00 GMT

To intercept, inspect or manipulate HTTPS traffic, you need the HTTPS client to trust you.

If you want to intercept your own HTTPS on Android, perhaps to capture & rewrite traffic from your Android device for debugging or testing, how do you do that?

This isn't theoretical - HTTP Toolkit does exactly this, automatically intercepting HTTPS from real Android devices, for inspection, testing & mocking. To do so, it has to automatically ensure that it's trusted by HTTPS clients on Android devices, without breaking security on those devices completely (it would be a very bad idea to simply turn off certificate validation, for example). Here's a demo:

Let's talk though how HTTPS clients in general manage this kind of trust, see how that works on Android specifically, and then look at how it's possible to get around this and intercept real HTTPS traffic.

How HTTPS trust works

An HTTPS request is an HTTP request, made over a TLS connection. Everything we're going to talk about here is really about TLS - the HTTP within is just normal GET / requests and 200 OK responses.

I'm not going to go into the lowest level details, but it is important to understand the basics of how TLS works. If you are interested in the fine details of TLS, The Illustrated TLS Connection is well worth a look, for a byte-by-byte breakdown of the whole process.

The high-level summary is this:

Every TLS client keeps track of some set of root certificate authorities (root CAs) that it trusts completely.
When any modern TLS client first connects to a server, its initial message includes a Server Name Indication (SNI), telling the server which hostname it's looking for (e.g. example.com). It expects the server's response to include a valid certificate for that hostname.
TLS certificates include a reference to the issuer of the certificate, and a signature proving that the issuer verified the certificate. The issuer's certificate in turn will have its own issuer & signature, creating a chain of certificates, up until a final self-signed root certificate.
The client must decide if it trusts the server's certificate. It does so by checking the details of the certificate (notably checking the hostname is what was expected), and then examining the issuer of the certificate, and then issuer of the issuer's certificate, and so on and so on until it reaches a certificate that it already trusts (a trusted certificate authority) or running out of issuers and deciding that it doesn't trust the certificate at all.
If the client trusts the certificate, it continues creating the encrypted connection, and then sends and receives data over that connection. If it doesn't trust the certificate, it closes the connection before sending any content, i.e. it never sends any part of its HTTPS request.

In short: every TLS client has a list of root CAs that it trusts, and to successfully receive an HTTPS request, you must be able to present a certificate for the target hostname that includes a trusted root CA somewhere in its chain.

This is a bit simplified and I'm ignoring all sorts of edge cases, but it's enough for our purposes. If you'd like to get into the nitty gritty of how the certificate validation really works, Scott Helme has written up a great guide.

So, given the above, if we want to intercept HTTPS we need to be able to present a certificate issued by a trusted certificate authority. Since nobody reading this has a globally trusted root CA to hand, in practice that means we need to create our own CA, and ensure the TLS client (in this case, Android's HTTPS clients) already trusts that CA, before we can get started.

How do Android HTTPS clients decide who they trust?

Android Certificate Stores

Each HTTPS or TLS client on Android will check certificates against the CAs in some certificate store.

There's at least 3 types of Android CA certificate store:

The OS has a 'system' certificate store, traditionally at /system/etc/security/cacerts/. This is prepopulated on the device at install time, it's impossible to add certificates to it without root access, and is used as the default list of trusted CA certificates by most apps. In practice, this store defines which CA certificate most apps on your phone will trust.

In Android 14 system CA certificates were moved to /apex/com.android.conscrypt/cacerts (though the default system path above still exists too) so that they're primarily loaded from the Conscrypt system module instead.
The OS also has a 'user' certificate store, usually at /data/misc/user/0/cacerts-added/, containing trusted CA certificates that were manually installed by the user of the device. Installing one of these certificates requires accepting quite a few warnings, and became even more difficult in Android 11.

Apps targeting Android API level < 24, i.e. before Android 7, or applications that specifically opt in will trust CA certificates in this store. Most apps don't, but this is enabled on a few apps where it's widely useful (notably Chrome, for now) and it's easy for developers to enable for testing with a few lines of XML.
Lastly, each application can include its own CA certificates, embedding its own short list of trusted certificate authorities internally, and refusing to trust HTTPS communication with certificates signed by anybody else. Nowadays this is fairly uncommon, except for apps that are especially security conscious (banking) or very high-profile (facebook), mostly because it's complicated and the changes in Android 7 to untrust the user store make this kind of pinning unnecessary.

If you want to intercept HTTPS traffic from an app, you need to ensure your CA certificate is trusted in the app's certificate store of choice. How can you do that?

How to intercept Android HTTPS

To intercept HTTPS, you first need the TLS connections to come to you. HTTP Toolkit runs as a desktop app on your computer, acting as an HTTP(S) proxy, and does this with an Android VPN app on the device that redirects packets to that proxy. I've written quite a bit of detail about that over here, and it's fairly easy to do if you either use the VPN APIs or configure an HTTPS proxy, so let's take that as a given.

Once you have TLS connections going to our server, you need to be able to respond to the initial client handshake with a certificate that Android will trust. Typically you'll generate a self-signed CA certificate when setting up interception, and then use that to generate TLS certificates for incoming connections, generating a fresh certificate for each requested hostname.

To make that work, you need to make your Android device's HTTPS clients trust your locally generated CA.

There's two big cases here:

Non-rooted production devices. Normal phones. This includes most phones sold and used day to day, and official 'Google Play' Android emulators.
Rooted devices. More specifically: devices where root access is available via ADB (not that root access is necessarily available to apps on the device). This includes normal phones that have been manually rooted, but also both the 'Google APIs' and AOSP official Google emulators, and most other Android emulators like Genymotion.

In general, less than 1% (according to some very dubious guesstimates) of typical user's devices are rooted.

However for Android developers, testers, and security researchers, that number runs far higher. Within HTTP Toolkit's user base for example, it looks closer to 30% (whilst I don't know for sure, but I suspect that emulators make up a large percentage of that).

Injecting CA certificates into rooted devices

This is the fun case. If you have root, how do you make apps trust your CA certificate? It turns out that even with root it's not quite as easy as it could be, but it's definitely possible to inject system certificates, so that almost all apps trust your CA by default.

There's a couple of challenges:

Even as root, /system is not writable by default
Making /system writable on emulators is only possible if the emulator is always started with an extra command line argument, and so requires restarting the emulator if that's not already set. To make this worse, it's not possible to set custom command line arguments in Android Studio, making this very inconvenient for normal use.
On Android 14+, you actually need to write to the Conscrypt APEX module at /apex/com.android.conscrypt/cacerts instead of the /system path, and this is fully immutable and uses per-process mount namespacing, so that it's independently mounted within every running process.
Even if you write a valid CA certificate to the right place, it won't be recognized. You need to ensure all the permissions & SELinux context labels are set correctly before Android will trust files in that directory.

To handle all this, as root, HTTP Toolkit:

Pushes the HTTP Toolkit CA certificate to the device over ADB.
Copies all system certificates out of /system/etc/security/cacerts/ to a temporary directory.
Mounts a tmpfs in-memory filesystem on top of /system/etc/security/cacerts/. This effectively places a fresh empty filesystem that is writable over the top of a small part of /system.
Moves the copied system certificates back into that mount.
Moves the HTTP Toolkit CA certificate into that mount too.
Updates the permissions to 644 & sets the system_file SELinux label on everything in the temporary mount, so it all looks like legitimate Android system files.
Checks if /apex/com.android.conscrypt/cacerts is present, and if so it enters the mount namespace (with nsenter) of all Zygote processes (which launch apps) and every running app, to bind mount the system certificate path over that APEX path (if you're interested, I've written a more detailed article about the full Android 14 CA certificate injection process).

This is all open source of course, and the full script to do this is here: httptoolkit-server:adb-commands.ts#L256-L361.

If you have a CA certificate, you can do this for yourself on any device with root access, to temporarily add new CAs that'll be trusted like any other CA prebundled on the device.

If you are doing this for yourself though, be careful around permissions, as the default for ADB-pushed files is very relaxed. If the CA you inject or the copied system certificates are globally writable, it'd be theoretically possible for another app on the device to change or add a CA during this process, and sneakily get visibility into all HTTPS traffic on the device for itself without you realizing or granting it root access.

All put together, this injects a system certificate without needing emulator startup arguments, and works 100% automatically & immediately, without even needing to reboot. As a nice bonus the tmpfs & bind mounts disappear on reboot, so everything is cleaned up automatically afterwards, and you only trust the inject CA temporarily (wherever possible, it's always a good idea to limit and/or avoid global developer CAs).

Intercepting HTTPS on non-rooted devices

If you don't have root access, you can't do this. Instead, the best you can do is to install the certificate into the user store. To do so:

If you're setting the device up manually:
- Download the certificate onto your device.
- Go to "Encryption & Credentials" in your device security settings.
- Select "Install a certificate", then "CA Certificate".
- Open the downloaded certificate, and follow the confirmation prompts.
If you're automating/scripting this:
- On Android up to and including Android 10, you can use the KeyChain.createInstallIntent() to prompt users to trust your CA certificate in your app. There'll be some warnings there, and they'll need to set or confirm the device pin to do so, but it's very straightforward. You can see HTTP Toolkit's code in httptoolkit-android:MainActivity.kt#L603-L606.
- From Android 11 onwards, prompting CA certificate installation is blocked, so you're in trouble. You can't launch the prompt to trust a CA directly, and it must be installed in the settings completely manually. If you do try to use the createInstallIntent() API to install the certificate, it just shows "Can't install CA certificates: CA certificates can put your privacy at risk and must be installed in Settings".
  
  If you download the CA certificate to the device though, it's easy enough to explain the process to users, and you can see how HTTP Toolkit does that in httptoolkit-android:MainActivity.kt#L615-L661.

With that done you can intercept HTTPS from Chrome, and other Chromium-based browsers, and you can intercept traffic from apps that explicitly opt in.

If you're debugging your own app that's fine, since it's just a few lines of XML to do so:

That XML trusts the user CA certificates installed on the device, if saved as an XML resources like network_security_config.xml and referenced with android:networkSecurityConfig="@xml/network_security_config" in the element of your app manifest.

With that and a certificate in your user store, you're done! But only if you're trying to intercept Chrome or your own apps…

Intercepting HTTPS from a 3rd party app on non-rooted devices

One last case then: what if you have a non-rooted phone, and you want to intercept HTTPS from an app that doesn't trust the user CA store, and which you can't easily edit the source code for yourself? For example, if you want to inspect HTTPS traffic from somebody else's app, or from your own existing production builds?

This is a tricky case on Android nowadays, but still it's often possible. You'll still need to edit the app, but it turns out you can do so without directly rebuilding from source.

First, you need to download the APK for the app. APKs aren't available for direct download from Google Play, but they are often available in various other alternate sites, with ApkPure.com being the most well known. You can search for most Google Play apps there to download an APK and get started.

Once you have an APK, you need to:

Decode & unpack the contents.
Patch the network config within to trust the user certificate store.
Repack an APK from that patched contents.
Sign the APK with a valid certificate so it can be installed on the device.

That can be quite complicated, but fortunately there's a tool called apk-mitm that can do all of this for you! In addition, it strips pinned certificates and can even automatically patch apps using the newer Android App Bundle format.

If you have Node.js (10+) & Java (8+) installed, installing and using this just requires:

$ npx apk-mitm ./downloaded-app.apk

That should complete the above steps and give you a patched APK. If you've already trusted your CA certificate in your device's user store, as in the previous section, just install the patched app with adb install ./patched-app.apk and you're away.

Hopefully that's a good intro into managing HTTPS trust on Android, and using & abusing it to intercept, inspect and rewrite HTTPS traffic.

Want to see this in action and see exactly what HTTPS your apps and device are sending? Give HTTP Toolkit a go now.

Want to know more about how this all works? HTTP Toolkit is 100% open-source, so feel free to check out HTTP Toolkit on GitHub, and do get in touch if you have any questions or feedback.

Migrating a JS project from Travis to GitHub Actions

Tue, 27 Oct 2020 11:45:00 GMT

Travis has been the most popular place to build open-source code for a long time, but the world is moving on. GitHub Actions is modern, tightly integrated with the most popular code hosting platform in the world, flexible, fast, and free (for public repos).

Travis has been popular for years though, there's still a lot of projects being built there, including many of HTTP Toolkit's own repos.

Last week, I decided to bite the bullet, and start migrating. Travis was having a particularly bad build backlog day, and HTTP Toolkit is entirely open source on GitHub already, so it's super convenient. I've been looking longingly at GitHub Actions builds on other projects for a little while, and I'd already seen lots of useful extensions in the marketplace of drop-in action steps that'd make my life much easier.

Unfortunately, I knew very little about GitHub actions, and I already had some Travis configuration that worked. In this post, I want to share how I converted my JavaScript (well, TypeScript) build from Travis to GitHub, so you can do the same.

The Goal

I decided to start with the simplest Travis setup I had: the HTTP Toolkit UI repo.

Here's the previous travis.yml file:

dist: xenial
sudo: required
language: node_js
node_js:
    - '14'
install:
    - npm ci
services:
    - xvfb
before_script:
    - sudo chown root /opt/google/chrome/chrome-sandbox
    - sudo chmod 4755 /opt/google/chrome/chrome-sandbox
script:
    - npm test
addons:
    chrome: stable

There's a few notable things here:

I want to build with a specific node version.
I need Chrome & XVFB installed for testing with Puppeteer & Karma.
There's some existing workarounds (before_script) for Travis.yml in here.
The build itself is just npm ci to install dependencies and then npm test.
Although not shown here, some of the npm dependencies include native node extensions, and need a working native build environment.

One other feature I'd really like, and which I'd strongly recommend for everybody, is the option to run an equivalent CI environment locally.

Yes, you can install and run tests on my machine normally, but especially with more complicated builds you'll quickly discover that that isn't a perfect match for the cloud build environment, and you'll occasionally hit remote failures that don't reproduce in your own environment. Slightly different versions of Chrome or Node, leftover git-ignored files and build output, and other environment-specific details can cause havoc.

Being able to quickly reproduce the exact cloud build environment locally makes debugging those issues much less frustrating!

Getting Started

We'll start with GitHub's JavaScript action getting started guide.

That summarizes the options available, and with a little wrangling that quickly gets us to a basic workflow (which I've saved as .github/workflows/ci.yml) matching the essential steps of the Travis config:

name: CI
on: push
jobs:
  build:
    name: Build & test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      # Install Node 14
      - uses: actions/setup-node@v1
        with:
          node-version: 14

      # Install & build & test:
      - run: npm ci
      - run: npm test

Very clear and easy: every time code is pushed, check it out and use node 14 to install dependencies & run the tests.

Note that I've skipped the Chrome & XVFB steps here entirely - we don't need them. The GitHub base image (ubuntu-latest) includes Chrome set up for testing and a enough of a native build environment that you can immediately install native modules and get going. Great! You can see the full standard list of what's available in each image here: https://docs.github.com/en/free-pro-team@latest/actions/reference/specifications-for-github-hosted-runners#supported-software.

You may find there's one small code change required though: you need to pass no-sandbox as an option to Chrome, if you're not already using it. This ensures Chrome runs happily in containerized environments like this (I think the chrome-sandbox steps in the Travis config were actually old workarounds for this on Travis).

In my Karma config, using headless Chrome, that looks like this:

browsers: ['ChromeHeadlessNoSandbox'],
customLaunchers: {
    ChromeHeadlessNoSandbox: {
        base: 'ChromeHeadless',
        flags: ['--no-sandbox']
    }
}

For Puppeteer, my browser launch code looks like this:

puppeteer.launch({
    headless: true,
    args: ['--no-sandbox']
}),

Very easy. A quick git push and you'll see your job start running on GitHub's cloud runners straight away.

But we also wanted reproducible local builds too…

Build Like a Local

Being able to locally reproduce your CI builds is essential for a healthy CI workflow, and with GitHub Actions it's already very easy.

To run builds locally, we can use act. GitHub Actions is built on Docker, starting images specified and injecting configuration into containers to run your build. Act does the exact same thing: parsing your workflow and automating Docker on your local machine to build in the exact same way.

To try this out:

Install Docker, if you don't have it already
Install act
Run act

That will automatically find .github/workflows/*.yml files in your current directory, and attempt to run them. Unfortunately, in my project that doesn't work so well:

| > registry-js@1.12.0 install /github/workspace/node_modules/registry-js
| > prebuild-install || node-gyp rebuild
|
| prebuild-install WARN install No prebuilt binaries found (target=14.14.0 runtime=node arch=x64 libc= platform=linux)
| gyp ERR! find Python
| gyp ERR! find Python Python is not set from command line or npm configuration
| gyp ERR! find Python Python is not set from environment variable PYTHON
| gyp ERR! find Python checking if "python" can be used
| gyp ERR! find Python - "python" is not in PATH or produced an error
| gyp ERR! find Python checking if "python2" can be used
| gyp ERR! find Python - "python2" is not in PATH or produced an error
| gyp ERR! find Python checking if "python3" can be used
| gyp ERR! find Python - "python3" is not in PATH or produced an error
| gyp ERR! find Python
| gyp ERR! find Python **********************************************************
| gyp ERR! find Python You need to install the latest version of Python.
| gyp ERR! find Python Node-gyp should be able to find and use Python. If not,
| gyp ERR! find Python you can try one of the following options:
| gyp ERR! find Python - Use the switch --python="/path/to/pythonexecutable"
| gyp ERR! find Python   (accepted by both node-gyp and npm)
| gyp ERR! find Python - Set the environment variable PYTHON
| gyp ERR! find Python - Set the npm configuration variable python:
| gyp ERR! find Python   npm config set python "/path/to/pythonexecutable"
| gyp ERR! find Python For more information consult the documentation at:
| gyp ERR! find Python https://github.com/nodejs/node-gyp#installation
| gyp ERR! find Python **********************************************************
| gyp ERR! find Python
| gyp ERR! configure error
| gyp ERR! stack Error: Could not find any Python installation to use
| gyp ERR! stack     at PythonFinder.fail (/opt/hostedtoolcache/node/14.14.0/x64/lib/node_modules/npm/node_modules/node-gyp/lib/find-python.js:307:47)
| gyp ERR! stack     at PythonFinder.runChecks (/opt/hostedtoolcache/node/14.14.0/x64/lib/node_modules/npm/node_modules/node-gyp/lib/find-python.js:136:21)
| gyp ERR! stack     at PythonFinder. (/opt/hostedtoolcache/node/14.14.0/x64/lib/node_modules/npm/node_modules/node-gyp/lib/find-python.js:179:16)
| gyp ERR! stack     at PythonFinder.execFileCallback (/opt/hostedtoolcache/node/14.14.0/x64/lib/node_modules/npm/node_modules/node-gyp/lib/find-python.js:271:16)
| gyp ERR! stack     at exithandler (child_process.js:315:5)
| gyp ERR! stack     at ChildProcess.errorhandler (child_process.js:327:5)
| gyp ERR! stack     at ChildProcess.emit (events.js:315:20)
| gyp ERR! stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:275:12)
| gyp ERR! stack     at onErrorNT (internal/child_process.js:465:16)
| gyp ERR! stack     at processTicksAndRejections (internal/process/task_queues.js:80:21)
| gyp ERR! System Linux 4.15.0-121-generic
| gyp ERR! command "/opt/hostedtoolcache/node/14.14.0/x64/bin/node" "/opt/hostedtoolcache/node/14.14.0/x64/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
| gyp ERR! cwd /github/workspace/node_modules/registry-js
| gyp ERR! node -v v14.14.0
| gyp ERR! node-gyp -v v5.1.0
| gyp ERR! not ok
| npm ERR! code ELIFECYCLE
| npm ERR! errno 1
| npm ERR! registry-js@1.12.0 install: `prebuild-install || node-gyp rebuild`
| npm ERR! Exit status 1

Whilst act runs build steps just like GitHub Actions does, it doesn't use the exact same base image (in part because the same image naively built locally would be 50GB!). There's a couple of options:

If you're only using basic features (normal node modules, and running node scripts), act will work out of the box and you're all good.
You can use act's own full-fat image, which includes all the standard GitHub tools in a somewhat smaller image size. This is opt-in, because it's still an up-front 6GB download (and then 18GB locally, once it's uncompressed) but it'll immediately give you everything you need from the GitHub Actions cloud environment.

To use this, you just need to map ubuntu-latest (the GitHub base runner) to the published image, with:
```
act -P ubuntu-latest=nektos/act-environments-ubuntu:18.04
```
If you're familiar with Docker, you can build your own base image including just the extra tools you need. This gives you a convenient matching environment (within the selected subset of tools) with none of the disk space & download hassle.

This is what I've done for HTTP Toolkit. The dockerfile directly runs the setup scripts from the act base image repo (in turn generated from GitHub's own setup scripts), but only runs the ones I care about: build-essentials (for native builds) and Chrome. That shrinks it down to a mere 300MB download, and below 1GB on disk.

You can do this for yourself, customizing your own image, or if you need the exact same customizations you can use the HTTP Toolkit image with:
```
act -P ubuntu-latest=httptoolkit/act-build-base
```
It is possible with this approach that your base image could diverge in behaviour from the GitHub runner. You're using the same scripts, for the scripts you include, but if you skip running a script that would affect your build then you could see differences here. To guarantee reproducibility, you can fix this by setting container: httptoolkit/act-build-base (for the HTTP Toolkit image) in the job in your GitHub workflow, thereby ensuring you use the exact same image in both places.

If you do need one of these non-default base image options, you don't have to specify the -P argument every time. You can create an .actrc file in the root of your project that sets your default arguments (HTTP Toolkit UI's is here).

With that done, we can reproduce remote GitHub Actions builds locally any time with just a quick act!

Going Further

That should give you enough to get most simple JavaScript or Node projects set up with GitHub Actions, locally and remotely. If you need a full example, feel free to take a look at the HTTP Toolkit UI repo. For me, this has dramatically sped up builds & CI feedback, mainly by them starting much faster, but also seeming to knock about 10% off the runtime itself.

Now the real fun begins though, as you can begin to extend this setup. Some more bonus steps you might want to consider:

Set up caching, to speed up slow npm install steps, with actions/cache. GitHub even have a ready-to-use example for npm.
Store build artifacts, as output attached to the workflow, using actions/upload-artifact.
Create GitHub releases from content automatically, with actions/create-release.
Deploy generated content to GitHub Pages, with peaceiris/actions-gh-pages.
Add a badge to your readme, with a sprinkle of markdown: markdown [![Build Status](https://github.com/$USER/$REPO/workflows/$WORKFLOW/badge.svg)](https://github.com/$USER/$REPO/actions)

Have further questions or suggestions? Get in touch on Twitter or send me a message.

Struggling to debug your code after failing builds, or want to test complicated HTTP interactions locally? Intercept, inspect & mock HTTP from anything with HTTP Toolkit.

How to Debug Any CORS Error

Wed, 07 Oct 2020 14:30:00 GMT

Your request is hitting an error due to CORS. Not all is lost! Most CORS errors are quick & easy to debug and fix, once you understand the basics. Let's sort it out.

You know you're hitting a CORS error when you see error messages like:

Access to fetch at 'https://example.com' from origin 'http://localhost:8000' has been blocked by CORS policy.

No 'Access-Control-Allow-Origin' header is present on the requested resource

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://example.com/

Response to preflight request doesn't pass access control check

The value of the 'Access-Control-Allow-Origin' header in the response must not be the wildcard '*' when the request's credentials mode is 'include'

Method PUT is not allowed by Access-Control-Allow-Methods in preflight response.

Request header field custom is not allowed by Access-Control-Allow-Headers in preflight response.

In each of these cases, you've asked JavaScript running in your page to send a request to a different origin, and at some stage the browser is refusing to do what you want.

What is CORS?

When you include JavaScript in a web page, you're running code on your user's computer, inside their browsing session.

That's a lot of power, and browsers are designed to protect users from the risks of this. CORS is one of these protections, aiming to protect the user and the services they use from two main attacks:

CORS stops you from using the user's existing login session (their cookies and other cached authentication details) when communicating with other servers. JavaScript on your web page shouldn't be able to send requests to the Facebook API using their existing Facebook session. Without CORS, any web page could talk to other servers as you.
CORS stops you from talking to servers that might only be accessible from their machine, but which aren't accessible publicly. Your web page should not be able to send requests to my-intranet-server.local, which might be an internal company server or your home router, and it should not be able to talk to servers that are listening only for localhost requests. Servers like these are often unauthenticated and very trusting, because they aren't connected to the public internet. Without CORS, any web page you visit could access them.

This only applies to cross origin requests, e.g. requests from https://example.com to https://google.com. The protocol, domain, and port all count as part of a URL's origin, but the path does not, so https://example.com/abc and https://example.com/def have the same origin, but http://localhost:123 and http://localhost:456 do not.

CORS protects against the above attacks by requiring the target server to opt into receiving dangerous requests from the source server, and to opt in to allowing pages from other origins to read responses. The Facebook API and your local network servers can accept requests from web pages running on other origins if they want to, but only if they agree.

Why doesn't my CORS work?

Your CORS request is failing because you're sending a request that the target server hasn't agreed to allow.

There's two classes of CORS request:

'Simple' cross-origin requests. There are basic requests that use no unsafe headers, don't stream requests or responses, and only use HEAD, GET or POST methods (with limited safe content types). Any request that's possible here would also be possible by e.g. loading an image or posting a form to the cross-origin request (and we can't stop those, for huge backwards compatibility reasons).

You can always send simple requests, but you might not be allowed to read the response.
'Preflighted' cross-origin requests. These are more complex requests, that aren't easy to send in other ways. A 'preflight' request will be sent to ask the server for permission before sending any of these requests, and if it's rejected, you won't be able to send the request at all.

If the preflight request is successful, the real request is sent, and the final response to that still has to follow the same rules as a 'simple' response for you to be allowed to read it.

When a request is preflighted, before sending the real request the browser sends an OPTIONS request with headers explaining the real request that it wants to send. It expects a response including headers that explicitly allow the real request.

There's three ways that this might hit an error:

You're sending a simple request, which is sent immediately, but the headers on the response don't allow you to read it.
You're sending a preflighted request, and the headers on the preflight response don't allow you to send the real request.
You're sending a preflighted request, the preflight went OK and the request was sent, but the headers on the final response for the real request don't allow you to read it.

The browser error message should show you which is happening for you. You can know if your request is being preflighted by looking for an OPTIONS request that's sent immediately before it.

The rules for the final (after preflight, if applicable) response are:

The response must include a Access-Control-Allow-Origin header, whose value either matches the page's origin or is *. The page's origin is sent in the request in an Origin header.
If the request included credentials (e.g. fetch(url, { credentials: 'include' })) then the response headers must include Access-Control-Allow-Credentials: true, and the Access-Control-Allow-Origin header must match exactly (i.e. * is not allowed).

If the response doesn't follow those rules, then the server hasn't opted in to your request, and you won't be allowed to read the response.

If you're in cases 1 or 3, you must be breaking one of these rules.

The rules for the preflight request are:

The preflight response must include a Access-Control-Allow-Origin header, whose value either matches the page's origin or is *. The page's origin is sent in the preflight request in an Origin header.
If the page wants to send custom headers, then it will include Access-Control-Request-Headers listing the headers in the preflight OPTIONS request, and the server must include a Access-Control-Allow-Headers header that includes all those headers in the response. * can also be used here, but it won't match an Authorization header - that must always be listed explicitly.
If the page wants to use a non-simple HTTP method, it will include Access-Control-Request-Method in the preflight OPTIONS request, and the server must include a Access-Control-Allow-Methods header that includes that method in the response.
If the page wants to send credentials (e.g. fetch(url, { credentials: 'include' })) the response must include a Access-Control-Allow-Credentials: true header, and the Access-Control-Allow-Origin header must match exactly (i.e. * is not allowed).

If your preflight OPTIONS response doesn't follow these rules, then you won't be allowed to send the real request at all.

If you're in case 2, you must be breaking one of these rules.

It's also possible that you're in case 2, but you actually don't want to read the response - you just want to send it. To do that, you'll need to simplify your request such that it's a simple request. You can use { mode: 'no-cors' } on your fetch options to enforce this (but note that this doesn't change the rules, it just enforces that it's a simple request where you can't read the result).

How can I fix my CORS error?

To know exactly why your request is failing, you need to inspect the traffic itself, find where you're breaking the rules above, and then either:

Change the request to make it a simple request
Change the server's response to follow the rules above
If all else fails, proxy the request through your own server on your own origin, so it's not a cross-origin request (proxying avoids the attacks above, because it doesn't let you use the cookies or authentication details from the user's browser, and it requires the target server to be accessible from your source server)

To inspect the traffic, you can use your browser built-in tools, but it's usually easier to use a dedicated HTTP debugger like HTTP Toolkit. Dedicated tools make it much easier to see the data, rather than (for example) Chrome's very cramped and fiddly network tab, and you can also breakpoint responses and edit the headers to test how the browser will handle changes without actually changing your server. Also, some Chrome versions don't show all CORS requests.

Hopefully, once you examine your CORS requests & responses, it's clear where you're breaking the rules above.

If not, try walking through Will It CORS. This is a self-explaining implementation of the CORS rules: you can input step by step what you're trying to do, and it'll tell you what will happen and why, and how you can change it.

There's also a few common mistakes that you should watch out for:

Trying to request content from another origin that isn't explicitly available cross-origin. If its not your server, and it doesn't actively want CORS requests, you're not going to work around most issues: you need to proxy the request, ask the owner to allow it, or do something entirely different.
Always returning * for Access-Control-Allow-Origin, and then trying to send credentials.
Adding CORS headers for preflight OPTIONS requests, but forgetting to also include CORS headers on the final request too.
Unnecessarily sending custom request headers. This will trigger a preflight request. You can often get by just using the CORS-safe request headers instead, or moving request data into the body of your request.
Incorrectnyl caching CORS response headers independent of their origin, by not using Vary: Origin. If you do this then responses for requests from one origin may be cached and returned for later requests from a different origin. That mismatched data can quickly break things.
Trying to access response headers without including an Access-Control-Expose-Headers header. In this case, all headers except the CORS-safe response headers will be unexpectedly undefined, even though they were sent by the server.
Sending cross-origin mixed-content requests (a request from https://... to http://...). These will always be blocked, regardless of the details, as insecure content like this is never allowed on HTTPS origins. There's not much you can do about this, other than changing to use HTTPS on both servers.

That covers the core of CORS, how it can go wrong, and how to fix it. Have more questions? Get in touch on Twitter.

Android 11 tightens restrictions on CA certificates

Thu, 10 Sep 2020 16:30:00 GMT

Your trusted Certificate Authorities (CAs) are the organizations that you trust to guarantee the signatures of your encrypted traffic and content. That's a lot of power, and the list of trusted authorities is dangerous to mess around with. Nonetheless, it's also something that power users might want to configure, for Android testing, for app debugging, for reverse engineering or as part of some enterprise network configurations.

Android has tightly restricted this power for a while, but in Android 11 (released this week) it locks down further, making it impossible for any app, debugging tool or user action to prompt to install a CA certificate, even to the untrusted-by-default user-managed certificate store. The only way to install any CA certificate now is by using a button hidden deep in the settings, on a page that apps cannot link to.

To be clear, carefully managing the trusted CAs on Android devices is important! Adding a CA should not be easy to do by accident or unknowingly. Protecting users from themselves is absolutely necessary here, and it's a hard problem.

That said, there are many legitimate use cases where you want to be able to choose which CAs you trust, and that just got much harder. There's a balance here to manage, and I'm not sure Android has made the right choice.

Let's dig into the details:

How did Android CA certificate management work until now?

Until now, an app could ask a user to trust a CA certificate in the user certificate store (but not the system store), using the KeyChain.createInstallIntent() API method. Similarly, the operating system would offer to trust a CA certificate if one was manually opened on the device from the filesystem.

These certificate trust prompts came with a variety of loud warnings & confirmations, and mandated setup of a device pin or other screen lock before you could complete them, if one wasn't already set. It wasn't possible to do accidentally, and it was hard to trick users into accepting these scary prompts (although probably not impossible).

That only applied to the user certificate store. This store, in case you're not familiar, differs significantly from Android system-wide certificate store, and since Android 7 (Nougat, released in 2016) it's been impossible to install any CA certificates into the system store without fully rooting the device.

The system store is used as the default to verify all certificates - e.g. for your apps' HTTPS connections - and as a normal user it's completely impossible to change the certificates here, and has been for quite some time.

Until now however, you could install to the user certificate store, which apps could individually opt into trusting, but which they don't trust by default.

This was very useful! This allowed developers to opt-into this trust in their local builds to debug traffic, it allowed testers to automatically & easily trust CA certificates so they can mock & verify HTTPS traffic in manual & automated testing, and it was used by a wide variety of debugging tools (including HTTP Toolkit) to easily let developers & testers inspect & rewrite their encrypted HTTPS traffic.

Unfortunately, automating that setup is no longer possible on these devices, and each of these use cases will now require a series of fiddly manual steps that tools can't lead you to or help with.

What's changed?

In Android 11, the certificate installer now checks who asked to install the certificate. If it was launched by anybody other than the system's settings application, the certificate install is refused with an obscure alert message:

Can't install CA certificates CA certificates can put your privacy at risk and must be installed in Settings

This wasn't clearly announced anywhere, as far as I can tell. The only mention in the Android 11 release information is a small side note in the enterprise features changelog, which notes that the createInstallIntent() API no longer works in some cases.

In practice, this change means the certificate install API no longer works, opening certificate files no longer works, and it's impossible to initiate a certificate install even from ADB (the Android debugging tool).

It is still possible to install certificates using the device management API, but only in the special case where your application is a pre-installed OEM app, marked during the device's initial setup as the 'device owner'. If not, you're out of luck.

In Android 11, to install a CA certificate, users need to manually:

Open settings
Go to 'Security'
Go to 'Encryption & Credentials'
Go to 'Install from storage'
Select 'CA Certificate' from the list of types available
Accept a large scary warning
Browse to the certificate file on the device and open it
Confirm the certificate install

Applications and automation tools can send you to the general 'Security' settings page, but no further: from there the user must go alone (fiddly if not impossible with test automation tools).

More inconvenient still: with the existing APIs, the app could provide the certificate bytes directly, reading certificates from their own internal data or storage. Now, because the user must browse to it, the certificate has to be in the shared user-accessible storage on the device. This also risks it being rewritten by other apps on the device before it's trusted, if they have the permissions to write to shared folders (not default, but not uncommon), allowing those apps to sneak their own CA on to unsuspecting users.

While it's still possible to trust your own CAs on rooted devices, Android is also making a parallel drive for hardware attestation as part of SafetyNet on new OS releases & devices, which will make this far harder.

Hardware attestation makes it possible for Android apps to reliably know whether the OS on the device is the original installed by the OEM. Many apps use SafetyNet to block installs and usage on such devices, and that doesn't just apply to secure banking apps: apps from Netflix to Pokemon Go to McDonald's require SafetyNet checks. In a not-so-distant future, these and many other apps will be completely unusable on rooted devices, once hardware attestation becomes standard.

Put together, this is not good. Android's been locking down on this for a while, but it really feels now like they're moving to a world where custom ROMs are cut off from much of the Android ecosystem, and official ROMs are completely locked down and inaccessible even to developers.

What can I do?

First up: add a star on the Android bug I've filed, suggesting an automatable ADB-based option for CA certificate management, for development use cases like these: https://issuetracker.google.com/issues/168169729.

Once you've done that, in the meantime you have a few options:

Accept that you need to manually install CA certificates, and do so/tell your users how to do so.
Use a rooted device or emulator, and trust your certificate in the system store (you might be interested in how HTTP Toolkit does this).
Completely reset the device, preprovision your application (before initial account setup), and configure your application as the device owner with dpm set-device-owner
Enable debugging on the device, connect to it with ADB, and manually inject touch events to automatically walk through the various settings screens.
Avoid using Android 11 entirely.

For now, HTTP Toolkit takes options 1 and 2:

For users using Android < 11, it walks you through the automated setup prompts as before, all very conveniently.
For users using Android 11 on unrooted standard devices, it downloads the certificate to your Downloads folder & tells you how to do manual setup.
For users on emulators and rooted devices, it automatically sets up a system certificate via ADB, transparently handling everything.

Not as smooth as in previous versions, but manageable!

These changes are important for Android to ensure it can protect average users from serious risks and attacks. At the same time though, it's important to balance that against allowing owners of devices freedom to configure those devices for themselves, and against allowing developers and other power users to access potentially dangerous functionality. Hopefully Android can find a path to support both.

Debugging Android apps, and want to inspect, rewrite & mock live traffic? Try out HTTP Toolkit. Hands-free HTTP(S) interception & debugging for Android apps, web browsers, servers, microservices & more.

GraphQL the Simple Way, or: Don't Use Apollo

Wed, 02 Sep 2020 12:00:00 GMT

The fundamentals of GraphQL are remarkably simple. Nonetheless, a busy hype train & rocket-speed ecosystem means that building a GraphQL API in the real world can be a tricky balancing act of piling complex interacting components a mile high, none of which anybody fully understands.

About 90% of this pile is built & heavily promoted by a VC-funded company called Apollo. Apollo describe themselves as a "data graph platform" who've built the self-described "industry-standard GraphQL implementation".

Unfortunately, while I'm sure their platform is great, if you're setting up a fresh GraphQL API you should not start with Apollo. It certainly might be useful later, but on day 1 it's a trap, and you'll make your life simpler and easier if you avoid it entirely.

Let's talk about why that is, what can go wrong, and what you should do instead.

The Problem with Apollo

In practice, "industry-standard GraphQL implementation" means 169 separate npm packages, including:

44 different server packages and apollo-server-* subpackages.
7 different GraphQL 'transport layers', plus a long list of link layer extensions that build on top of this.
8 code generation packages
5 different create-* project setup packages
3 different GraphQL Babel plugins (plus a Relay un-Babel plugin, so you can avoid using Babel for some specific cases).
Much much more…

The Apollo packages required to install the base apollo-server package suggested in their Getting Started guide include the "Apollo Studio" (née Apollo Graph Manager, néeée Apollo Engine) reporting engine, which integrates your server with their cloud service, plus extra protobuf definitions on top of that to reporting to the cloud service with Protobuf. It includes request tracing for their own custom tracing format, multiple different custom caching packages, an abstraction layer that powers the many available transport link layers, an abstraction layer for connecting external data sources…

In total, installing apollo-server installs actually installs 33 direct dependencies, and 179 packages in total, pulling in about 35MB of JavaScript.

Once all put together, by itself this package creates web servers that can't do anything.

If however you also use the (official, non-Apollo) graphql package too, then you can now just about answer complex queries like:

{
  books {
    title
    author
  }
}

To be clear, that graphql package is the official GraphQL JS implementation, which takes a schema, a query, and a resolver (in effect, a data set object), and gives you a result. I.e. it does all the GraphQL heavy lifting required to process a query like this, except the HTTP.

I know I'm treating Apollo very harshly here, and that's not wholly fair. Most of their published packages do have cases where they're genuinely useful, many of the packages I'm counting are deprecated or duplicates (though published and often still well used), and as far as I'm aware everything they've released works perfectly effectively. I certainly don't think that nobody should use Apollo!

I do 100% think that Apollo shouldn't be anybody's GraphQL starting point though, and that nonetheless it's marketed as such.

They clearly want to be the entrance to the ecosystem, and of course they do! Ensuring a nascent fast-growing ecosystem depends on your free tools is a great startup play. They're the first result for "how to set up a graphql server", and either they or GraphQL-Yoga (another package on top of Apollo Server) are the suggested beginner option in most other articles on that page, from Digital Ocean's docs to howtographql.com. This isn't healthy.

If you're setting up a GraphQL API, you don't need all this. Pluggable transport layers and data sources, request tracing, cool GraphQL extensions like @defer and multi-layered caching strategies all have their place, but you don't want that complexity to start with, and they're not a requirement to make your simple API 'production ready'.

It is great that Apollo makes these available to you, but they're features you can add in later, even if you don't start off using Apollo at all. A GraphQL schema is pretty standard, and entirely portable from the standard tools to Apollo (but not always back, if you start using Apollo's own extensions…).

If I seem personally annoyed by this, it's because I am! I was burned by this myself.

HTTP Toolkit's internal APIs (e.g. for defining HTTP mocking rules and querying traffic) use GraphQL throughout. Those APIs started off built on Apollo, because that's the very widely recommended & documented option. The overly complex setup required to do so caused a long stream of serious pain:

Apollo's packages like to move fast and break things, and each often requires specific conflicting graphql peer dependencies, making updates remarkably painful all round.
The base packages include a lot of features and subdependencies, as above, which in turn means a lot of vulnerability reports. Even if vulnerabilities aren't relevant or exploitable, downstream users of my packages very reasonably don't want security warnings, making keeping everything up to date obligatory.
Some Apollo packages are effectively quietly unmaintained, meaning that conflicting dependencies there can block you from upgrading entirely, unless you fork the whole package yourself.
Once you start having multiple interacting packages in your system that use Apollo this gets even worse, as dependent packages need updating in lockstep, or your peer dependency interactions explode, scattering debris for miles.
The packages involved are huge: apollo-server alone installs 35MB of JS, before you even start doing anything (that's v2, which is 2.3x the size of the last Apollo Server v1 release, but hey the upgrade is unavoidable anyway, so who's counting?).
These problems are getting worse. apollo-server v3 is coming soon, with built-in support for GraphQL federation, non-Node backend platforms, and a new plugin API. Don't get me wrong, these features are very cool, but you don't need them all included by default in your starter project!

It's not fun. However there is an alternative:

How to Build a Simple GraphQL Server (without Apollo)

To build a GraphQL API server, you really need just 3 things:

A web server
An executable GraphQL schema (i.e. a schema and a resolver) which can together can answer GraphQL queries
A request handler that can accept GraphQL requests, hand them to the schema, and return the results or errors

I'm assuming you already have a preferred web server (if not, Express is an easy, convenient & reliable choice). The official graphql package can turn a string schema and a resolver object into an executable schema for you.

That leaves the final step, which is easily handled with express-graphql: a simple Express middleware, with just 4 dependencies that handle content negotiation & body parsing. That works for Express or Connect, and there's similar tiny packages available for most other servers.

To set up your GraphQL server, install those packages:

npm install express graphql express-graphql

And then set up a server that uses them:

const express = require('express');
const { graphqlHTTP } = require('express-graphql');
const { buildSchema } = require('graphql');

// Create a server:
const app = express();

// Create a schema and a root resolver:
const schema = buildSchema(`
    type Book {
        title: String!
        author: String!
    }

    type Query {
        books: [Book]
    }
`);

const rootValue = {
    books: [
        {
            title: "The Name of the Wind",
            author: "Patrick Rothfuss",
        },
        {
            title: "The Wise Man's Fear",
            author: "Patrick Rothfuss",
        }
    ]
};

// Use those to handle incoming requests:
app.use(graphqlHTTP({
    schema,
    rootValue
}));

// Start the server:
app.listen(8080, () => console.log("Server started on port 8080"));

Run that and you're done. This is solid, reliable, fast, and good enough for most initial use cases. It's also short, clear, and comparatively tiny: node_modules here is just over 15% of the size of the Apollo equivalent. Running 80% less code is a very good thing.

In addition, you can still add in extra features incrementally later on, to add complexity & power only where you need it.

For example, in my case, I want subscriptions. Mockttp (the internals of HTTP Toolkit's proxy) accepts GraphQL queries over websockets, so it can stream intercepted request details to clients as they come in, with a GraphQL schema like this:

type Subscription {
    requestInitiated: InitiatedRequest!
    requestReceived: Request!
    responseCompleted: Response!
    requestAborted: Request!
    failedTlsRequest: TlsRequest!
    failedClientRequest: ClientError!
}

To add this, I can just expand the basic setup above. To do so, I do actually use a couple of small Apollo modules! Most can be picked and configured independently. For this case, graphql-subscriptions provides a little bit of pubsub logic that works within resolvers, and subscriptions-transport-ws integrates that into Express to handle the websockets themselves. Super helpful

Here's a full example:

const express = require('express');
const { graphqlHTTP } = require('express-graphql');
const { buildSchema, execute, subscribe } = require('graphql');

// Pull in some specific Apollo packages:
const { PubSub } = require('graphql-subscriptions');
const { SubscriptionServer } = require('subscriptions-transport-ws');

// Create a server:
const app = express();

// Create a schema and a root resolver:
const schema = buildSchema(`
    type Book {
        title: String!
        author: String!
    }

    type Query {
        books: [Book]
    }

    type Subscription { # New: subscribe to all the latest books!
        newBooks: Book!
    }
`);

const pubsub = new PubSub();
const rootValue = {
    books: [
        {
            title: "The Name of the Wind",
            author: "Patrick Rothfuss",
        },
        {
            title: "The Wise Man's Fear",
            author: "Patrick Rothfuss",
        }
    ],
    newBooks: () => pubsub.asyncIterator("BOOKS_TOPIC")
};

// Handle incoming HTTP requests as before:
app.use(graphqlHTTP({
    schema,
    rootValue
}));

// Start the server:
const server = app.listen(8080, () => console.log("Server started on port 8080"));

// Handle incoming websocket subscriptions too:
SubscriptionServer.create({ schema, rootValue, execute, subscribe }, {
    server // Listens for 'upgrade' websocket events on the raw server
});

// ...some time later, push updates to subscribers:
pubsub.publish("BOOKS_TOPIC", {
    title: 'The Doors of Stone',
    author: 'Patrick Rothfuss',
});

My point isn't that you need subscriptions in your app, or that everybody should use all these extra packages (quite the opposite).

This does demonstrate how you can extend your setup to progressively use these kinds of features though. Moving from request/response model to also supporting subscriptions is not a trivial change, but even in this case, adding in Apollo extensions is a few simple lines on top of the existing logic here that fits nicely into a standard setup.

You can also extend with non-Apollo tools too. Here we're building primarily around the vanilla GraphQL packages and Express directly, composing Apollo components in separately, rather than basing everything on top of them. That means you could still drop in any other Express middleware or GraphQL tools you like, to add any kind of authentication, caching, logging or other cross-cutting features just using standard non-GraphQL solutions & examples, with no lock-in from the Apollo ecosystem.

Apollo do have a wide selection of interesting & useful packages, and they should be lauded for the effort and contributions they've made to the ecosystem. At the same time though, they're not a neutral actor. Don't assume that the Next Big Thing is the right choice for your project, especially if it calls itself "industry-standard".

Instead, start simple: build a system that you can fully understand & manage, avoid unnecessary complexity, and keep your project lean & flexible for as long as you can.

Using GraphQL, and want to debug, rewrite & mock live traffic? Try out HTTP Toolkit. One-click HTTP(S) interception & debugging for browsers, servers, Android & more.

Inspecting Android HTTP with a fake VPN

Tue, 25 Aug 2020 12:20:00 GMT

Can you build an Android app that can inspect & rewrite the network traffic from every other app on the device?

In turns out that, yes, you can. HTTP Toolkit does exactly this, by building an app on top of the Android VPN APIs that fully simulates a fake VPN connection entirely within the device.

Here I want to talk through how that works, look at the code that makes it happen, and show you how you can do the same thing for yourself.

To be clear, this is not intended (or very effective) as a attack on the security of traffic from the device. When you actually do this Android provides clear warnings & permission prompts to the user during setup, and requires persistent UI notifications any time this is active. In addition this doesn't give you any way to read the contents of encrypted traffic, by default (in the next post, we'll talk about how HTTP Toolkit can do that).

There are some interesting & constructive use cases this opens up though for developer tooling. For example:

Inspecting & rewriting mobile traffic for testing & debugging (this is HTTP Toolkit's raison d'être).
Building a firewall for Android that blocks outgoing app connections according to your custom rules.
Recording metrics on the traffic sent & received by your device.
Simulating connection issues by adding delays or randomly injecting packet resets.

How do Android VPNs work?

The Android developer docs have a VPN guide, which is a good starting point.

These VPN APIs allow you to register a service in your app, which when activated is given a file descriptor that backs a network tunnel interface.

That tunnel interface is then used by the whole device for all network traffic. In addition, your VPN service is given the power to create protected sockets that don't use this tunnel, so the VPN app can communicate with the network without going through itself.

Once this is activated, when an app sends some data, instead of that going out to the network, each IP packet is buffered behind this file descriptor. When you read from it you're given raw network bytes directly, and when you write bytes to it they're treated as bytes received directly from the network interface.

This is designed to allow implementing a VPN connection in your app. In that case, your app would forward all the read bytes directly to a VPN provider over some protected separate connection, without any substantial processing of them on the device. The VPN provider would then forward that data on as part of the VPN's traffic, forward response packets back to your app over your connection, and you'd write the resulting packets back to the file descriptor.

That's what's this is primarily designed for, but that doesn't mean that that's all we can do with it.

When is a VPN not a VPN?

Once we have a VPN service running, our app will receive every network byte the device sends, and has the power to inject raw bytes back.

Things get interesting if rather than forwarding these bytes to a VPN provider, we examine them, and then simply put them straight back on the real network. In that case, we get to see every network byte, but we don't interfere with the network connection of the device, and we don't need an externally hosted VPN provider to do it.

Unfortunately that's easier said than done. Our file descriptor works with raw IP data, but Android doesn't actually have an API for us to send raw IP data anywhere else. Instead, we have higher level APIs for TCP and UDP, and the IP part is always done invisibly under the hood.

If we want to proxy these bytes, we need to match these two APIs up. We need to:

When we read an IP packet from our tunnel:
- Parse the raw packet bytes into an IP packet.
- Parse the TCP/UDP packet within and extract its content.
- (For TCP) Track the connection state of the overall TCP connection, and ack/fin/etc each packet in the session appropriately.
- Send the equivalent TCP/UDP content upstream, using Android's TCP/UDP APIs.
When we receive a TCP/UDP response from upstream:
- (For TCP) Match that to the tunnelled TCP connection.
- Build our own complete TCP/UDP + IP packet data around the received data.
- Write the resulting bytes back into the tunnel.
- Cleanly (or messily) close connections when the upstream socket is done.

This is quite complicated. We effectively need to reimplement UDP & TCP from scratch!

Fortunately, we're not the first people to want to do this. Most of the existing implementations are unmaintained demos, but they are open-source so we can build upon them! My own solution is based on a GitHub proof-of-concept called ToyShark (a pun on Wireshark, I assume) which was in turn based on some of the open-source network collection internals of an old AT&T project called Application Resource Optimizer (source).

The resulting HTTP Toolkit Android app implements all the above. This is 100% free & open-source (github.com/httptoolkit/httptoolkit-android) so similar open-source implementations in future can build upon it in turn.

This implementation acts as a VPN, while proxying all traffic back onto the real network, all without native code, just powered by Java NIO.

The core VPN implementation is in src/main/java/tech/httptoolkit/android/vpn, and there's a README there with an outline of the implementation details. We'll explore this a little more below, as we look at ways to extend it.

There is a performance penalty to all this of course, in both network bandwidth and latency. The impact isn't really noticeable in normal usage on any modern device though. On cellular connections it's usually dwarfed by the underlying connection performance, and even on wifi you can reach quite acceptable numbers:

This could probably be improved further by rewriting the Java code as a native code, but that entails significant extra complexity. For the HTTP Toolkit use case (targeted debugging, rather than heavy everyday usage) it's not worth it.

With that in place, we now transparently receive every network packet from the device. We can inspect it as we'd like, and even edit that traffic, through either the raw IP stream or the parsed TCP/UDP packet data. To what end?

How can we use this?

In HTTP Toolkit's case, the usage of this is very direct: we forcibly redirect all HTTP(S) traffic via the debugging proxy (which is running on your local development machine). That proxy then lets you inspect and rewrite all the traffic there as you see fit.

There's a demo video on the Android page if you want to see this in action.

To do this, we check the target port of outgoing TCP connections, and rewrite the address if it's one of our configured HTTP ports (e.g. 80, 443, …), by just adding the following lines into TCP session setup:

public Session createNewTCPSession(int ip, int port, int srcIp, int srcPort)
        throws IOException {
    // ...

    String ips = PacketUtil.intToIPAddress(ip);

    // Use the given target address, unless tcpPortRedirection has specified
    // a different target address for traffic on this port:
    SocketAddress socketAddress = tcpPortRedirection.get(port) != null
        ? tcpPortRedirection.get(port)
        : new InetSocketAddress(ips, port);

    channel.connect(socketAddress);

    // ...
}

That forcibly sends all the matching traffic to the proxy, and immediately gives us full visibility into HTTP traffic. On Android 10 we also set a VPN proxy configuration, which catches most traffic on further ports that aren't explicitly matched (although in that case it's advisory default configuration, rather than enforced redirection).

That's enough to redirect traffic for remote inspection & rewriting. How else could you extend this? Let's talk about the 3 other use cases I mentioned at the start:

Blocking outgoing connections

To block outgoing connections to specific addresses or on specific ports, you just need to throw away the packets after you receive them from the VPN interface, once you've parsed them to work out where they're going.

You can use this to block specific hosts you don't like, block DNS requests for certain addresses to build an on-device Pi-Hole, or allow traffic only to a short trusted list of hosts to lock down your networking entirely.

In HTTP Toolkit's implementation, SessionHandler's handlePacket is where we handle the raw packet data that the device wants to send. It looks like this:

public void handlePacket(@NonNull ByteBuffer stream)
        throws PacketHeaderException, IOException {
    final byte[] rawPacket = new byte[stream.limit()];
    stream.get(rawPacket, 0, stream.limit());
    stream.rewind();

    final IPv4Header ipHeader = IPPacketFactory.createIPv4Header(stream);

    // TODO: inspect ipHeader here, and 'return' to drop the packet

    if (ipHeader.getProtocol() == 6) {
        handleTCPPacket(stream, ipHeader);
    } else if (ipHeader.getProtocol() == 17) {
        handleUDPPacket(stream, ipHeader);
    } else if (ipHeader.getProtocol() == 1) {
        handleICMPPacket(stream, ipHeader);
    } else {
        Log.w(TAG, "Unsupported IP protocol: " + ipHeader.getProtocol());
    }
}

From that we can drop packets entirely based on the target address or port in the IP header, by simply doing nothing.

Dropping a packet here is literally packet loss, where the app sending the original request will never hear any response at all.

Alternatively, for more complex rules you can make changes within specific protocol handling, e.g. the handleTCPPacket or handleUDPPacket methods above. In both cases you can examine the parsed TCP/UDP packets, and drop them there (or in the TCP case, inject an immediate RST packet to tell the app the connection failed).

Recording traffic metrics

Want to know what your device sends and receives? Normally Android makes that more or less invisible. Within a fake VPN application like this though you have every network byte, so it's easy to examine and record data about outgoing & incoming packets.

It's simplest to do total byte metrics by address and/or port, but you could also build more complex analyses of packet data itself. E.g. tracking the duration of TCP sessions with certain hosts, recording metrics about the unencrypted data available, or looking at DNS UDP packets to examine which hostnames you're looking up.

For this codebase, we can easily capture outgoing traffic in the handlePacket method above. We have the raw IP packet data there, and the full TCP & UDP data is just a little more parsing away.

To track incoming traffic, we'd need to look at the code that handles the upstream connections. For example in readTCP from SocketChannelReader, where upstream TCP data is received:

private void readTCP(@NonNull Session session) {
    // ...

    try {
        do {
            len = channel.read(buffer);
            if (len > 0) {
                // We're received some TCP data from the external network:
                sendToRequester(buffer, len, session);
                buffer.clear();
            } else if (len == -1) {
                // The external network connection is finished:
                sendFin(session);
                session.setAbortingConnection(true);
            }
        } while (len > 0);
    }

    // ...
}

At this point, we're handling the contents of the TCP connection, before we pack it back up into the raw bytes for the VPN interface.

By examining the TCP data read here and associating it with the IP & port of the TCP session, you can quickly start to build a view into your device's network communication.

Simulating connection issues

It's possible to simulate connection issues on the device too. That's especially useful to test how applications handle low-quality internet connections and network errors.

Unfortunately you can't simulate all issues, as Android's APIs give us limited control of upstream traffic. We control the contents of upstream TCP & UDP packets, but not the raw network connection itself. That means, for example, we can't simulate our device sending the wrong upstream packet sequence number or corrupting a TCP checksum, but we can simulate the device receiving such packets.

There's still a lot of interesting things you can simulate with this:

Incoming or outgoing packet loss, where some packets simply disappear in transit.
Repeated or misordered packets.
Random connection resets (similar to tcpkill).
Delays to packets, in either direction.

In each case you'd normally do this probabilistically, so that 10% of connections fail, or packets are delayed by 500ms on average.

When you do this, you'll often see some surprising results and errors in your app. In effect we're doing on-device chaos engineering.

Adding random connection resets like this will usually result in very visible TCP connection failures, causing random HTTP requests, raw network sockets or whatever else to suddenly fail and disconnect.

Packet loss and ordering issues meanwhile are normally be handled at the TCP level, invisibly to your application code, but the process of doing so can result in unpredictable performance, and cause real issues at the application level.

During day-to-day development it's very easy to never see these issues, given the fast & reliable wifi in your office or at home, and simulating rural 2G issues like this can be eye-opening!

You do most of this at a very low level, just hooking into the places where individual raw IP packets are passed to and from the VPN. For TCP error simulation though, you'll need a lot more information about the TCP connection itself, to find packets to reorder, or to inject RSTs into active connections.

In the HTTP Toolkit app specifically:

ClientPacketWriter is where raw IP data is written back to the VPN (incoming IP packets). At this stage we can easily drop, corrupt or delay incoming packets at the IP level.
handlePacket again in SessionHandler, would allow us to drop, delay or otherwise react to outgoing packets.
SessionHandler also controls the TCP flow of each connection to process each packet, allowing us to hook into that flow directly. For example, you could extend replySynAck to schedule a connection reset (just a call to resetConnection) for 50% of new connections 2s after they're created.
SessionManager stores the state controlled by SessionHandler. Given the list of active connections there, we could select random active TCP sessions and kill them according to whatever criteria you like.

As we've seen, the Android VPN APIs are powerful, and there's a lot of potential here.

With a few tricks like this to hook into network traffic, there's a whole world of interesting tools you can build. Give it a go! Have any thoughts or feedback? Let me know.

Want to take your Android debugging to the next level? HTTP Toolkit gives you one-click HTTP(S) inspection & mocking for any Android app (plus lots of other tools too).

How to Debug Node.js Segmentation Faults

Wed, 12 Aug 2020 11:20:00 GMT

Oh no, your JavaScript code isn't just throwing an exception or crashing: it's segfaulting. What does that mean, and how can you fix it?

You'll know this happens because node will hard crash, exiting silently without any kind of real stack trace, perhaps printing just segmentation fault (core dumped).

(If you do get a normal JavaScript stack trace on the other hand, then you're dealing with a normal JS error, not a segfault. Lucky you! You might be more interested in the guide on How to Debug Anything)

What is a Segmentation Fault?

A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system). - wikipedia.org/wiki/Segmentation_fault

In practice, a segfault occurs when your program breaks some fundamental rule set by the operating system. In that case, the operating system sends your process a signal (SIGSEGV on Mac & Linux, STATUS_ACCESS_VIOLATION on Windows), and typically the process shuts down immediately.

The rules that you can break to cause this include things like reading or writing to an invalid memory address (e.g. native code somewhere trying to use a null pointer as a memory address), causing a stack or buffer overflow, or reading or writing from memory that's not yours (maybe it was yours but it's now been released, maybe it's unused, or maybe it's owned by another process or the operating system).

All of these cases involve low-level concerns, like pointers & memory management. You shouldn't normally have to worry about this when writing JavaScript! The language runtime normally manages your memory, doesn't expose the kinds of APIs that could cause these issues, and enforces its own rules on the APIs that are available, to guarantee that your code behaves correctly.

That all ensures that the underlying operating system's rules are never broken, and ensures that any time you do accidentally try to take any invalid actions, you get a clear error that appears straight away, rather than random failures later.

Unfortunately, there are a few cases where you can still hit segfaults in Node:

When you use native addons (either directly, or because one of your dependencies uses them), so you're effectively running your own native code as part of your application. If that native code is either buggy or just incompatible with your version of Node, you'll often get segfaults.
If you manipulate parts of the internal private state of Node objects. This can break Node's assumptions, so that Node's built-in native code does the wrong thing, resulting in segfaults.
When Node.js itself has a bug somewhere, and segfaults all by itself.

How can I fix it?

Find the culprit

First, you need to work out which of the 3 cases above you have.

Native addons are always the most likely cause here. There's a couple of things to try straight away:

Rebuild all your native node modules with npm rebuild. This will recompile native code with your current version of node, and should resolve any issues where your native modules are compiled for the wrong node version.
Find all the native modules you have installed, by searching your node_modules folder for .node files. On Linux/Mac you can list them with:

  find node_modules -iname "*.node"

If you have no native modules installed, you can rule that case out entirely. If you do have modules installed there that seem related to the crash you're seeing, then that's probably a good place to start looking.

You can also try to get more detail on the segmentation fault itself.

To do this, you can use the Segfault-Handler module. Just run npm install segfault-handler, and then add the below right at the start of your application code:

const SegfaultHandler = require('segfault-handler');
SegfaultHandler.registerHandler('crash.log');

That module listens for any SIGSEGV signal, and reports the detailed stack trace that caused it before the process shuts down. When you next hit your segmentation fault, you'll get something like this:

PID 30818 received SIGSEGV for address: 0x20
[...]/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3127)[0x7fdb5a5fb127]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7fdb735f58a0]
node(_ZN4node7TLSWrap6EncOutEv+0x170)[0xa09010]
node(_ZN4node7TLSWrap7DoWriteEPNS_9WriteWrapEP8uv_buf_tmP11uv_stream_s+0x2c7)[0xa0a6c7]
node(_ZN4node5http212Http2Session15SendPendingDataEv+0x4ce)[0x93b5ae]
node(_ZN4node5http212Http2Session5CloseEjb+0xda)[0x93c4fa]
node[0xb62a3f]
node(_ZN2v88internal21Builtin_HandleApiCallEiPPNS0_6ObjectEPNS0_7IsolateE+0xb9)[0xb635a9]
[0xcec6c2dbe1d]
[1]    30818 segmentation fault (core dumped)  node ./bin/run start

That's the output from a segmentation fault I was hitting recently, where the new HTTP/2 debugging support in HTTP Toolkit occasionally crashed the Node process, after certain patterns of connections & disconnections.

A trace like this doesn't give you enough to fix the issue, but it does give a clear clue where the problem lies.

In my case, the SendPendingData method of an HTTP2Session is trying to write to a TLS stream as the session closes down, and that's then crashing the process. That gave me some clear info: it's an issue with HTTP/2 requests, and it's happening in node itself, not a native addon. From there, a quick search of the Node issue tracker led me to a reported bug, and eventually to a workaround.

Find a fix

From here, you should have some pointer towards the code that's buggy. If there's a suspicious native addon module involved then that's almost certainly the culprit, and you should start there.

Otherwise, if the trace is clearly pointing to Node internals (as above) and you're not messing around with those yourself, or using any relevant native addons, then you've probably found a bug in Node. Congratulations! Node should never segfault if you're writing normal JavaScript code, so something very wrong is going on.

From here, there's a few good next steps:

Update to the latest version of Node/the node module in question, and make sure the same bug still appears there.

In many cases just a quick update of the right thing will solve your issue, and if not then maintainers will be much happier to help you investigate if they know it's definitely a current issue.
Double-check your code is using the failing code as intended.

Check the documentation of the related properties and methods you're accessing, and make sure that they are indeed documented (i.e. you're not unexpectedly messing with internal state) and that you're following the instructions in that documentation correctly. It's often useful to look through the native module's test code too, to see some examples of how it's supposed to be accessed.
Report the issue to the addon maintainers/Node team.

GitHub is your friend here: use the details you've found to do a quick search on the relevant repo's issue tracker first. The Node issue tracker is available at github.com/nodejs/node/issues.

If you're lucky, you'll find an issue with more information, and maybe even an existing workaround. You can then add any extra details you have and an upvote there to help the maintainers. Of course, if not, it's time to file a bug for yourself.

Either way the best way to ensure these bugs actually get fixed is to provide a reliable way for other developers to reproduce the issue. The more information on how to do so, and the simpler the steps required, the better.
Use your segfault trace to find the relevant code, add detailed logging or use debugging tools, and very carefully walk through the code that's failing to try and find something that's not quite right.

If you're not familiar with the code in question, and you haven't written native addons for Node.js before this can be intimidating and difficult. It's worth a go though, and you don't need to understand the code perfectly to do this. In many cases you'll quickly spot a comment or clue for why this crash could occur, that'll lead you back to a nice clean fix in your own JavaScript.

Especially in native addons, you'll often find that they make certain assumptions (this method will never be called twice, this parameter will never be undefined) that aren't always checked everywhere. Any of these can easily mean that a minor bug in your code results in the addon's native code doing completely the wrong thing, and crashing the whole process.
Find a workaround: change how you're using the module in question, use a different module entirely for now, delete the broken feature from your product entirely, or quit your job and go live in the forest.

Hopefully that's enough to show where the issue is, and get the information to fix or workaround it so you can get your code back on track.

Have any other suggestions or advice for others in the same place? Send me a message or let me know on Twitter.

Translating between HTTP/1 and HTTP/2

Wed, 15 Jul 2020 14:00:00 GMT

Semantically, what changed in HTTP/2?

Multiplexed connections, binary frames, header compression - all the headline changes are syntactic and network format changes, rather than fundamental changes to the concept. As a developer building on top of this, you can often ignore the low-level syntax of network protocols like this, and just think about the meaning of each message (the semantics) rather than byte-by-byte how it's sent between computers.

Semantically though, while HTTP/2 is built on top of the ideas of HTTP/1.1, and the HTTP/2 spec is at pains to emphasize that it is not redefining HTTP's semantics, there are a few real-world semantic differences you need to use it effectively (and with HTTP/2 now in use on nearly 50% of the top 10 million webservers, you really do need to know how it works).

This matters when building anything non-trivial on HTTP/2, but especially matters if your application does need to translate between the two, e.g. as a proxy or from a cache, and it's important to understand if you want to reliably handle requests in both protocols using the same code.

The more things change, the more they stay the same

Let's start with what doesn't change.

Firstly, the core communication model is still a request from a client followed by a response from the server.

Requests have a method like GET, a URL, some headers, and optionally a body. Responses have a status code (200, 404…), their own headers, and their own body. Most HTTP methods and status codes still have the same fundamental meanings.

You can write code that takes a request, looks at the method & URL to decide what to do, and sends back a response with a status code and the data requested, and most frameworks will make that work for you automatically. For basic HTTP handling like that, you can just enable HTTP/2 in your server of choice, and you're golden.

Once you get into anything more complex though, you can run into some interesting issues. Let's dig into the differences:

Status messages are dead

In HTTP/1, every response has a status code, and a status message. In its raw format, that might look like this:

HTTP/1.1 200 OK
header-name: header-value
another-header: ...

OK is the default for 200, but it's not fixed. This is equally valid:

HTTP/1.1 200 Super great
header-name: header-value
another-header: ...

This status info is received in the client, and can be used to make error messages more informative, to differentiate nuance in status-only responses, and so on.

However, although it's useful in theory to have a space for a single-line summary of a response, this isn't used much in practice, and adds complexity, so HTTP/2 drops it entirely. It's just status codes now, and you'll need to put other data into the body or headers.

Everything's a header

HTTP/2 moves request and response metadata into so-called 'pseudo-headers'. These are headers, but prefixed with a colon, like :path (because in HTTP/1 that's an unparseable header name, so it's guaranteed to not have been used).

That means that although requests do still have methods and URLs and responses still have status codes, they're now part of the header data itself. Rather than building the URL from the request path plus the old Host header, we now have :scheme (http/https), :authority (the hostname) and :path (the URL path) headers. Rather than a standalone status code, we have a :status header.

This makes translation a bit more complicated, as a lot of pre-HTTP/2 code will choke on header names like these and the values need to be extracted for HTTP/1-compatible usage, but this allows these values to take advantage of header compression to be sent far more efficiently.

Hop-by-hop connection headers are no more

The Connection header, along with a few other related headers, describes specific details about how the direct TCP connection between the client and the server or proxy works. For example, in HTTP/1.1 a Connection: close header might be sent with a message to close the connection immediately afterwards, rather than keeping it alive ready for future requests.

The Connection header can also include a list of other header names, to declare them as only relevant for the current direct connection, i.e. to tell any proxies involved not to forward them to the target server.

Other hop-by-hop headers include Keep-Alive (which defines exactly how a connection should be kept alive), Proxy-Authenticate (authentication details for HTTP proxies) , Trailer (announces headers that will arrive after the message body in chunked messages), Upgrade (switches the protocol of the current connection), and Transfer-Encoding (describes data transfer formats, made unnecessary with HTTP/2's data framing).

In HTTP/2, connection-specific headers like Connection and Keep-Alive are prohibited. While some browsers will simply ignore them, Safari and other webkit-based browsers will outright reject any response that contains them. Easy to do by accident, and a big problem.

Other hop-by-hop headers aren't forbidden, but are not generally functional, and it's recommended that they not be used.

Cookie headers

In HTTP/1, according to RFC 6265:

When the user agent generates an HTTP request, the user agent MUST NOT attach more than one Cookie header field.

When a user agent (typically a browser) wants to send multiple cookies with a request, it sends them separated with a semicolon and space, like so:

Cookie: cookie1=a; cookie2=b

In HTTP/2, you can still use this form, but you can also send multiple cookie headers separately, breaking the rule above.

Splitting them up is generally better to do, because it allows HTTP/2 to more effectively compress cookie headers, so they're not sent repeatedly within the same connection. If you do send them separated like this then any change to them only requires sending the new changed header. If you concatenate them as with HTTP/1 then every change to any cookie requires resending all of them.

Unfortunately that means headers for most HTTP/2 requests are invalid in HTTP/1, potentially resulting in your server rejecting the request entirely, missing all but one of the values provided, or other undefined behaviour. It's easy to fix though: just join all Cookie header values with a '; ' (semicolon + space) separator.

The CONNECT method

A CONNECT request asks the server for a raw TCP tunnel to a different server. In HTTP/1 this is used almost exclusively for HTTP proxying, so that a client can make an HTTPS connection to a server through a proxy, without sharing the content of the connection with the proxy.

In HTTP/1, a proxy client makes a CONNECT request with the name of the target server, the proxy server returns a 200 response (if it accepts it), and then the entire TCP connection becomes a tunnel to the target server. Every byte sent is sent directly on to the target, and every byte received is sent back to the client.

In HTTP/2, a proxy client makes a CONNECT request with the name of the target server, the proxy returns a 200 response (if it accepts it), and then that one specific request stream becomes a tunnel to the target server. To send data, it must be wrapped up as an HTTP/2 DATA frame, with the id of the tunnelled stream, and the data within that is then forwarded on to the target server, and received data is similarly packed into DATA frames on that stream, whilst other HTTP/2 requests with the proxy server can keep using the same TCP connection independently.

In addition, setting up websockets and other HTTP protocol changes is now done using CONNECT requests, rather than the GET request + Upgrade header + 101 response that was used with HTTP/1. Like CONNECT tunnelling, websockets also now work over only a single stream within the HTTP/2 connection, rather than taking over the entire connection.

Server Push

Server Push allows an HTTP/2 server to proactively send content to an HTTP/2 client, without it being requested.

You can think of it semantically as an extra response given to a client, along with the metadata of a request that would have generated this response, like "if you were to send me a GET request for /abc, this is what you'd receive".

This is useful when a server can guess what you're likely to request next, for example when you request an HTML page that contains lots of images, it can be useful for the server to push the critical images back alongside the HTML rather than wait for the client to realise they need them.

HTTP/1 has no mechanism similar to this, so there's no way to translate this back for older clients, but fortunately push support is entirely optional, and clients or intermediaries are free to ignore push requests, or set the SETTINGS_ENABLE_PUSH setting to 0 to explicitly refuse them up front.

Translating one to the other

If you're translating between the two, what does that mean in practice?

If you want to safely translate an HTTP/2 message into HTTP/1, e.g. to handle an HTTP/2 request with HTTP/1.1-compatible code, you need to:

In requests, build a method, URL and 'Host' header from the pseudo-headers
In responses, read the status from the :status pseudo-header, and set the status message to the default for that status code
Strip all pseudo-headers from the header data
Join your Cookie headers with a ; separator, if you have more than one
Build tunnels from CONNECT requests like normal, but tunneling just within that specific HTTP/2 stream, not the entire connection
If you receive a server push, drop it silently (although you could cache it first), or considering signaling in the initial HTTP/2 SETTINGS frame that you don't accept pushes in the first place

To safely translate an HTTP/1.1 message into HTTP/2, e.g. to return an HTTP/1.1-compatible response to an HTTP/2 client:

In responses, drop the status message entirely, and put the status code into the :status pseudo-header
In requests, build the :scheme, :authority and:path headers from the URL and host header
Optional, but recommended: Split Cookie headers into a separate header per cookie
Strip all headers made illegal in HTTP/2 messages:
- connection (and all headers listed in the value of the connection header)
- upgrade
- host
- http2-settings
- keep-alive
- proxy-connection
- transfer-encoding
- te (unless the value is just trailers)

With all that in place, you can convert back and forth between HTTP/2 and HTTP/1 messages freely, and easily integrate HTTP/2 into your existing HTTP-powered codebase. Enjoy!

If you're interested in the finer details of implementing this, you might enjoy this thread, where I'm currently live tweeting the entire implementation of HTTP Toolkit's new HTTP/2 interception & debugging support, with links and explanation for each commit along the way.

Anything I've missed? Any thoughts? Get in touch on Twitter or send me a message and let me know.

What's coming in TypeScript 4?

Mon, 22 Jun 2020 16:00:00 GMT

TypeScript 4 is coming up fast: a first beta release is planned for this week (June 25th), with the final release aiming for mid-August.

It's important to note that TypeScript does not follow semver, so 4.0 is not as big a deal as it sounds! There can be (and often are) breaking changes between any minor TypeScript versions, and major version bumps like this happen primarily for marketing reasons, not technical ones.

This bump to 4.0 doesn't suggest that everything is going to break, and this won't be a huge world-changing release, but it does bring some nice additions, particularly on the typing side. For projects like HTTP Toolkit (written entirely in TypeScript) that means faster development & fewer bugs!

Let's dive into the details:

Variadic tuple types

Also known as 'variadic kinds', this is a complex but substantial new feature for TypeScript's type system.

~~It's not 100% confirmed yet (the PR remains unmerged!), but it's explicitly in the 4.0 roadmap, and Anders Hejlsberg himself has called it out as planned for the coming release.~~ Update: PR now merged, looks like this is happening!

Explaining this is complicated if you don't have an strong existing grasp of type theory, but it's easy to demo. Let's try to type a concat function with tuple arguments:

function concat(
    nums: number[],
    strs: string[]
): (string | number)[] {
    return [...nums, ...strs];
}

let vals = concat([1, 2], ["hi"]);
let val = vals[1]; // infers string | number, but we *know* it's a number (2)

// TS does support accurate types for these values though:
let typedVals = concat([1, 2], ["hi"]) as [number, number, string];
let typedVal = typedVals[1] // => infers number, correctly

This is valid TypeScript code today, but it's suboptimal.

Here, concat works OK, but we're losing information in the types and we have to manually fix that later if we want to get accurate values elsewhere. Right now it's impossible to fully type such a function to avoid this.

With variadic types though, we can:

function concat(
    nums: [...N],
    strs: [...S]
): [...N, ...S] {
    return [...nums, ...strs];
}

let vals = concat([1, 2], ["hi"]);
let val = vals[1]; // => infers number
const val2 = vals[1]; // => infers 2, not just any number

// Go even further and accurately concat _anything_:
function concat(
    t: [...T],
    u: [...U]
): [...T, ...U] {
    return [...t, ...u];
}

In essence, tuple types can now include ...T as a generic placeholder for multiple types in the tuple. You can describe an unknown tuple ([...T]), or use these to describe partially known tuples ([string, ...T, boolean, ...U]).

TypeScript can infer types for these placeholders for you later, so you can describe only the overall shape of the tuple, and write code using that, without depending on the specific details.

This is neat, and applies more generally than just concatenating arrays. By combining this with existing varadic functions, like f(...args: [...T]), you can treat function arguments as arrays, and describe functions with far more flexible argument formats and patterns than in the past.

For example, right now rest/varadic parameters in TypeScript must always be the last param in a function. For example, f(a: number, ...b: string[], c: boolean) is invalid.

With this change, by defining the arguments of the function using a variadic tuple type like f(...args: [number, ...T, boolean]) you can do that.

That's all a bit abstract. In practice, this means you'll be able to:

Destructure array types:
type head = (list: [H, ...T]) => H
Do many of the things allowed by mapped types, but on arbitrary-length arrays of values, not just on objects.
Infer full types for functions with variadic arguments: type f = (...args: [...T]) => T
Infer proper types even for extra complicated partially-known variadic arguments: type f = (...args: [string, ...T, boolean]) => T
Fully define types for promisify.
Create accurate types for many other higher-order function definitions, like curry, apply, compose, cons, …
Kill all sorts of workarounds where you had to separately define an overload for each possible number of arguments (I've been guilty of this myself).

Even if you're not writing a lot of higher order functions, improved typing here should allow more detailed types to spread far and wide through your code, inferring away many non-specific array types, and improving other types all over the place.

There's a lot more depth and many other use cases for this - take a look at the full GitHub discussion for more info.

Labelled Tuples

As a related but drastically simpler feature: TypeScript will allow labelling the elements of your tuples.

What does the below tell you?

function getSize(): [number, number];

How about now?

function getSize(): [min: number, max: number];

These labels disappear at runtime and don't do any extra type checking, but they do make usage of tuples like these far clearer.

These also work for rest & optional arguments too:

type MyTuple = [a: number, b?: number, ...c: number[]];

For more info, checkout the GitHub issue.

Property type inference from constructor usage

A nice clear improvement to type inference:

class X {

    private a;

    constructor(param: boolean) {
        if (param) {
            this.a = 123;
        } else {
            this.a = false;
        }
    }

}

In the above code right now, the type of a is any (triggering an error if noImplicitAny is enabled). Property types are only inferred from direct initialization, so you always need either an initializer or an explicit type definition.

In TypeScript 4.0, the type of a will be number | boolean: constructor usage is used to infer property types automatically.

If that's not sufficient, you can still explicitly define types for properties, and those will be used in preference when they exist.

Short-circuit assignment operators

Not interested in typing improvements? TypeScript 4.0 will also implement the stage 3 JS logical assignment proposal, supporting the new syntax and compiling it back to make that usable in older environments too.

That looks like this:

a ||= b
// equivalent to: a = a || b

a &&= b
// equivalent to: a = a && b

a ??= b
// equivalent to: a = a ?? b

Nowadays the last option is probably the most useful here, unless you're exclusively handling booleans. This null-coalescing assignment is perfect for default or fallback values, where a might not have a value.

The also rans

That's a few of the bigger notices, but there's a lot of other good stuff here too:

unknown now supported as a type annotation for catch clauses:
try { ... } catch (e: unknown) { ... }
Support for React's new JSX internals
Editor support for @deprecated JSDoc annotations
More performance improvements, following on from the big improvements in 3.9
New editor refactorings (e.g. automatically refactoring code to use optional chaining), improved editor refactorings (better auto-import!), and semantic highlighting

None of these are individually huge, but nonetheless, cumulatively it'll improve life for TypeScript developers, with some great improvements to type safety and developer experience all round.

I should note that none of this is final yet! I've skipped a few discussed-but-not-implemented changes - from awaited T to placeholder types - and it's quite possible some of these features could suddenly appear in the next month, or equally that a new problem could cause changes in the implemented features above, so do keep your eyes peeled…

Hope that's useful! Get in touch on Twitter if you've got questions or comments.

Bye bye Feature-Policy, hello Permissions-Policy

Wed, 27 May 2020 13:30:00 GMT

Ever heard of Feature-Policy? It's a draft W3C web security standard, defining an HTTP header and iframe attribute that sets limits on the browser features a page can use.

It's useful for any site that's concerned about XSS attacks, embedded content, security risks in dependencies, or major bugs in their own software, by setting guardrails on the browser features a page can use. You can use feature policy to guarantee your page or an embedded iframe cannot access the user's microphone or camera, can't read their location or phone sensors, can't use the Payment Request API, and so on. This is an extra safeguard, in addition to the browser's own permissions system, so it only tightens existing permission restrictions further.

Feature Policy has been around a couple of years now, and got some good early press as a recommended security technique all over, from Google's web developer guide to Smashing Magazine.

Since then browser support has made steady progress, with about 75% of users globally now supporting it (that's all recent browser versions except Safari). More recently that's lead to the start of real production usage: Rails 6.1 and Node.js's popular helmet security package recently shipped built-in support, and Scott Helme's latest analysis of the top 1 million sites shows the Feature-Policy header in use by nearly 5,000 of them.

It is still just a draft though. That means it's subject to change, and it is now changing: the Feature-Policy standard & header is being renamed to Permissions-Policy.

There's some discussion of the reasoning in the spec repo. In short:

Many proposed additions don't mesh with the existing Feature-Policy behaviour, so these (along with some of the existing features) are being defined instead in a new Document-Policy header, with different semantics focused on feature configuration, rather than security.
The remaining features are a strict subset of the separately defined set of web permissions.
Renaming offers an opportunity to change the header value syntax, to align it with the new Structured Headers standard.

Any kind of migration of web standards comes with some risk. In this case, a different risk than normal: removing or renaming this header won't break anything outright, but it does silently remove a security safeguard from existing sites (scary).

The exact migration plan is unclear, but it seems likely that browsers will include support for the existing header & syntax for a while with a warning, to ensure this is as obvious as possible for the existing sites that expect it to work. Seeing this change in the real world is still a couple of browser releases away, so we'll have to wait to find out exactly how each browser decides to handle this.

Consider this an early warning though: if you're currently using Feature-Policy, you're going to want to migrate soon, and as a community we've got a whole bunch of documentation that's going to need updating.

Want to test out Feature/Permissions policy headers right now? Fire up HTTP Toolkit, set some breakpoint rules, intercept some real web traffic, and rewrite the live headers to your heart's content.

How will user-agent client hints work?

Wed, 06 May 2020 17:00:00 GMT

In the coming months, browsers are going to start killing the User-Agent HTTP header to replace it with user-agent client hints, a set of opt-in Sec-CH-UA-* headers.

Maybe you've heard about this already, maybe that all sounds great, but what exactly does this mean in practice?

Let's talk about how the Accept-CH and Sec-CH-UA-* headers will work, how you can test that with your own services today, and what comes next.

What's the current situation?

Right now the user agent (UA) includes your browser version, OS version and architecture, specific mobile phone manufacturer & model, and more. This creates a wide range of unique user agent header values, and that means a server & proxies can use this header (along with other data points) to fingerprint users - to recognize & track individual people without using cookies or other restricted tracking mechanisms.

In addition, many sites use UAs to decide which content to server. This UA 'sniffing' has historically been abused, blocking functional browsers from accessing services when they don't fit a whitelist of known UA formats. That in turn has resulted in UAs trying to preserve backward compatibility, and UA strings gaining more and more cruft that can never be removed. Right now, 100% of popular browsers' user agents start with Mozilla/5.0, for instance. Not great.

As a case in point, here's a user agent for Chrome on Android:

Mozilla/5.0 (Linux; Android 9; Pixel 2 XL Build/PPP3.180510.008) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Mobile Safari/537.36

Very specific, and very inaccurate. In reality, there's no KHTML, Gecko, Safari or Mozilla involved. All this information is sent to every service your browser communicates with in any way. This is a mess.

What's the plan?

The solution is not to remove the User-Agent header completely. For compatibility reasons it will still be sent, probably forever, but 'frozen'. The plan is to progressively reduce the number of unique UA values, by grouping more and more cases together to return the same UA.

Soon, there's likely be a single UA used by all Chrome versions on all desktop OSs, and a single UA used by all Chrome versions of all mobile OSs. That reduces the real information in the user agent down to just mobile/desktop, and the browser itself. Long term, it's very possible those will be frozen too, sharing UAs across desktop and mobile and more browsers.

This will apply to both the User-Agent header that's sent in HTTP requests, and the navigator.userAgent property accessible from client-side JavaScript.

Some services do need the information that the UA provides though. You might be serving content that depends on the specific browser version a user is using (either because the content itself is relevant to the browser, or because you need to work around behaviour in specific known versions), or you might be serving content that depends on the user's specific OS and OS version (offering a Mac download to Mac users and a Windows download to Windows users).

These cases exist, and will continue to be supported, but explicitly: the server will need to send an Accept-CH header to request this information.

The Accept-CH header

Accept-CH is an existing HTTP header, currently an active but experimental draft standard, currently in the "Last Call" phase (until the 8th of May 2020). It's been supported in Chrome on desktop & Android since 2015, and other Chromium-based browsers, though it's not yet available in Firefox or Safari.

Until now, its been used to request bonus details from browsers, such as the user's connection speed, viewport size or screen density. The idea is to allow servers to customize the content they serve, optimizing images and other content for mobile devices or users on low-bandwidth connections.

It works like so:

The client sends a request to the server with no hints, for example an initial navigation to https://example.com/index.html
The server responds with the content requested, and includes an Accept-CH header, such as:
- Accept-CH: Viewport-Width - the server wants to know the width of the client's screen
- Accept-CH: Width - the server wants to know the desired width of resources being requested (e.g. how much space is available to show an image)
- Accept-CH: DPR, Device-Memory, Downlink - the server wants to know the screen density, amount of RAM, and bandwidth of the client
For subsequent requests for pages or resources from the same origin, the client sends these hints, each as a separate header:
- Width: 123 - the size of image the device wants to show
- Device-Memory: 2 - the memory of the device, in GiB, rounded to 0.25/0.5/1/2/4/8 to resist fingerprinting
- Download: 2.5 - the bandwidth available, in Mbps, rounded to the nearest 25Kbps to resist fingerprinting

There's a few caveats to this:

First, client hints aren't always honoured. They're only supported for HTTPS connections, and only on first-party resources, so if you open https://example.com in your browser, requests to load subresources from example.com may include client hints, but requests for subresources from ads.otherdomain.com will not (although this may be configurable using a feature policy).

They're also optional. Clients might refuse to send them, or might not support them at all, and they'll likely never appear in the first request to your origin.

That said, if you do need a hint in the initial request, you could return an Accept-CH header with a 307 redirect back to the same URL to ask for the hint immediately, but you rarely want to do this. Doing so adds a redirect to your page load, and you risk putting users who can't or won't provide these hints into a redirect loop that locks them out of your site. It's better to serve a default version of your content, and treat client hints as progressive enhancement, to be used when available but not depended upon.

These client hints are then persisted for the origin. The exact lifetime is left up to the client (a previous draft included a Accept-CH-Lifetime header, but that's now been removed) but its likely to be at least the rest of the current browser session. Although this means the same hint headers are duplicated on all future requests, with HTTP/2's header compression that can be done extremely efficiently.

Lastly, if you are building a server that uses any client hints, you should be sure to include Vary: in all responses, to ensure that they're only cached for requests that send the same hint values.

All of this is Chrome only right now, although there's some progress in other browsers, and the standardisation process is intended to encourage that. The set of opt-in hints supported in the latest stable Chrome includes:

Width
Viewport-Width
DPR
Content-DPR
Device-Memory
RTT
Downlink
ECT

Google's web fundamentals guide has more detail about using these in practice.

That's the state of things today. Let's talk about how we can use this to kill off User-Agent once and for all.

User-agent client hints

The current UA client hints draft proposes a few user-agent client hint headers to expose the information from User-Agent in a granular form:

Sec-CH-UA - basic UA info, e.g. "Google Chrome"; v="84"
Sec-CH-UA-Arch - the CPU architecture, e.g. x86_64
Sec-CH-UA-Model - the device model, e.g. Pixel 3
Sec-CH-UA-Platform - the client OS, e.g. Linux
Sec-CH-UA-Platform-Version - the client OS version, e.g. NT 6.0
Sec-CH-UA-Full-Version - the full client UA version, e.g. "84.0.4128.3"
Sec-CH-UA-Mobile - a boolean header, describing whether the client a mobile device, either ?1 (yes) or ?0 (no)

The Sec- prefix here may be unfamiliar. This is a general prefix for a forbidden header name as defined by the Fetch spec. Headers starting with Sec- can never be manually sent by JS in a web page.

Sec-CH-UA and Sec-CH-UA-Mobile are considered 'low-entropy hints', which will be sent by default. For the others, you'll need to send an Accept-CH header, with the header name without the Sec-CH- prefix. For example, if you want to know what platform the client is using, send a Accept-CH: UA-Platform response.

It's important not to ask for too much, and request only the hints you really need. In addition to potential data transfer concerns (especially for HTTP/1.1 clients or servers), requesting too much information may trip a privacy budget or otherwise trigger permission prompts in future, and implies collecting unnecessary personal information about your users.

The draft also proposes a navigator.userAgentData JavaScript API to access this hint data client-side, but that doesn't seem to be implemented anywhere yet.

How to start using these today

Right now, the only browser that supports this is Chrome, and only in the dev & canary channels, and behind a flag. It's early days! That said, testing it out now allows you to see how this might impact your application, and start to see how you can capture any hints that you need to handle this, once this does land for real.

To test this out today, you'll need an HTTPS server locally where you can log requests and play with the response headers, or you can use an HTTP debugger like HTTP Toolkit to directly inspect & inject responses to test around with. Once you have that in place:

Open Chrome (either the Dev or Canary builds)
Enable "Experimental Web Platform features" and "Freeze User-Agent request header" from chrome://flags
Load a page from your domain over HTTPS, and look at the request headers you receive - this is what'll happen soon by default: Note the frozen "84.0.0.0" version and "Windows" platform in the UA here
Load the page afresh, this time returning edited headers (directly from your server, or by adding a breakpoint from the Modify page in HTTP Toolkit) that include Accept-CH: UA-Platform
Reload once more, and you should see the client send you a new Sec-CH-UA-Platform header in the request.

Bear in mind this is still a draft, not yet released in any stable browsers, and not yet final. Don't ship code that depends on this! The full details aren't yet decided, and it's still very possible that it'll change over the coming months.

When is this happening?

In Chromium's original timeline (now disrupted by COVID-19), the goal was to freeze browser & OS versions from June 2020, eventually freezing to just 2 possible user-agent values - one for desktop & one for mobile - for all versions of Chrome from September 2020.

That's now delayed until 2021, and the specific new plan hasn't yet been announced, but it's likely to take a similar shape.

Other browsers will likely follow suite. Edge have been supportive, while Firefox are broadly supportive, and already have UA freezing implemented in place as a privacy option today. Recording Firefox's HTTP traffic with HTTP Toolkit normally shows Firefox sending a detailed UA:

But if the privacy.resistFingerprinting flag is set in Firefox's about:config, that same browser sends:

Safari haven't formally announced their position, but they've previously attempted to freeze the UA in preview builds (though that was partially rolled back), and it seems likely they'd follow suit once the rest of the ecosystem commits to this.

Watch out for more changes in the same direction too, as browsers move other fingerprintable data behind client hints in future, including the Accept-Language header, and begin investigating approaches like GREASE to mitigate sniffing risks. You can follow the detailed progress on this in the Chromium and Firefox bug trackers.

Have thoughts? Accept-CH in general is now in its last call for comments, until the 8th of May 2020, whilst the UA freezing and client hints details are still very much subject to change, with discussion happening in the WICG ua-client-hints repo on GitHub. There's still time to shape them to work for you!

Ignore HTTP Client Errors At Your Peril

Thu, 16 Apr 2020 17:45:00 GMT

There's a lot that can go wrong when talking to servers on the web. When you're building and running a web service, keeping an eye on errors is essential to finding bugs, and understanding the experience of your end users (and hopefully even improving it).

With so many possible forms of failure though, there's some critical cases that can fall through the cracks.

Most backend monitoring and logging will detect and report on explicit server failures, from tracking the number of 5xx HTTP error responses that you send to reporting thrown exceptions to services like Sentry. For this post, I want to go beyond these surface checks, but that's not to say they're unimportant: before you do anything else here, I'd strongly recommend having that fundamental monitoring in place.

In many cases though, those checks alone can offer a false confidence to teams, who assume that no explicit server errors means that everything is working fine. That's frequently not true. These don't tell the whole story, as there's a whole world of errors that matter to the backend, and whose root cause lies in the server itself, but which surface as client errors, and never get reported.

The Client is Always Right

When we talk about 'client' errors, I'm talking about errors that are typically blamed on bad client behavior. Think unexpected connection resets, semantically invalid HTTP requests, syntactically invalid HTTP requests, and the like.

These are issues caused by how the client communicates with the server, rather than by the server's core application logic. They're often handled at a lower level of your HTTP stack, and logged and handled separately. 4xx errors often aren't included in default metrics, invalid or disconnected HTTP requests often don't get a response at all, and many of the raw errors these trigger will be handled and swallowed by your HTTP server or framework. These are near-invisible failures.

They're ignored usually simply to manage the noise. There really are bad clients out there, from bots to old browsers to individual users doing quick tests with cURL, and you don't want to hear about their problems. However, in many cases you control the client for your application — be it your mobile app, your single-page web application, or other servers within your own infrastructure — and failures in communication with them means your product is broken for your users. Even when you're producing an API used by 3rd parties, those 3rd parties are often your customers, and those client errors are hurting their experience of your product, regardless of the cause.

Your users do not care about layers of your software stack. From their point of view, your software either solves their problem or it's broken. If it's broken because of an error in a client, be it their browser or their phone or the JS you've delivered to their device, it's just as broken as if the server threw an exception. Monitoring and reacting only to explicit server errors, simply because they're easier to spot, is a classic example of the streetlight effect, where attention is focused on the issues that are easiest to see, rather than the issues that are most important.

If lots of your HTTP clients suddenly start hitting errors, as the person responsible for the server, you want to hear about it, and right now, many teams won't.

Let's look at some examples, to make this more concrete:

TLS setup errors

If you're running an HTTPS service, the first thing any client does when they connect is negotiate a TLS connection, creating a secure channel with your server that they can use for their request. This can fail.

There's a few ways this can fail:

If your certificate expires. Automation with services like Let's Encrypt helps with this, but it's not sensible to assume they're infallible. You may also see this if the client's clock is wrong - on the web that might be their problem, but if your client is another server in your infrastructure then it's definitely something you want to be aware of.
If your clients' certificate validation requirements change. In 2018, the latest Chrome released started requiring Certificate Transparency for all certificates. In September 2020, Apple will stop trusting certificates with lifetimes longer than 398 days. The rules for a 'valid' certificate are inconsistent and subject to change. When they change, new HTTPS certificates issued in exactly the same way to previous ones will be invalid and non-functional.
If your clients' TLS requirements change. Your server has configuration defining which TLS versions and cipher suites it supports, as does every TLS client. If the server & client can't agree on a common configuration then TLS connections will fail. Updates to your servers or updates to clients can make browsers, API clients and mobile devices silently incompatible with your server.
If your certificate authority (CA) becomes untrusted. In 2018, all certificates signed by Symantec's CA or any of its intermediate CA brands (e.g. Verisign, GeoTrust, Thawte, RapidSSL…) were distrusted by all major browsers. If you were one of the sites using those certs, a huge proportion of web browsers started rejecting your certificates almost overnight.
If your certificate is revoked. If your private key is leaked, you need to revoke your certificate, and clients should all stop trusting it immediately. In addition, at times CAs make mistakes, and have to revoke active certificates en-mass. Revocation checking is hit-and-miss in a few ways, but can definitely result in your certificate suddenly being rejected by clients.
If you screw up certificate pinning. With HPKP in browsers (now deprecated, though still generally supported) or certificate pinning configuration in mobile apps, a client can be configured to only trust a specific certificate. If your server starts using a different certificate, serves an incorrect HPKP configuration, or if a mobile app is misconfigured, your TLS setup will be rejected.

In any of these cases, those requests fail and your product is broken, but no server-side error appears. This is bad. Fortunately, it's detectable.

Firstly, TLS handshakes can be terminated by a fatal TLS alert (and MUST be, in some cases) with codes defined to describe the various possible issues, e.g. 42 bad certificate, 44 certificate revoked, 45 certificate expired, etc. These are alerts sent from the TLS client to the server before the handshake is rejected, and in most cases your server will already receives these alerts without doing anything. With most web servers, you can subscribe to these directly or log them automatically, and then include them in your metrics and reporting.

Unfortunately, not all clients will close all connections with clear alerts like this when they're not happy. In many cases, clients will simply close the connection once they receive unacceptable TLS data, or complete the handshake 'successfully' but then immediately close the connection without sending any data. That brings us to our next error:

Unexpected connection resets and closes

Once a client has connected to your server, it's supposed to send its request (GET /hello), listen for the response, and then cleanly close the connection (ignoring keep-alives for a moment).

If that connection is immediately reset or closed, before a request is sent, it's likely that this is one of the above TLS setup issues.

There are other cases here too though, where the request will be closed earlier than expected in that process, like:

User client connectivity issues (perhaps interesting in aggregate, but unlikely to be important individually).
Connectivity issues in your own infrastructure, perhaps between caches or reverse proxies and your servers.
Issues where certain statuses or header values crash the client outright, killing the connection before the response can be completed.
Broken mobile apps or other API clients, which mishandle their outgoing connections.

Except for the HTTPS case, the causes of these disconnections can often be unclear, and many of these are just noise. Nonetheless, these cases are very easy to detect, and in aggregate this data can help to pinpoint server issues and spot broken clients far earlier than you would otherwise.

Semantically invalid HTTP requests

Clients can send HTTP requests that are structurally valid, but make no sense.

Perhaps this might be an attempts to update a user who doesn't exist, or to set a completely invalid property on some resource. Requests for invalid paths, requests with the wrong method, or requests with invalid authentication parameters all fall into this camp. In each of these cases, the server does understand the raw content of the client request, but your application logic can't or won't do what it's requesting.

These requests should result in 4xx status code responses. In many cases though, these are tracked completely separately from 5xx server error responses, and largely ignored, though many of these are interesting!

Clients sending semantically invalid requests to your API implies a bug in either the client or server. Perhaps the client is using a endpoint that you've removed, thinking it was unused. Perhaps the client is genuinely using the API wrong, or perhaps your server is configured incorrectly and is rejecting valid requests.

In each case, these are clearly real bugs, and are either your problem and need urgent fixes (for 1st party clients), or these highlight issues in your documentation, SDK & examples (for 3rd party API clients).

The main exception to this is 404 errors from browser clients and crawler bots. These are common, it's easy to get overwhelmed if you start paying attention to them, and they are often just noise. That said, it's worth tracking the URLs that most often trigger such 404 errors, and skimming the top of that list occasionally, to spot broken links and URLs in your service.

Syntactically invalid HTTP requests

Clients can send HTTP requests that make no sense whatsoever. Instead of GET /index.html HTTP/1.1 they might send non-ASCII binary data, or some other unparseable gibberish, such that the server cannot understand what they want at all.

These generally imply some lower-level failure of basic communications expectations. Some examples:

Sending HTTPS traffic to a server that only accepts HTTP
Optimistically sending HTTP/2.0 traffic to an HTTPS server that only supports HTTP/1.1
Somebody sending you traffic that isn't HTTP at all
Headers longer than the maximum header length your server will accept
Invalid content-encodings, content-length or transfer encodings for a request body
A body containing content with the wrong content-type, which can't be parsed

All of this means that somebody is seriously misinterpreting what your server expects to receive. That usually means a major bug in either the server or the client, and these can have serious consequences.

Overlong headers are a particularly interesting example. Although the HTTP spec doesn't define a maximum, in practice most servers have a limit on the length of headers they'll accept in a request, and will reject requests immediately with a 431 response if they exceed this. Apache defaults to 8KB, IIS to 16KB, and Node.js recently reduced theirs from 80KB to 8KB as a security fix.

It's surprisingly easy to go over this limit, particularly if you're setting a few large cookies or using a metadata-heavy JWT for authentication. If that happens, then when your users tick over the limit their requests will all be suddenly, inexplicably and silently rejected. On almost all servers this is a simple configuration change to fix (or of course, you could stop sending so much metadata in your requests), but if you're not logging client errors then you won't notice this on the server side at all.

This is particularly bad for cookies, since these can accumulate and many will be set for a long time, this rarely comes up in automated testing, and the end result is to effectively lock the user out of the service indefinitely & invisibly. Oops.

You'll also see errors like this in broken server configuration, for example if you accidentally disable HTTP/2 on a server that previously supported it, or if your request body parsing isn't capable of handle all valid inputs.

Each of the other cases suggests a major bug, somewhere in the server or client implementation. Something is very wrong, the server definitely has the details, and you should look into that.

Collecting Client Feedback

There's a lot of things that can go wrong in a client's requests. Fortunately, in all of these cases your server already knows this is happening, it's just not telling you about it. Most server frameworks don't report on client errors like these by default:

Node.js & Express won't report or call error handlers for most client errors automatically, and you need clientError (for HTTP errors) and tlsClientError (for TLS errors) listeners to hear about them.
Apache and Nginx won't log TLS handshake issues like other errors, unless you explicitly configure them to do so.
Puma (the most popular Ruby server) has a separate error handler for all low-level (non-application) errors, separate from the error handling in your Rails/Sinatra/etc application.
AWS's API Gateway automatically parses and handles many types of client error for you, making them invisible to your application code.
Microsoft's IIS has a separate log for all HTTP errors that it handles outside the application, from connection resets to parsing issues to TLS failures.

You get the idea.

This isn't a hard problem to solve: the servers have this information, but they often don't include it as part of normal error logging & handling, simply because these errors can be irrelevant or noisy. That's not an unreasonable default to start with, but once you have an application in production and you really care if it works, it's good to look into these.

On the other hand, that definitely doesn't mean you want to get a notification for every single client error, or even for every spike in errors, but tracking metrics to spot patterns and enabling notifications for specific classes of these errors can be useful. For example:

Even a small spike in certificate rejections or malformed requests suggests a major configuration bug has been released somewhere.
Graphing unexpected connection closes & resets can be another easy way to spot TLS issues, and get a better understanding of your users' overall experience of your product.
Receiving notifications for any 431 Request Headers Too Large errors is probably a good idea, and potentially other 4xx errors depending on your application, as these can otherwise hide serious & invisible client failures.
Recording and occasionally checking on your top 404 URLs can highlight interesting cases of broken links or client bugs.

The specific cases that matter to your application will vary, and there will be noise that you want to ignore too, but ignoring all client errors completely is rarely the right balance.

Lastly, in addition to monitoring these on the server, where possible of course it's also good to have tests in places for your client applications, and to monitor them in production too. For many cases that isn't possible though (for 3rd party clients and applications merely using your SDK), that may comes with serious privacy risks (for clients running on user devices), and reporting client errors from the server directly can make issues more directly available to the team best placed to deal with them. Collecting these on the server side is easy, and solves this for all possible clients out of the box.

A Worked Example

To wrap up, let's see how this looks in practice.

In my own case, I've been integrating HTTP client error reporting into HTTP Toolkit. HTTP Toolkit intercepts HTTP connections for debugging, and already highlights common cases like TLS errors (to easily spot clients that don't trust the MITM certificate yet), but I recently discovered that many of the client errors listed here were hidden, or not fully reported, making it hard to inspect all client behaviour. This is now fixed in the underlying open-source proxy library, so all of these errors will be fully surfaced in the next HTTP Toolkit server update.

How does this work?

For TLS errors, we just listen for tlsClientError events on the HTTP server. That's super simple:

server.on('tlsClientError', (error) => recordClientError(error));

As mentioned above, there's also the case of TLS connections that reject silently, by connecting 'successfully' then disconnecting immediately without sending anything. This is a fairly common pattern for clients who don't trust your HTTPS certificate for some reason. To spot those, you'll want something like this:

// Takes a new TLS socket, calls the error listener if it's silently closed
function ifTlsDropped(socket, errorCallback) {
    new Promise((resolve, reject) => {
        socket.once('data', resolve);
        socket.once('close', reject);
        socket.once('end', reject);
    })
    .catch(errorCallback); // Called if 'close'/'end' happens before 'data'
}

// Check for this on all new connections:
server.on('secureConnection', (tlsSocket) =>
    ifTlsDropped(tlsSocket, () =>
        recordClientError(new Error("TLS connection closed immediately"))
    )
);

Those two quick checks should let you record and report on most HTTPS issues.

You'll also want to catch non-TLS client errors. To do so, you're looking for the clientError event:

server.on('clientError', (error, socket) => {
    recordClientError(error);

    // By listening for this, you take responsibility for cleaning
    // up the client socket. Here's the equivalent of Node's default
    // implementation for that:

    if (socket.writable) {
        if (error.code === 'HPE_HEADER_OVERFLOW') {
            socket.write(Buffer.from(
                "HTTP/1.1 431 Request Header Fields Too Large\r\n" +
                "Connection: close\r\n\r\n"
            , 'ascii'));
        } else {
            socket.write(Buffer.from(
                "HTTP/1.1 400 Bad Request\r\n" +
                "Connection: close\r\n\r\n"
            , 'ascii'));
        }
    }
    socket.destroy(error);
});

Easy peasy.

Make sense? Agree, disagree, or hate the whole concept? Get in touch on Twitter or send a message directly and let me know.

HTTP 555: User-Defined Resource Error

Fri, 27 Mar 2020 13:30:00 GMT

Does the rise of serverless mean we need a new HTTP status code?

The team at Oracle think so. They've submitted a draft specification to the HTTP Working Group, defining a new HTTP status code (initially suggesting 555) to be used for server-side errors caused by user-supplied resources.

[Note: I'm going to use 555 to refer to the new proposed code everywhere here, but this is not standardized, even if it is standardized in future it will probably use a different code, and you 100% should not start building anything that uses this code for real anywhere. Nobody needs another 418 I'm A Teapot battle.]

Anyway, let's talk about what this means, and whether it's a good idea.

Status codes: a refresher

First let's recap the background. Status codes are 3 digit codes, included in every response from an HTTP server, which summarize the result of a request.

There's common examples you'll have heard of like 404 (the resource you requested could not be found) or 200 (your request was successful, and this response represents the result). There's then a long list of less common examples, like 302 (the resource you requested is a temporarily stored elsewhere, you should go there instead), 410 (the resource you requested was here, but now it's gone, and there's no new address available), or 100 (yes, please continue sending the rest of the request you're already sending).

And many more. They're categorized into a few classes:

1XX: Information

These are provisional responses, which typically don't fit into the simple HTTP request/response flow, and describe unusual behaviors like interim responses or switching the connection to a different protocol.

2XX: Success

The request you asked for was successful in some way. Perhaps you're getting the resource you asked for (200), the server has accepted and started asynchronously processing your operation (202), or your request was successful but the server doesn't have any data about it for you (204).

3XX: Redirection

Your request is valid, but the response you requested requires you to take some action elsewhere. Perhaps the resource you want is currently only available elsewhere (301/302/307/308), or your request indicated that your cache is up to date, and already has the data you need (304).

4XX: Client error

You, the client sending the request, have done something wrong. Maybe the server can't understand you at all (400), you're not authenticated for the thing you're asking for (401), there was a conflict between your request and the current state of the server (409), or you've made too many requests recently and the server is rate limiting you (429).

5XX: Server error

Your request seems valid, but the server receiving your request can't deal with it for some reason. The entire service might be unavailable (503), an upstream server the server wants to pass your request too might have failed (502), or server might have broken in some way it can't explain (500).

These classes are well defined and widely understood nowadays, and very unlikely to change in future. It is however possible and likely that new individual codes within each class will be needed, and there's some details of how that should work in RFC 7231.

These were designed to be extensible from early on, and any client that receives an unrecognized YXX status is required to treat it a Y00. For example, clients that receive 555 responses and don't understand them are required to treat them as 500 responses. Whether they do in practice of course is a separate question…

Errors as a service

Back to the proposed status code 555. Why do the Oracle team want it?

Oracle are building a service called Oracle Rest Data Services, a developer platform designed to let you quickly create REST APIs for your Oracle databases. You define the endpoints and the SQL, and they generate the API (I'll carefully avoid discussing whether this is a good idea).

They're in good company here - the developer market for cloud-hosted software platforms has exploded in the last couple of years, with a massive range of serverless providers and related tools appearing everywhere, from AWS Lambda to Firebase Cloud Functions to Cloudflare Workers to OpenFaaS.

In each case, developer platforms like these hide all server concerns and mechanics from you, and provide you with a clear interface and set of expectations to against which to write code. You provide the core business logic for your API, and they do all the server management & heavy lifting.

Sometimes though, this can go wrong. Your code can fail completely: not just fail to run an operation and return an explicit error status, but entirely fail to fulfill the core expectations required by the platform. Perhaps your SQL statement for Oracle RDS is fundamentally broken, or your code crashes before registering the handler that Lambda expects, or calls the callback with gibberish, or your worker totally runs out of memory.

In these cases the platform needs to tell the client sending the request that something has gone wrong. That is definitely some kind of 5xx error. It's not the client's fault, and something has gone wrong on the server end. But which 5xx error?

Here's the list of standardized codes we have to pick from:

500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported
506 Variant Also Negotiates
507 Insufficient Storage (WebDAV)
508 Loop Detected (WebDAV)
510 Not Extended
511 Network Authentication Required

500, the maximally unspecific option, is the only real candidate.

Unfortunately, the platform can also fail in unexpected ways, and the most appropriate status code for those is also 500.

What Oracle are arguing is that these two cases (the platform failing and the platform customer's logic failing) are generic & widely relevant cases, and that they are semantically different from one another in an important way. They want to differentiate these cases by status code. It's a good idea to standardize a status code to do so if the specific case is often going to affect clients' interpretation of the response, and it's a widely relevant distinction, so there are many other services who hit equivalent cases and would use the status codes to differentiate them.

The spec itself and the email submitting it have more detail on their reasoning here, and how they propose this works.

So, the big question: do we really need a new status code for this?

Is this widely relevant?

I think there's a strong argument that these are types of error that are relevant to a huge & growing set of services and clients.

There are many PaaS providers now where this could happen, and they're increasingly popular. As of 2018, Lambda was running trillions of executions every month. DataDog did an analysis of their customer's infrastructure in Feb 2020, and half of their AWS users are using Lambda, an adoption rate that increases to 80+% in large and enterprise environments. At the end of 2019, more than 2 million mobile apps were communicating with the Firebase platform every month. Cloudflare launched Workers in 2018, and according to their S-1 filing by mid-2019, more than 20% of all new Cloudflare customers were using it.

These specific platforms won't last forever, but the running-your-code-within-a-managed-platform model seems likely to be a long-term fixture.

There's a lot of these platforms around, a lot of services running on them, and a lot of HTTP requests communicating with those services. All of these platforms fail sometimes. Errors from the platforms themselves are a real & widespread issue.

While it's tough to get hard numbers, it's also easy to be confident that the code hosted by these platforms often fails too. It's pretty clear that the both the hosted code errors and platform errors are real cases that are widely relevant to a lot of modern HTTP traffic.

Is this semantically important to differentiate?

This is less clear. Even if these errors do happen widely, do we really need a separate HTTP status code for hosted logic errors and platform errors?

There's definitely an argument that it's a pure implementation detail, and the client doesn't care. The server hit sent an error and something broke. HTTP status codes shouldn't depend on the infrastructure choices used by the service, they should just tell the client details about their request.

At the same time, there are other existing 5xx codes that explicitly tell us about implementation details of the failing service, when it's widely useful to do so. 502 and 504 both declare that the service is internally dependent on another server, and the second server has failed (502) or timed out (504), but the server you're talking to is functioning correctly. Meanwhile 506 tells us that the internal configuration of content negotiation in the server is broken, placing the blame on that specific part of the server's implementation.

The gateway errors are a pretty similar case to these platform errors, but directing blame at the "which server" level, rather than the "which organization" or "which level of the stack" level that we're considering here. It's common that requests go through a CDN or reverse proxy of some sort, and when that fails it's often useful to know whether it's the gateway server that has failed, or the actual service behind it, so we have error codes to distinguish those cases. This would be similar.

In practice though, would this really be useful?

The AWS Lambda outage thread above has a nice quote:

Same here too! Getting "Service error." when I make requests to my functions.. Not good aws! I spent a good amount of time thinking it was my mistake since I was working on some of my functions :(

This is the situation we're trying to disambiguate. Is the platform failing somehow, or is the hosted code broken?

This is clearly a meaningful distinction for the developers of the service (i.e. the customers of the platform), like the commenter above. When their service starts serving up errors, their understanding of the response and their next steps are completely different depending on which part is failing. Clearer status codes mean fewer sad emojis.

It's also an extremely important distinction for the platform provider (i.e. AWS/Oracle/Cloudflare/Google/etc). They'd like to be able to monitor the health of their platform. To do so, they're very interested in failures caused by the platform, but largely uninterested in failures caused by the hosted code within. It's easier to set up monitoring tools and automation to report on status codes than it is to parse detailed error information from the response itself. It's also valuable to them because it clarifies the situation to their customers (as in the quote above), and so avoids unnecessary support requests.

Oracle dig into this in their submission:

When such a resource raises an error the only appropriate HTTP status code to use is 500 Internal Server Error. This causes confusion as it looks like there is something wrong with ORDS, where in fact there is only something wrong with the user supplied SQL. Despite explanatory text to clarify this, operators and users very often miss this distinction, and file support issues against ORDS. Further, automated tools that monitor the access log only see the 500 status code, and thus cannot differentiate between 'real' 500 errors in ORDS itself that need remediation versus 500 errors in the user supplied REST APIs that probably do not need any remediation.

Still, the developers of a service & the platform hosting the service are not the main clients of a server.

I do think differentiating these two cases is also arguably useful as a client of an API though, uninvolved in the implementation behind it.

This is a debatable point. It is really only relevant to API clients, as a technical audience, rather than browser visitors, but API clients are still important consumers of HTTP responses. For those clients, a platform failing entirely is a meaningfully different case from the service they want to talk to failing. It affects who they should contact to report the issue, how they should categorize it in their own error logs, which status pages to monitor to know when the issue is resolved, and what kind of expectations they can have for the resolution of the issue.

As with gateway errors: when multiple parties are involved in a failing response, it's useful for HTTP clients to be able to tell who's fault it is from the outside.

Is this the right solution to the problem?

Ok, let's take as a given that this is a widespread case that it's often important to distinguish. Is the 555 status code described the right way to do that?

One alternative would be to distinguish these cases in an HTTP header or response body of a 500 response. That's not quite as easy to integrate into much automation though, and less visible for something that is (Oracle would argue) an important distinction in the failure you're receiving. As a platform, if you want your customers to more clearly understand where errors come from, you want it to be obvious.

Unfortunately, there's one big reason the 555 status code as proposed isn't viable anyway: for most platforms, it doesn't make 500 errors any less ambiguous.

The issue is that for many of these platforms it's possible for hosted code to explicitly return a 500. This is a problem. If 555 is defined to mean "the hosted code crashed", that means that 500 now means either "the hosted code explicitly returned a 500 error" or "the platform crashed". That makes it useless. Users can't spot platform issues by looking for 500 errors, and similarly platforms can't monitor their own status by doing so, which means the differentiation is pointless. This is bad.

It's fixable though. Instead, we can just flip the proposal on its head, and reserve 555 for platform errors, rather than errors in the hosted logic. I.e. if the platform fails in any unknown way, it should return a 555. Platforms just need to watch their monitoring for 555 errors, and developers & API clients can know that 555 errors are always caused by the service's platform, not the service itself, so everything except 555 is semantically related to the service.

I suspect in Oracle's case they missed this simply because it's not relevant to their platform; their hosted code doesn't appear to be able to directly set the status, just the data, so it can never explicitly return a 500. It's definitely relevant for other platforms though, from Lambda to Firebase, so without this the spec is probably unusable.

Do we really need a new status code?

Even if we flip this proposal to define a 555 "Unknown Platform Error", given all the above: do we really need this?

It's hard to definitively answer. I do think there are legitimate arguments for and against, and I don't think it's 100% clear cut either way.

The real test is whether the rest of the ecosystem displays any interest. If this is a status code that only Oracle care about, then it really doesn't need formal standardization. On the other hand, if AWS or other platforms or API clients do start displaying interest, then maybe it's honestly a widespread and semantically meaningful class of errors. You can debate the theory all you like, but HTTP, like most standards, is intended to be defined by what's important for real use cases in the wild, not just what one company wants to implement today.

We'll have to wait and see.

In the meantime, if you want to keep an eye on this and other HTTP developments, subscribe to the IETF HTTP Working Group mailing list for more thrilling specs and debate, or just subscribe to this blog below, and I'll write up the interesting parts.

Chrome 79+ no longer shows preflight CORS requests

Thu, 13 Feb 2020 16:25:00 GMT

Chrome 79 brings some important changes in its CORS implementation, rolling out now, which mean that CORS preflight OPTIONS requests will no longer appear in the network tab of the Chrome developer tools.

CORS?

Cross-Origin Resource Sharing (CORS) allows web servers to tell browsers which web applications are allowed to talk to them.

This applies when a web application tries to send a request to a server with a different origin, for example a page hosted at https://example.com tries to make a request to https://api.mybank.com. For simple requests that are defined to not cause side effects, the browser will make the request, but examine the Access-Control-* headers on the response from the server before allowing the web application to read that data.

For more dangerous requests, which could trigger an action on the server, the browser sends a so-called "preflight" request. Before sending the real request, it sends an OPTIONS request to the server that includes Access-Control-Request-* headers describing the method and any restricted headers that the application would like to send. The server then responds with a response including its own Access-Control-* headers, which tell the browser whether or not this is allowed.

If it's allowed, the browser goes on to send the real request, if not then the application isn't allowed to make that request, so it fails.

Phew, make sense? This is just an outline of CORS, there's quite a bit more detail available in MDN's docs. It trips up quite a few people, and checking that you've done it securely on the server side (i.e. you're not allowing other malicious web applications to do or read things they shouldn't) is harder still.

Changes in Chrome 79

In Chrome 79, a new flag was added:

If you're running 79+, you can see this on the chrome://flags page. It appears that this was disabled by default at the release in December 2019, but it's intended to be enabled incrementally over the weeks from January 6th 2020, which brings us to approximately today, where people are seeing this for themselves.

When this flag is enabled, the CORS handling logic is moved entirely out of the core Blink browser engine. In general this is a good thing - CORS is a core security feature, browser engines are very exposed to untrusted remote inputs, and trying to isolate the two from one another is a great move for security.

In practice, the main visible change from this is that CORS preflight requests will no longer appear in the Chrome developer tools network tab. That means debugging CORS - already tricky - just got quite a bit harder, because these requests are going to be completely invisible to you.

They'll also no longer be considered as a separate entry by the resource timing API.

There's a bit more background on this from Mike West on the Chrome security team:

We moved CORS checks out of our renderer process to (among other things) ensure that we’re not exposing cross-origin data to Spectre, et al. In the short-term, this is a pain in the ass for developers, and I’m sorry for that. I do hope it’s temporary. - https://twitter.com/mikewest/status/1227918108242989056

Judging from the bug discussion there's a bit of an outline on how this might be resolved in future whilst keeping CORS outside Blink itself, but not a lot of progress or detail yet, so I wouldn't bet on this changing any time soon.

What can I do about this?

Cheeky plug: you could debug Chrome's HTTP traffic with HTTP Toolkit instead. HTTP Toolkit lets you collect all traffic the browser sends, even for CORS requests (or any other requests) that happen outside the core renderer process.

One-click setup to start intercepting Chrome, and then you can see literally everything, with a far nicer UI than the network tab to boot:

There are other options too though:

You can manually disable this flag in your browser on the chrome://flags page, but do be aware that this non-Blink CORS implementation does have some different behaviour compared to the Blink one (see the design doc). If you want to see the same thing as your users, you probably don't want to leave this enabled all the time.
You can take a NetLog dump from Chrome, to log the full requests and examine them elsewhere.
You can test with another browser, like Firefox.
You can use hosted HTTP request recording & reporting tools, like WebPageTest.
You can use any other standalone HTTP debugging tools, like Fiddler or Charles, which should also still be able to collect this traffic.

When you do start seeing CORS requests failing for no good reason though, none of these are quite as convenient as being able to check the preflight inline…

Want to see & explore all your HTTP traffic? Get started with HTTP Toolkit now.

Global developer CAs considered harmful

Tue, 14 Jan 2020 13:00:00 GMT

Certificate infrastructure is built on trust. If you trust the wrong thing, it all falls down. Unfortunately, we developers do it all the time anyway. YOLO.

A remarkable number of dev tools & practices encourage or require you to globally trust a certificate authority (CA) that they provide or generate locally. If you do so these, anybody with access to the key for that CA can rewrite any HTTPS traffic between you and anything, and take almost complete control of your internet traffic.

We don't need to do this. These tools could easily work without globally installed CAs, and they open you to unnecessary risks by not doing so. We can do better.

Who does this?

Lots of different dev tools, for a couple of different reasons.

First, there's a selection of tools that generate HTTPS CAs & certificate for local development servers, and trust them for you globally & automatically. That lets you easily run a local HTTPS server on a hostname where certs aren't otherwise available, like localhost or other local hostnames (my-docker-container:8080). That's often useful because more and more web features are limited to only HTTPS origins.

The tools doing this include:

mkcert - a go-based CLI tool that generates a CA and then automatically trusts it everywhere, recommended for HTTPS setup with everything from ASP.NET to Python. $ mkcert -install Created a new local CA at "/Users/filippo/Library/Application Support/mkcert" 💥 The local CA is now installed in the system trust store! ⚡️ The local CA is now installed in the Firefox trust store (requires browser restart)! 🦊
Devcert - an npm module which creates a self-signed CA, and automatically trusts this CA globally in the OS keystore and in all browsers it finds.
Gatsby - a static site framework. Running Gatsby with --https will generate a CA, prompt you for your password, and then trust it automatically & globally, in both your system cert store and every browser store it can find.
Create-React-App - the official toolchain for setting up React single-page app projects. To their credit, they don't explicitly tell you to trust the cert system-wide, but they don't tell you what to do instead, and the internet is full of guides telling you how to trust it globally.
Generator-Office - a popular template for building MS Office add-ins. Generates & prompts you to install a system-wide root CA for local HTTPS development, without the slightest warning about what that means.

In addition to local HTTPS servers, there's also a world of HTTPS debugging tools, in a similar space to HTTP Toolkit itself. These tools let you intercept, inspect & rewrite HTTPS traffic between a client and a server, for testing, debugging & prototyping. They typically intercept HTTPS traffic from your whole system, and require/strongly encourage you to trust their CA certificates globally to do so.

There's a host of examples of these, from Fiddler on Windows, to Charles on Mac, or Burpsuite (Java, cross platform), all following that same pattern.

A few of these server & debugging tools do recognize that there's risks inherent in this (although they do it anyway). Warnings abound:

Mkcert:

Warning: the rootCA-key.pem file that mkcert automatically generates gives complete power to intercept secure requests from your machine. Do not share it.

Devcert:

This exposes a potential attack vector on your local machine: if someone else could use the devcert certificate authority to generate certificates, and if they could intercept / manipulate your network traffic, they could theoretically impersonate some websites, and your browser would not show any warnings (because it trusts the devcert authority).

Burp Suite:

If you install a trusted root certificate in your browser, then an attacker who has the private key for that certificate may be able to man-in-the-middle your SSL connections without obvious detection, even when you are not using an intercepting proxy. To protect against this, Burp generates a unique CA certificate for each installation, and the private key for this certificate is stored on your computer, in a user-specific location. If untrusted people can read local data on your computer, you may not wish to install Burp's CA certificate.

If only we could avoid these warnings somehow, and do something safer instead…

What could possibly go wrong?

Doing this but showing warnings isn't the right choice. Firstly because it's rarely necessary at all (we'll get to that), but mainly because nobody reads warnings. Silently creating new risks on your machine and then waving away concerns with "well we warned you" is a touch rude. We should all encourage our users to do the safe thing by default, so far as possible.

As a case in point, both Preact-CLI & Webpack-Dev-Server support automatic local HTTPS, so that users can easily use HTTPS locally by trusting their CA certificates. In 2017 it was discovered by Mike North that both projects were using a shared default CA whose HTTPS key & certificate was published as part of the tool.

Anybody who trusted the certs from either tool before June 2017 and hasn't heard about this now trusts a compromised CA, which means they 100% trust pretty much anybody to send them anything, until that CA expires. For webpack-dev-server, that CA cert is valid until 2026, 10 years after it was issued. Oops.

It's hard to get precise numbers on how many users this affected, but the Webpack-Dev-Server package was installed 27 million times before the fix for this was released, so even with conservative estimates this was Very Bad.

To quote Mike's disclosure:

As a result of this vulnerability, an attacker could very easily and reliably eavesdrop on, and tamper with HTTPS traffic across ALL DOMAINS, undetected. Essentially HTTPS is completely compromised, and no data is secret or safe anymore.

This happened because the private key was shared & fully public. All the tools above generate fresh CA certificates & keys on every developer's machine, so they're not immediately vulnerable to this specific issue. They avoid the worst case, but this shows what happens if users keys are exposed, and just because we've fixed the worst case, that doesn't make that risk go away.

In fact, in some ways it's got even worse since 2017, with the new Signed HTTP Exchanges feature (live in Chrome 73). In the past, abusing a trusted CA certificate would require you to intercept a machine's HTTPS traffic to the target domain. Nowadays you can instead generate signed HTTP exchanges from domain X using an exposed CA, then host that exchange on your own domain Y. Anybody visiting & trusting the CA will treat that as real traffic from X. Now domain Y can now run JS in domain X's origin, or poison its cache to inject code later. Drive-by attacks, no interception of traffic necessary.

And then there's everything else on your system that trusts your installed CA certificates. I wouldn't be surprised if this worked to defeat code signing checks, attack many app's automated update processes, and make you vulnerable in lots of other places. You can be too trusting.

What're the chances?

Once you have a CA certificate like this installed, all certificate-based security on your machine becomes contingent on you keeping the private key of that CA secret. That's easier said than done, especially given that many users are unaware of the risks, where the key is stored, or even that it exists.

To use a CA like this locally you generally need the CA private key in a user-readable file, so you can use it (or you need to run your dev tools as root, which brings fun new risks). That means you have a user-readable file on your computer that is a catastrophic security risk if it's ever exposed.

If somebody ever gets access to your machine, if you accidentally commit this file to a project, if you back it up to the wrong place, or if a rogue app on your machine quietly reads it, you now have a huge & invisible problem.

As an industry we spend a lot of effort elsewhere avoiding exactly this kind of risk. We password protect our SSH keys, we salt & hash passwords in databases, and we encrypt our password manager's database on disk. You don't want a "if you can read this, you own my computer" user-readable file on your computer. You definitely don't want tools to generate one for you and quietly & automatically store it somewhere.

To be fair, it is possible in the HTTPS devserver case to have a CA key that's only root-readable, and to cache user-readable certificates & keys per-domain instead. A tool then needs to prompt for sudo/admin rights for initial setup for each domain, and when certs expire. That definitely helps, but only for servers; it's not practical for debuggers, where you might talk to many domains unpredictably.

Even for servers this isn't perfect: only root-readable doesn't mean "100% secure forever", each per-domain key still needs to be user-readable so remains vulnerable, and your dev tools need to prompt you at intervals for authorization as your certs expire, which isn't great either (and potentially opens a UX hole for others to steal the root-readable CA key).

That brings me to one of the consequences of this whole mess: because refreshing global CAs is hard, managing global CAs is much easier with long-lived certificates, which makes things even worse if they ever do get exposed. Your dev tools do not need certificates that last 10 years.

Even if the CA key is never exposed, at the very least you're giving random developer tools you just downloaded permission to inspect & rewrite everything you ever see and do on the internet for years into the future. That should sound scary - as we saw with Webpack & Preact-CLI, great dev tool authors are not necessarily security experts.

Fundamentally, the problem here is that globally installing CA certificates for local development tools violates the principle of least privilege. Nobody uses these tools intending to redefine their long-term definition of trust for the whole internet, but that's what they're doing. Instead, we should trust these tools to verify traffic to the specific domains we care about, for the client we're using, whilst we're using the tool.

Fortunately, we can do exactly that.

How could this work better?

For 99% of cases you don't need to trust these CA certificates globally. When running an HTTPS server, you only need to trust it in your HTTP client whilst the server is running, and only for that one domain. When debugging a client, you only need to trust the certificate within that client, during that one debugging session.

We can and should do exactly that. The real only argument against this I've seen is that it's hard, but that's really not true, even before you compare this with the complexity of automatically elevating privileges & globally installing certificates for every possible browser & OS.

Let's talk about browsers, for example. If you're running a local HTTPS server for development, that's probably your client of choice. To trust a CA in Chrome temporarily for that one Chrome process, without affecting other or future Chrome processes, and without trusting anything system-wide, you need to:

Get the certificate's fingerprint.
- Easy to do in lots of tools & libraries, from the openssl CLI to node-forge
Start Chrome with --ignore-certificate-errors-spki-list=$FINGERPRINT

This is not that hard. That same option also works out of the box for other Chromium-based browsers, from Edge to Brave.

Firefox doesn't have one single option to trust this certificate, but you can create a new Firefox profile, trust it manually there (or automatically using certutil), and then use that profile just when you need it. Setting this up and selecting the profile is automatable, and in fact devcert & mkcert already do most of the setup (but against your global user profile).

For non-browser HTTPS clients, there are good options too. In most cases, certificate trust can be configured just with an environment variable. That means you can set this variable in the script you use to run your tests or start your client, and you're done.

For starters, for anything that uses the OpenSSL defaults you can set the SSL_CERT_FILE environment variable to point to the certificate path. This covers many apps and popular tools out of the box, from curl to apt-get, notably including most code running in Ruby, Python, PHP, and other similar languages.

There's a few other special cases, depending on what you're building. Node.js ignores this variable, but provides its own NODE_EXTRA_CA_CERTS env var which works similarly. A few other specific libraries need their own configuration too, like Python's popular Requests library (REQUESTS_CA_BUNDLE), and Perl's LWP (PERL_LWP_SSL_CA_FILE). For Java you need to build a truststore for your app that includes the cert, which means running one command.

While this isn't trivial, it wouldn't be hard to bundle it into a zero-config package that did make this effortless. For any of these languages, it's very possible to temporarily trust a given certificate for effectively all HTTPS traffic, with no global side effects or long-term risk.

As a last step, we could also limit the power of the CA certificates themselves. Support for the Name Constraints certificate extension is rapidly growing. This lets you create a CA that's only trusted to sign certificates for a whitelisted range of domains, so that trusting a CA doesn't mean giving them a blank cheque for the whole internet.

With improvements like this, in 99% of cases we can directly trust the certificate only when and where we need it, from browsers to backend code to shell scripts to CLI tools. It's simply not necessary to globally trust certificate authorities just to use simple dev tools.

What should we do?

Ok, so this is happening, it's bad, and it's unnecessary. How do we fix it?

Tools that set up a local HTTPS server by globally trusting the CA should drop that, and instead add commands to start browsers that trust the cert directly, and encourage users to use that instead by default.
Automated tests & scripts that talk to local HTTPS servers should not require installing global certs, but should instead trust the certs just by configuring their clients, and only during the test.
HTTPS debuggers should take control of the clients they care about, and inject settings directly into them, rather than requiring global trust & intercepting all traffic from the entire system.
Anybody building logic that decides which certificates to trust should include an escape hatch like the above, rather than only trusting the OS store, and should implement name constraints.
Tools should aim to generate CA certificates with name constraints & short lifetimes by default. For many use cases you could even go further, and generate a fresh CA for every session.
Developers (and everybody else) should stop trusting & installing CA certificates globally without a really really rock-solid good reason that they fully understand.

Phew. Sound good?

Of course, a big part of why I'm writing this is my work in HTTP Toolkit solving the exact same problem. HTTP Toolkit is an open-source HTTPS debugger that has tools to intercept traffic from single clients, by injecting proxy settings & trusting certificates only where they're necessary, doing exactly the above. With this, you capture & rewrite only the traffic you're interested in, you don't have any of the global trust problems we've talked about here, and you never need to give anything any extra permissions.

That implementation is all open source, so if you're interested or working on something similar then go take a look at how HTTP Toolkit launches Chrome with interception, or all the env vars it uses to intercept arbitrary CLI commands.

Have any thoughts on this? Get in touch on Twitter, by email, or join the discussion on HN & reddit.

Free as in Beer

Mon, 06 Jan 2020 18:00:00 GMT

HTTP Toolkit is a bootstrapped commercial open-source project. It takes work from me, and it needs to make money to survive, but it's also directly powered by the feedback, testing & contributions of its users.

As you might imagine, this is a complicated balancing act. I want to talk a little about how and why this works now, and the next step in this direction.

Philosophy 101

To make this work, I've taken a few philosophical positions on how HTTP Toolkit should operate. First on the business model, and second on the licensing.

Freemium

HTTP Toolkit aims to be profitable by charging for features that power & enterprise users need, not by charging everybody. It doesn't hook you then cut you off, it doesn't nag you into submission, and it doesn't integrate itself into your life to force you to upgrade.

Instead I separate features by use cases & audience. The aim is to let free users get on with things unhindered, but ensure advanced users reward the project for the value they get from it.

This is a practical stance, rather than a purely charitable one. As a small development team, if your goal is to build great & popular software, you need a lot of marketing, and a lot of user feedback. Free & widely used software provides more feedback & testers, especially early on. As long as the project is sustainable enough that somebody can actually work on that feedback, this builds better software for everybody. Good free software then lends itself to word of mouth growth, some of the most effective long-term marketing possible for a small team.

All together, this is a automated virtuous cycle. More feedback makes better software makes more growth makes even more feedback.

As a model, bootstrapped freemium lets you pair sustainable businesses with a deep understanding & focus of your users. If more developer tools followed this, rather than gambling on VC for funding or going 100% freeware for growth, we might even trade shiny & transient for reliable & effective. Better tools for everybody.

That's not to say I've invented this by any means. I have a lot of respect for the others out there taking a similar approach, from Ghost to Insomnia to Basecamp, and many many others. It's a powerful model, especially when paired with open source.

Open Source

On top of all that, HTTP Toolkit's audience is very technical, which makes open-source uniquely valuable. For many of us technical people software is a changing and collaborative work, not a delivered appliance. We want to be able to fix issues we find, and change the tools we use to fit into our workflow.

Customization & plugins can help with this, but with limitations. Complete freedom to change your software removes those entirely.

Feedback from users of open-source code is often more valuable too, so this pairs nicely with the freemium model above. Users can dig into underlying causes themselves, and even offer feedback in the form of failing test code, bug fixes or feature implementations. This completely closes the traditional feedback loop, collecting feedback, implementations & feedback on the result all in one go.

Unfortunately, commercializing software often involves adding other restrictions (in licensing and implementation), which limit this. HTTP Toolkit aims to avoid that, and be commercial software without restrictions. Of course the benefits are equally valid for the free version and the Pro version, so it's open-source all the way down.

To avoid wholesale copycats, I've licensed HTTP Toolkit with a mixture of AGPL (for the top-level app code) and Apache 2 (for all standalone reusable libraries within). That mix means anybody can use the shared libraries (from the internal proxy to the react infrastructure to the indexed OpenAPI directory), and anybody can read & contribute to the main app. At the same time, any separate projects building directly on the core functionality of the tool must go AGPL too though, and share all their code back in turn.

New Decade Resolutions

So far, this has been going pretty well! I want to take it further though, and for 2020 onwards I want to get the community (you!) more involved. It's time to more directly reward those users who contribute, and involve them in the project itself.

What does that mean? It means that from now on HTTP Toolkit Pro is free for contributors to the open-source project.

More specifically, I'm giving out free Pro subscriptions in return for any accepted contributions that help HTTP Toolkit to develop & expand. That means things like:

Writing articles or blog posts about HTTP Toolkit (guest authors for this very blog welcome too!)
Contributing fixes, features or internal improvements to the core codebases, e.g:
The HTTP Toolkit UI
The HTTP Toolkit server
The HTTP Toolkit desktop shell
The HTTP Toolkit website & docs
Mockttp (the HTTP Toolkit proxy itself)
Or any other repo in the HTTP Toolkit github organisation
Reporting new bugs or security issues
Suggesting new & useful features

This is not an exhaustive list, and it's intentionally not limited to code contributions. Documentation, bug reporting & marketing are some of the most important contributors to any project's successs. My goal is to reward anything that helps drive HTTP Toolkit development or bring it to new people. I'd love to involve anybody else who wants to contribute in any way, so if you're not sure, get in touch.

Length of Pro subscription for different contributions will vary according to my whims, but as a guideline:

Big code changes (a new feature or bug fix) get a year's free Pro subscription.
Small code changes (small text tweaks & typo fixes, useful dependency updates) get 1 month's free Pro subscription.
Writing an article somewhere with a large audience (a popular blog or developer community) is a year's free Pro subscription.
Writing an article somewhere with a small audience (a little-known blog) is a month's free Pro subscription.
Reporting a substantial security issue is a year's free Pro subscription.
Reporting a bugs & suggesting features varies between a month & a year depending on the details.

To get your subscription set up, just send me an email with the details, from the email account you'd like associated with your new subscription, and I'll make it happen.

If you'd like ideas for contributions, take a look through the feedback repo or the issues attached to the above repos, play around with HTTP Toolkit for yourself to find rough edges or missing features, or feel free to get in touch for suggestions.

If you look closely, all this is just an extension of the philosophy above! It's an expansion of the free tier - if you're a contributor to the project, HTTP Toolkit is now 100% FOSS for you - and a use of open source to feed that freemium virtuous cycle, and push the project further & higher.

Make sense? Have any thoughts on this? Get in touch on Twitter or by email.

How to Debug Anything

Mon, 02 Dec 2019 15:00:00 GMT

Debugging is an important skill for any developer. Arguably the most important skill, if you consider debugging in the general sense: exploring a system, explaining its behaviour, and working out how to change it.

Nonetheless, most of us are bad at it. We don't methodically work through a process. Instead, we guess wildly, sprinkle logging at random and change things blindly, until the problem disappears.

Fortunately, we can improve! I've been talking to a whole range of brilliant developers about their top debugging advice to put together a superguide to debugging, drawn from all their years of expertise, and my own software development experiences.

So, without further ado: how do I debug?

Narrow down your problem

In most debugging situations, you start with a mismatch between what you expect and reality.

The user clicked a link to our website. I expected it to load for them, but it never did.
Server A sent a request to server B. I expected it to receive a result within 100ms, but server B took 20 seconds.
The customer entered their credit card details correctly. I expected them to be charged, but they never were.

To be able to fix a problem, we need to understand it sufficiently that there's a clear & acceptable fix available. Note that 'sufficient' is contextual: sometimes "server B is broken, we can just reset it" is fine, sometimes "server B has disk corruption caused by our pattern of IO usage" is just the beginning.

Let's look at how to isolate the location of the problem first, and we'll explore how to explain it (and fix it) later.

Narrow down the area where things can be going wrong: it's not working here, so is everything correct at x earlier point? Yes? How about y point between here and x? - Nicole Williams

Your first step is to focus your debugging down to a sufficiently small area that you can start thinking about fixes.

In effect, you're running tests to separate the parts of your system that are working correctly from the parts that aren't, repeatedly. This is an incremental process, and I'd highly recomend taking careful notes as you go, to keep track.

Sometimes the separating lines are clear. If you have a single function, which gets the correct inputs but produces the wrong outputs, your next step is to examine the values at points within that function, and work out at which point they go wrong.

Sometimes, it's less clear. Some tips:

If the system is in a broken state

Which state exactly is 'broken'?
- If your data store breaks due to inconsistent data, which of the inconsistent parts is incorrect?
- If your server runs but stops responding: do all endpoints stop responding, or just a few?
When was it last in a good state?
Exactly what event changes it from a good to a bad state?
Which part of its state breaks first?
- For non-trivial state, it's useful to untangle the dependencies within, to understand how they affect one another.
- This is useful to work out whether variable A & B were both set wrong, or variable A was set wrong and variable B was set correctly, but based on that wrong data.
- Applies to low-level state like variables but also high-level state like 'is the server responding'. If server A & B sometimes both go down, do they both crash for the same reason, or does one crashing kill the other?

If the system is made up of communicating parts

Which one makes the first mistake?
For example, if your app fails to load data from the server:
- Is it requesting the right data from the server?
- Is the server returning the right data?
HTTP Toolkit is perfect for doing this, if you're using HTTP!
If you can answer those questions, you immediately know whether the server or the app is at fault (assuming only one is broken…)

If the issue is intermittent

Do anything you can to narrow it down to a specific & consistently reproducible error.
Are there any common factors between the times it appears?
- The user's operating system/browser, the time of day, size of your data and system load are all good candidates.
Try to decide if it's caused by race conditions, or specific rare inputs.
- If you can take a set of safe & reliable operations, run a lot of them rapidly, and they then consistently fail, it's probably a race condition.
- If there's any common factors in when it appears that aren't just correlated with system load, it's probably something else.
Once you have a clear race condition, you can remove and shrink the parallel operations until you can find what's racing.
Once you have specific inputs that cause a failure, you can reproduce the issue and investigate why.

You don't just have to search the code for the cause!

Sometimes you can search within your set of servers or users, to narrow down where the bug appears to a specific machine that's doing the wrong thing, which can provide a wealth of clues.

Sometimes it can also be useful to narrow down the problem in time, to find the moment this behaviour changed (if you're confident it worked in the past) so you can see which parts of the system were changed at a similar time, and investigate those more closely.

Isolate as many variables as you can and test each one by itself until you figure out the issue - Stacy Caprio from Accelerated Growth Marketing

If something is null/missing, where was it meant to be coming from? Follow the flow back, don't just look at the place where the error is - Nicole Williams

In many of these cases, there's multiple dimensions: where the problem occurs, and what input causes it.

Once you've found the point that things go wrong, it's these inputs that become important. You need to work through the same kind of process to narrow down which part of your input causes the problem.

'Input' here is very general: it might be an HTTP request you receive, a database record that you're processing, a function parameter, the current time of day, or your network latency. It's any variable that affects your application.

The process here is much the same, but for complex inputs this is one place where a good diff can be very valuable. Find a working input, find a bad input, and compare them. Somewhere in that set of differences is your problem! You can replace parts of the bad input with the good input (or vice versa) until it fails, to pin down the minimal bad data required to trigger the issue.

For simple inputs it's easier, but there's still some comparison required. For example, if your UI throws an error when trying to display some prices: for which prices does that happen? Is it an issue with too large inputs, unexpected negative inputs, or certain inputs that cause calculation errors later on?

Binary search all the things; if you can rule out half the possibilities in one step then do it - Tom Hudson

Once you have a range of possibilities for what could cause your problem, in some sense, test in the middle of the two. Your intuition for where the problem lies is probably wrong. Given that, the middle is going to get you to the right answer faster and more reliably than anything else.

This is solid advice for everything from debugging a single broken function (examine the state of your values in the middle) to an entire codebase (where it's well worth learning how to git bisect).

Get visibility

High-level processes are all very well, but sometimes you can't clearly see inside a broken part of your system, and you're unable to dig any deeper.

Production servers are a classic example, along with issues that only hit certain customer's devices but never yours, or intermittent issues that you can't reliably reproduce. You need to get information on what that the system is doing at each step, so that you can narrow down the problem.

The first & best option is to somehow reproduce the issue in an environment that you do have visibility into (e.g. locally). Collect whatever you can, and try to reproduce it. If you can do this everything is great! You win. This should definitely be your first port of call.

Often you can't though, either because it's hard to reproduce in other environments, or because can't see the details even in your environment of choice. Fortunately, there's a lot of tools to help you get that visibility:

Logging & observability tools

If I've got the offending function at hand, I have a good logger like Chrome DevTools, and my build doesn't take long, logging variable contents is my quick-and-dirty first step. - Aaron Yoshitake from Pick a Kit

Error logging is the first thing I google when playing with a new programming language.
While comprehensive debugging and benchmarking toolkits exist for most platforms out there, simple error logging can do miracles both during local testing and in production environments.
Depending on the application, logs can be generated in a file, passed to a third party, or even stored in the database. Navigating through the user flow can be facilitated with a simple logging framework that every language out there supports out of the box. - Mario Peshev

There are built in logging tools for every language & tool out there. They're easily accessible, simple & flexible, and can quickly help you narrow down the issue enough to understand it, or at least get closer to reproducing it.

Logging can be a blunt instrument though: difficult to do usefully at scale, or to get deep context into what's going, and often not in place in the one spot where you need it.

As a first step beyond these, it can be worth looking at automated logging tools. Error monitoring services like Sentry for example will record errors automatically, along with the stacktrace & context of the system, and the details of interesting events shortly before the error, from HTTP requests to console messages.

Meanwhile, tools like LogRocket let you replay user sessions to see what they saw, and understand issues you can't reproduce accurately yourself. This is powerful, but recording user sessions can also come with privacy concerns.

Finally, there's also more heavy-duty observability tools available, such as Honeycomb and New Relic.

These tools like take substantially more setup, but can offer you far more data, and more power to explore it: from checking all SQL queries triggered by a given incoming HTTP request to exploring the exact distribution of latency between each of your servers on Tuesdays. They'll collect some data automatically, but also require you to log data points at relevant points in your application, so there's some effort involved. If you're running a large system though, and frequently debugging issues in production, then it's well worth the investment.

For all kinds of tools like this, it's best if you've set them up beforehand! There's often still value in setting them up as you're investigating an issue though, so don't discount them if you haven't.

Debuggers

Your language will have proper debugging tools, which allow you to walk through your system, line by line. That's mostly relevant in local environments, but tricks like sending your customer a debug build of the app to reproduce the issue can help in other cases too.

Either way, debuggers are most useful once you're reasonable sure where the problem is, and you want to examine it up close to work out the exact details.

Being very familiar with the standard debugging tools for your environment is extremely valuable. Don't just learn the basics; many will go beyond just adding breakpoints and examining variables, and include more powerful features that you can use to more quickly & effectively find your issue:

Conditional breakpoints, which pause execution at a point in the code only when some condition is met.
The ability to manipulate state or even the code itself while it's running, to reproduce issues & test fixes.
Time travel, allowing you to fully explore the execution flow of your process.

Don't shy away from debugging into your platform itself too. Even if the bug is caused by your code, sometimes stepping through the built-in functions that you're using can show you how you're using them wrong.

Interaction inspectors

Often, issues will appear in the interactions between systems, and being able to see and interact directly with the communications powering these interactions can quickly solve them.

HTTP Toolkit fits in perfectly here. HTTP Toolkit makes it easy to intercept & view HTTP or HTTPS traffic from clients, between backend servers, or sent to APIs, and to then edit that traffic live too, so you can test edge cases and narrow down which part of your inputs is causing you trouble.

Alternatively, if you're working with a different protocol, or you need to inspect at the raw TCP level, Wireshark can be a godsend. Wireshark lets you capture & view raw packet data, and provides tools to interpret & filter packets with an understanding of a variety of protocols, although that can mean it has a steep learning curve.

Interactions often happen between networked systems of course, but there are other interactions you can inspect too. Strace or DTruss allow you to inspect & modify the interactions between a process and the kernel, for example. They trace system calls, including each individual file & socket operation, and many others. This can help you to understand low-level OS issues, to see exactly which files or sockets a program is trying to use, or to explore very complex performance or deadlock problems.

Interactive code exploration

For debugging knotty algorithmic code, exploration of the data and its processing can be very effective.

Tools exist to let you do this interactively, turning code into something conceptually more like a spreadsheet: where you can see each of the intermediate values as all at once, and change one value or calculation to see how it affects everything else.

Quokka.js does this for JavaScript & TypeScript, as a plugin to a variety of different editors. Light Table meanwhile is a fully fledged IDE designed for exactly this workflow, originally designed for clojure, but now with plugins available for other languages too.

Explain the problem

Hopefully at this point, after using your visibility into your system to incrementally narrow down your problem, you have a good idea where & how things are going wrong.

The next step is to work out why.

In many cases, once you narrow down the specific part of your system or state that's incorrect, the mistake will be obvious. You're adding instead of multiplying a value for example, or you've forgotten to validate clearly bad input data before using it. In other cases it's not though, and explaining the issue so you can fix it can be a big challenge in itself.

Check your assumptions

Validate all assumptions. Does that function really return what you think it does? Read documentation carefully. Check spelling, casing, punctuation. Actually read the error message instead of glancing at it. - Tom Hudson

As humans, we make assumptions and build abstractions around how things work, to avoiding constantly thinking about every possible detail.

Sometimes these are wrong.

It's very easy for this to end up causing major bugs that are hard to unpick, even for the most simple assumptions. If a problem seems inexplicable, as if the computer is just doing the 'wrong' thing, you've almost certainly run into this, and you're making the wrong assumption somewhere.

Check the right function is being called, or the right server is being talked to.
Check you're running the version of the code you think you are.
Check that a value you checked elsewhere hasn't been mutated later.
Check you actually return your values, and properly wait on asynchronous events.
Check the error in the logs that you're trying to explain is the first unexpected error, not just a symptom of a previous issue.

Search for answers

Searching the internet for explanations of confusing behaviour is a time worn debugger tradition.

Often you'll struggle though, and it's not as easy at it sounds for complex issues. If just searching from your description of the problem doesn't work, there's a few things you can try:

Search for potential answers to your problem, not just the question. Rather than "fetch request to X fails", try "X doesn't support gzipped requests" or "fetch can't send JSON".
Search for snippets of any error messages you can see, or any other related logging, even if it's not the problem itself.
Search StackOverflow directly, filtering questions by tags to hone your results.
Search the issue tracker for the tools involved, to find bug reports related to your issue.
Search for examples of working projects that might include similar code, compare your approach to theirs, and look very closely at the places where they differ.

Check on the usual suspects

Tom Hudson has a good list of common things to watch out for:

Common reasons for weird behaviour: * No disk space (or no free inodes!) * Network issues (especially DNS) * System time set wrong (that one's caused some really weird issues for me) * Antivirus interfering * Filename casing (e.g. case sensitive on linux, but not mac or windows)

Any of these can cause strange errors elsewhere, that are seemingly unrelated, and extremely hard to trace down!

It's useful to collect your own list of these kind of issues. Some common problems will be very general, like these, but there'll also be common culprits unique to your own platform or system. Keep a list somewhere, note down the cause of each problem you have to debug, and you'll quickly build a library of things to watch out for in future.

Talk about it

Explain the issue to someone else. - Veronika Milic

Still stuck? If all else fails, sometimes rubber duck debugging is often the best solution. Talk to a colleague, ask a question on Stack Overflow, or post on Twitter. Asking for help is important, and there's a surprising number of people who'd love to explore your problem.

Try to explain everything you understand about what's currently happening, and what the inexplicable part is. Half the time you'll end up solving it yourself along the way, and the other half of the time at least you have somebody else who'll try to help!

Fix it

Hopefully, you can now explain which part of your system is broken, and why that's happened. The last step is up to you I'm afraid: fix it. Fortunately, if you understand where the code is broken and why it's wrong, that's normally a fairly clear process (though not necessarily a quick one).

Once you do fix your issue though, do yourself a favour and remember to:

Thoroughly retest the fix after writing it, rather than assuming it works based on your understanding of the problem. Failing to do this is very painful, wastes a load of your time, and is yet remarkably common.
Write some notes on how you debugged the issue, and your best understanding of the underlying issue & how it happened. At the very least this will help you debug similar issues in future, and in some important cases this highlights that your fix doesn't actually make any sense given your explanation, so one of the two is wrong.

Good luck!

Still stuck? Have questions or comments on this article? Have any great debugging tips of your own? Get in touch on Twitter.

A Crash Course in Price Localization

Mon, 28 Oct 2019 14:30:00 GMT

Naming a price for your latest magnum opus is one of the hardest decisions in releasing a product. In theory you price on the value you're providing, but wow that's hard to measure. Instead it's better (and often easier) to experiment: try a price, see how it converts and what that means for the bottom line, try another, compare, repeat.

If you're selling software on the internet (like me) then your pricing experiments are probably missing a key variable: the customer's local market.

In this post, I'm going to show you why that's important for your bottom line, explain how to think about this kind of pricing, and give you the tools I'm using to fix this for HTTP Toolkit (my product), starting next week.

Why Localized Pricing Matters

Right now HTTP Toolkit costs €5/month, and I'm selling to software engineers. In the US (average software engineer annual salary $84,000 = €76,000 = €200/day) that's extremely cheap. In India however (average software engineer salary ₹502,000 = €6,400 = €17/day) it's completely unaffordable.

This is a real problem. These are two very different markets, but they're both full of my target audience (professional software developers). They're also actually very similar-sized markets: India is predicted to have more developers than the entire US by 2024. Although it's likely the US will be more profitable regardless, it is a crowded & expensive market to enter, and the majority of developers in the world don't live there (notably this is true for every country).

If you're building a paid software product that isn't fundamentally tied to a specific location, you have this same dilemma. You can price high and ignore big parts of the world, or price lower and undervalue your product in the richest parts of the world. Regardless, if you pick one price worldwide, you'll pick the wrong price for most of your potential customers.

One classic example of this: Netflix vs Amazon in India. Netflix were charging $7.50 per month, similar to their US pricing, and limiting themselves to a small audience. Meanwhile Amazon charged $15 for an entire year and won big. More recently, Netflix have now started localizing too, with a new $3/month mobile-only subscription.

In addition to the financial impact of all this, it also artificially limits the reach and value of your software. For me, developers in India (and much of the rest of the world) will get just as much use out of my product, and it's not fair or rational to price them out of it.

Finally, even for users where your fixed price is 'right', having a sensible local prices improves your checkout experience. Prices shown in the fixed currency (€5) are harder to interpret for customers that don't use that currency day to day, and create currency conversion distractions. Meanwhile if you simply show the directly converted price in the local currency ($5.54) it looks ugly, which measurably affects how customers perceive the quality of your product.

This is not good, but we can fix it.

Picking the countries you care about

The first step is working out which countries to localize for.

There's a diminishing return here, which means you probably don't want to localize for every country on earth. Instead, you want to pick the set of countries most likely to buy from you. The local economy and overall size of your target audience in the country do have some bearing too, so there's a little art to this. Some important metrics to dig into:

Which countries visit your site most?
Which countries buy your product most?
(For freemium/trial) Which countries most try for free, but don't upgrade?
Which countries are most likely to start buying your product, but not finish?

These should give you a good set to start with. The users currently converting in earnest are clearly interested and getting a good (for them) price, which should perhaps be higher. Meanwhile the countries where users consistently visit your site and show interest, but don't convert from free to paid, are the audience you're missing out on.

These are the users for whom you most want to customize your pricing. Pick a small set of countries depending on how much time you have. The most interesting 80% of your customers is probably a good starting point. There's a lot of countries in the world, but there should be a few huge ones that stick out.

Personally, I've picked 13 currencies (mostly an implementation feature/limitation - more on that later): the Euro, US Dollar, Brazilian Real, Canadian Dollar, Chinese Yuan, British Pound, Indian Rupee, Swedish Krona, Czech Koruna, Russian Ruble, Hong Kong Dollar, Swiss Franc, and Danish Krona.

Benchmarking your prices

Once you have a list of countries, the next step is to decide a price for each. To do so, we're going to relate them to a market metric.

When talking about pricing here, I'm assuming you're not doing this 100% from scratch! You need some initial signal from a real set of customers on what general level of pricing is appropriate in at least one market, before you can focus on per-location pricing. If not, talk to your customers more, confidently pick a number out of a hat and try it, or copy your competitors, to work out how the market reacts to some prices and get an initial starting point.

If you do have that, great! Pick your biggest customer country where it seems like your price is about right: not too high that nobody buys, and not so low that nobody complains. This is going to be our benchmark. That price works in this market, and we're going to extrapolate from here.

The key hypothesis here is that the value your customers get from your product is related to another known market metric.

For example, my product is usually a business cost. It saves developers time & effort. It's thus related to the value of software developer work, and so should be related to the software developer salaries in a region. If your product is similarly tied to a job, you're probably in the same place. For other products your pricing might be sensitive to the cost of food, or transport, or tourist accomodation, or disposable income.

Regardless of the metric, we can use this to directly derive equivalent prices. In my case, my existing prices suggest that if you're willing to pay X/day for a software developer, then you're willing to pay about 6% of X/month for my product. I can take that, look up the average developer salary in a country, and immediately get an appropriate local price.

Running the numbers

You've now got a set of countries, a base price, and a local pricing metric, and it's time to break out the spreadsheets.

I'll walk you through the process below, but if you just want to see the result, then take a look at my full pricing calculations for HTTP Toolkit.

Here's the general idea:

Set up your country data: per country, the currency conversion rate & the value of the metric you care about (in a common currency, if it's monetary).
Create another sheet to track your current prices.
For each product you have, for each country, add your current product price there, and add another column using the conversion rates to convert these prices to one common currency.
For each row there, calculate the ratio between your price and your metric.
For your base country, this ratio is your target ratio. Every other country that isn't close to that is currently mis-priced. In my case for example, I'm charging 5.8% of developer daily salaries in the eurozone, 3% in the USA, and 35% in India - pretty clearly not optimal.
Add any other metrics you need to be aware of, e.g: overheads/margin, relationships to any other potential market metrics, comparative discount between multiple products, how annual prices will appear per-month, etc.
That's your current pricing. Duplicate this sheet to start changing it.
Add one more column: the ratio of the soon-to-be new prices here vs the current price in the previous sheet.
Edit the prices for each row here, optimising the metrics shown, to make your market metric ratios consistent, and preferably get nice round numbers too.

If that doesn't make sense, or you're not a big spreadsheet fan, do take a look at the HTTP Toolkit sheet, which should help with a lot of this.

Do bear in mind that while these metrics are a useful indicator, they don't tell the whole story. The levels of competition in each market, relative preferences for your different products, and many other factors are all relevant to your pricing - but hopefully the metrics here should provide a good baseline.

If you're confident, you can try to extend this to model more factors and explore the data. You can add in your real sales data, for example, to see how it might affect revenue, or put the prices of multiple products together so you can work out what your pricing page will look like for users from a given country.

Implementing localized pricing

Ok, we have prices. Let's talk mechanics. Now you know what you want to charge, how do you actually do that for customers in different countries? There's two tricks here:

Credit cards are registered to an address in a specific country, and this is non-trivial to forge, so this is fairly easy to enforce at payment time.
Most of the time, IP geolocation is accurate enough that you can also accurately predict a customer's country beforehand, so showing the correct pricing up front is doable too.

That means it's generally quite practical to charge for the right country. This likely isn't foolproof, but it's good enough in practice. The details of actually implementing this depends on your payment setup.

For my case I'm using Paddle, who have a built-in UI for this:

Note that this only works per currency, not per country. That's not perfect (the euro particularly covers a lot of different countries and markets), but it's a pretty good approximation.

Paddle also have an API to query prices, automatically appropriately localized, and handle they the checkout country bit too, so if you're using them too it's really just a setting your prices in their dashboard, matter of reading your prices from the API instead of hardcoding them, and you're good to go.

If you're not using Paddle, you're probably using Stripe. I'm not, so I can't comment in detail (though I'd love pointers if you have them) but subscription plans are all tied to a specific currency so you'd likely want to make a plan per currency, priced appropriately, then handle country detection yourself, and set up the subscription against the corresponding plan.

Pricing updates for HTTP Toolkit

As an end result then, in my case, what does all this mean for HTTP Toolkit?

First, I'm not changing any prices for existing customers, and I have no plans for price hikes any time in the foreseeable future.

However one week from now, for all new customers, I am going to rebalance a lot of prices. They're not exclusively going up though - for some countries, this is going to be a dramatic discount. There will be price increases for some of the more well-off countries (Canada, USA, Sweden, Denmark, Hong Kong & Switzerland), but price cuts for others (Brazil, China, India, Czechia & Russia).

Those are all the clear-cut cases for now, so everybody else is staying the same. If you think I do need to customize pricing for your country though, let me know, and I'll take a look.

Conclusion

It's easy to sell software to the world nowadays, but it's also easy to do it badly.

If you sell to everybody worldwide at one fixed price, then you're pricing your product wrong for the majority of your potential customers.

With a couple of hours and a little spreadsheet-fu you can fix this, and boost revenues while giving more customers more value.

Don't forget that this isn't a one-off process though! Look at this as just one more experiment, and go back to the data again to tweak it further once you've tried it out.

Have any questions? Let me know.

The 5 Big Features of TypeScript 3.7 and How to Use Them

Thu, 12 Sep 2019 12:00:00 GMT

The TypeScript 3.7 release is coming soon, and it's going to be a big one.

The target release date is November 5th, and there's some seriously exciting headline features included:

Assert signatures
Recursive type aliases
Top-level await
Null coalescing
Optional chaining

Personally, I'm super excited about this, they're going to whisk away all sorts of annoyances that I've been fighting in TypeScript whilst building HTTP Toolkit.

If you haven't been paying close attention to the TypeScript development process though, it's probably not clear what half of these mean, or why you should care. Let's talk them through.

Assert Signatures

This is a brand-new & little-known TypeScript feature, which allows you to write functions that act like type guards as a side-effect, rather than explicitly returning their boolean result.

It's easiest to demonstrate this with a JavaScript example:

// In JS:

function assertString(input) {
    if (typeof input === 'string') return;
    else throw new Error('Input must be a string!');
}

function doSomething(input) {
    assertString(input);

    // ... Use input, confident that it's a string
}

doSomething('abc'); // All good
doSomething(123); // Throws an error

This pattern is neat and useful and you can't use it in TypeScript today.

TypeScript can't know that you've guaranteed the type of input after it's run assertString. Typically people just make the argument input: string to avoid this, and that's good, but that also just pushes the type checking problem somewhere else, and in cases where you just want to fail hard it's useful to have this option available.

Fortunately, soon we will:

// With TS 3.7

function assertString(input: any): asserts input is string { // <-- the magic
    if (typeof input === 'string') return;
    else throw new Error('Input must be a string!');
}

function doSomething(input: string | number) {
    assertString(input);

    // input's type is just 'string' here
}

Here assert input is string means that if this function ever returns, TypeScript can narrow the type of input to string, just as if it was inside an if block with a type guard.

To make this safe, that means if the assert statement isn't true then your assert function must either throw an error or not return at all (kill the process, infinite loop, you name it).

That's the basics, but this actually lets you pull some really neat tricks:

// With TS 3.7

// Asserts that input is truthy, throwing immediately if not:
function assert(input: any): asserts input { // <-- not a typo
    if (!input) throw new Error('Not a truthy value');
}

declare const x: number | string | undefined;
assert(x); // Narrows x to number | string

// Also usable with type guarding expressions!
assert(typeof x === 'string'); // Narrows x to string

// -- Or use assert in your tests: --
const a: Result | Error = doSomethingTestable();

expect(a).is.instanceOf(result); // 'instanceOf' could 'asserts a is Result'
expect(a.resultValue).to.equal(123); // a.resultValue is now legal

// -- Use as a safer ! that throws immediately if you're wrong --
function assertDefined(obj: T): asserts obj is NonNullable {
    if (obj === undefined || obj === null) {
        throw new Error('Must not be a nullable value');
    }
}
declare const x: string | undefined;

// Gives y just 'string' as a type, could throw elsewhere later:
const y = x!;

// Gives y 'string' as a type, or throws immediately if you're wrong:
assertDefined(x);
const z = x;

// -- Or even update types to track a function's side-effects --
type X = { value: T };

// Use asserts to narrow types according to side effects:
function setX(x: X, v: T): asserts x is X {
    x.value = v;
}

declare let x: X; // x is now { value: any };

setX(x, 123);
// x is now { value: number };

This is still in flux, so don't take it as the definite result, and keep an eye on the pull request if you want the final details.

There's even discussion there about allowing functions to assert something and return a type, which would let you extend the final example above to track a much wider variety of side effects, but we'll have to wait and see how that plays out.

Top-level Await

Async/await is amazing, and makes promises dramatically cleaner to use.

Unfortunately though, you can't use them at the top level. This might not be something you care about much in a TS library or application, but if you're writing a runnable script or using TypeScript in a REPL then this gets super annoying. It's even worse if you're used to frontend development, since top-level await has been working nicely in the Chrome and Firefox console for a couple of years now.

Fortunately though, a fix is coming. This is actually a general stage-3 JS proposal, so it'll be everywhere else eventually too, but for TS devs 3.7 is where the magic happens.

This one's simple, but let's have another quick demo anyway:

// Today:

// Your only solution right now for a script that does something async:
async function doEverything() {
    ...
    const response = await fetch('http://example.com');
    ...
}
doEverything(); // <- eugh (could use an IIFE instead, but even more eugh)

With top-level await:

// With TS 3.7:

// Your script:
...
const response = await fetch('http://example.com');
...

There's a notable gotcha here: if you're not writing a script, or using a REPL, don't write this at the top level, unless you really know what you're doing!

It's totally possible to use this to write modules that do blocking async steps when imported. That can be useful for some niche cases, but people tend to assume that their import statement is a synchronous, reliable & fairly quick operation, and you could easily hose your codebase's startup time if you start blocking imports for complex async processes (even worse, processes that can fail).

This is somewhat mitigated by the semantics of imports of async modules: they're imported and run in parallel, so the importing module effectively waits for Promise.all(importedModules) before being executed. Rich Harris wrote an excellent piece on a previous version of this spec, before that change, when imports ran sequentially and this problem was much worse), which makes for good background reading on the risks here if you're interested.

It's also worth noting that this is only useful for module systems that support asynchronous imports. There isn't yet a formal spec for how TS will handle this, but that likely means that a very recent target configuration, and either ES Modules or Webpack v5 (whose alphas have experimental support) at runtime.

Recursive Type Aliases

If you're ever tried to define a recursive type in TypeScript, you may have run into StackOverflow questions like this: https://stackoverflow.com/questions/47842266/recursive-types-in-typescript.

Right now, you can't. Interfaces can be recursive, but there are limitations to their expressiveness, and type aliases can't. That means right now, you need to combine the two: define a type alias, and extract the recursive parts of the type into interfaces. It works, but it's messy, and we can do better.

As a concrete example, this is the suggested type definition for JSON data:

// Today:

type JSONValue =
    | string
    | number
    | boolean
    | JSONObject
    | JSONArray;

interface JSONObject {
    [x: string]: JSONValue;
}

interface JSONArray extends Array { }

That works, but the extra interfaces are only there because they're required to get around the recursion limitation.

Fixing this requires no new syntax, it just removes that restriction, so the below compiles:

// With TS 3.7:

type JSONValue =
    | string
    | number
    | boolean
    | { [x: string]: JSONValue }
    | Array;

Right now that fails to compile with Type alias 'JSONValue' circularly references itself. Soon though, soon…

Null Coalescing

Aside from being difficult to spell, this one is quite simple & easy. It's based on a JavaScript stage-3 proposal, which means it'll also be coming to your favourite vanilla JavaScript environment too soon, if it hasn't already.

In JavaScript, there's a common pattern for handling default values, and falling back to the first valid result of a defined group. It looks something like this:

// Today:

// Use the first of firstResult/secondResult which is truthy:
const result = firstResult || secondResult;

// Use configValue from provided options if truthy, or 'default' if not:
this.configValue = options.configValue || 'default';

This is useful in a host of cases, but due to some interesting quirks in JavaScript, it can catch you out. If firstResult or options.configValue can meaningfully be set to false, an empty string or 0, then this code has a bug. If those values are set, then when considered as booleans they're falsy, so the fallback value (secondResult / 'default') is used anyway.

Null coalescing fixes this. Instead of the above, you'll be able to write:

// With TS 3.7:

// Use the first of firstResult/secondResult which is *defined*:
const result = firstResult ?? secondResult;

// Use configSetting from provided options if *defined*, or 'default' if not:
this.configValue = options.configValue ?? 'default';

?? differs from || in that it falls through to the next value only if the first argument is null or undefined, not falsy. That fixes our bug. If you pass false as firstResult, that will be used instead of secondResult, because while it's falsy it is still defined, and that's all that's required.

Simple, but super useful, and takes a way a whole class of bugs.

Optional Chaining

Last but not least, optional chaining is another stage-3 proposal which is making its way into TypeScript.

This is designed to solve an issue faced by developers in every language: how do you get data out of a data structure when some or all of it might not be present?

Right now, you might do something like this:

// Today:

// To get data.key1.key2, if any level could be null/undefined:
let result = data ? (data.key1 ? data.key1.key2 : undefined) : undefined;

// Another equivalent alternative:
let result = ((data || {}).key1 || {}).key2;

Nasty! This gets much much worse if you need to go deeper, and although the 2nd example works at runtime, it won't even compile in TypeScript since the first step could be {}, in which case key1 isn't a valid key at all.

This gets still more complicated if you're trying to get into an array, or there's a function call somewhere in this process.

There's a host of other approaches to this, but they're all noisy, messy & error-prone. With optional chaining, you can do this:

// With TS 3.7:

// Returns the value is it's all defined & non-null, or undefined if not.
let result = data?.key1?.key2;

// The same, through an array index or property, if possible:
array?.[0]?.['key'];

// Call a method, but only if it's defined:
obj.method?.();

// Get a property, or return 'default' if any step is not defined:
let result = data?.key1?.key2 ?? 'default';

The last case shows how neatly some of these dovetail together: null coalescing + optional chaining is a match made in heaven.

One gotcha: this will return undefined for missing values, even if they were null, e.g. in cases like (null)?.key (returns undefined). A small point, but one to watch out for if you have a lot of null in your data structures.

That's the lot! That should outline all the essentials for these features, but there's lots of smaller improvements, fixes & editor support improvements coming too, so take a look at the official roadmap if you want to get into the nitty gritty.

Hope that's useful - if you've got any questions let me know on Twitter.

While you're here, if you like JavaScript & want to supercharge your debugging skills, try out HTTP Toolkit. One-click HTTP(S) interception & debugging for any JS page, script, or server (plus lots of other tools too).

Unblocking Node With Unref()

Wed, 11 Sep 2019 13:30:00 GMT

Node.js runs on an event loop. It holds a queue of tasks to run, and runs them, one by one. New tasks appear on the queue while it runs, added by your code (setTimeout) or outside events (a network connection), and the process simply continues until the queue is empty.

That's all great, until it isn't.

Occasionally you want to break out of that model. What happens if you want to run a schedule task on a fixed interval indefinitely? Typically, life gets difficult: you need to include & manage an explicit shutdown process for that interval, and if you ever forget to shut it down then the process will keep running forever, with no explanation. Ouch.

I hit ran into this whilst working on Mockttp (the HTTP interception & testing library behind HTTP Toolkit). Mockttp needs to keep track of your current local IP addresses, to help detect and warn about request loops. That data can change occasionally, so it needs to poll it on an interval, but it's very annoying to have to remember to carefully shut that process down in addition to everything else.

Fortunately, it turns out you can fix this easily! Enter unref:

Timeout.Unref()

Timer functions like setInterval and setTimeout in Node.js return a Timeout object, representing the ongoing timer.

These can be passed to clearInterval or clearTimeout to shutdown the timer entirely, but they also have a little-used unref() method. This does something magical: it keeps running your code, but stops it from keeping the process alive. Like so:

// Update my data every 10 seconds
const interval = setInterval(() => updateMyData(), 10000);
// But don't let that keep the process alive!
interval.unref();

// Log a message if the app is still running 10 seconds from now
const timeout = setTimeout(() => console.log('Still going'), 10000);
// But still shutdown cleanly if it wants to stop before then:
timeout.unref();

This functions like a flag you can set on your timers, marking them as tasks that node doesn't need to wait for. They'll run as normal while the process is alive, but if the rest of the event queue is empty then they're ignored, and the process exits anyway.

You can also mark the timer as important again with timer.ref() or (in Node 11+ only) check whether it's currently configured to block exit of the process with timer.hasRef().

If you want to see this in action, you can check out the fix for Mockttp over here: https://github.com/httptoolkit/mockttp/blob/master/src/util/socket-util.ts#L58-L71

Gotchas

There's three last things worth noting here:

Although this can let you skip complicated cleanup processes, it doesn't make them worthless. Especially if your timer is doing something expensive, it's very often useful to provide an explicit shutdown command instead. This isn't a subsitute for cleaning up after yourself!
This can come with a small performance cost, as it's actually implemented using a separate scheduled task. Using a few is fine, but if you're creating very large numbers of these you might see a performance impact.
You shouldn't be using this everywhere. If you use this on a timeout you care about, you'll discover that your app is unexpectedly exiting half way through, way before you're expecting. This is similar to weak maps: it's a tool for specific situations, not an option for every day.

While you're here, if you like Node & want to supercharge your debugging skills, take a look at HTTP Toolkit. One-click HTTP(S) interception & debugging for any Node.js script, tool or server (and lots of other tools too).

Notarize your Electron App with Electron Forge

Tue, 06 Aug 2019 13:00:00 GMT

In the next release of macOS (10.15), if your app isn't notarized, your users can't run it. If you're distributing Mac apps, you need to deal with this. Fortunately, for Electron apps the process is fairly easy.

The final release date for 10.15 isn't yet announced, but there are many betas available to developers for testing. It's planned for fall 2019, and the hard requirement for notarization is already the reality in 10.14.5 for all new developer accounts (i.e. anybody who has never distributed software associated with their Apple developer id).

What is notarization?

Notarization is designed to allow Apple to quickly ensure that your software is safe to run on users' computers. The full details are available in Apple's own developer documentation.

In short, it's a process where:

You upload your new app build to Apple's notary service,
The notary service automatically checks it for malicious content and other issues
The notary service returns you a ticket showing that this check has been done on this build
You attach ('staple') this ticket to the software
When the app is installed, the Gatekeeper software on your users' Macs can check the stapled ticket, and thereby know that the software has already been examined by Apple.

In addition this means that every distributable version of your software comes with an attached ticket, which can be revoked later if necessary. If you discover that malicious code has somehow entered your application, or that your signing key has been leaked and other people are distributing unauthorized versions, you can remotely revoke the tickets and make that software uninstallable.

In addition the requirements for notarization are stricter than the existing code signing restrictions. Since notarization is now mandatory, this effectively represents Apple tightening their application security rules.

Note that this is not app review: there's no human process here, it's an automated scanning of your distributable app versions, and a audit trail of those scans.

What's involved?

There's a few main steps required for a typical Electron app build:

Ensure your app build conforms to the requirements for notarization. That means you need to:
- Build with Xcode 10+, on macOS 10.12+
- Build against the macOS 10.9 or later SDK
- Code sign your builds with your Developer ID (local development or Mac Distribution certificates aren't enough)
- Include a secure timestamp with your code signing signature (in most cases this already happens automatically)
- Enable the 'Hardened Runtime' capability
- Give your app the com.apple.security.cs.allow-unsigned-executable-memory entitlement, or Electron will fail to run when hardened runtime is enabled
Notarize all your builds before they're distributed:
- Build the executable, but don't package it into a DMG/etc yet
- Submit the app build to Apple's notary service
- Wait for the notary service to give you a ticket
- Attach that to the executable
- Continue with your packaging process

How do I do that in practice?

If you'd like a worked example, I recently added notarization to HTTP Toolkit, and you can see the commits involved here:

Let's walk through it, step by step, for a typical app built with Electron Forge v5. I'm assuming that you have code signing set up already, but nothing else, and that you're building the app on Travis. If you're not using Travis this should translate easily to other environments, but if you don't have code signing in place you'll have to set that up first.

Make sure you're using OSX 10.12+ and Xcode 10+
- For travis, just need to set osx_image to at least xcode10.
Record the Apple ID login details required
- Save your username (your Apple developer account email address) in a secure environment variable called APPLE_ID.
- Create an app-specific password for your developer account, following the instructions at https://support.apple.com/en-us/HT204397.
- Store the app-specific password in a secure environment variable called APPLE_ID_PASSWORD.
Set hardened-runtime: true and gatekeeper-assess: false in your electron-osx-sign configuration
- For Electron Forge v5, this is in your forge config under osxSign, within electronPackagerConfig.
- hardened-runtime is clear enough: this enables hardened runtime.
- Disabling gatekeeper-assess is required because otherwise electron-osx-sign will ask Gatekeeper to sanity check the build, and in new MacOS versions this will fail as it's not yet notarized. Fortunately, notarization will make these same checks for us later on anyway, so this is safe to skip.
Create an entitlements file, and set the entitlements and entitlements-inherit config properties of electron-osx-sign to use it
- The minimal entitlements file for an Electron app looks like this:
```
    
        com.apple.security.cs.allow-unsigned-executable-memory
        
    
```
- The entitlements and entitlements-inherit configuration properties should be a relative path to this file (e.g. src/entitlements.plist), in the same osxSign configuration as the previous step.
Install electron-notarize

Create a script that will perform the notarization

This needs to call the notarize function from electron-notarize, and wait for it to complete.
An example script might look like:

const { notarize } = require('electron-notarize');

// Path from here to your build app executable:
const buildOutput = require('path').resolve(
    __dirname,
    '..',
    'out',
    'HTTP Toolkit-darwin-x64',
    'HTTP Toolkit.app'
);

module.exports = function () {
    if (process.platform !== 'darwin') {
        console.log('Not a Mac; skipping notarization');
        return;
    }console.log('Notarizing...');

return notarize({
    appBundleId: 'tech.httptoolkit.desktop',
    appPath: buildOutput,
    appleId: process.env.APPLE_ID,
    appleIdPassword: process.env.APPLE_ID_PASSWORD
}).catch((e) => {
    console.error(e);
    throw e;
});
}

Don't forget to update the buildOutput path & appBundleId in the above to match your codebase!

Run this script, after the executable is built but before it is packaged into a DMG or similar.
- Confusingly, the correct forge hook for this is called postPackage.
- To set that up in Electron Forge v5, you need to add the below at the top level of your forge config:
```
"hooks": {
    "postPackage": require("./src/hooks/notarize.js")
}
```

Get notarizing!

Once this is in place, your builds should immediately start notarizing your OSX Electron executable. You'll receive an email from Apple each time a notarization is completed; these might be useful to audit notarization in your processes, but they can be very noisey, so you'll probably want to filter them out of your inbox.

You can check that notarization is worked by opening the resulting app on a Mac; on the first run after downloading it, you should see a popup saying something like:

"YourApp.app" is an app downloaded from the Internet. Are you sure you want to open it? Chrome downloaded this file on August 6, 2019. Apple checked it for malicious software and none was detected.

That final line is the key here: your Mac has detected the stapled ticket, it's happy with it, and you're all good.

Don't forget to actually run the app though, and confirm it all works happily under the tighetened hardened runtime requirements! If you have any issues there you may want to look at including extra entitlements, or reducing your use of protected APIs.

One last note: that message above is what you'll see if you download a built version from the internet, e.g. from your CI build output. If you built it locally, or need to manually confirm the notarization for some other reason, take a look at Apple's Gatekeeper testing instructions.

That's it! Good luck, and happy notarizing.

X-Ray Debugging for HTTP

Wed, 24 Apr 2019 12:00:00 GMT

HTTP Toolkit is a suite of open-source & cross-platform tools for developing, debugging & testing anything using HTTP. It lets you intercept HTTP(S) with one click, and explore, examine & understand all your traffic, to spot bugs, fix bugs, and build better software.

The free release of the first version (HTTP View) has been available for a little while, and today I've got some new killer features launching as part of HTTP Toolkit Pro, to let you look deep inside your HTTP traffic, with all the context you need to understand everything your application is doing.

API integrations for 1400+ APIs

Understanding your traffic takes more than just the raw data. Using OpenAPI and the OpenAPI directory, HTTP Toolkit can work out exactly which API every single request is talking to, for APIs from AWS to Stripe to Github, and a whole lot more.

With that, there's a lot of cool things we can do. For example:

Here we've taken a request to the YouTube API, and immediately worked out what operation it's doing, interpreted the parameters to provide inline documentation, pointed out that one parameter has an invalid value, and spotted another required parameter that's missing.

Debugging tools with real context - tools that really understand what you're trying to do - let you take your development skills to a whole new level.

Performance analysis, tips & warnings

Performance is hard. There's a huge number of ways to tweak & tune the speed of your application with HTTP, a lot of confusing specs (what's the difference between a no-store and no-cache cache-control header?), and not a lot of advice.

For most applications though, the two most important things to focus on are compression, and caching. Streaming large responses hugely slows down client applications, and caching lets them avoid request round trips entirely.

You can now get automated performance analysis for all HTTP responses, including the response time itself, but also the details of the request & response body compression (and a comparison to other content encodings you could've used), and a breakdown of the response cacheability itself.

The caching details come with a summary & detailed explanation of whether & why the request is cacheable, and also which future requests it'll match, who can cache it (just browsers, or CDNs and proxies too?), and when it'll expire.

Caching is hard, and HTTP Toolkit has your back to help you really understand what your response headers mean. In addition, if you've made mistakes like missing directives that would make your caching more consistent & reliable, or disagreeing configuration for the same property (like Expires & max-age), you'll get tips and warnings to help you fix them easily.

Lots more

On top of all that, we've now got light, dark & high-contrast themes, inline documentation for every HTTP header & status code, one-click man-in-the-middle setup for terminals (on all platforms) in addition to browsers, and everything else you might need to quickly & easily understand your HTTP traffic.

A sustainable future

HTTP Toolkit is fundamentally an open-source project. It's been hugely driven by the hard work and many contributions to Mockttp, and user feedback from the community over the past couple of months has been essential.

Releasing this paid version doesn't change that, and the entire Pro code is open-source too: github.com/httptoolkit. The aim is to make the project sustainable though, by encouraging professional developers & power users to help support ongoing development, to drive the project forward into the future.

If you like the sound of this, help fund it! Get HTTP Toolkit now, and supercharge your software debugging. If you're on the fence, you can also get started by with the existing free release.

We're also launching on Product Hunt today! Take a look at the reviews and leave your feedback at producthunt.com/posts/http-view.

Bundling Remote Scripts with Webpack

Thu, 07 Feb 2019 16:45:00 GMT

As a JavaScript developer nowadays, almost everything you use comes from npm. Unfortunately, not absolutely everything: there's still a small subset of scripts that expect to be included from a remote CDN somewhere, and when bundling your application these pose a problem.

You could use these scripts from the CDN, as intended. If you do so you'll lose opportunities for bundling benefits like tree shaking, but more importantly you now have to independently load scripts from one more domain at the same time as your other bundle(s). That means another point of failure, and means you need logic in your main app to wait until the remote script has loaded before using it, and to potentially handle loading failures too.

Instead, you could download the script directly, save it into your codebase ('vendor' it), and treat it like your own source. What if it changes though? Many of these CDN scripts change frequently, so you'll need to repeatedly update this, and every change is extra noise and mess in your codebase & git history.

I hit this recently working on HTTP Toolkit trying to use the JS SDK for a 3rd party service, which is only available from a CDN, and isn't published on npm. Fortunately, there's another option: webpack can solve this for us.

Val Loader

Webpack's little-known val loader allows you to easily define your own loading logic that is run at build time. When you load a file with most webpack loaders they read the file, transform the content somehow, and add some content to your bundle, which will later be returned from the initial import/require statement.

When you load a file with val loader however it:

Executes the file contents as a node module
Looks for an exported function or promise from the module
Waits on the promise/calls the function (which may in turn return a promise)
Takes the code property from the final result, and uses this as the content to be bundled and returned by the original import/require

This means you can write a simple node script that dynamically generates content, you can require that script elsewhere, and webpack will pre-generate the content for you at build time, totally automatically. Magic!

Fetching Remote Scripts

You can probably see where this is going. Putting this together: we need to write a module that fetches our remote script at build time, and returns it to val loader.

In practice, this looks something like this:

Install val loader: npm install --save-dev val-loader
Create a fetch-script.js loader script:

// I'm using fetch here, but any HTTP library will do.
const fetch = require('node-fetch');

const SCRIPT_URL = 'https://cdn.example.com/your-script.js';

module.exports = function () {
    return fetch(SCRIPT_URL)
    .then((response) => {
        if (!response.ok) {
            throw new Error('Could not download ' + SCRIPT_URL);
        }
        return response.text();
    })
    .then((remoteScript) => ({ code: remoteScript }));
}

In the rest of your codebase, require the module like any other, but using val loader:

const scriptExport = import('val-loader!./fetch-script');

That's it! No extra config, just a tiny node script.

With that in place, any code that needs the remote script can import our module via val loader, and get the remote script as if it were a normal dependency. It gets properly bundled with the rest of your app, and is always immediately available, like any other bundled script. At the same time, it still keeps up to date automatically: every build, we pull down the latest version from the CDN. You don't need to commit the script into your own repo, or manually check for updates.

One thing to watch out for here: the loader script does not get built by webpack before it's run. That means it needs to be natively runnable by node, so no TypeScript/babel/etc. It's a very simple script though, and this is node not browsers, so you can use modern JS regardless.

Accepting change

Depending on the script of course, safely pulling in changes is another article in itself. In general most remote scripts like these have some kind of compatibility guarantees (otherwise using them remotely would be impossible), but you may still want some kind of locking mechanism.

If there's versioning available in the remote URL that's trivial, if not though you'll need to check changes manually.

One reasonable approach would be to include & check a hash of the remote file in your loader script, and to fail the build if it changes, or perhaps just send yourself a notification. Failing the build forces you to manually confirm changes when the remote script changes, and then update the hash, which does at least ensure that you won't see unpredictable changes in your application. You'll need to play around, but there's many options here, depending on how flexibly you want to handle new changes.

Putting it all together

Enjoy! If you'd like to see a working example, take a look at how HTTP Toolkit's UI loads paddle.js. Check out the paddle.js loading script, and the code that imports it.

Have any thoughts or ideas about this? Just love/hate webpack? Let me know on twitter, or join the discussion on reddit.

Debugging Netlify Function Errors with Sentry

Thu, 31 Jan 2019 19:30:00 GMT

Netlify functions are a quick, easy & powerful tool, but like most serverless platforms, they can be even more difficult to debug & monitor than traditional server applications. It's a hard environment to precisely recreate locally, there's no machine you can SSH into in a pinch, and no built-in error notifications.

Your code is going to break eventually, and you need the tools to fix it.

HTTP Toolkit uses Netlify functions under the hood to manage user account information and Paddle checkout callbacks. If we hit errors here, people's payments will fail, or they'll stop being given access to paid features, so this can be pretty bad! I need to be able to catch errors immediately, debug and work out why they're happening, and confirm that my fixes work.

Debugging & fixing issues here is a big topic, but one of the first steps is knowing exactly when & how errors happen. There's a few tools for this, but personally I've had a lot of success on projects recently with Sentry. They've got a generous free plan (5k errors a month), built-in integrations for almost everything, and some good & detailed error reporting tools too.

If you can get Sentry set up, you'll get emails every time there's an error in your function, and you can explore the errors themselves in detail to work out exactly what failed. Perfect, but the setup for Netlify functions has a few extra steps.

Start reporting errors to Sentry

I'm going to be using JS here, and I'm assuming you've already got a working Netlify function set up. To add Sentry reporting from there, you need to:

Create a Sentry account
Create a Sentry project in that account for your functions
Take the DSN for your Sentry project and set it as a SENTRY_DSN variable in your Netlify build
npm install --save @sentry/node (the examples here require ^4.6.0)
Initialize your error logging logic:

const Sentry = require('@sentry/node');

const { SENTRY_DSN } = process.env;

let sentryInitialized = false;
export function initSentry() {
    if (SENTRY_DSN) {
        Sentry.init({ dsn: SENTRY_DSN });
        sentryInitialized = true;
    }
}

// Import & call this from your function handlers:
initSentry();

With just this in place, uncaught errors & rejections are now reported automatically!

Unfortunately though, errors/rejections in handlers are caught and swallowed, so we'll need to catch those too. In addition, AWS Lambda (the service behind Netlify functions) doesn't behave exactly as you'd expect, so some error reports will be delayed or lost when your function is shut down after execution.

Catching handler errors

Let's detect handler function errors first. To start with, create an convenient reportError method you can call to report errors to Sentry, which will wrap the extra logic that we'll need in a minute.

// Don't use this example quite yet! It's not complete - see below.
function reportError(error) {
    console.warn(error);
    if (!sentryInitialized) return;

    if (typeof error === 'string') {
        Sentry.captureMessage(error);
    } else {
        Sentry.captureException(error);
    }
}

Then add a wrapper around each of your function handlers. The wrapper needs to call the function handler as normal, but catch any errors or promise rejections, and report them to Sentry. It then needs to rethrow the error too, so that an HTTP error is still returned:

// Don't use this example quite yet! It's not complete - see below.
function catchErrors(handler) {
    return async function() {
        try {
            return await handler.call(this, ...arguments);
        } catch(e) {
            // This catches both sync errors & promise
            // rejections, because we 'await' on the handler
            reportError(e);
            throw e;
        }
    };
}

// Use the wrapper on each of your handlers like so:
exports.handler = catchErrors(function (event, context) {
    ...
});

This assumes you're using promises in your handlers, instead of callbacks. If you're using a callback-based approach, you'll need to capture and wrap the callback in your catchErrors function.

Reliable reporting with Sentry & AWS Lambda

A Lambda function runs until completion, and then will be frozen. Later calls may start it up again, or it might be disposed of, and the whole process created afresh. That means that any Sentry requests that haven't been sent when your function responds might be lost. Fortunately, we can fix this. We need to do two things: wait for reported errors to be fully sent, and ensure that Sentry doesn't interfere with normal Lambda shutdown.

The latest Sentry SDK now supports a flush() method (as of 4.6.0). This allows us to report errors, and then explicitly wait for them to be fully completed before our function ends.

To use it, change your report error function to the below:

async function reportError(error) {
    console.warn(error);
    if (!sentryInitialized) return;

    if (typeof error === 'string') {
        Sentry.captureMessage(error);
    } else {
        Sentry.captureException(error);
    }

    await Sentry.flush();
}

Lastly, to stop Sentry callbacks interfering with normal Lambda lifecycle, we need to set context.callbackWaitsForEmptyEventLoop to false.

We can do this in our handler wrapper, and we also need to update that wrapper to wait on the reportError call too, to make sure that it's completed.

Change your catchErrors wrapper to:

function catchErrors(handler) {
    return async function(event, context) {
        context.callbackWaitsForEmptyEventLoop = false;
        try {
            return await handler.call(this, ...arguments);
        } catch(e) {
            // This catches both sync errors & promise
            // rejections, because we 'await' on the handler
            await reportError(e);
            throw e;
        }
    };
}

All done! With this in place, all handler errors will be reliably reported to Sentry, and you can rest safe in the knowledge that your functions are working nicely (or at least, that you know exactly how much they're failing).

Bonus Extensions

There's two optional extra steps I'd like to mention, to help you debug your issues more easily.

First, extra reporting is super useful. Sentry.addBreadcrumb for example lets you record extra events that will be included in any later exceptions. You can also call our reportError function from anywhere else in your code to immediate report errors, even if you don't actually throw them and fail (but do remember to wait on the returned promise).

Second, include your function's git commit as your Sentry release, so you always know which version of the code threw which errors. Netlify provides this as a COMMIT_REF environment variable, but this sadly isn't available in the runtime Lambda environment, so we need to make sure we bake it in at build time. To do that, first extend the default webpack config:

// webpack.js:
const webpack = require('webpack');
const { COMMIT_REF } = process.env;

module.exports = {
    plugins: [
        new webpack.DefinePlugin({
            "process.env.COMMIT_REF": JSON.stringify(COMMIT_REF)
        })
    ]
};

You'll need to change your build script to pass -c ./webpack.js to your netlify-lambda build command to use this.

Then, change the initial Sentry setup to pass this variable on to Sentry:

const Sentry = require('@sentry/node');

const { SENTRY_DSN, COMMIT_REF } = process.env;

let sentryInitialized = false;
export function initSentry() {
    if (SENTRY_DSN) {
        Sentry.init({ dsn: SENTRY_DSN, release: COMMIT_REF });
        sentryInitialized = true;
    }
}

// Import & call this from your function handlers:
initSentry();

And voila, automated error reports for Netlify functions:

Want to see a complete example of this in action? Take a look at HTTP Toolkit's accounting internals.

One-click HTTP debugging for any CLI tool

Tue, 22 Jan 2019 14:50:00 GMT

Debug HTTP(S) from git, npm, apt-get, or any other CLI tool.

The command line is powerful, but can be hard to understand, and extremely hard to debug. Ever run a command, see it fail with a cryptic error, and have no idea why?

Better tools can help you understand what's really going on. They can let you see inside the command you're running to understand how it's communicating, see what data it's working with, and debug it in depth.

Excitingly I've just shipped one-click terminal interception to do exactly this with HTTP Toolkit, for HTTP and HTTPS, to show you everything your CLI is sending and help you work out what the hell is going on.

HTTP Toolkit is free & open-source, so if you want to try this yourself, go download it now and dive right in.

How do I try it?

Install HTTP Toolkit (if you haven't already).
On the Intercept page, click the 'Terminal' button.
In the terminal that opens, make some HTTP requests (try curl example.com, sudo apt-get update, git clone https://..., or anything!)
Open the View page in HTTP Toolkit, and take a look through the full details of every request sent.

What can I do with this?

Debug failing applications

Let's imagine you're using a CLI tool, and it hates you. For whatever reason, it refuses to do the perfectly reasonable thing you ask for. It exits with some cryptic error, or just fails silently, and you're stuck. Maybe you just downloaded it, or maybe it's your code and you've just forgotten how it works (we've all been there).

Internally perhaps this script is talking to an HTTP service somewhere, and failing later on. Unfortunately, you have no idea what it's asking for, what it gets in response, or why the result doesn't work. Most applications aren't designed to be debugged, and can be painfully opaque.

If you can intercept all HTTP from the app then you see everything that's being sent, spot the error, and fix the root cause directly. Is your script requesting a file from github and then crashing trying to use it? When you see that github is returning unexpected HTML instead of the download, because github is down again, the issue gets a little clearer.

Spot creepy app tracking & monitoring

The future is a dark place. Think your CLI tool might be sending your private data back to analytics & tracking services? If you can see all HTTP it sends, you can see exactly what's being reported and know for sure.

Live editing coming soon too, so you can block/edit these requests in flight as well.

Learn how your tools work

Ever wondered how Git actually works? No problem - open an intercepted terminal, clone a repo over HTTPS, and immediately read through every request it sends and receives to make that happen.

Apt-get doesn't use HTTPS and instead distributes packages over HTTP, but with signatures that you can validate locally. That means every request your client makes is publicly readable though. Open an intercepted terminal, run a quick sudo apt-get update, and see exactly what that shares with the world.

Want to see how many requests your npm install is making under the hood? …you get the idea.

How does this work?

Automatic terminal interception works by starting a new terminal window on your machine, and ensuring it starts with various environment variables set. This doesn't strictly enforce HTTP interception, but these variables are observed by almost every language & HTTP library you use on the CLI, and it's enough to ensure that 90% of tools work out of the box.

These variables need to do two things: send all HTTP(S) traffic via the proxy, and then ensure that all HTTPS clients trust the interception certificate authority (CA) used by HTTP Toolkit. The variables used to make this happen are:

HTTP_PROXY (and http_proxy) - the full URL for the proxy to use for HTTP traffic (e.g. http://localhost:8000)
HTTPS_PROXY (and https_proxy) - the full URL for the proxy to use for HTTPS traffic (e.g. http://localhost:8000)
SSL_CERT_FILE - the path to a file containing the certificate authorities (CA) certificates that OpenSSL should trust
NODE_EXTRA_CA_CERTS - the path to the extra CA certificates that Node.js (7.3.0+) should trust
REQUESTS_CA_BUNDLE - the path to the CA certificates that Python's Requests module should trust
PERL_LWP_SSL_CA_FILE - the path to the CA certificates that Perl's LWP module should trust
GIT_SSL_CAINFO - the path to the CA certificates that Git should trust

Out of the box, across Windows, Linux & Mac, this immediately intercepts:

Classic HTTP clients like Curl, Wget, and Lynx.
More powerful tools built on HTTP(S), including Git, Apt-Get, and HTTPie.
Almost all Ruby, Perl, Go, Bash, or Python scripts.
All Node.js tools that correctly observe HTTP_PROXY (unlike most tools, Node.js doesn't do this automatically). This does include npm though, and any requests made with libraries like Axios (0.14+) or Request (2.38+), or using global-tunnel.
Probably much more. Reports of issues/other working clients are very welcome!

I suspect there'll be extra cases that could be caught with a few more env vars. If you have one, I'd love to hear from you. Either file some feedback, or just open a PR on the interceptor directly.

If this looks interesting, go download HTTP Toolkit now and try it out for yourself. Have feedback? File it on Github, or get in touch on Twitter.

Welcome to the HTTP Toolkit blog

Thu, 17 Jan 2019 19:08:00 GMT

Watch this space for new features, ideas and progress updates.

The first release of HTTP Toolkit (codename HTTP View) recently went live, and it's time to start planning for the next steps, sharing those plans & progress, and pushing forwards. This blog is one of the first steps toward that. I'm going to be sharing updates and ideas directly, along with the details of new features, and writing about the process of building & using HTTP Toolkit in practice.

The 1st Release

The HTTP View release went fantastically, much better than I'd expected. Great level of attention & usage straight out of the gate, both for downloads but also mailing list signups for actual usage too.

Headline numbers: around 4000 visitors in the first 24h, and conversion rate to app download of nearly 20% (!!!). If you're interested, I posted a thread breaking down the numbers in full on Twitter.

It's been a long road to get here, but there's a few great bits of recent progress that're going to help, and progress on top is already accelerating rapidly:

As of late November 2018, I'm now working full-time on HTTP Toolkit! This is very scary, but it's allowed me to seriously focus on this, and made progress 1000x faster.
With this release, the core setup for the app and app distribution is all in place, so I can work almost 100% on shipping features from here on.
HTTP Toolkit is now live and open-source, so there's a lot of feedback coming in, and it's open for contributions from you and the community too.

What's Next?

So, given that, where do we go from here? My current plan is:

Set up a blog to keep people updated (check)
A quick new automatic interceptor for fresh terminal windows
Releasing the Pro version, to make development sustainable
Automatic Docker interception

That's all for now, but you can subscribe below for future updates & posts! Watch this space…

HTTP Toolkit

Dictionary Compression is finally here, and it's ridiculously good

Under the hood

How did we get here?

Where can I use dictionary compression today?

Compression dictionary transport

Putting compression dictionaries into practice

Building your own custom dictionary

Real-world results

Caveats

Wrapping up

Funding the OSS Stack: HTTP Toolkit & Open Source in 2025

HTTP/3 is everywhere but nowhere

Why do we need more than HTTP/1.1?

The two-tier web

OpenSSL + QUIC

What happens next?

HTTP Toolkit is joining the Open Source Pledge

What is the Open Source Pledge?

HTTP Toolkit's contributions

2024

2023

2022

2021

What's next?

ERR_PROXY_CONNECTION_FAILED errors with HTTP proxies

The Simple Case

Beyond the Simple Case

The Solution

Designing API Errors

HTTP Status Codes

Written Description of the Problem

Helping Machines with Error Codes

Complete Error Objects

Summary

22 years later, YAML now has a media type

What is X-Forwarded-For and when can you trust it?

What is X-Forwarded-For used for?

Can you trust X-Forwarded-For?

Which IP in the list is really the client?

1) Picking by Trusted Proxy List

2) Picking by Trusted Proxy Count

Parsing X-Forwarded-For headers

Invalid IP Addresses

Separator parsing

Multiple headers

Arbitrary code execution?

Alternatives & Standards

Forwarded Header

X-Forwarded-Host/Proto

Via Header

And others…

Summary

Further Reading

Working with the new Idempotency Keys RFC

Idempotency in HTTP APIs

How can non-idempotency go wrong?

Idempotency keys to the rescue

How can you get started with idempotency keys?

Making PATCH idempotent

Distributed idempotency with data adapters

Security considerations of idempotency keys

Further reading

A brief introduction to OpenAPI

What can OpenAPI do?

How does OpenAPI work?

OpenAPI Documents

info

paths

webhooks

components

Schema Objects

How can you use OpenAPI as an API developer?

How can you use OpenAPI as an API consumer?

Going forwards with OpenAPI

6 ways to debug an exploding Docker container

New ways to inject system CA certificates in Android 14

Choose your own adventure:

Clearing up confusion

Under the hood