Stories by Michael Hart on Medium

Using container images with AWS Lambda

Michael Hart — Tue, 01 Dec 2020 17:45:09 GMT

The serverless landscape just got massively expanded

Introduction

Container Image Support has just been announced for AWS Lambda and it’s a pretty big deal — I’m very excited because it’s something I’ve wanted for years!

https://x.com/hichaelmart/status/735877688943124480

I maintain a distribution of thousands of packages called yumda that I created specifically to deal with the problem of bundling native binaries and libraries for Lambda — I’m happy to now say that AWS has essentially made this project redundant 😄

Not Docker

To be clear, what’s been announced is not actually Lambda running Docker per se — it’s specifically using container images as the packaging mechanism. A little like zipping up your OS image, which then gets unzipped in Lambda’s environment when your function executes. But it’s very similar — the image is actually the main feature I (and many others) have wanted — because it allows you to bundle all your dependencies, including native binaries and libraries, using any (Linux) OS you want, into a portable package that can easily be tested locally and deployed remotely.

Loads of room

You also get a massive 10GB to do it all in, which overcomes another pain point many have had with trying to get large functions running on Lambda (eg those with ML models) — and is a huge step up from the previous 250MB limit.

Use standard tools

In this post I’ll show you how to use Container Image Support in Lambda with regular docker commands and versatile images like Alpine. We’ll see how you can test your images locally, and deploy them to Lambda.

The code for this tutorial can be found at https://github.com/mhart/pdf2png-demo. This post assumes you have docker installed on your machine.

Background

Let’s say we’re a web publisher and we want to create a service that can convert PDF files to PNGs whenever they’re uploaded, so we can use them as images in our web pages. In this case, we’ve found a PDF-to-PNG converter tool called pdftocairo which does just that, so we want to use it in our Lambda function.

Before Container Image Support, we would’ve needed to find a public Lambda Layer that already had this tool, or use yumda, or compile it ourselves for a Lambda environment with a particular directory structure. However, now that we can package our Lambda functions using container images, we can just install it using whichever Linux OS and package manager we like.

Bring your own OS

In this case I’ve chosen to use Alpine Linux to base the container image on. It’s a popular distribution for container images because it has a very small footprint and a strong track record with security.

I’ve also chosen to use Go as the language to develop the Lambda handler. Go has the advantage of compiling quickly and it can produce binaries that can talk to the Lambda Runtime API so can be used directly as the entrypoint in any container image without needing any extra software.

Getting our PDF conversion working

First we’ll create a container image we can run locally with docker to check that our tool works. We’ll create a program that will take a file at /tmp/input.pdf and turn it into a PNG file per page in /tmp/output, eg /tmp/output/converted-1.png, /tmp/output/converted-2.png, etc. We’ve chosen /tmp as it’s the only directory under which we can write files in Lambda (something to keep in mind if you’re used to a Docker environment where you can write to any OS path). Once we’ve confirmed this works, we can add the functionality we need to turn it into a Lambda handler and transfer the input/output files to/from S3.

Here’s the first version of our Go program, which we’ll save in src/main.go, that will execute a short shell script including the pdftocairo converter tool:

https://medium.com/media/be9e78394ad1a04eea3000b4bc712992/href

Then we create src/Dockerfile that contains the instructions to build our container image. It’s a multi-stage Dockerfile, with the first stage (labelled build-image) compiling our Go source files, and the final stage becoming the image we’ll test (and eventually deploy on Lambda).

We can use package managers

The package manager on Alpine Linux is called apk (similar to yum on Amazon Linux or apt on Ubuntu) — so we use apk addto install the tools we need, and then we copy over our program from the build image step.

https://medium.com/media/c69a732b825136a401936580acef985a/href

We can then build the image (giving it the name pdf2png) by running docker build from the top-level of our project:

docker build -t pdf2png src

Then we’ll place a PDF file named input.pdf in our test directory and we can mount this directory as /tmp when we run the container:

docker run -v "$PWD/test":/tmp pdf2png

Now we should see a new directory, test/output that contains the converted PNG files. Our tool works! With regular old docker commands! Now we just need to wrap it in a Lambda handler.

Adding a Lambda handler

To turn this into an image that Lambda can execute, we can just modify our Go program to execute a handler function in the same way we would for the Go Lambda runtime.

We’ll keep the existing test functionality, but we’ll only run it if we pass in —-test as a command-line argument. Otherwise we’ll execute our Lambda handler.

https://medium.com/media/2bf53be4d924a70f3ead363d6b453b52/href

The handler will respond to S3 object-created events — it will download the created S3 object locally, then run our pdf2png conversion function that we’ve already tested, then upload the PNG files to the same S3 bucket, under a different key prefix (see the demo repo for details of the download/upload functions)

The build instructions in our Dockerfile need to download the Go dependencies we need, and build our program for a Lambda environment, but the commands for the deployment image stay the same.

https://medium.com/media/fdb75aca48bf852aa078fa1c4bedec93/href

We build it in the same way we did before:

docker build -t pdf2png src

Thanks to Alpine, our image is actually pretty small, only 40MB! This has the advantage of being quick to build and deploy and upload/download.

And we can still test the conversion on a local file as we did before, now just passing in the --test flag:

docker run -v "$PWD/test":/tmp pdf2png --test

Testing our Lambda handler end-to-end

So far, we haven’t been able to check the rest of our function’s implementation though — the Lambda handler and the S3 upload/download.

Luckily, we can also test the full Lambda handler functionality with high fidelity, including our interaction with S3, using the Runtime Interface Emulator. This will run our program in an emulated Lambda environment and expose the same endpoint as the Lambda Invoke API so we can call it locally using an http client like curl or the AWS CLI.

There are a few ways we can get the aws-lambda-rie emulator binary onto our image for testing:

Include it in the same image we deploy to Lambda
Create a new image for testing
Copy the emulator locally and mount it during docker run

In this case we’ll go with option 2 and create a new image that extends the image we already created, but adds the emulator and sets it as the entrypoint. So test/Dockerfile looks like this:

https://medium.com/media/de93fa172ec799624906a747fedc6527/href

And we can build it with:

docker build -t pdf2png-test test

When running live on Lambda, the function will get AWS credentials from the environment, in the form of AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN variables, but we’re going to use a slightly more convenient method here. We’ll mount our ~/.aws directory into the container which will allow the SDK to use the same credentials as we would use with the AWS CLI. We can then use the AWS_REFION and AWS_PROFILE environment variables to choose our region and profile. As the emulator runs on port 8080 in the container, we’ll also need to expose that to our local machine, so we’ll map it to port 9000 locally.

https://medium.com/media/d94cfc8aa63c37efd857a6c61e3977e8/href

We’ve now started the emulator running locally, and it’s listening for events. Let’s open up another terminal to send events to it.

We’ll need to create an event payload that matches the signature of our Lambda handler. There’s an example in the project repository that you can use. You can also generate one using the SAM CLI with the sam local generate command

As this test will actually download the S3 object we specify, and upload the converted files, we need to pass in a real S3 bucket and key with the event. So choose an existing S3 bucket you have, or create a new one (aws s3 mb) and upload a PDF file to test:

aws s3 cp test/input.pdf s3://my-bucket/upload/123/test-file.pdf

Then edit test/event.json to change the bucket to my-bucket and the key to upload/123/test-file.pdf.

Then you can invoke your function by calling the local endpoint with this event payload:

curl -d @test/event.json http://localhost:9000/2015-03-31/functions/function/invocations

Then you can see it execute it the terminal you ran the docker command in. The converted file should be uploaded to that same S3 bucket and you can use ctrl-c to exit the emulator.

Great! We’ve verified that our container image works with the Lambda runtime emulator and now we’re ready to deploy it.

Deploying

Container support requires your Lambda function code point to an image URI from an ECR Repository. The demo repo includes an infrastructure stack that will set this up for you, but here’s a guide if you want to do it manually:

Create an ECR repository, eg called pdf2png-app. The full name of this repository will be something like .dkr.ecr.us-east-1.amazonaws.com/pdf2png-app

Then you can tag your image with this repository name

docker build -t .dkr.ecr.us-east-1.amazonaws.com/pdf2png-app:v1 src

You’ll need to ensure docker is authenticated with ECR before you can push this image:

aws ecr get-login-password | docker login --username AWS --password-stdin .dkr.ecr.us-east-1.amazonaws.com

And now you can push your tagged image to your repository:

docker push .dkr.ecr.us-east-1.amazonaws.com/pdf2png-app:v1

Now that we have an image URI, we can use it as the source of our Lambda function. Here’s the full cloudformation template for our app that creates a container image Lambda triggered by an S3 bucket:

https://medium.com/media/866c9f6f3627b604bbf8187d750cfbdb/href

We can deploy this by passing in the image URI we just created:

https://medium.com/media/30a424b70286ea7215969a95cc40d369/href

Once this is deployed, you have an Image Function!

Using our live function

Congratulations! Now we can check that it works.

The S3 bucket created should be called: pdf2png-app-. You can find the exact name in the Outputs tab of your pdf2png-app CloudFormation stack.

Then you can test the functionality by copying a PDF file to the S3 bucket in the appropriate prefix:

aws s3 cp test/input.pdf s3://pdf2png-app-/upload/2020/12/01/123.pdf

And (a second or so later) observe the file has been converted into one or more PNG files!

aws s3 ls s3://pdf2png-app-/converted/2020/12/01/123/

If there are downstream services that need to be notified when the conversion has completed, we could add notifications to our Lambda function using SNS or EventBridge too.

Conclusion

In this post I’ve shown you how container image support in Lambda makes it easy to create complex applications that rely on binary tools.

I’ve shown how you can use existing docker tooling to create and test your container images locally — including a full integration test talking to S3 — and deploy Lambda functions packaged using these images.

I’m looking forward to using container image support for other use cases too. With 10GB of image size, we can now deploy tools on Lambda that were previously difficult, if not impossible, such as machine learning and serverless continuous integration. It really opens up a new world of possibilities.

Shave 99.93% off your Lambda bill with this one weird trick

Michael Hart — Mon, 09 Dec 2019 16:23:15 GMT

AWS Solutions Architects hate him.

Unprovisioned

AWS launched Provisioned Concurrency for Lambda at re:Invent 2019 last week — essentially a way to keep warm Lambdas provisioned for you so you don’t experience any cold start latency in your function invocations. It also may save you money if you happen to have the ideal workload for it, as it’s priced at $0.05/hr (for 1 GB of memory) instead of the usual $0.06/hr.

This theoretical 16.67% saving is not what this article’s about though — it was only as I was exploring this new feature that I was reminded of an interesting factor of Lambda I discovered a couple of years ago.

Before I dive in, I’ll preface this with: this is an FYI, to explore some aspects of Lambda that you may be unaware of. It is not something I’ll be releasing code for. You’ll see why.

Fast Boot

The thing I noticed with Provisioned Concurrency Lambdas was related to global work done outside of the function handler, before it’s invoked — let’s call it the init stage. For Provisioned Lambdas, this is executed in the background whenever you config your provisioned concurrency settings, and then every hour or so after that. Work done during this stage seemed to be executing at the same performance as the work done during the handler function on invocation — plus, you’re also charged for this time. That would seem unsurprising if it weren’t for the fact I was reminded of: that normal Lambda “containers”, unlike Provisioned Lambdas, actually get a performance boost when they’re in the init stage. This is presumably to aid cold starts, especially in runtimes like Java and .NET that typically have slow process start times and large class assemblies to load.

What do I mean by a perf boost? Well, we can measure it. Let’s code up an in-no-way-contrived Node.js 12.x function where we see how many PBKDF2-100k-iteration password hashes we can calculate per second. We’ll do it once outside the handler (initHashes below will get run only when the Lambda container cold starts), and once inside the handler (handlerHashes below).

For those not aware, Lambda memory and CPU for your handler are linked (with some nuances for multi-core), so let’s play with the memory setting. If we do it at the highest setting, 3008 MB, we see there’s little difference in performance, getting just over 12 hashes/sec both during the init stage and the handler stage:

No difference at 3008 MB

These numbers are about the same until we get down to around (exactly?) 1792 MB. The multi-core nuance I mentioned above is that above this point, instead of CPU increasing as memory does, you instead get an extra core — but as this code’s single-threaded, we didn’t see any difference.

No difference at 1792 MB

Below this memory setting is where it gets interesting. We find the init performance stays the same, even when we get all the way down to 128MB, but the handler performance degrades in direct proportion to the memory.

Half the performance at 896 MB

Almost exactly 1/14th at 128 MB

128 MB init = 1792 MB performance

So essentially, we’ve established that the init stage has the same performance as a 1792 MB Lambda, even if we’re only running a 128 MB one.

Maybe you can see where I’m going with this… If we can do all of our work outside of the function handler, we get $0.105/hr (1792 MB) performance for only $0.0075/hr (128 MB)— a 14x cost saving 🎉

But how

Hang on a minute, I hear you cry. Firstly, how are we supposed to do all of our work outside the handler, if any subsequent time we invoke that Lambda, it’s already warm and that code won’t even run? Secondly, how are we supposed to pass anything to the init stage if only the handler receives events? And finally, 1/14th is “only” a 92.86% cost saving, not the 99.93% you promised 💸

Always cold

Let’s tackle that first point. There are some basic ways to ensure we always hit a cold Lambda, such as modifying any aspect of the function that would cause existing warm containers to be out-of-date. We were doing exactly that when we were fiddling with the memory settings above — each time we change that number and invoke, it’ll be fresh containers that get hit. Modifying environment variables, deploying new code, and other function config settings would achieve the same thing. The APIs to do these are probably rate-limited at a fairly strict rate though, so YMMV.

Another way to achieve this is to just exit the process in the handler. The Lambda supervisor will need to restart the process when the next invoke comes in and the init code will run again. The downside to this is that the function will always return an error.

Getting data in and out

To the second point, you basically can’t pass any events in outside of the handler. If you’re just doing some sort of fixed job that didn’t require events, then this isn’t a problem. You could try to do it via environment variables I guess, but you’d need to modify the function’s config with each invocation.

One thing you can do, though, is make HTTP calls, API calls, etc. You can read from an SQS queue, a DynamoDB table, S3, Route53, or maybe even use something crazy like Serverless Networking. (also, if you’re using Node.js, you’ll need to spawnSync/execSync another node process to do any async work)

If you needed your Lambda to respond synchronously, you’d have to have another normal 128 MB one (or another something) in front of it. This function could post the event to an SQS queue, invoke the cold-starting Lambda, and then wait to get a response from a second SQS queue. The cold-starting function reads from the first queue and responds to the second. Pretty messy, wouldn’t recommend, but you know, we’re talking mad science here.

100x developers

Alright, here’s where it gets even more far-fetched. If you actually ran the code from earlier, you may have noticed another interesting thing: the billed duration didn’t match the entire duration of work done. In fact, the init duration isn’t included in the billed duration at all. The init stage is free.

At least, up to a point. Technically you can do up to 10 seconds of work before it starts getting included in the billed duration. Also, you’ll always have to pay something for the handler execution — the minimum billed execution time is 100ms.

To illustrate, let’s first modify our code from above and do as many hashes as we can in our handler within 10 seconds on a 1792 MB Lambda. We subtract a little buffer to make sure we definitely stay under 10 secs, though it’ll be rounded up when billed.

So we calculated 122 hashes in 10 secs. Let’s say we wanted to calculate a billion hashes this way. Using 1792 MB Lambdas, this would cost us $2,390.71.

Now let’s try it outside the handler on a 128 MB Lambda. We use the container start time (as measured by /proc/1) to accurately calculate our deadline as some time would have already been used up with the process starting, requiring Node.js modules, etc. We also exit the process so our init code will always run, as we mentioned earlier.

Here we calculated 121 hashes — one less, we needed to be a little more cautious so as not to hit the 10 second limit. Still, it was in a 128 MB Lambda and we were only billed for 100ms, 100x less than the 10 seconds we ran for.

Calculating 1 billion hashes this way would cost us $1.72 — that’s 1,390x cheaper: a saving of 99.93%

Responsible disclosure

I first noticed this in Jan 2018 and I was a little worried as it wasn’t documented anywhere and I thought it may be a resource abuse vulnerability. I contacted AWS security (aws-security@amazon.com), was told the relevant teams would be contacted to investigate, and heard no more.

Since then it’s been mentioned a number of times from different AWS people as a feature, not a bug. A little thank-you-for-using-Lambda if you will.

https://x.com/edjgeek/status/1192126550235369472

https://x.com/kecabojan/status/1192252416248254464

Takeaways

Obviously you shouldn’t code your app like this. It’s a proof of concept that involves lots of hoop-jumping and who knows, you may very well get a slap on the wrist from AWS if you start abusing it.

However, it is a good illustration of just how much you should leverage the init stage. Do as much work as you can outside of your handler: it’s fast and cheap. Even with Provisioned Lambdas where you don’t get any perf boost or cost saving, at least it’s work that doesn’t need to happen in your handler, which will leave them nice and responsive.

Happy hacking everyone!

Staying warm with docker-lambda

Michael Hart — Sun, 24 Nov 2019 21:55:53 GMT

Sometime in the last few days, docker pulls of lambci/lambda hit 35 million.

Hockey anyone? 🏒

Which is more than twice what it was six months ago:

https://x.com/hichaelmart/status/1128697320374251520

I don’t know where serverless is on the hype cycle these days, but if local testing and building of Lambda services is anything to go by, doubling every six months ain’t bad 📈

Strict growth

I have no doubt docker-lambda’s growth is fueled by Amazon’s decision to use it as the base of their AWS SAM CLI tool for local testing — as well as tools like Serverless Framework, localstack and many others.

Part of its appeal I think lies in the fact that it reproduces the Lambda environment incredibly strictly. This means that developers spend less time needing to deploy their software just to see how it will run—even if they have unusual requirements that go beyond simple hello-worlds.

But there is some friction with this approach. In my stringency, I’ve tried to add as little extra software as possible to the images, so developers don’t rely on things that just don’t exist in production. However, that means the images have been stuck in a “run-once” model: you run the docker container, it executes the runtime, including your Lambda handler, and exits. Every execution is a cold start, as there’s been no coordinating service to allow for multiple invocations.

…Until now

Well screw that stringency, I’m excited to announce that docker-lambda now supports warm invocations for all runtimes 🎉

You can run each runtime in a “stay-open” mode, and it will start an API server that listens for events in exactly the same way the production Lambda API does. You can use the AWS CLI to invoke them, or you can just use curl or any other http client (there’s no authentication). The first invocation will still be a “cold start”, but all subsequent invocations will be warm.

It still keeps the concurrency model that Lambda has — each container won’t handle more than one request at once — but testing on my MacBook Pro yields around 130 req/s for a fast handler. This should be fast enough for local testing, and certainly an order of magnitude faster than executing the docker container plus a cold start each time.

Internally it uses the same mechanism each runtime does in production, even the legacy ones. Each has a loop and a “receive-invoke” function that waits for incoming events, and I hook into this. The only difference is I’ve added a mock server (written in Go) responsible for dispatching the events, instead of the native socket methods used in production. I needed to add this as an extra piece of software on the older runtimes (at /var/runtime/mockserver), but aside from that, I kept the intrusion pretty minimal.

Along the way, I refactored a lot (especially the dotnetcore runtimes), and the behavior of the runtimes should be more standardized across each language. It’s become a veritable Pantone wheel of a repo.

Features, features

This feature follows a line of smaller ones I’ve been rolling out in my spare time over the past few weeks.

Support for the new runtimes from day one: nodejs12.x, python3.8 and java11
The images now support Docker Content Trust, so you can have some assurance about the provenance of the software you’re running
I launched a related project, yumda — which makes it a lot easier to install system dependencies for your Lambdas, and is especially relevant for the newer runtimes running on Amazon Linux 2 (nodejs10.x, nodejs12.x, python3.8 and java11). I chatted with Jeremy Daly about this and other Lambda deep-dives on his Serverless Chats podcast here and here.

The future

The obvious use case for keeping runtimes warm is local API Gateway testing. I’m hoping support in AWS SAM CLI and Serverless Framework will be added for this soon (or maybe it’s the excuse I need to finally write a docker-lambda CLI).

Please test away and file issues if you run into any behavior that differs from production.

If anyone has suggestions around how the build images or yumda can reduce their development friction even further, I’m all ears!

Here’s to 70 million, see you in six months 🍸😸

AWS Lambda nodejs10.x = FIXED

Michael Hart — Tue, 25 Jun 2019 18:13:19 GMT

A month ago I dug into the nodejs10.x runtime and highlighted some issues with it — including some bugs and style problems. I’m glad to now report that all the issues I raised have been addressed in the latest runtime code, which should be running on all nodejs10.x Lambdas as of the time of writing.

There are also couple of changes that will break functions using relative handler paths and relying on logs preserving newlines — I cover these further down.

To summarize briefly how the issues were addressed:

The env vars in bootstrap are now named correctly, AWS_LAMBDA_FUNCTION_MEMORY_LIMIT ⇒ AWS_LAMBDA_FUNCTION_MEMORY_SIZE, --max_semi_space_size ⇒ --max-semi-space-size, --max_old_space_size ⇒ --max-old-space-size
String quotation marks are now standardized (they chose double-quotes, boo… #team-single-quotes). There are a number of other formatting changes that suggest they’re now using a linter like eslint and a formatter like prettier, which I highly recommend to any JS team — the fewer style arguments (and diversions) the better.
An unhandledRejection handler has been added — and top-level errors are now logged as they were in previous runtimes, which caught a few people out.

https://x.com/brianleroux/status/1131974411609812993

The beforeExit bug in index.js regarding work done outside the handler has been fixed — essentially runtime.scheduleIteration() is now explicitly called on startup as well on beforeExit, so it doesn’t matter if the event loop never becomes empty and beforeExit never fires.
Handlers that use a callback will no longer erroneously use the return value of the handler function instead of the value passed in the callback. The explicit call to callbackContext.succeed(result) in Runtime.js after function invocation was removed, leaving the callback to handle success/failure itself.
All asynchronous functions now use callbacks correctly 🎉 There are no more cases of functions being called serially without waiting for the previous result, and all callback functions now have callback parameters. This is GREAT to see, and should avoid any hard-to-track-down bugs in the HTTP handling code.

Breaking changes

As I mentioned before, there are SOME changes that may break (or already have broken?) existing functions if you were relying on either of the following:

Relative handlers (eg, ../../opt/index.handler) — these no longer work and will throw an exception. Alternatively you can use an absolute handler, eg /opt/index.handler.
Newlines in log output. This is an unfortunate change IMO — newlines(\n) in console.log() statements are now replaced with carriage returns (\r) . This means any object logging, or formatted JSON strings you were outputting will now be transformed, potentially irreversibly. If you’re parsing these logs using CloudWatch, you’ll need to update your parsing code, OR add the following to your lambda to revert to the log formatting from previous runtimes:

const { logger, appenders, layouts } = require('lambda-logging')
logger.appender = new appenders.ConsoleAppender(
new layouts.Node4LegacyLayout()
)

There are a lot of other changes to the lambda-logging module, which AFAICT isn’t documented anywhere. I personally think log formatting should be left up to individual functions, and the way the runtime messes with console.log and the overhead of including this module in everyone’s Lambda isn’t worth it. Hopefully AWS will document this module soon in any case.

Conclusion

I’m glad to see that AWS has updated this runtime, and the code looks in much better shape for it.

I’m now lifting my previous caution and suggest everyone give it a shot — watch out for the logging changes, but aside from that, launch away! 🚀

Reviewing the AWS Lambda nodejs10.x runtime

Michael Hart — Mon, 20 May 2019 19:31:12 GMT

Update 2019–06–25: All of the issues I raise here have been addressed in the latest runtime! I’ll leave this story here for posterity, but it no longer reflects the current state of affairs. See my latest blog post on how AWS addressed the issues I raise here.

Last week AWS announced official support for Node.js v10 on Lambda, which is great! Or at least, it will be once it stabilizes a bit… Here’s what I found after digging into the code.

Background

I run the docker-lambda project which allows you to execute a docker container that’s a replica of the Lambda environment (also used in the AWS SAM CLI and Serverless framework among others) — so I have a detailed understanding of the Lambda runtimes and the code they run on. I also maintain a custom runtime implementation of Node.js v10 (and v12), which we’ve used for hundreds of millions of invocations at BDG. All this to say when I looked under the hood at the code of the official v10 runtime, I have some idea of what it should look like, and it gives me the impression you might want to hold off using it for now.

The code

The code which calls your Lambda function can be found in /var/runtime on a nodejs10.x Lambda, which is laid out like this:

https://medium.com/media/7a06786d2575837b804ebed72063f60f/href

bootstrap is a shell script that launches node, spawned by /var/rapid/init in the same way that custom runtimes are. It sets up the NODE_PATH env var and is mostly unremarkable except that it currently contains a little bug:

https://medium.com/media/918c360b36a30668848426abe27d269f/href

The bug being that the environment variable that it’s launched with on Lambda is actually AWS_LAMBDA_FUNCTION_MEMORY_SIZE, not AWS_LAMBDA_FUNCTION_MEMORY_LIMIT — so that bit of logic to setup the node args of --max_semi_space_size and --max_old_space_size does nothing, and the args aren't passed at all (unlike the other 8.10, 6.10, etc runtimes that do pass them). The AWS team are aware of this and I've been told a fix should be pushed out soon.

Onto index.js!

https://medium.com/media/1423d56d37dd33388c4c2b1f0c3965e0/href

The log patching done by enableLogPatch() is similar to the log patching done in the previous 8.10, etc runtimes — but not quite the same. As with the other runtimes, it patches the native console.(log|error|warn|info) functions to prepend tab-separated metadata to each call (not a huge fan of patching core functions like this, but Node.js Lambdas have done this since 0.10). The nodejs10.x runtime inserts another field though (for log level, INFO/ ERROR/ WARNING) so any existing parsers will break (and indeed the AWS web console doesn't expand JSON objects in the same way anymore).

The mixed quote styles (some single, some double) put me off a bit — easy to setup a linting rule for that. This continues throughout the codebase with no pattern that I can see — both 'utf-8' and "utf-8" appear in RAPIDClient.js. Unlikely to cause bugs, just rare to see in well-written JavaScript code these days and a sign that perhaps there's no linting tool in use here (or that it's very light on rules — some other inconsistencies in the codebase like !== vs != for non-nulls, typeof(x) vs typeof x, {x} vs { x }, etc).

Given there’s an uncaughtException handler, an unhandledRejection handler could also be added — it would clean up some of the Promise error handling code as we'll see later on.

The main issue I have with this entrypoint though is the final process.on('beforeExit', ... line - that is, it waits for the event loop to be empty before processing the user's handler function. This is intended to deal with some complex callback logic we'll see later, but it means the handler function won't be called until any top-level async work has completed. In certain circumstances it means that the handler function won't be called at all!

Let’s say we had this handler file:

https://medium.com/media/fc89424c355112bb1634f9e19989ff62/href

The setInterval call will start running as soon as the handler file is required by the UserFunction.load() call above — but it also means that beforeExit will never fire, because the event loop will never be empty. If you try this out on Lambda, your handler will never be called and will just timeout eventually. AWS has acknowledged this issue and a fix should be on its way.

This is basically the beginning of the awkward async state-machine dance that happens in the runtime, continuing with the runtime.scheduleIteration() call:

https://medium.com/media/2c97e52031ac61efaaa00160cd8e2030/href

The setTimeout(..., 0) is there (presumably) to schedule this function at the start of the next event loop (I must admit it's unclear to me why this is necessary). Arrow functions pass this, so there's no need for the that = this dance. You also don't need to pass functions to then if you don't need them, so you could just use null as the first argument instead of allocating an empty function. Or you could just get rid of the whole thing and let unusual errors like this be handled by unhandledRejection.

We’ve already waited for beforeExit, now we've waited for a setTimeout(), finally we get to call the handler:

https://medium.com/media/116e61ccd06b18fd1c0649f45275e29f/href

It sets up the context and callback, calls the user’s handler function and then deals with the result. At this point I should point out the callbackWaitsForEmptyEventLoop property that exists on the Lambda context. IMO this is a very unfortunate legacy — for some reason when callbacks were initially introduced in Node.js Lambdas (the original API was just context.(done|succeed|fail)) AWS decided that by default, instead of returning from your Lambda function when you call the callback, only return when the event loop is empty. Why? I'm not sure — maybe to deal with poorly written callback code? It's certainly very un-Node.js-like in any case. To turn off this unusual behavior you explicitly need to set context.callbackWaitsForEmptyEventLoop = false.

So with the code above, if the result isn’t a Promise and callbackWaitsForEmptyEventLoop is false, then callbackContext.succeed() is called immediately with the result of the handler() and anything passed to the callback asynchronously (which is the usual way callbacks are called) will be ignored completely.

Putting it in plainer terms, if you had a Lambda function like this:

https://medium.com/media/e1ec12bca8fd20ac4f12c51efb87650f/href

It would be completely broken. As intended, it calls the callback with the result it wants to return in the Lambda (the handler function itself returns nothing, ie, it implicitly returns undefined). The problem is, handleOnce() will use the handler function's return value as the result and call callbackContext.succeed(undefined) before the callback is even handled — this is eventually stringified into "null" and that will be the result of the Lambda invocation, not { success: true } as we wanted. This has caused some other bugs yet to be acknowledged by AWS.

The way the callback function is created in CallbackContext.build in CallbackContext.js also bears scrutiny:

https://medium.com/media/2d8dfb96e109086f50e67a18af9f3a67/href

Looking at that last comment we can immediately see why the buggy beforeExit handler was setup in index.js — it's to cover cases where waitForEmptyEventLoop is true (which maps to context.callbackWaitsForEmptyEventLoop).

But the more worrying thing that jumps out to me about this code is that it appears to be just executing asynchronous commands ( postError(), scheduleNext()) one after the other, as if they were synchronous.

Popping open RAPIDClient.js (where the client.post...() functions reside) shows this (trimming mine):

https://medium.com/media/06f312df394c9f54cf7d1d05e6dbf7a7/href

Lots of alarm bells going off in my head with this code:

Errors will be rethrown in a different context, so can’t be handled by any caller of this function.
There’s no checking of the HTTP status code of the response (as there is in other runtime implementations), so if there’s a problem handling the response, the user wouldn’t know.
There’s no callback and nor does it return a Promise — so there's no way to know when this call has completed — the no-op response handler reveals as much. This means that multiple calls to this function in succession could very well end up overlapping, with HTTP calls potentially ending up out of order if some are sent faster than others.

Now you could argue that this would be less worrisome if it were used for something non-critical, like pushing logs out-of-band or similar — but in this case it’s used to post back the response of your Lambda function. No wonder the scheduling code with setTimeout() etc is complicated! If you ever write an HTTP client in Node.js, please don't write it like this.

How should it be written then?

I honestly think the best way to write easy-to-understand async-heavy code in Node.js these days is to use async/await and Promises.

In the specific case of needing to cover the legacy behavior of callbackWaitsForEmptyEventLoop, context.done(), etc, Promises give you a way of ensuring that only one of these mechanisms “win” each invocation. Coupled with async/await , they also allow you to build up an event processing loop that’s easier to understand than using events and callbacks (or not even using those!).

The entire processing and callback logic from node-custom-lambda, supporting all legacy use cases, can be boiled down to:

https://medium.com/media/366c34093bd6f97120c68e5f6dea3844/href

By wrapping the user’s handler in a Promise, we can always rely on awaiting it, which vastly simplifies the surrounding logic. The only awkwardness comes from the handling of callbackWaitsForEmptyEventLoop where we need to restart the loop on beforeExit, but there’s little way around that and it’s vastly simpler than many of the dead-ends in the (current) official runtime.

Conclusion

I’m hoping in fixing some of the bugs users have been finding in the runtime, that AWS will use it as a reason to also simplify some of the bootstrap code.

Until then, I’d hold off on updating to this runtime — especially if you currently rely on the legacy callback behavior (I’ve seen fewer issues if you’re only using Promises).

Nothing like thousands of users hammering your code to get it into shape huh? 😉

Massively Parallel Hyperparameter Optimization on AWS Lambda

Michael Hart — Thu, 07 Feb 2019 21:58:07 GMT

I’m excited to share a hyperparameter optimization method we use at Bustle to train text classification models on AWS Lambda incredibly quickly— an implementation of the recently released Asynchronous Successive Halving Algorithm paper by Liam Li et al, which proved more effective than Google’s own internal Vizier tool. We extend this method using evolutionary algorithm techniques to fine-tune likely candidates as the training progresses.

https://x.com/hichaelmart/status/1055925313362890762

(Coincidentally, there’s a talk on the ASHA paper at the AWS Loft in NYC tonight, 7 Feb)

Background

We use text classification extensively at Bustle to tag and label articles, freeing our editors up to create awesome content. There are now a number of excellent services for this — notably Google’s AutoML and Amazon Custom Comprehend. However, if you need to backfill custom tags on hundreds of thousands of articles, these offerings aren’t cheap at scale— classifying 300k articles at 1,500 words each would cost you $2,000 (and would take potentially days of API calls). Such tasks lend themselves better to training your own machine-learning model to run locally that essentially costs you nothing at classify-time.

Recent deep learning models such as BERT are state-of-the-art in this domain, but lightweight alternatives like fastText deliver similar results for much less overhead. Regardless of the model you’re using though, there will always be parameters you need to tune to your dataset — the learning rate, the batch size, etc. This tuning is the realm of hyperparameter optimization, and that can be slowwwww.

Services like AWS SageMaker’s automatic model tuning take a lot of pain out of this process — and are certainly better alternatives to a grid search — but they tend to use Bayesian optimization which doesn’t typically lend itself to parallelization, so it still takes hours to tune a decent set of hyperparameters.

Enter Lambda

We’re big fans of AWS Lambda at Bustle and we’ve actually been doing our own ad-hoc machine-learning tuning on Lambda for a while now, so when I stumbled onto the ASHA paper, it seemed like an even better fit. To illustrate:

Dashed lines represent min and max scores of the trials

The chart above shows our implementation of ASHA running with 300 parallel workers on Lambda, tuning fastText models with 11 hyperparameters on the ag_news data set and reaching state-of-the-art precision within a few minutes (seconds?). We also show a comparison with AWS SageMaker tuning blazingtext models with 10 params: the Lambda jobs had basically finished by the time SageMaker returned its first result! This is not surprising, given that SageMaker has to spin up EC2 instances and is limited to a maximum of 10 concurrent jobs — whereas Lambda had access to 300 concurrent workers.

Default limits for Lambda on left, SageMaker on right: Not exactly a fair fight

This really highlights the power that Lambda has — you can deploy in seconds, spin up literally thousands of workers and get results back in seconds — and only get charged for the nearest 100ms of usage. SageMaker took 25 mins to complete 50 training runs at a concurrency of 10, even though each training job took a minute or less — so the startup/processing overhead on each job isn’t trivial, and even then it still wasn’t getting close to approaching the same accuracy as ASHA on Lambda.

Not getting much better…

It would be remiss of me to point out that the overhead of SageMaker becomes less important if the jobs you’re training take hours anyway. Just that, for this particular problem, it’s dwarfed by overhead.

ASHA

The CMU ML blog has a great description of ASHA — a relatively intuitive technique that trains model configurations for small amounts of time (or some other resource) and allows them to proceed to train for longer if they look to be doing well. The diagram below illustrates this with the bottom “rung” containing configurations that are trained for a short time, progressing (if they do well) to higher rungs where they can be trained for longer.

(credit: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/)

It’s particularly suited to problems that can be solved incrementally, eg training a neural network for a certain number of iterations or epochs.

As I was reading, I found myself furiously nodding at this snippet from the post:

We argue that tuning computationally heavy models using massive parallelism is the new paradigm for hyperparameter optimization.

I couldn’t agree more, and Lambda has some properties that make it a perfect platform for a problem like this:

it immediately scales to thousands of concurrent jobs
jobs start in seconds, at most
deployments also take seconds — and machine learning involves a lot of tweaking

There are also some serious disadvantages to using Lambda for machine learning — at least, and this is important to stress — with its current limitations:

250MB function/layer size + 512MB disk
3GB memory
15 min time limit
No GPU access

Given their track-record, I have no doubt AWS will increase at least some of these limits soon, however deep learning architecture searches on Lambda might need to wait for the future.

A library like fastText, on the other hand, is well-suited to working within such limitations for medium-sized data sets as it uses “hashing tricks” to maintain fast and memory-efficient representations of n-gram models — so we have no problem training data sets with hundreds of thousands of articles within these limits. And it’s easy to compile and get running in the Lambda environment.

Our implementation

In the original paper, the learning rate is adjusted according to a schedule during a single training session. Instead of modifying fastText, we take the approach of just decaying the learning rate slightly every rung of the algorithm — that way, high learning rates that will likely do well in a small number of epochs will be decreased as the number of epochs increases, allowing the algorithm to train slower but (hopefully) more accurately.

We also combine the selection mechanisms of ASHA with evolutionary algorithm techniques as outlined in So et al and similar papers — at the time of generation for new candidates, we choose either to generate a new random configuration (as in the original ASHA paper), or perform a tournament selection and mutation of an existing parent — increasing the probability of the latter as time progresses.

As ASHA is an anytime algorithm we add a stopping criteria of a certain number of total epochs completed (which should roughly translate to, eg, dollars spent on AWS Lambda), which gives us a schedule to increase the likelihood of evolution. We also decrease the selection size in time and decrease the amount of mutation, both in terms of number of parameters mutated (as in Piergiovanni et al), and the magnitude of gaussian noise added to each (real number) parameter.

The rung-promotion technique in ASHA gives us the advantage of being able to select from models that have already been promoted, so we can choose to select parents from the top-most rung. This also means that evolution only begins after a certain amount of the search space has been explored, as there will be no configurations in the top rung in the early stages.

Intuitively this sort of technique makes sense – as ASHA progresses, the probability of a completely random configuration finding a new global minimum decreases, so (given finite resources) fine-tuning existing configurations to their local optima becomes a better strategy.

There’s no reason that this fine-tuning needs to occur on a schedule though — it would be trivial to keep the anytime aspect of the algorithm, and interactively choose to finetune at a time when you perceive that the configurations are no longer improving.

Future Extensions

While models requiring GPU access might be out of bounds on Lambda for now, other libraries like XGBoost, LightGBM and CatBoost should perform well on Lambda — again, on medium sized data sets. This is something we hope to explore.

Lambda might be a good environment to try out reinforcement learning problems — depending on how long it takes to run each problem — but evolutionary strategies like those outlined by OpenAI might be good candidates for this sort of algorithm.

ES might be a good method for fine-tuning candidates, as opposed to just simple mutation. And we also haven’t tried any form of crossover, which also might improve performance of the fine-tuning stage.

Finally, for problems that really can’t fit on Lambda, it would be straightforward to instead invoke them on AWS Fargate or AWS Batch — invocation times are an order of magnitude slower, but again, these may pale in comparison to the job time.

Conclusion

I’ve personally always enjoyed pushing technologies like Lambda to their limits, and while Lambda might not be the exact environment the ASHA authors had in mind when they wrote the paper, it’s been very fun getting it all running and being blown away at its capabilities. I have no doubt that as limits get lifted, serverless environments will become commonplace for machine learning problems.

If you have any questions, hit me up on Twitter.

And if you like these sorts of problems, we’re hiring!

Introducing LambCI — a serverless build system

Michael Hart — Wed, 06 Jul 2016 21:02:59 GMT

I’m excited to announce the first release of LambCI, an open-source continuous integration tool built on AWS Lambda 🎉

LambCI is a tool I began building over a year ago to run tests on our pull requests and branches at Uniqlo Mobile. Inspired at the inaugural ServerlessConf a few weeks ago, I recently put some work into hammering it into shape for public consumption.

It was borne of a dissatisfaction with the two current choices for automated testing on private projects. You can either pay for it as a service (Travis, CircleCI, etc) — where 3 developers needing their own build containers might set you back a few hundred dollars a month. Or you can setup a system like Jenkins, Strider, etc and configure and manage a database, a web server and a cluster of build servers .

In both cases you’ll be under- or overutilized, waiting for servers to free up or paying for server power you’re not using. And this, for me, is where the advantage of a serverless architecture really comes to light: 100% utilization, coupled with instant invocations.

https://twitter.com/adrianco/status/736553530689998848

Systems built on solutions like AWS Lambda and Google Cloud Functions essentially have per-build pricing. You’d pay the same for 100 concurrent 30 second builds as you would for 10 separate 5 minute builds.

The LambCI Advantage

From an ops perspective, all of the systems and capacity are managed by Amazon (SNS, Lambda, DynamoDB and S3), so LambCI is far simpler to setup and manage than Jenkins — especially given that you get 100 concurrent builds out of the box.

From a cost perspective, it’s typically far cheaper for private builds than the various SaaS offerings because you only pay for the time you use (and the first 4,444 mins/mth are free):

(Assumes 7 days/wk — with LambCI running on fastest 1.5GB Lambda option)

So if you had 2 developers, each simultaneously running sixty 4-min builds per day (ie, 4 hrs each), LambCI would be more than 8 times cheaper per month than Travis ($15 vs $129).

It’s only if you need to be running builds 24/7 that SaaS options become more competitive — and of course if you’re wanting to run builds for your open source projects, then Travis and CircleCI and others all have great (free) options for that.

Performance-wise, Lambda reports as a dual Xeon E5–2680 @2.80GHz. If you have checked-in dependencies and fast unit tests, builds can finish in single-digit seconds — but a larger project like dynalite, with 941 HTTP-to-localhost integration tests, builds in about 70 seconds. 43 secs of that is actually running the tests with the remainder being mostly npm installation. On my 1.7GHz i7 MacBook Air the npm install and tests complete about 20% faster, so there’s definitely an element of “cloud” speed to keep in mind.

The public Travis option takes only a few seconds longer than LambCI to run dynalite’s npm install and tests, but the overall build time is larger due to worker startup time (22 secs) and waiting in the queue (up to several mins — I assume this only happens if you don’t have enough concurrency).

What does it look like?

Here’s what it looks like in action — this is building a project with only a handful of tests and checked-in dependencies, so this is definitely faster than it is when building our typical projects, but I promise this is real and all running remotely on AWS Lambda:

Build time includes DB lookup, git cloning, etc — Amazon’s network is fast!

It comes as a CloudFormation stack that will deploy quickly (about 3 mins) and cost you nothing when you’re not using it.

Setup everything during stack creation thanks to Lambda-backed Custom Resources

The stack consists of:

an SNS Topic to listen to GitHub events and forward to Lambda
a Lambda function with a bundled git binary to clone the PR/branch, run the build, update Slack and GitHub, and store the results
two low-capacity DynamoDB tables for config settings and build results
an S3 bucket to store HTML pages of the results and any other build artifacts (optional)
a bit of IAM glue

Boxes and arrows! Must be an architecture diagram

There’s also a command-line tool to perform setup and configuration — so you don’t need to manage everything from the AWS console if you don’t want to.

No API Gateway?

Nope. Not… yet, anyway. Having SNS as the sole entry point means that you have a well-defined surface area which you expose to the world — and a single user who just needs permissions to publish to an SNS Topic. It’s entirely possible that API Gateway endpoints will be added in the near future to enable a richer UI, but for now it definitely makes the stack simpler without it.

There’s gotta be a downside?

There are definitely some limitations that may be showstoppers for you, depending on your requirements. The two largest in my opinion are:

no root access
5 minute max build time

The latter may be something that AWS extends — also, given there’s such a low barrier to concurrent execution, this limit encourages you to split up your builds into parallel jobs. Making this more straightforward is definitely on the list of features for LambCI v1.0.

In terms of root access, this means you cannot run any software that requires root (eg, Docker), or install software in default system locations. You only have access to /tmp, so any extra tools need to be able to be installed in non-standard locations. A surprising number of tools can be installed in /tmp, and there’s a growing collection of recipes for how to get them running in Lambda/LambCI.

Containers to the rescue

Given not every project can fit within these limits, LambCI has the optional ability to run build tasks for particular projects on an ECS cluster.

Now hang on a minute, I hear you well-actuallying, that’s not serverless! Of course, you’re absolutely right. However, there is still a huge advantage in the fact that the instances in the cluster are stateless and homogeneous — they all run the same stock-standard Amazon image and they can be spun up or down whenever you like, so the maintenance overhead is still very low. You can have zero instances running whenever you don’t need them, and you can auto-scale them based on time of day or current load.

The lambci/ecs project has a stack with a task that will look for a Dockerfile.test file in the cloned repository, and build and run all the commands specified in that Dockerfile (Docker-in-Docker!) This makes it very straightforward to specify all of your dependencies, leverage Docker’s layer caching, use any language you want, run the build for as long as you want, and have root access in the container the build is running in.

Here’s how that setup looks:

LambCI with a dash of ECS

The road to v1.0

LambCI is feature-complete inasmuch as it can respond to GitHub events, clone repositories, run build commands and update GitHub and Slack statuses. It can run different versions of Node.js, Java, Go, Ruby, PHP, Rust, Python 2.7, native compilation with gcc, and tools like phantomjs for automated UI testing.

However, there are a few features that would be great to have for an impending v1.0 release, a number of which just boil down to “what’s the right incantation to get this to work”:

Recipes on how to build other languages — other versions of Python and other languages will probably work just fine too, they just need to be tested.
More solidified configuration on how to run parallel builds.
AWS CodePipeline integration for continuous delivery.
Support for other notification services — as well as Slack, LambCI can publish statuses to an SNS topic, so email and SMS are already covered, but it might be nice to support services like HipChat, Yammer, etc out of the box.
Support for other repository sources like BitBucket, GitLab, AWS CodeCommit, etc — although this is more likely a post-v1.0 goal.
Support for running on other cloud services like Google Cloud Functions and Azure Functions — probably also post-v1.0 goals.
A hosted service with pay-per-build pricing — for those who don’t have/want an AWS account and want to get up and running easily, with the ability to move to their own LambCI stack with the same configuration if they wanted later.

The future of serverless ops is bright

LambCI is just one example of the sort of tools that are now possible to build without needing to wait for servers, instances, dynos, etc to start or worry about keeping them running and up-to-date.

As container-based systems like OpenWhisk become more production-ready, we’ll start to see even more flexibility in this space— who knows, maybe AWS will offer a way to run containers on Lambda too.

So take LambCI for a spin, hit us up on Twitter and GitHub, let us know if there are any features you think might be great to add. I’d love to get some community feedback on what works and what doesn’t, what languages people want to see supported, and what interesting ways we can push our automated build setups to take advantage of this newfound concurrency!

Enjoy!

Many thanks to Jed Schmidt and Tom Dale for nudging me to get this out and providing feedback on this post 🙇

PS: Hating on “serverless”?

Well. Look. I’m not going to defend it to the death, but I don’t think it’s anywhere near as bad as some suggest. It’s a term people are using to describe architectures in which you don’t deal with anything resembling a server, or an instance, or similar.

Think of the term as being akin to “stateless” in a “stateless architecture” — of course the underlying infrastructure has state — and no one would pretend otherwise, just as no one is suggesting there’s not literally physical servers powering Amazon, Google or Microsoft’s serverless products— it’s just that you don’t deal with anything that resembles one and it doesn’t appear in a logical representation of your system.

Mike Roberts has a great write-up over at Martin Fowler’s site on the whole landscape which I think lays things out a little clearer for those who are new to the space.