Judoscale Dev Blog

Blue Ridge Ruby: A Couple of Reflections

Jon Sully — Mon, 4 May 2026 00:00:00 +0000

Oh Asheville! How wonderful, weird, and full of art you are! Aside from returning with sore shins (central Ohio isn’t known for its hills…), Adam and I head home from North Carolina with two primary reflections we feel worth sharing here.

Ye old Blue Ridge Mountains

1. “Humans are the X-factor”. There’s something irreplaceable about a gathering of humans in-person; something unmistakably creative and promising. Something you feel more than you can describe; a moment where the whole is indeed greater than its constituent components. Small Ruby conferences ooze this feeling! My point here is simply that there is no replacement for in-person gatherings, and you will leave having gained something. Friendships, ideas, insights, and a sense of identity in community way beyond a Slack workspace or forum. There is so much value there! We all already know this — there’s a reason remote-only meetups can feel chore-ish and paradoxically disconnecting. Humans are the X-factor. They have to show up, but the rewards never fail to arrive when they do.

To that end, we can’t help but feel great joy at the resurgence of the small, single-track, regional Ruby conference. There’s a place for RubyConf and Rails World and the “big show”, for sure. But that human X-factor doesn’t scale linearly. Smaller regional conferences don’t need much production, keep everyone in the same room, and clearly relay the genuine love the conference planners have for their community. Yet they always have insights just as deep and compelling as the bigger conferences… and you can grab lunch with the speakers right after their talk! If you haven’t been to Blue Ridge, Rocky Mountain, Blastoff, or RailsCamp, seriously consider making the trip!

👀 Note

The quote, “Humans are the X-factor” was from Joshua Wenning’s lightning talk at BRR. Josh only began writing Ruby in January! And gave a lightning talk at a Ruby conference in April! How cool is that?!

2. There has never been a better time to build. There are a lot of takes floating around about AI and software development right now. BRR hosted a two-hour round-table discussion on exactly that! Adam and I have had more than our own share of discussions around it and what the future looks like. We don’t know. Nobody does! But one slide from Ernesto Tagwerker at the conference contained the simple equation:

And I think that sums up my current outlook on AI for Ruby/Rails right now. If you have experience in the stack, come in with opinions on what you want and exactly how you want it shaped, and opt to use Claude or Codex to execute/implement, you’re going to find that you can build at unprecedented speed. But the quality of the output, at this point, is going to governed more by how detailed and particular your input is, not by the quality of the model. The models are shockingly good. They can follow your opinions, build your ideas, and refactor entire codebases in minutes.

Let me rewind and repeat that: if you can leverage your experience and knowledge of the stack to write clear and concise specifications about what you’d like built, it can be built in minutes for you. At stunning accuracy. Truly, there has never been a better time to build things for those who want to build.

This was inadvertently on display at Blue Ridge. Katya Sarmiento (a wonderful human!) scaffolded up a fully custom app just for Blue Ridge, “[what] started as an app for BRR Ruby Embassy… turned into a whole companion app called ‘My Blue Ridge’” — complete with user-specific scheduling, group meal coordination, and even a fun data-report that was added minutes before her lightning talk about that very thing! This sort of thing was absolutely possible before software development AI came on the scene but it would’ve had a much higher human-time cost (and likely wouldn’t have been possible to do for a volunteer venture like Blue Ridge). If your desire is to build things, the landscape is wide open.

Anyway, to the Blue Ridge team of Jeremy, Mark, and Joe, thanks for another wonderful conference!

Dear Heroku: Uhh... What’s Going On?

Jon Sully — Mon, 6 Apr 2026 00:00:00 +0000

Dear Heroku,

As friends, long time marketplace-partners, supporters, and customers, we here at Judoscale — along with so many others from the developer community — are a bit confused about what’s going on over there in San Francisco. Frustrations aside, we’re having a hard time figuring out what to think about this whole ‘Heroku thing’.

First came the blog post on February 6th that, sort of out of nowhere, announced that:

Heroku is transitioning to a sustaining engineering model focused on stability, security, reliability, and support… with an emphasis on maintaining quality and operational excellence rather than introducing new features…

Enterprise Account contracts will no longer be offered to new customers…

Which, full disclosure, sounds an awful lot like a product going into “Maintenance Mode”, even if that specific phrase isn’t said. Speaking of specific phrases, what exactly is a “sustaining engineering” model? Respectfully, that feels like a phrase cooked up in a corporate PR meeting.

But then came new features? 🤔 In mid March we saw slug sizes increase for the first time I can remember (which is genuinely helpful, thank you!), updates to the automatic SSL cert provisioning pipeline so it runs more frequently (cool!), and… the entire CLI was rebuilt (!?). Um.. these feel like new features and platform progress, not maintenance mode… 😵‍💫

Add to that a round-up post from the Head of Product and Engineering that doubled down on “Sustaining Engineering” (capitalized this time!):

We will continue releasing features and functionality that align with our Sustaining Engineering goals…

…but I thought “sustaining engineering” was

rather than introducing new features

I’m not trying to be pedantic here, I’m just confused. What are we developers/customers/teams supposed to expect from Heroku at this point?

Heroku and/or Salesforce leadership/PR/etc. team(s), can I humbly offer you a few thoughts?

Just Tell Us Straight

Developers notoriously hate corporate verbiage, PR plays, and implications. “Sustaining engineering” feels like the embodiment of all three of those. If Heroku is going into maintenance mode, cool — the platform is pretty great and if it never improved again, there’s still many years of life in it. Just call it that, then. What do the actual devs at Heroku think of all of the recent news, terms, and reactions? I’d be curious for their insights.

Open The Roadmap

And, as noted above, if that roadmap is “do nothing, just keep it running”, that’s fine. But if there are things on the roadmap, it’d be really great if that was made public. Wasn’t Fir going to ship to the common runtime? Aren’t there Postgres updates to be made? There must be some cost optimizations to be made on several fronts… developers would be thrilled to actually see what Heroku’s priorities are, even if they can’t interact with them or influence them.

Clarify Business Intent

Look, we get it. Developers are pragmatic people. We understand there are seasons to products and platforms and businesses and companies. Not everything lasts forever — we’ve all had Node packages that we regret installing. From where I’m standing, I’m guessing Heroku as a product is in one of two places:

We just want to focus on direct-billing customers and maintain a smaller budget and team to do that. We’ll continue to move forward, but direct-billing is our outlook now; enterprise isn’t our goal anymore
We’re looking to exit major investing in the platform altogether, but obviously it will keep running and we’ll keep fixing issues and bugs for many years to come

It’d be really helpful if someone could inform us which of those two roads Heroku leadership is choosing.

Again, this is coming from a (small) team of people who have loved and hosted on Heroku nearly as long as Heroku has existed. We care deeply about the platform and its future! We, and all those that we’ve talked to thus far, would just like to know what that future is.

Sincerely,

The Judoscale Team

❗ Important

Adam adding a little P.S. here (👋)

I just got back from RBQ (a lovely Ruby conference in Austin) a couple weeks ago. I spoke with several teams hosting on Heroku, and Every single one of them are making plans to migrate due to these recent communications. I echo Jon’s thoughts here. Please update us with some clarity!

Judoscale on Tour: An Ode To Heroku

Jon Sully — Tue, 31 Mar 2026 00:00:00 +0000

Judoscale ‘On Tour’ Series

“The Friction Model” & Heroku (This page!)
Render (Coming soon…)
Railway (Coming soon…)
Fly (Coming soon…)
Northflank (Coming soon…)
Digital Ocean (Coming soon…)
Amazon ECS Fargate (Coming soon…)

Thus we begin our tour! As we mentioned in our last post, we’re going to take our production app on a hosting tour to fully experience what each option in today’s hosting marketplace looks like, feels like, and runs like. But before we do any of that, we need a baseline and a strategy.

Judoscale has been on Heroku since its origins ten years ago. Adam, Judoscale’s founder, has been using Heroku since it was a first-days startup! All that to say, we’ve been around the Heroku block many, many times. We’re too close. We need to intentionally zoom out and take a look at Heroku like a brand new user would. We need to put words to the things we take for granted before we jump ship so we can know what to look for somewhere else.

And we’re not alone! Many folks are starting to put out feelers for alternative hosting platforms as Heroku’s moved into ~~maintenance mode~~ a “Sustaining Engineering model” and they too need a pragmatic view of the features and toolkits that make a hosting platform fantastic. So we asked for their input as well:

Our goal here is to build a community rubric of sorts. A set of baseline standards for performance, developer experience, and complexities-vs-niceties on hosting platforms.

With all of that said, this article is our attempt at outlining many of the features that have made Heroku so great (and occasionally difficult!) to build on over the last fifteen years. We’ve come up with an assessment strategy that we think will work for all platforms and we’ll apply it here to Heroku first.

👀 Note

It’s worth calling out: not everyone uses, sets up, or runs their Heroku-based applications in the same ways. Even just asking if folks use a staging-server setup brought many different opinions into the mix. That’s okay! There may be things noted here in this write-up that simply don’t apply to you or you don’t care about. Just keep an open mind: there’s no single way to do simple app hosting, and you might even find new ideas here!

The Friction Model

How fast can you fly?

After spending quite a bit of time brainstorming, then asking other experienced devs for input, we’ve got a long list of features that we know make Heroku great. From this simple stuff (git push heroku) to the more complicated (“What are all the buildpacks I need for vips, again?”), there are plenty of things. But a plain list of things is only helpful for making our eyes glaze over. We need to see these features in some kind of logical groupings that help us understand intent and perspective.

Given Heroku’s history and philosophy over the years, we believe this is best captured as a friction model. From the beginning, Heroku’s value proposition was always about friction: removing it. Anti-friction = anti-pain. Anti-pain = peaceful shipping. And, of course, peaceful shipping nets developers that are excited to build (“productivity”). It’s always been about friction.

As it’s said: you don’t know what you’ve got until it’s gone… (which it’s not, but we can try to simulate the feeling) so let’s try to peel back the anti-friction layers and discover what Heroku’s been silently handling for so many years.

We’re considering four feature groups here:

Shipping friction. As in, “how many steps are there between my local code and production?” This vector covers things like deploys and releases, migrations, pipelines, review apps, setup, CI/CD (is that phrase still popular?), how long it takes to deploy, and zero-downtime deploys ✨.

Debugging friction. As in, “WTF IS GOING ON WITH PRODUCTION RIGHT NOW?!” This vector covers a lot of visibility and speed-of-access: logs, metrics, dashboards, production consoles/terminals, scaling, and some cron/scheduled jobs concepts. Also, reaching actual customer-service help when necessary!

Infrastructure friction. As in, “how much platform’y stuff do we have to own and maintain? How often do I have to (re-learn how to) fix this stuff?” Things like environment variables and secrets, SSL configurations, domains and DNS, multi-region / replication, compliance (scary, I know), and networking/routing. Oh, also, are the servers actually fast / performant?

And finally, Organizational friction. As in, “the stuff my manager probably cares more about than me, but therefore still impacts me indirectly”. How much does the platform actually cost? How many nines? Do I need to hire an Ops team? How much is my CTO going to hear the word “Heroku”?

We believe the friction model helps to paint a clear and personal picture of what hosting on any platform will feel like. It’s not just a feature-list table with checks and x’s; our goal is to capture the subjective experience of using platforms at various moments in a commercial developer’s workflow.

Okay, enough setup! Let’s dive in and see how Heroku fits into this model.

Heroku: Shipping Friction

Heroku essentially taught an entire generation of developers that shipping could be as simple as git push heroku, so it’s fair to say that they’ve optimized for low shipping friction from the beginning. In fact, Heroku’s pioneering of the PaaS concept was mostly rooted around low shipping friction. While most applications these days probably use automatic deployments off main via GitHub connection rather than pushing from local, Heroku’s done a great job of keeping that setup just about as simple to configure as good old git push heroku. A few clicks and you’re off to the races.

Heroku was also built with “release” processes in mind. We might take them for granted now, but a dedicated short-lived process that runs a command once only when deploying a new commit is both very helpful and somewhat complicated! Unless your host has this specific workflow supported and pre-setup in their platform, trying to do it yourself can be a real pain in the rear. Heroku simply built and gave us a perfect home for db:migrate.

Skimming through some of the other features here, the story is broadly the same: we only think about these features and/or know about them because Heroku brought them to the masses. Automatic PR-review apps, pipeline setups to go from review app to staging to production, direct-deploy main to production (after CI passes!) — I’d wager these concepts are familiar to most developers because Heroku pioneered them.

If we boil down the category into a single question, “how hard is it to take an app I have running locally and get it running in the cloud?” Then we’d give Heroku an A. To this day, Heroku sets the bar for low-friction shipping.

Heroku: Debugging Friction

Bugs will be bugs…

Figuring out why your prod app is on fire isn’t ever easy per se, but there are things a platform can do to (hopefully) make it easier. We’d normally split this concept into two camps:

The platform’s native tooling for viewing and searching logs, seeing metrics, and assessing what’s failing
How easy the platform makes it to add third-party software (APMs, scalers, etc.) which can provide even more visibility

But Heroku sort of has a third — or maybe a 2a. Add-ons. Entire third-party software suites that can bolt onto your application with (typically) zero extra configuration required, with a single click. That’s neat!

But let’s start with #1. How good is Heroku’s native tooling for figuring out why production is on fire? Eh 🤷‍♂️. There’s good and bad.

Being able to fire up heroku run ad-hoc at any point is handy, but it can take a little while to spin up and runs on its own separate VM. Heroku does allow SSH’ing into running dynos, which is handy, but there’s an ephemerality that you need to keep in mind with dynos. A dyno that’s currently ‘on fire’ may well restart and shut you out at any time if the platform control plane decides it’s on fire enough. Essentially, there are times when the control plane feels more authoritative than the actual resources! That’s helpful sometimes, but harmful others.

Heroku’s approach for giving you helpful log parsing and tooling is essentially just to not do that. The CLI allows you to tail your real-time logs (as does the web UI) but you’ll have to pipe that into other tools if you want to do anything more than just read logs whizzing by. It’s accessible quickly enough that it can be useful, just hit heroku logs -t, but depending on your app’s RPS it may be way too much info to be useful to human eyes on a terminal.

On the metrics front, Heroku’s dashboard metrics display is… fine. Heroku isn’t an APM and doesn’t install a package into your code, so it really doesn’t have access to the sort of application-level stats we might be interested in these days. But is is a reasonable readout of throughput, memory, errors, and dyno load, though they lose points for the last one. “Dyno load” is an opaque and unhelpful metric derived from opaque resource-sharing algorithms for their Standard dynos. How much dyno load should you use? 🤷‍♂️

Lastly in the native-tooling group, Heroku does provide a cron-ish scheduling system that’s first-party (even though it’s installed as an add-on) but it’s just not great. We actually used it for years before deciding to move away. It’s fine for very small apps and/or non-critical jobs, but it’s not something that should scale with any application. Though, to be fair, we wouldn’t consider having a “heavy duty scheduler” a responsibility of a hosting platform. That’s something you should implement inside your application layer one way or another.

On to #2: how easy Heroku makes it to add third-party software. The short answer is that Heroku makes it very easy. Heroku decided early-on to invest in infrastructure for an “add-on” system that made installing third-party libraries as easy as installing a new app on your phone. In fact, Judoscale was born on the Heroku Marketplace before we branched out to supporting many platforms! Maybe you want Scout for your APM, a MySQL DB for your data, and ElasticSearch for a search index across that data. All of those can be setup with just a click or two from the Heroku Marketplace. Handy!

Aside from the Marketplace, it’s also easily to install third party libraries or software the old-fashioned way: signing up directly. Heroku doesn’t restrict any dynos’ outbound internet access by default so getting third party libraries configured which need to verify license keys, send data somewhere, or otherwise talk to some server work fine. Given that environment variable control on Heroku is quite simple too, the “DIY” third party software path is nearly as simple as the Marketplace path. We’ve only ever experienced the occasional friction of dynos not having static IP’s… but there are add-ons that do just that!

Beyond #1 and #2, we also need to consider Heroku’s actual customer support system for when we experience actual platform issues. How long does it take to get helpful, actual customer support when something happens? Well, it looks and feels just about like:

Uhh…. Help?

Unless you pay hefty fees for an elevated support tier on an enterprise contract, Heroku’s support response times can be rough. You can be paying thousands per month for all the dynos you’d like but still have to wait actual days before you get a response to a critical issue you submitted a ticket for. If you’ve ever tried, you know. That stinks.

If we boil down the concept of ‘debugging friction’ into a single question, “how hard is it to figure out what’s on fire?” Which, being fair, is a crazy large question that your hosting provider holds just a slice of responsibility for, we’d give Heroku a B. The tooling is mature and reliable, as both points #1 and #2 above cover, but we can’t deny the awful experience of their customer service ticketing. Nonetheless, the likelihood of needing to open a ticket remains low, so we have to balance the weight there.

Heroku: Infrastructure Friction

Get your tools ready.

Our application is shipped, prod is running smoothly, and we’re starting to think about week-over-week maintenance and upkeep. At this point we need to consider how much of our platform footprint requires our own active involvement across months and seasons — how much friction our infrastructure causes in our day-to-day for an existing app: “How often do I have to (re-learn how to) fix this stuff?” Let’s enumerate the basics.

When it comes to environment variables, Heroku model is almost aggressively simple. There’s no separate secret manager, no multiple layers of injection depending on build vs runtime, and no ambiguity about where a value is coming from. It’s a single, flat interface per application; you set it once, and your app has it. Done. It’s not the most flexible system in the world, but it’s extremely predictable, and predictability reduces friction.

The same story shows up in domains and SSL. Adding a custom domain and getting HTTPS working is deliberately designed to be a few clicks, then something you just don’t think about again. They even do some special magic we’ve written about — some neat tricks for correctly configuring SSL when you’ve opted to use Cloudflare in front of Heroku… it all just works! No provisioning your own certs, no renewal tasks every few months (or more…), just a 🔒 in your browser address bar that you will (lovingly) ignore for the rest of your app’s lifetime.

Networking and routing is a similar story but has its own tradeoffs. When you run an app on Heroku there’s no ports to configure, no connections to setup; you don’t own the routing or load balancing layers at all. But, Heroku’s “load balancer” actually isn’t. As we’ve mentioned in “Understanding Queue Time: The Metric that Matters”, Heroku’s router uses a random routing algorithm. There’s no load balancing! So, while Heroku does grant the wonderful simplicity of, ‘your app listens for requests on a port, Heroku handles the rest’, the one caveat is that you should take just a few minutes and read a primer on how random routing might impact your app. The article I just linked is exactly that 😜.

There are, of course, a couple of rough edges. Buildpacks can get tricky when you need system-level dependencies. Performance characteristics of dynos are frustratingly opaque, especially when you start caring about CPU vs memory (and please don’t get me started on noisy neighbors). And while Heroku’s abstractions are usually quite helpful, they can be limited once you venture into multi-region replication, strict compliance requirements, and truly private networking. Heroku has a lot of features and capabilities in those spaces, but some of the “it just works” shine might fade.

But the point here isn’t to judge whether or not the platform can do everything — most can if you’re willing to fiddle enough. The point is about assessing how often the platform makes us think about these configurations and setups in the first place. Infrastructure friction is about the recurring cost of the platform in terms of our own time.

Heroku’s abstractions let us think about our infrastructure configuration and maintenance, year over year, less than just about any other host. For that reason, we give it an A- on infrastructure friction. Points lost for “dyno” resource opaqueness!

Heroku: Organizational Friction

Finally, let’s talk about the stuff managers and owners usually care about more than boots-on-the-ground developers. This is less about the mechanics of actually building on the platform and more about the ripple effects of that platform up the chain-of-command. Remember, someone’s got to actually pay the bill!

And we might as well start with the bill. Heroku is notoriously the “worst” deal in PaaS’s. Just about any way you slice the performance-per-dollar, Heroku is more expensive than everyone else. We built a PaaS Price Calculator that makes that much clear:

Based on default app values in our calculator

And that is perhaps the greatest knock against Heroku in this entire rundown. While maybe not the first thing that jumps to mind when thinking about a “friction model”, there is absolutely friction here: cost friction! Friction incurred when having to hand over all those dollars every month instead of, in some cases, half! We’ll save the deeper discussion for another day, but just understand that the balance here is a higher cost vs. all of the low-friction abstractions we described in all of the paragraphs above. Heroku’s schtick is paying for simplicity. It always has been, it likely always will be. You’re buying back time that you might otherwise have to spend on infrastructure and hosting tasks.

Along with that idea, your platform choice has implications on team structure. To be frank, you shouldn’t need dedicated operations engineers if you’re running on Heroku. So maybe that potential savings accounts for some of the cost, but we’ll leave that to your own discretion. The truth is that many large applications and businesses, Judoscale included, began with one developer building and deploying an app on Heroku. Heroku allows the “one dev shop” to scale enormously in ways that more complex platforms would not — Adam talked about this quite a bit in our interview — most teams can go a long time before needing more complexity and control than Heroku gives. We’ve seen companies doing billions in annual revenue humming along perfectly fine on a cluster of Perf-L’s!

When it comes to uptime concerns (depending on how you count them and consider magnitude), Heroku has had 2-4 major outages in the last two years, each lasting at least a few hours, impacting most of their customers. That’s still somewhere in the “three nines” region, and frankly, I don’t know of any hosting platform shooting much higher. Heroku outages just tend to be more prominent in developer news given how many applications run on Heroku. Answering the earlier question of, “How often is my CTO going to hear the word ‘Heroku’?”, which is an implication of outages, the answer is probably once or twice a year. Therefore, we’d consider their uptime to be solidly “good”.

If we boil the concept of organizational friction down to simply, “how does using this platform impact my business beyond my developers?” Then there are a few plain answers: it’s going to cost a lot, it’s going to save you from hiring quite a bit, and it’s going to be boring and unmentioned almost all the time. Generally speaking, most organizations are into that tradeoff, which is why Heroku has been so successful. Nonetheless, given that Heroku is no longer the only fish in the sea, and that its competitors have kept up with modern compute hardware and pricing adjustments so much more, we give Heroku a C in for organizational friction. It’s a very good service, but it just shouldn’t be as expensive as it is in 2026.

👀 Note

There’s one last bit of “organizational friction” that’s more amorphous and would be difficult to give a grade because it’s entirely subjective: trust in the platform and company itself. Do you trust that the platform is moving in the right direction? Building according to your needs and interests? Has the best intentions? Looks out for the customer’s needs?

We’re not going to factor it into our grade here (a C already is what it is), but this is perhaps this single biggest pain point for us with Heroku right now. Heroku has absolutely burned a whole lot of developer trust in the last two months. Unclear messaging, vague direction, ~~a “Sustaining Engineering” model~~ lots of corporate hand-wavy verbiage… trust in Heroku is at a many-year low.

Tangibly, we’d love to see platforms provide public development roadmaps, transparent communication when things go wrong (or right!), and open spaces for developers to provide feedback that’s taken seriously. Heroku currently fails on all three fronts.

Let’s Wrap It Up

Heroku:

Shipping Friction: A
Debugging Friction: B
Infrastructure Friction: A-
Organizational Friction: C (for ‘Cost’ 😆)

Perhaps our most primary opinion given in our “Heroku: What’s Next” article was that of, “Heroku’s still fine, we’ve got years”. That opinion feels worth reiterating here because it’s hard to look at these grades and not choose Heroku. It’s not a perfect platform by any means, but it still sets quite a high bar in 2026, cost aside.

Our “friction model” grading mechanism and rubric, if anything, might be helping to remind us why we chose Heroku in the first place. It wasn’t flashy features or landmark architectures… Heroku just removes a lot of friction in a lot of different places. You can ship quickly and easily, debug reasonably, focus on building your product, and largely not worry about Heroku at all. That’s nice!

But, of course, this sets the stage for what’s coming: it’s time to find out if this experience and ease of use holds up elsewhere. It’s time to go on tour and move our production application to each of the competitors; time to see how much friction exists on other platforms. We want to find out what’s harder, what’s easier, what’s faster, and what’s rough. That’s the real test, but now we have our baseline!

✅ Tip

Reminder: This article is the first post in our “Judoscale going on tour” series, where we put our money where our mouth is and migrate Judoscale to various platforms. No holding back, no keeping background jobs somewhere else, no splitting traffic.

Judoscale is a 24/7 real-time reactive production application. We receive well over 3,000 RPS every moment of every day. Our downtime is exceedingly rare (generally only when Cloudflare or Heroku themselves have issues), but then, it darn well should be! We’re an autoscaler! We need to be online, regardless of traffic load, so that we can reactively scale our clients’ applications correctly and appropriately any time of day.

Sign up for our newsletter to join us on this tour as we discover the nooks and crannies of 2026’s available PaaS’s. If you’ve been thinking about moving, let us feel the pain first — we’ll tell you all about it 😆.

Heroku: What’s Next

Jon Sully — Fri, 27 Feb 2026 00:00:00 +0000

In a move that surprised many of us — and one which I still can’t determine the business sense in making at all — Salesforce officially announced last week that Heroku will be moving into a “sustaining engineering model”. That’s essentially giant-software-corporation-speak for, “we’re putting this into maintenance mode”. The platform that taught a generation of developers to “push to deploy” has reached its investment limit from its owners 😕.

"Our work here is done"

Now, before you jump straight to “abandon ship!!”, there are real questions we should think about when looking ahead. Heroku is still an excellent platform, runs very stably, and, to this day, has the smoothest DX for getting an application into production. For those of us with production apps currently running on Heroku, we need to be pragmatic about what this announcement means for our present, our future, and our time!

Salesforce’s announcement should ultimately drive a calm, collected conversation around both timing and execution. Heroku isn’t a sinking ship, it’s just done shipping new features.

Let’s Be Honest About Urgency

Urgency itself is a function of two inputs: having a thing to do and believing that you must do that thing soon. The sooner you believe you must do it, the more urgent it will feel. So allow me to reiterate the point I made above and mix in some urgency:

Heroku is not dying today, tomorrow, next month, or next year.

It is not urgent that you migrate away from Heroku.

"Breathe"

The Salesforce announcement might serve to give you the first component of urgency: we’ll all have a ‘thing to do’ at some point: migrate to another platform. But it certainly does not give the second component (‘do that thing soon’). Heroku isn’t going anywhere. And, if you recall the late two-thousand-teens, this isn’t even the first time that Heroku will spend some years running without major feature improvements! We sincerely believe it’ll be a few years before there’s any real pressing need to migrate off Heroku if you’re already successfully running your production app there.

I don’t want to come off like a Heroku shill here, so let me clarify why I’m pushing back against the hype and panic. It has nothing to do with Heroku’s bottom line or expensive servers. It has to do with your team’s time spent shipping useful features that will grow the value of your app and/or business.

Even in the best of circumstances and setups, migrating platforms takes time. It requires testing, planning, mapping, and careful execution to ensure that you’re not dropping traffic or upsetting customers along the way. It’s work. All of this work has opportunity cost: you won’t be building and shipping the features and enhancements that your customers want. You won’t be improving your application or business. At the end of the day, your customers don’t care how or where you host your app. They just want it to work and provide them value!

Okay fine but give me an actual recommendation here?

Sure. Deep breath. Let the panic subside: most applications currently running on Heroku shouldn’t worry about migrating until next year (2027) at the earliest. If you have an enterprise contract, you should renew it in 2026.

You already chose Heroku, you’re already setup on Heroku, your app is already running fine on Heroku. You should try to capitalize on those gains as long as possible (especially if you have enterprise/discount pricing!). “Heroku isn’t going to get any new major features” doesn’t actually prevent you from realizing the value of your initial investment into “I want managed hosting I don’t have to worry about”. Moving to another PaaS would still satisfy the “I want managed hosting…” but the migration itself is an additional investment and cost that you simply don’t need to make yet. Take a deep breath and go build your app / business! That does reap value today.

😮‍💨

Looking at the Alternatives

Nonetheless, I know many readers are still going to queue up migrations in the coming months. Maybe that’s discomfort, simply having time available to migrate, or a bad taste in the mouth. I get it! Even as I wrote the paragraphs above I felt some of those same tensions. Honoring those thoughts (and knowing that the future will come eventually) it feels worthwhile to talk through some of the migration paths an existing Heroku app has ahead.

We’re going to evaluate each option in three primary lenses:

Migration effort: how painful it would be to migrate a full production Heroku app to this new setup
Ongoing operational load: how it feels (subjectively) to use over time — things like CLI, “hop into prod console”, control and tweakability, etc.
Cost structure: how expensive is this new setup compared to Heroku, and how is it billed differently?

Then we’ll give our general take on each path outside of those three parameters. Today’s challengers:

Render
Fly.io
Railway
Run-it-Yourself Systems

But today’s look isn’t our one time “here’s the truth” post, it’s just a preview. We’ll give you our opinions here today based on our work integrating with most of these platforms and running various apps on them over the last three years, but we’re planning on going deeper in the coming months: Judoscale is going on tour. More on that below, but we’ll be moving our 3,000+RPS production app to each of these platforms to really feel out what it looks like for a production app that can’t go down!

Render: The Obvious Choice

If you asked me for a simple, single-sentence recommendation for most teams, it’s going to be Render. Many folks have described Render as, essentially, “the natural progression of Heroku” — perhaps what Heroku could’ve become had it never been acquired by Salesforce. I think this is mostly due to Render sharing many of the same philosophies as Heroku (fully managed PaaS, auto build detection, etc.) but just having been built fresh many years after Heroku: the Render devs had the chance to reimagine the Heroku UX from the ground up with plenty of Heroku experience to draw from.

Migration effort. Any migration is going to take effort, but things are pretty smooth here. Heroku to Render is a well-trod path at this point and Render’s own team offers migration assistance for those coming from Heroku. The mental model is broadly the same and you’ll feel at home within a few minutes of logging into the Render dashboard. The only gotcha to keep in mind is around buildpacks and system dependencies. Render does supply some base-level buildpacks that should cover most apps, but if your app requires specific system dependencies beyond their included set, you may need to build out a Dockerfile. Where on Heroku buildpacks themselves can be composable, Render’s approach is simply, “stay on the rails or bring your own Dockerfile” (more here).

Ongoing operational load. Again here, this one’s going to feel just like Heroku. They handle the infrastructure, you just merge to main. Metrics and web dashboard UI are all friendly and available, logs can be pushed wherever you need, manual rollbacks are simple and accessible, there’s a broad CLI for control if you prefer that style, you can take your favorite autoscaler with you… the list goes on. Essentially everything you love about Heroku exists in Render in parallel or enhanced form.

Cost structure. Of all that platforms and paths we’ll look at today, Render’s cost structure and setup matches Heroku’s the most. Like Heroku, their pricing revolves around pre-set, per-month pricing depending on which instance types (e.g. “dyno type”) you need. Unlike Heroku, they’re actually clear about how many vCPU cores you’re paying to hold (🎉). In terms of real cost, our rough estimate is that, depending on the composition of your app and resources you need, you’ll likely save 20-30% off your current Heroku bill for similar resources on Render.

Our general takeaway on Render is that it’s the right choice for the grand majority of currently-on-Heroku apps. It’s a near-seamless transition, the billing operates the same, the operational overhead for engineers learning the new platform is very low, and most apps will be able to get up-and-running within a day.

Fly.io: A Little More Complicated, A Little More Interesting

Still mostly on the high-level-PaaS layer, Fly.io was built to accomplish a different goal. Fly’s whole thing is distributing your app geographically so that your users will always hit an application server close-by, and doing so with “Fly machines” — micro VM’s with much smaller footprints than full-on Docker containers. Fly is also heavily optimized for its powerful CLI and config tooling. Fly is tremendously flexible and configurable, but comes with the cost of complexity: a steep learning curve!

Migration effort. Like Render, Fly has written guides specifically for those migrating from Heroku, including framework specific guides in many cases (Rails, Django, FastAPI, Flask, etc.) to help explain nuances. And these guides are certainly helpful, but there’s no getting around the paradigm shift: Fly is a fundamentally different platform from Heroku and doesn’t operate quite the same. There is going to be a learning lift as you get familiar with its UI tooling and flyctl CLI tool — the latter of which you absolutely will want to become highly familiar with.

Ongoing operational load. Like other PaaS’s, Fly can absolutely be configured to do the simple deploy-on-main thing and includes built in metrics dashboards, logging basics, and standard machine health checks, but you’ll find a lot of utility in flyctl. Restarting instances, changing environment variables, spinning up secondary production instances… all simple flyctl commands once you learn them! If you’re not already a heavy terminal user, dive on in. Fly exposes more primitives and control around lower-level constructs than most PaaS’s (think: direct VM controls, volumes, storage, regions, etc) and most of that is controlled via flyctl. So there’s more flexibility, but again, a steeper learning curve. Oh, also, you can still take your favorite autoscaler with you!

Cost structure. Fly walks a sort of middle-ground between resource tier-based pricing and metered usage, which makes it easy to jump around to difference scale sizes, tweak your RAM levels, and scale vertically as needed. Prices are per second of machine runtime, extra RAM can be added wherever you want (very cool), and Fly offers everyone a (massive) 40% discount when you opt to pre-reserve compute time — no enterprise contract required. If that sounds like a lot of levers to pull and tweak, that’s because it is. Again, Fly’s schtick here is configurability.

My take: if you’re the kind of person that was driving an automatic Honda Civic and already felt for years like you just wanted more of a car-person’s kind of car, then it’s probably true that Heroku’s recent announcement didn’t change anything for you — your Civic is still a Civic. But it’s understandable that Salesforce has, in some way or another, shaken you into realizing your dream. If you’re after that ‘69 Big Block Mustang with a four-barrel carb that you can tune juuuuust right… then Fly might be for you. This metaphor may have gone too far. Fly is complex. There are neat value-adds with that complexity, but it comes at the cost of complexity — there’s more to learn, more to understand, and more to manage.

Railway: Not Exactly Our Way

We’re not out to bash any hosting providers, especially ones that we support autoscaling on, but we also need to be honest: our experience with Railway has been pretty lackluster. All other bells and whistles aside, we had the worst actual system performance on Railway. Not because of dependent services or database latencies or anything like that, we just found our real, pure compute performance to be worse on Railway than any other platform. It was just plain slower.

We can’t tell you why that’s the case, and at the same time, we love that Railway’s schtick is running their own metal in datacenters rather than reselling metal they rent from the big three. That’s awesome! But we suspect that economies of scale are a relevant factor here.

Overall, we would not recommend Railway at this time. We love the mission and the goal, but we had a less-than-great-time. For the sake of being positive-outlook community members, we’ll simply leave it at that!

Oh, and we do still plan on taking another full crack at Railway when we go on tour — see more below.

The More HIY Stuff!

We live in a wonderful time of options! There are so many great options in the Host-it-Yourself world, in many flavors, and at many levels of even hosting-it-yourself. The bring-your-own VPS tooling like Dokku, HatchBox, Coolify, and CapRover offer lightweight PaaS-like experiences with great flexibility, each with their own distinct tradeoffs and workflows. Going even more complex, container orchestrators (e.g. they coordinate the Kubernetes for you) like Northflank, Porter, and Qovery can allow you to “bring your own cloud” (be it your own metal, rented Hetzner boxes, or AWS API keys, etc.) while still handling most of the complexities of Kubernetes cluster orchestration for you. And, of course, the big world of AWS itself — “Hop onto ECS Fargate!” or “Elastic Beanstalk, baby!” among other choices. There’s truly never been so many ways to run the “Heroku experience” yourself!

Honestly, there are a dizzying number of ways to make the technologies at this level of hosting control work. For the sake of this article not turning into a book, we’re going to mostly leave them unmentioned here. The reality is that if you’ve been a happy Heroku customer, you shouldn’t go looking down this path. I know that’s a strong statement that might make a few of the “come to the DIY-side!” folks upset, but it’s a pragmatic truth. These are two wholly different worlds with different levels of time and skill involved. Going ‘down’ a single layer in the hosting stack (as we perceive it) and getting into Fly’s ecosystem is already going to add overhead to your workflow as you need to learn to understand and handle their config complexity. Going all the way down to the HIY tooling is only going to add more ops time (or people!) to your app’s needs. If you’re happy with your PaaS-level at Heroku, stay up there!

The Real Answer

Let’s zoom out and take a deep breath. I still fully stand by my original sentiment above: Heroku isn’t going anywhere and will remain stable for years to come. There’s no urgency to move, and doing so will only detract from the hours you could be spending on your product itself at this point. Moving takes work. We can’t ignore that reality amidst the hype here.

Then, of course, conceding to those who are for sure going to move soon out of principle, spite, or otherwise disdain for Salesforce (which… I get), we covered some options. Render is the clearest, clean-cut, easy choice. Fly is more complex but more complicated. Railway isn’t recommended at the moment. Host-it-yourself and bring-your-own-cloud solutions are way more effort than a Heroku team should look at.

So…. move to Render and call it a day? Not exactly.

As your resident auto-scaling experts for the last decade, who have integrated deeply with and provide autoscaling services for nearly all of the platforms previously mentioned, we have some opinions.

But our opinions are from last year (or prior). And they’re based on integration work with Judoscale. And, who knows, they might just be wrong. So we’re going to do something that we haven’t seen done before: we’re going on tour.

Judoscale On Tour

As much as I wish that meant a music tour around the US with our kazoos, we actually hatched up a better idea. Judoscale is a 24/7 real-time reactive production application. We receive well over 3,000 RPS every moment of every day. Our downtime is exceedingly rare (generally only when Cloudflare or Heroku themselves have issues), but then, it darn well should be! We’re an auto-scaler! We need to be online, regardless of traffic load, so that we can reactively scale our clients’ applications correctly and appropriately any time of day.

Sounds like the perfect app to move to each of these platforms / services to test some things out.

To be clear: our “going on tour” means that we’re going to migrate the Judoscale production application, including all traffic, DNS, configs, background workers, etc, to each of Heroku’s competitors, one at a time, and document every step along the way for you all.

So, again, our real recommendation here is simply to hang tight on Heroku. We’re going to take the plunge for you (many times over) and move our real-time, high traffic application ourselves. We’re going to find the rough edges. We’re going to feel the performance bottlenecks. We’re going to foot the literal bill and feel the DX each of these new platforms provides compared to ol’ purple.

If that sounds exciting to you, make sure you subscribe to our newsletter below. We’ll start with a full breakdown of all the things we love and use on Heroku, which will set forth our rubric for how to evaluate other platforms.

Latency-based Celery Queues in Python

Jeff Morhous — Tue, 17 Feb 2026 00:00:00 +0000

If you’ve worked with Celery in production with real traffic, you’ve probably hit one of its many sharp edges. Maybe you’ve watched a simple background job silently pile up in an unmonitored queue.

Or maybe you’ve built out a tidy set of queues only to find your high-priority jobs are getting stuck behind slow (and unimportant) ones. Celery gives you powerful tools, but few guardrails.

These pain points usually stem from queue planning problems. Most teams slap labels like high_priority or emails on queues without defining what those mean.

If you plan your Python task queues around latency, you’ll have more predictable (and scalable) results. Ready to get started?

The basics of Celery Queues

Before we get into queue planning, let’s clarify some Celery terminology. If you already have a great understanding of how Celery works, feel free to skip to the next section.

Celery tasks

In Celery, a task is a single unit of work. For example, send_email_task might send a welcome email.

Celery queues

A queue in Celery refers to a named channel on the broker (like a Redis list or RabbitMQ queue) where tasks wait to be processed. By default, Celery uses a queue named "celery" (if you don’t specify one).

Celery workers

A worker is a Celery process that runs tasks. A worker can run multiple tasks concurrently, depending on its concurrency setting.

Celery concurrency

Concurrency refers to the number of tasks a worker can process at the same time. In prefork mode, this is the number of child processes (often defaults to the number of OS-reported CPUs).

Decisions you have to make when using Celery

In a typical deployment, you must decide how many queues to use and what they are called, which tasks go to each queue, and how many worker processes will consume each queue.

You also choose how many threads/processes each worker has (concurrency) and how many total containers to run (horizontal scaling). That’s a lot of decisions!

So let’s dig into how you can make these decisions with scaling in mind.

Why Celery queues run into problems at scale

Out of the box, Celery will use a single queue (usually named "celery" by default). If a task doesn’t specify a queue, it goes to the default queue. If you start a worker without specifying -Q, it will consume the default queue.

Could you build an app with just one queue? Sure. But please don’t.

Not every task is created equal

For a brand-new project, one queue might work fine for a short while. But very soon, you’ll encounter scenarios that push you to create additional queues:

You have a task that needs to run quickly (a high-priority job), so you want it processed before other tasks.
You have a task that takes a long time to run (perhaps several seconds or minutes), and you want it to have lower priority or even separate handling so it doesn’t block faster tasks.

In response, teams might eventually create ad-hoc queues like "urgent" for high priority and "low" for slow tasks.

Ambiguous queue names

However, there’s a big problem. Those queue names are ambiguous.

How urgent is “urgent”? What does “low” mean, exactly? As your application grows, you’ll find there are varying degrees of priority. One developer might add very_urgent or critical queues; another might introduce a queue for a specific feature like reports or emails.

Before you know it, you have a sprawl of Celery queues without a clear hierarchy or expectations.

Latency-based queues

Take a step back and consider what metrics define the “health” of a task queue. Three key metrics are commonly used:

Worker CPU: How taxed is the CPU for worker processes?
Queue depth: How many tasks are waiting in the queue (queue length).
Queue latency: How long a task waits in the queue before a worker starts processing it (sometimes called queue time).

CPU can be used, but it doesn’t actually tell everything about the queue. It simply gives an indication (and often a trailing indication) of the worker process during an individual task. And task queues often back up without spiking CPU at all, giving a false sense of worker health.

Queue depth is easy to visualize (a simple count of jobs), so many people focus on it. Queue depth can be very misleading. The number of tasks doesn’t tell you how long they’ll take to clear.

For example, imagine two queues, each handled by one worker process:

Queue A has 10 jobs enqueued, and each job takes ~1 second to run.
Queue B has 10,000 jobs enqueued, but each job takes ~1 millisecond to run.

Queue B might look “backed up” at a glancem, but in reality, both queues will finish their work in about 10 seconds. The latency (wait time) for jobs in both queues is the same ~10 seconds, which is the metric that truly matters.

✅ Tip

Queue latency tells the real story about how well a queue is doing.

So, is a 10-second wait time good or bad? It depends.

The acceptable latency for a queue is a business decision. It depends on what the tasks are doing and how quickly that work needs to begin. This brings us back to the notion of “urgency”, but now we can quantify it. Instead of calling a queue “urgent” in a vague sense, we decide what latency is acceptable for that queue’s tasks.

Latency SLA queue names

If you’re convinced queue latency is the right metric to measure performance, you should fix the ambiguity in your queue names. Naming your queues after their latency targets (SLAs) is a great way to set yourself up for success.

For example:

“urgent” becomes within_5_seconds (tasks should start within 5 seconds)
“default” becomes within_5_minutes (tasks should start within 5 minutes)
“low” becomes within_5_hours (tasks should start within 5 hours)

If I push a task to the within_5_seconds queue, I’m explicitly saying I expect that job to begin processing within five seconds. The name of the queue communicates the expectation.

You can choose whatever latency thresholds make sense for your app, the specifics aren’t as important as the explicitness of the naming.

By communicating latency expectations in the queue names, we get a few important things.

First, you’ll end up with fewer queues. You’re far less likely to create a new queue per feature or whim. Almost every new task will fit into an existing latency category. This should remove the temptation of one-off queues that don’t serve a strategic purpose.

Second, each queue now has a performance target (its name). This gives clarity for monitoring. If the within_5_minutes queue starts seeing 10-minute latencies, you have an unambiguous problem.

Of course, naming queues “within_X” doesn’t magically make tasks start within X time – you have to ensure enough worker capacity to meet those targets. That’s where scaling comes in.

Fortunately, this strategy makes it crazy easy to decide when to spin up more (or fewer) workers to scale, but we’ll talk more about that later.

Simple ways to scale Celery queues

Typically, scaling a Celery worker pool is with the goal of avoiding a queue backlog.

Now that our queue names encode latency expectations, we can define a clear scaling goal for each queue:

✅ Tip

Each queue’s latency should stay within its target (as named), without having overprovisioned resources.

For most people, traffic and job volumes fluctuate too much to maintain this manually. You’ll want to autoscale your workers based on queue latency. With autoscaling in place, meeting those latency targets becomes trivial.

When jobs start waiting too long, spin up more workers; when the queues are empty, spin them down.

For example, if the within_5_seconds queue’s jobs are waiting >5 seconds, your autoscaler should add another worker process (or increase concurrency) for that queue. If the queue’s latency stays under 5 seconds, you can maybe scale down. We’ll talk about how to assign workers to queues next, which affects how you set up autoscaling triggers.

👀 Note

Built-in autoscalers default to CPU usage for scaling. Judoscale is a great autoscaler add-on that can scale your queues based on queue latency!

Speaking of queue assignment, how should we split up queues across Celery workers? I have a few opinions!

Your options for matching workers to queues

When it comes to queue-to-worker assignment, you have a couple of options. On one hand, you have one set of workers pulling from all queues. On the other hand, you have dedicated workers for each queue.

In between these two extremes, you might run some workers that each handle a subset of queues.

Running a single worker pool for all queues

Running a single worker pool for all queues is the simplest setup. It’s resource-efficient since any free worker can work on any task, and you don’t need to worry about balancing workers between queues.

However, the downsides are significant. You risk long-running tasks blocking high-priority tasks, plus it’s harder to autoscale effectively for all latency goals at once.

For example, suppose one Celery worker (with concurrency 4) is consuming within_5_seconds, within_5_minutes, and within_5_hours queues. If it picks up several very slow within_5_hours tasks (say tasks that each take minutes to execute) on all its worker processes, and then a bunch of new within_5_seconds tasks arrive, those fast tasks can’t start until a process is free.

All processes are busy churning on slow jobs, so even though the within_5_seconds queue is the highest priority, it’s effectively blocked. This defeats the purpose of having a fast queue!

Dedicated workers per queue

In this setup, each queue gets its own Celery worker process (or pool).

For example, you might start one set of workers with -Q within_5_seconds, another with -Q within_5_minutes, and so on. This completely isolates each latency tier.

The slow jobs in the 5-hour queue can never block the 5-second jobs, because they’re handled by different workers on possibly different machines.

Autoscaling becomes much cleaner because you can scale each worker deployment based on that queue’s latency threshold. The within_5_minutes workers only care about keeping that queue under 5 minutes latency, and if they’re idle, you can scale them down without affecting the queue time of unrelated queues.

The mental model is simpler, and each queue’s performance can be managed separately. The primary downside is the cost of running more separate processes.

The cost difference between one big worker vs. multiple smaller dedicated workers is often minor, and it’s far outweighed by the performance improvements. With dedicated per-queue workers, you also avoid starving out fast tasks with long-running ones.

A bit of both

One strategy is to try to group certain queues together on workers and isolate others. For example, maybe combine the within_5_seconds and within_5_minutes queues on one worker type, but keep the within_5_hours queue separate.

While this can work, any time you put multiple latency tiers on one worker, you reintroduce the possibility of interference. It also complicates autoscaling (which latency do you scale on for that combined worker?).

My recommendation

In summary, I recommend dedicated Celery workers per latency-based queue. It makes it straightforward to maintain each queue’s SLA.

If you’re on an autoscaling platform, set each worker deployment to scale up whenever its queue latency exceeds the target. To mitigate the potentially higher resource usage of this setup, I also recommend autoscaling your lower-priority workers (5 minutes, 5 hours, etc.) down to zero when the queues are idle. (Of course Judoscale makes this super easy 😁.)

If you’re doing this manually, you still benefit from clarity: you can monitor each queue’s wait time and add resources accordingly without guessing which queue is starved.

You should also look into other ways to effectively scale Python task queues, like fanning out large jobs.

One thing to keep in mind for Celery queues

One Celery-specific consideration that doesn’t apply to every queuing system is task acknowledgment timing. By default, Celery acknowledges a task as “received” when a worker picks it up. If the worker crashes mid-task, that task is dropped.

Setting acks_late=True (either globally or per-task) delays acknowledgment until the task completes. This means crashed tasks get redelivered, but it also means your tasks need to be idempotent, since they might run more than once.

If you’re using acks_late with Redis as your broker, pay attention to the visibility_timeout setting. This controls how long Redis waits before assuming a task was lost and redelivering it. The default is one hour. If you have tasks that need to run longer than your visibility timeout, they’ll get redelivered while still running.

For latency-based queue planning, the practical advice is that tasks in your fast queues (like within_5_seconds, within_5_minutes) should be short enough that the visibility timeout is irrelevant. For your slow queue, make sure your longest-running tasks finish well under the visibility timeout, or increase the timeout accordingly.

Shipping performant Celery queues

This opinionated guide for setting up your Celery queues is very much inspired by the strategies we know work well in the Sidekiq world. I hope this gives you some fresh ideas and a solid game plan for taming your Celery queues.

Remember, planning your queues boils down to:

Name queues by expected latency.
Isolate latency tiers on separate workers to avoid cross-interference.
Monitor and autoscale by latency.

Follow these steps, and you’ll avoid most of the common background job headaches that plague teams as they scale up.

Node.js Hosting Options

Jeff Morhous — Wed, 4 Feb 2026 00:00:00 +0000

Choosing the right hosting environment for a Node.js application will define much of both your development workflow and application performance. The hosting option you choose directly affects the developer experience (how easy deployments and updates are), the cost model of running your app, its scalability under load, and how much control (and responsibility) you have over your infrastructure.

For example, a fully managed platform can eliminate server maintenance at the cost of less flexibility and more money, whereas running your own server gives maximum control but demands more operational work.

Your goal in deciding on where to host a node app is to align your hosting choice with your app’s technical requirements and your team’s capacity to manage the underlying infrastructure.

Different types of Node apps have different needs

APIs built with Node are stateless request/response services and are a good fit for most hosting models. A Node.js API can run on anything from a cheap VPS to serverless functions, since each request is independent and typically short-lived.

Real-time apps (like those with WebSockets), on the other hand, need persistent connections. Things like chat apps or live dashboards require hosting that supports long-lived network sockets. Traditional servers or container-based platforms are often necessary here as pure serverless platforms often don’t allow WebSockets or constant connections. For example, Vercel’s serverless functions cannot hold always-on WebSocket connections, but they do support WebSockets through their Edge Runtime.

Server-rendered apps (think Next.js) are certainly a special case. Frameworks like Next.js generate (most) pages server-side and often do well with serverless deployment. Next.js is tightly integrated with Vercel, which offers zero-configuration deployment, serverless functions for API routes, and edge caching for static assets. Many teams choose serverless platforms for these SSR apps to leverage features like automatic CDN distribution and on-demand scaling without managing servers. However, this serverless approach comes with tradeoffs in execution time limits and statefulness, which we’ll discuss later.

First, let’s talk about the option that demands the most of you.

Hosting Node apps on a VPS (or similar cloud service)

Running a Node.js app on a VPS (Virtual Private Server), Amazon EC2, or cloud virtual machine gives you maximum control over the environment. But with that comes maximum responsibility.

On a VPS, you get root access to install any OS packages, configure the stack exactly as you want, and run any background processes you need. This flexibility is powerful for custom setups, but the maintenance burden on you or your team is high. You are in charge of everything under the hood.

Applying OS security patches, monitoring disk and CPU usage, setting up firewalls, managing backups, and handling scaling manually are all things you should be prepared to manage if you go this route.

Using infrastructure-as-code and containers can ease some pain, but won’t eliminate ops work. Tools like Kamal can simplify deploying a containerized app to a VPS. However, Kamal doesn’t handle the surrounding infrastructure needs. You still need to set up things like load balancers, databases with backups, log aggregation, and system monitoring yourself.

Containers help by packaging your Node.js app with its dependencies, making it portable and consistent across environments. But the VPS still needs to have everything the container needs. You’ll still be responsible for orchestrating containers, scaling them, and managing the host VM’s health.

Hosting on a VPS or cloud VM is fine if you need fine-grained control or have specialized requirements that platforms don’t support. But it’s not an option I can recommend unless you have a dedicated ops team (or you just really love that sort of thing). I’ve hosted small projects on a VPS, and it’s always been more headache than the cost savings I faced.

Hosting your node app on a PaaS

Platform-as-a-Service (PaaS) offerings strike a middle ground by handling most infrastructure concerns while still letting you run a “server-like” app. Platforms like Heroku, Render, Amazon ECS Fargate, and Fly.io are PaaS leaders.

They allow you to push your Node.js code (via Git or container image) and then they build, run, and serve your application in a managed environment. Platforms abstract away the server (or VPS) management.

Most platforms give you the option between using containers or not, so the above image could be even simpler, with you only managing the app itself.

With platforms, there’s very little manual configuration and management. You get a deployment platform that automates scaling, security updates, and (some) monitoring, usually through a web dashboard or CLI. Developers can focus on code and let the platform handle the “ops” heavy lifting.

Using a PaaS still provides you with the flexibility to run long-running processes and async job queues like BullMQ or Bee-Queue, which are things that pure serverless platforms don’t support.

The general-purpose nature of PaaS means it doesn’t matter whether you’re deploying a frontend, a Node API, or a background worker. This makes platforms the best option for most Node apps.

You get persistent Node.js processes that can maintain state in memory, hold database connection pools, handle WebSocket connections, and even schedule cron jobs without worrying about hitting an execution timeout or some vendor constraint. Essentially, it offers the convenience of managed hosting without the severe limitations on process lifespan that come with serverless function environments.

You get a managed environment that dramatically reduces your operations overhead, but you keep quite a bit of control.

But serverless is right for some apps! Let’s look into that next.

Hosting serverless Node apps on Vercel or Netlify

Serverless platforms like Vercel and Netlify have gained popularity, especially for frontend-oriented and Jamstack applications. Vercel hired much of the React core team away from Meta and has stewarded the development of both React and Next.js, which positions them well to support Next apps in particular.

In a serverless model, you don’t maintain a running server process. Instead, your Node.js code is deployed as functions that execute on demand in response to requests (or events) and then terminate. This model brings automatic scaling per request – every incoming request can spin up a new isolated function instance if needed, so capacity can increase seemingly without bound, and you never pay for idle time.

Vercel and Netlify both provide an experience where you connect a Git repo, and they build and deploy your site with serverless functions backing any dynamic endpoints or API routes. This gives a fantastic developer experience for certain use cases. Frontend-heavy apps get static hosting plus dynamic capabilities without ever thinking about servers, and things like CI/CD, CDN distribution, and SSL are handled for you out of the box.

I host my personal site and a few simple projects on Vercel and am quite happy with how hands-off it’s made hosting. For my simple Next.js app, Vercel is a very good fit and also free.

That being said, if I want to expand this application to include more functionality, I’d probably run into some limitations.

The first major limitation is that serverless functions on these platforms have hard time limits. This means you cannot do long processing jobs directly. If your Node app needs to generate a large report or process a big file, you’ll likely exceed these limits and the platform will kill the function.

Long-running tasks have to be offloaded to external services or broken into much smaller jobs. But Vercel and Netlify do not allow running arbitrary background worker processes. You can’t have a worker listening to a queue or a scheduler that continuously runs in the background. “Background Functions” on Netlify simply allow a single function invocation to run longer (up to 15 minutes) asynchronously, but they are not equivalent to a always-on worker process.

Vercel recently introduced scheduled functions, which are cron-like triggers, but these are just periodic invocations of serverless functions, not persistent jobs. Any asynchronous or delayed work in a serverless architecture has to be handed off to another system (using an external job queue service, or triggering an AWS Lambda via event).

This is a fundamental design difference. Traditional platforms (like Heroku, Render, etc) let you run a worker indefinitely, whereas on Netlify/Vercel, you might schedule a function to run every few minutes, but it will start fresh and then terminate each time.

Both Vercel and Netlify abstract away containers and don’t let you deploy a custom Docker image to their platform. You are limited to the runtimes and languages they support and the build process they provide. While the support is often sufficient, the platform’s provided environment is the only environment. Vercel and Netlify focus on source-based deployment and static assets, not running arbitrary containers.

They are great at what they do (fast frontend deployments), but aren’t general-purpose hosting for any kind of app.

Autoscaling a Node app

Scalability is a big question for web developers, and different platforms scale Node apps in different ways. Understanding your autoscaling options and their implications for performance and cost matters a bit for choosing a host.

On traditional setups like VPS or self-managed servers, scaling is usually manual unless you build your own scripts or use cloud vendor tools to spin up new VMs. By contrast, PaaS platforms typically offer some form of horizontal autoscaling for Node apps, but the responsiveness to load can vary.

Heroku, for example, has a built-in autoscaler (available on certain tiers) that can add or remove dynos based on response time thresholds. The caveat with this metric is that they might react sluggishly or scale at the wrong times.

This is why third-party solutions like Judoscale have emerged. Judoscale focuses on request queue time as the metric to decide scaling, which directly measures if requests are backing up due to a lack of capacity.

Judoscale will add more web processes as capacity demands it, and we also watch your job queues to autoscale worker processes. If you want reliable autoscaling on a PaaS, you want Judoscale.

Scaling on serverless is weird

Serverless platforms scale very differently.

Essentially, they scale per request by default. There’s no “instance” for you to add.

Every incoming event will find capacity by the provider launching more copies of your function as necessary. This leads to effectively unlimited concurrency out of the box, which is great for absorbing traffic spikes without any configuration. The flip side is limited control over this scaling.

Normally, every request that comes in will result in a new Node.js runtime starting if the existing ones are all busy. This is an awesome way to ensure reliability in a scenario where you traffic increases quickly.

However, there are two big tradeoffs: cold starts and cost unpredictability.

When serverless scales, many of those new function invocations might incur a cold start delay (a few hundred milliseconds or more to initialize a Node environment). In a high-traffic scenario, you could have lots of functions cold-starting, which might cause latency for some requests. More importantly, from a cost perspective, serverless billing is usually metered by time and memory per execution, plus any external service calls (like database or bandwidth).

If you get 1000 concurrent requests frequently, you pay for 1000 function runs in parallel, which can add up quickly. I see developers on X and Reddit all the time complaining that their Vercel bills ballooned under heavy load.

This isn’t to say serverless can’t be cost-effective. For super volatile but low-average traffic, it can be the cheapest option.

If you require tight control and predictability, a PaaS with the right autoscaling tool might be preferable. If you need to handle unpredictable surges and are okay with the stateless function model, serverless will do it out of the box. Just keep an eye on those usage metrics!

Picking your hosting option based on developer experience

I’ve thrown a bunch of information at you, but I don’t want to make my opinion unclear.

I think you should prioritize developer experience. Whether you’re trying to decide where to host a solo project or influence a decision for an enterprise, put real weight behind the developer cost that comes with the “cheaper” options.

Beyond that, the decision comes down to your application’s type and its traffic profile.

Ask yourself a few questions about your Node.js app:

Does your app require persistent connections or background processes? If it does, then a serverless platform (Vercel/Netlify) likely won’t serve you well. You’d lean towards a PaaS or even your own VPS if you’re okay being pretty hands-on.

How much ops work are you (or your team) willing to take on? If you have a strong DevOps skillset or an ops team, hosting on VPS or some pure cloud solution might be a good fit. You’ll get full flexibility to tailor the environment and potentially save on high-volume costs by squeezing more out of each server. But if you’d rather not deal with server management, then PaaS or serverless is attractive.

What are your scaling and traffic patterns? For relatively steady, predictable traffic, it can be more cost-effective and simpler to run a fixed number of servers (or dynos) on a PaaS or VPS. You won’t get surprises in the bill, and you can ensure they’re always warm and performant. For spiky or highly variable traffic, serverless is an option.

Choose the platform that fits the shape of your app and your team. For a typical web API or monolithic Node app that has a mix of web requests and background jobs, a PaaS will provide the least friction. If you’re building a highly interactive frontend-heavy app (especially with Next.js), deploying the frontend on Vercel or Netlify can be great for the static+serverless benefits, possibly complemented by a separate backend for any heavy lifting.

Choosing the Right Node.js Job Queue

Jeff Morhous — Mon, 5 Jan 2026 00:00:00 +0000

Modern Node.js apps often need to perform background jobs. Offloading to a job queue is a great way to preserve web performance when faced with sections of code that are too slow or resource-intensive to handle during an HTTP request. If your app needs to send emails, generate PDFs, process images, or aggregate data, you probably need background jobs.

Offloading these jobs (sometimes called tasks) to a job queue ensures your web process remains responsive and keeps latency down. A typical setup is to have your web processes enqueue jobs to an external system, and one or more worker processes consume and execute those jobs asynchronously.

This works well for keeping your web processes free and performant.

So you’ve got a Node.js app, and you know what needs to be passed off to a job queue. But do you know what job queueing system to use?

If you’re looking for a quick answer, I won’t make you wait. BullMQ is right most of the time. But let’s take a look at our options!

Bull and BullMQ for job queues

BullMQ is definitely the most popular Node.js job queue (especially if you also consider Bull).

It is a powerful queue library backed by Redis, known for its high performance and rich feature set. Bull can process a large volume of jobs quickly by leveraging Redis and an efficient implementation under the hood.

👀 Note

Understanding Bull vs BullMQ: One really important thing to note is that Bull’s original library is now in maintenance mode. The authors have moved efforts to BullMQ, a modern TypeScript rewrite that will receive new features going forward.

Jobs are persisted in Redis, so they won’t be lost if a worker crashes. Bull provides job persistence, automatic retries, error handling, and priority queues. Together, this gives you an unbeatable expectation of reliability.

BullMQ also supports multiple workers consuming the same queue, and you can configure concurrency (the number of jobs a single worker can process in parallel). This horizontal scaling ability means BullMQ can handle a lot of load and is also perfect for autoscaling, which we’ll get into later.

BullMQ is essentially a new (major) version of Bull, with mostly the same API and using Redis, but with improved internals. If you’re already using Bull, that’s fine. But if you’re starting fresh, consider BullMQ so you get long-term support and benefit from the improvements.

Since they’re Redis-based, Bull and BullMQ are naturally suited for modern web apps that may run across multiple processes. It’s no surprise Ruby’s Sidekiq uses Redis too. All workers connect to the same Redis instance, so adding more worker processes (whether permanently or by autoscaling) increases the throughput of job processing. Jobs will be pulled by any available worker.

BullMQ includes mechanisms to detect stalled jobs, such as requeueing failed jobs. For most web applications, a single Redis-backed queue can coordinate dozens of workers reliably. If your app already uses Redis, BullMQ fits in nicely. If not, you’ll need to introduce Redis just for the queue, which is probably a worthwhile tradeoff for the reliability it provides in most cases.

Bee-Queue for job queues

Bee-Queue is another popular Redis-backed job queue for Node. It’s designed with a focus on simplicity and speed, inspired by the shortcomings of older libraries. Like BullMQ, Bee-Queue requires a Redis instance to operate, a common theme we’ll continue to see.

Bee-Queue intentionally has a smaller feature set than BullMQ, trading breadth of features for low complexity and high performance. It gives us all of the core job queueing capabilities, but leaves out some of the advanced features of BullMQ.

This tradeoff is right for some people, as it’s notably easier to get started.

The library’s API is relatively straightforward. You create a queue, define a job processor function, and enqueue jobs. My time reading Bee-Queue’s examples and documentation has been stress-free as they’re very easy to understand. This can translate to faster initial setup and less overhead in learning the tool, something that’s really underrated in medium-sized software projects.

Despite being lightweight, Bee-Queue does include essentials for production. You get persistence in Redis, job completion callbacks, and even rate limiting and retry logic. It supports job timeouts, retry attempts, and will handle “stalled job” detection.

What it lacks is some features of Bull and BullMQ, like built-in priority levels or repeatable (scheduled) jobs.

Multiple Bee-Queue worker processes can consume from the same queue even if they’re on different machines, making scaling as simple as running more workers. This makes it a great fit for autoscaling scenarios.

In practice, you’d run one or more worker processes with Bee-Queue. If you need more throughput, just increase the number of workers, and jobs will be distributed across them. If you’re okay with using Redis (and most Node apps can add Redis via a managed service fairly easily), Bee-Queue provides a nice balance of simplicity and performance.

Still, it’s been 2 years since the last release of Bee-Queue, and the lack of recent maintenance/development may put off a lot of developers.

Agenda for job queues

Agenda is a different breed of job queue for Node when compared to BullMQ and BeeQueue. It is primarily a job scheduler built on MongoDB, not Redis! It focuses on scheduling jobs (think cron jobs and delayed jobs), but it also supports immediate job queuing with concurrency control.

Agenda is a popular choice, especially for teams already using MongoDB, since it uses your MongoDB database to store job information. If I were in a project not already using MongoDB, this wouldn’t be my first choice.

Agenda’s features overlap with BullMQ and Bee-Queue in some areas, but it has its own philosophy. Agenda stores jobs in a MongoDB collection, so if your application already uses MongoDB, you don’t need an extra infrastructure component for the queue. Jobs are persisted to the database, which ensures durability.

Agenda can also work with other databases (it supports a few Mongo-like interfaces), giving some flexibility in persistence. Still, it shines in scheduling future or recurring jobs. It offers a human-readable syntax (but still supports cron syntax) and the ability to schedule jobs at specific dates or intervals.

For example, you can schedule a job to run every day at 8 am, or run once a week, all using cron patterns or (close to) plain English. This makes Agenda ideal for background jobs that need to run on a schedule.

Agenda runs as a single process scheduler. It pulls jobs from Mongo and processes them in the same process. It does support concurrency (multiple jobs at once in one process) and can be scaled to multiple processes using MongoDB’s locking mechanism (to ensure two processes don’t run the same job).

However, scaling horizontally with Agenda is not as straightforward as with Redis queues. Agenda is generally single-master, meaning one instance should be scheduling to avoid duplicate scheduling of recurring jobs, though multiple workers can cooperate on different jobs. It’s not impossible to scale horizontally, of course, but the path isn’t as straightforward.

Agenda is probably best suited for applications that need cron-like scheduling and already use MongoDB. If you have a Node app in production that’s already using Mongo, you can use Agenda to schedule jobs without introducing Redis. It’s great for things like daily reports, periodic cleanup jobs, or any job that must run X times a day/week without needing to support another infrastructure piece.

Using a message broker like RabbitMQ

Instead of using a Node-specific library, you can opt for a message broker service such as RabbitMQ, Amazon SQS, or Google Cloud Tasks. These are not Node.js libraries. They’re external systems that Node can interface with through their APIs or client libraries.

For example, RabbitMQ is a robust open-source message queue that many large systems use. In a Node app, you might use a package to publish and consume messages from RabbitMQ.

The advantage of brokers like RabbitMQ is primarily reliability and advanced messaging patterns like acknowledgments and dead-letter queues.

Similarly, cloud services like AWS SQS or even Google Cloud Tasks are fully managed queues. They remove the need to run Redis or RabbitMQ yourself, which is attractive to a lot of people. These can scale virtually indefinitely and handle autoscaling scenarios by design.

The trade-off with using external cloud queues is that you’ll have to implement some features in your application code, like deciding how to schedule jobs or doing retries. Also, there’s a bit more latency as calls go over the network. Developer experience might not be as seamless as using a Node library, but if you prefer not to manage any infrastructure, they are a very reasonable option.

Autoscaling your workers

Scaling Node job queues is a necessary part of running them in production. Offloading intensive jobs to queues doesn’t do much for the performance of the queue processing itself, which isn’t that performant.

There are two big levers you can pull to scale your Node job queues. Vertical scaling means using more powerful workers with more threads/processes. Meanwhile, horizontal scaling increases the number of worker processes or machines. Comprehensive solutions require attention to both.

As we talked about above, the major Node job queues support horizontal scaling without too much hassle, so it’s worth putting some effort into. You can do this manually, but it’s best practice to set up an autoscaler.

This lets you keep your hands off, adding worker processes when your existing processes can’t keep up with demand, and removing them when demand allows, which saves you costs. Still, most autoscalers leave much to be desired. Heroku’s autoscaler doesn’t work for workers, and other major platforms that have support use CPU as the autoscaling metric, which is not an optimal way to measure demand on asynchronous worker processes.

Judoscale is a powerful autoscaler that you can add to most any hosting setup. The autoscaling algorithm scales based on queue latency, which is a much better indicator of queue well-being than CPU usage. If you’re running a Node app in production, try Judoscale’s free plan to see if it’s right for you.

Comparing Node job queue options and making a decision

My opinion here is somewhat controversial in that I think you should value developer experience a lot in your decision-making. That means using BullMQ unless you really need a ton of extra features, in which case use a message broker like RabbitMQ.

If your app environment already includes a certain datastore, leaning into that can simplify setup. For instance, if you use Redis, Bull or BullMQ will be straightforward to add. If you use MongoDB, Agenda might integrate more naturally. A solution that fits your existing stack usually means less friction for you, which I think you should place a premium on.

Black Box Hosting vs. Glass Box Hosting: An Interview With Judoscale's Adam

Jon Sully — Fri, 2 Jan 2026 00:00:00 +0000

Greetings, Judoscale readers! While we usually write our posts as a team, I (Jon) wanted to take a novel approach this time around. I wanted to interview Adam, Judoscale’s founder and still the head of our tiny team, to get his outlook on the marketplace of hosting as we begin 2026.

The goal here wasn’t to host a cage match between the various PaaS vendors currently on the market. It was to setup a scenario:

Let’s frame this conversation as a thought experiment: if you were starting a new startup today — something like Judoscale, but fresh — would you still choose Heroku? We’ll look at that decision through the lens of a founder building a real business, not a hobby app — meaning time to profitability, team velocity, cost structure, and technical tradeoffs all matter.

This isn’t a bashing session; I want to explore how the landscape has evolved and changed over the years, and what you might do today.

Then simply chat through it. I think we ended up with some interesting and valuable insights at both the technical layer as well as the business-leadership layer (e.g. solo dev trying to start a profitable app).

That said, I didn’t want to post a typical back-and-forth style Q&A article. Instead you’ll find concepts grouped together below, each with a little context beforehand. Enjoy!

👀 Note

One last note before we dive in — one of my (Jon) express goals in this interview was to be deliberately antagonistic. In reality, Adam and I believe mostly the same things (sorry Adam, I’m still not sold on Phlex…), but the goal was to tease out some reasoning by prodding and gentle pushing.

Okay, let’s dive into this thing!

The Black-Box Dividend

Possibly the most important thing when spinning up a new bootstrapped business is actually making money. That is, getting your product running and live — providing value for people that are willing to pay for it — as soon as possible. When it comes to your application architecture and hosting, then, paved roads will get you to your destination faster than carving out your own from scratch.

🕵️‍♂️ Jon

So… you’ve mentioned before that Heroku can be thought of as a “black box”, where I think you’re describing the lack of fine-grain control that Heroku gives, right? When you started Judoscale back in 2016, what did the black box buy you — and would it still buy the same thing today?

👨‍💻 Adam

Heroku’s value was super simple: git push and you get a URL. No server naming exercises, no AMIs to patch, no cluster ceremony. Buildpacks detected my Rails app and just… did the right thing.

I was building a product nights and weekends; I didn’t want to think about deployment or scaling. The black box let me ignore everything that wasn’t shipping. If I were starting that same kind of small, bootstrapped SaaS today, the black box still buys the same thing: focus.

🕵️‍♂️ Jon

Okay, but more specifically, what did it actually remove from your plate?

👨‍💻 Adam

Whole categories of work. TLS is handled. Rollbacks are boring and reliable. Runtime upgrades don’t feel like heart surgery. Logs show up where I expect them. Scaling from one dyno to a handful doesn’t require a new playbook. You do pay a tax for that, but you’re buying back time. For a solo dev or tiny team, that trade is almost always worth it early on. I just didn’t have that much time to spend.

The Glass-Box Leverage

Of course, here in 2026 the landscape isn’t simply Heroku vs. run-your-own-hardware-at-home. Fly, Render, Railway, and several other platform-based hosting services exist now. There’s competition! And there’s nuance. Many of these platforms are more open to complexity: bringing your own Docker images, choosing far more granular server resource tiers, and selecting geographical constraints, among so many other choices. That transparency (and complexity) can be good or bad.

🕵️‍♂️ Jon

Let’s contrast the “black box” with the “glass box” — platforms that give you far more control and allow you to get inside the box and tweak things. Do you think these ‘glass box’ platforms can actually beat the black box?

👨‍💻 Adam

I think the glass box is going to win if you really need portability and/or really specific resource granularity. Most of the glass-box options right now are built around, or at least support, Docker containers. Docker containers are sort of the common denominator between all of them. But that can be helpful because it means it’s easy to switch from one platform to another — you own the build script and take it with you. That leads to the second point. When you can switch providers fairly seamlessly, you can take advantage of whoever has the best price and/or resource tiers that your specific application needs. Just switch to another platform with your same Docker container and you’ll likely save some money.

🕵️‍♂️ Jon

Buuuuuut what’s the price of that flexibility?

👨‍💻 Adam

Well, it’s more surface area. Images, volumes, networking, health checks — you own more of it. Day-2 operations take more intention. You can absolutely beat Heroku on cost and control, but you’ll pay for it in time. And everything I just described is probably all more time than I’d want to spend on production infrastructure when bootstrapping a new app. I have features I need to build for my customers! But it’s nice that these platforms and strategies are all available right now in case I did want, or need, to go that route.

Unbundle the Risk

One thing I know Adam’s been a pretty big advocate for the last few years is using disparate third-party service providers detached from your hosting solution. So I wanted to dive into that here with a historical view: what he did previously vs. today.

🕵️‍♂️ Jon

Okay, let’s pivot to auxiliary hosting tooling. If you were starting fresh again today, would you still use add-ons from a PaaS marketplace, or would you buy direct from vendors?

👨‍💻 Adam

That one’s changed a bit over time. I now avoid marketplace add-ons whenever I can. Judoscale is a direct customer for almost all of our services: Sentry for exceptions, Scout for monitoring, BetterStack for logs and uptime, etc. Two reasons for that, really. First, it’s usually cheaper. Second, it’s portable. When our third party services are separated from our compute, we don’t have to worry about moving them when we move our compute.

Same with databases: I want a third-party provider, be it CrunchyData, PlanetScale, Tiger Data, etc. The teams behind those database services only care about their database services. It’s not a side-product for them. The UI’s, metrics, and controls are way better than the bolted-on database services offered by most hosting providers.

🕵️‍♂️ Jon

But doesn’t adding a bunch of third-party providers and connection inevitably add a lot of complexity to your mental understanding of your app?

👨‍💻 Adam

I think it tends to add an account and a connection string. But at the same time, it removes a migration nightmare if you should ever want to move your compute. If compute and data are decoupled, you can move one without detonating the other. I think that’s worth it.

On Leaving the Black Box

We’ve covered some of the nuances of the “black box” and “glass box”, but I’m still curious what might drive people to actually migrate across the chasm, auxiliary services aside…

🕵️‍♂️ Jon

Adam, what actually pushes people off Heroku?

👨‍💻 Adam

Granularity. The jump from a $50 dyno to a $250-ish dyno is harsh, and it’s often just to buy memory headroom. Fly/Render give you more intermediate steps. If you’re scaling on thin revenue—which is normal early on—it’s hard to justify that cliff. That’s the moment teams start looking over the fence.

Interjecting here for a moment — Adam’s referencing the lack of options between Heroku’s std-2x dyno type and their perf-m. For many users, std-2x dynos lead to headaches when trying to process large files and/or data, while jumping to perf-m feels like overkill both in terms of capacity and cost.

If that’s something that resonates with you, we actually just published a strategy for getting the best of both worlds: “Dealing With Heroku Memory Limits and Background Jobs”.

🕵️‍♂️ Jon

So does that make Heroku the wrong choice?

👨‍💻 Adam

No, it makes Heroku a great early choice and a question later. If you’re pre-revenue and traffic is modest, the black-box dividend (focus) is worth the tax. If you’re high-traffic/low-ARPU (Average Revenue Per User), the math flips fast. That’s when a glass-box platform’s pricing steps feel sane.

Compute Is Commodity; DX Is Not?

One thing that all PaaS’s obviously have in common, regardless of what we call them or how we pay for them, is raw compute power. But how we developers can efficiently leverage that compute, and how fast we can do so might be a different question altogether.

🕵️‍♂️ Jon

Do you care what the container is called (“dyno”, “machine”, “pod”, whatever) and/or how it’s built?

👨‍💻 Adam

Call it whatever… It’s all compute. What I care about is: how much work do I have to do to set up and maintain it?

Heroku’s buildpack approach is still a great default for Rails. Docker is great for portability — especially on platforms that want you to bring an image. All that to say, I don’t obsess over containers or their construction; I optimize for how much developer energy managing them consumes.

🕵️‍♂️ Jon

Sure but it’s 2026 — many years after you started Judoscale. If you were starting again today, like we said, would you go Docker/Dockerfile from day one?

👨‍💻 Adam

Honestly I’m not sure. I really like the “cloud native buildpacks” that seem to be cropping up, and having moved Judoscale across Heroku, Render, Fly, Railway, and ECS, I’ll be the first to tell you that having a Docker file ready to go is extremely handy.

I’d probably recommend just keeping a Docker file ready even if you don’t use it. It feels like a good spare tire.

Support, Sales, and the Human Stuff

We’d be remiss to ignore the soft edges (support and sales) because they become hard edges during incidents and procurement.

🕵️‍♂️ Jon

Any lingering frustrations with Heroku?

👨‍💻 Adam

Two. First is compute granularity, which we already covered. Second: support and enterprise sales have a reputation for being slow and not particularly helpful. We run a small team and prefer transparent, self-serve pricing; I don’t want to talk to sales to get a number… I don’t want an enterprise contract to just use the service. Anecdotally, other teams have had rough experiences there. It’s not a deal-breaker for a small shop on self-serve, but it’s part of the picture.

🕵️‍♂️ Jon

Have you found better elsewhere?

👨‍💻 Adam

I don’t have enough firsthand experience with Fly/Render support to compare. What I do know is that the product choices—granular compute, Docker-first—have reduced the number of times I’d need support in the first place.

🕵️‍♂️ Jon

Fair point!

Simple Rules We Actually Use

Let’s start wrapping this whole thing up! I wanted to ask Adam to summarize some of the topics above into a straightforward path… specifically how he might go about starting Judoscale today if he was starting Judoscale today:

🕵️‍♂️ Jon

Okay, let’s say that you were launching Judoscale again today: trying to bootstrap a real, profitable business from scratch, just you. What’s the plan?

👨‍💻 Adam

My default choice is going to be to start on Heroku and optimize for time-to-first-dollar. I want to get the app built and delivering value as soon as possible, and I don’t want to waste time on infrastructure details. The only caveat there is if I know I’m going to have high traffic and thin margins from the start. In that case, I might choose Fly. Either way, the goal is to get to first-dollar fast.

Otherwise I’d unbundle my services: third-party, direct account for DB, logs, error-tracking, etc.

Finally, I’d take a strong stance of YAGNI around most scaling and infra concerns. I wouldn’t build for scaling issues I don’t have yet — I’d flip on a simple autoscaler (like Judoscale!) and move on to my next feature.

Oh, also, no Kubernetes. Hard line here. It’s way too much surface area and a waste of time for small teams just getting their footing.

🕵️‍♂️ Jon

That last one is going to ruffle some feathers.

👨‍💻 Adam

That’s fine. Complexity makes us feel important as developers. It also makes us slow. Keep it simple until reality—paying customers, not theoretical scale—forces your hand.

Wrap Up

I started this interview assuming we’d land on a winner. I thought for sure, after all these years, Adam would still land on Heroku! But Adam nudged me to a better question: How much of the machine do you need to control right now? Early on, “black box” hosting buys momentum you can’t afford to lose. As traffic grows and dollars stay stubborn, “glass box” hosting might make the math worth looking at again… especially if you’re already unbundling other services and can spin up a Docker image quickly.

Anyway, thanks for joining us for this candid conversation with Adam, and we hope it lends some clarity as you navigate your own hosting choices and business journeys! As always, keep building and keep questioning, because sometimes the best answers come from challenging the assumptions we hold most dear.

Totally disagree with us? Think Adam’s way off base about something? Let us know over on Reddit, here.

Process Utilization: How We Actually Track That

Jon Sully — Tue, 25 Nov 2025 00:00:00 +0000

Over the last few months we’ve published a couple of articles talking about our new “Utilization”-based autoscaling option. The first talked through the use-cases for this new option — when it’s useful and who it’s for (“Autoscaling: Proactive vs. Reactive”). The second was a bit more nitty-gritty, explaining the high-level concept for how we’re tracking this ‘utilization’ metric (“How Judoscale’s Utilization-Based Autoscaling Works”)…

This post is the nerdy sequel to the latter: the actual boots-on-the-ground / nuts-and-bolts of how we attempted to track process utilization, how that proved to be a bad setup, and the clever idea that lead us to a way better v2. This is the story of low-level measurement with sampling, thread safety, and lackluster results leading to new ideas 😅.

The job to be done

As per our second post in this saga, our definition of ‘utilization’ is based around an idle-state. Paraphrased, it’s essentially:

Measure the fraction of time a web-server process is handling at least one request, then aggregate that across all processes over time.

Two constraints forced us to think carefully:

Extremely low overhead. Judoscale is a performance tool; it’s an autoscaler that’s intended to help your application soar. It is not something whose client code should impact your application! The Judoscale package should have a perceivably invisible performance impact on the app running it. Full stop. No compromises.
Correct values in a multi-threaded world. While Ruby, Python, and Node can operate in an asynchronous fashion, and that asynchronosity can be valuable for serving many web requests at once, we need to be very careful in collecting values. It’s easy to accidentally collect thread-level metrics which then overlap and become very confusing. We need to be careful to stay up at the process level.

So… now we need to actually write some code: how do you actually capture the idyllic “idle time” of a process in a real application receiving real traffic?

Attempt 1: Background Sampling

Our first proof-of-concept was built around running a mostly dormant background thread. It would essentially wake up every few hundred milliseconds, ask “is this process handling any requests right now?”, record that yes-or-no, then go back to sleep. Voilá: utilization!

It was easy to ship, but it had issues. Notably…

Aliasing difficulties. Bursty traffic and short requests can fall between samples. Imagine a process that handles a flurry of 30–50 ms requests. With a 250 ms sample rate, many bursts are invisible; you under‑count busyness simply because you looked away at the wrong moments. Whoops!

Jitter vs. overhead trade‑off. If we increased the sampling rate to reduce aliasing, we immediately hike CPU wakeups, heap churn, and lock contention (on every process, 24/7!) even when your app is idle. Oof ☹️

Low signal‑to‑noise. Inherently, sampling produces a staircase approximation of a curve. Real utilization is a smooth “busy/idle timeline.” Our samples were a blurry thumbnail of a scene that actually mattered.

I personally tend to visualize this, oddly enough, as a mathematical curve on a chart (oh how my high-school math teacher would be proud). Imagine we have some real curve of data, perhaps like this:

Okay, great. Now let’s pretend we don’t actually know what that curve looks like and we’re taking a sampling-based approach to figuring it out. What we end up with is a bunch of samples. That might look like this:

Which might be fine for some cases, but we’ve clearly lost several details from the original curve — the fast spikes and drops, in particular. Thus the issue of sampling rates is seen: sample too slowly relative to how fast your data actually changes and you won’t capture a high-detail image. Sample too quickly…

You end up with a great representation of the curve, but you took up way too much horsepower constantly waking up and reading those samples. It’s hard for an app to actually serve its requests when the thread scheduler is constantly switching back to a background thread asking “HEY ARE YOU SERVING A REQUEST?!” (“I’M FREAKING TRYING TO, THANK YOU VERY MUCH!!!”).

When we’re talking about requests that might take 5ms, 50ms, or 150ms to fully handle and deliver, a sample rate of 250+ms just doesn’t capture the details. And a faster sample rate feels heavy-handed. This wasn’t going to work…

Attempt #2: Event edges + a tiny counter

Okay, to be fair, the line curve I gave above was a little disingenuous to the actual type of data we’re trying to track. Utilization, as we’ve defined it, isn’t a curve with smooth radii and roller-coaster-esque waves. As we’ve defined it, instantaneous utilization is either a zero or a one. A process is either busy, or it is not. If we were to plot that on a chart, it would actually look more like this:

That is, a square wave representing a binary signal. Unfortunately, a square wave signal can actually make sampling results even worse. Check out how wrong an ill-timed sampling pattern can get:

I left the green line slightly opaque for reference

If you believed your sample data in that case, you’d think the signal is almost always “on”, but that’s not true.

👀 Note

Fun math fact: the fewer possible points on a Y-axis there are, the worse the infrequent-sampling-effect (observing statistically incorrect data because you’re sampling too infrequently) can become. When your Y-axis range is just 0-1 you actually need to sample far more frequently to capture the binary signal with any real integrity. It’s much harder than a flowing curve!

If you’re curious for more of the math here, read up on Bernoulli distributions and binomial variance 🤓

Anyway, the novel idea ended up being beautifully boring: don’t poll at all, just record state transitions cleverly. If we simply track the timestamps of when a process leaves and returns to idle, we can realize the real, true value of “how much time was it non-idle”? That looks like this:

And once we have the blue blocks, we can simply add them all together for a given timespan, then say active_time = (blue_block_total) / total_time. Sum the rectangles! Boom!

The Benefits of Edge-Tracking

Tracking the state-changes (we’ll call them “edges” for math’s sake) has some really fantastic benefits over polling.

Computational cost: instead of constantly waking up a thread to check in on current requests (which requires stack shifting, single-threaded locking switches, etc.), we instead can simply read and/or write against a process-global timestamp register when any request starts or ends.
Correctness: instead of hoping a reasonable sample rate provides a decent guess at the actual curve being modeled, we instead know the exact amount of time that a given process is non-idle! There’s no guess.
Reliable for all traffic shapes: Sudden request waves, thin bursts, long I/O waits — they all work. If a worker is non‑idle, it gets counted correctly and appropriately.

Once we realized this route, we quickly understood that it was all upside. There’s no catch here! A purely better approach born of a realization that we’re tracking binary signals, not actual curves.

Let’s See Some Code

✅ Tip

Just a note before we dive into the code: we developed our utilization-based tracking and scaling in Ruby first, so these examples are going to be in Ruby. But since this new approach is agnostic to any language specifics, we have the same implementations for Node and Python 🎉 it’s all the same when you’re just tracking edges!

The great news with this new approach is that it’s so simple I can share the real code that implements it here in a blog post. This code is taken straight from the judoscale-ruby Github repository, which houses all of the Ruby packages Judoscale publishes.

👀 Note

One caveat in this code: while my diagram and example above focused on showing that we track “busy time”, our actual implementation is inverted: we track “idle time” rather than “busy time”.

Tracking “busy time” is slightly easier to grok (and build diagrams for!), but in reality our code does this:

It’s the inverse, so the math still all checks out, but understanding both “busy time” and “idle time” are useful for us! We just went with idle-side tracking for our code because it ended up slightly simpler. Check it out!

First, we have a Judoscale::UtilizationTracker class. It has a few methods and helpers in it, but the important parts start with the incr method (short for “increment”):

module Judoscale
  class UtilizationTracker
    # ...
    def incr
      @mutex.synchronize do
        if @active_request_counter == 0 && @idle_started_at
          # We were idle and now we're not - add to total idle time
          @total_idle_time += get_current_time - @idle_started_at
          @idle_started_at = nil
        end

        @active_request_counter += 1
      end
    end
    # ...
  end
end

First, keep in mind that this method is going to run every time a request comes in (starts). So, since we’re going to be incrementing a request counter and idle-time timer across multiple threads, we do need to use a simple Mutex (@mutex is simply a Mutex.new from the Ruby standard library). Once we’re certain that we can safely update our process-level variables, we need to do two things: mark that our “idle time” has ended, and increment our active-requests counter.

Pretty straightforward, there! Since this block may run as a multi-threaded application server picks up a request on thread #2 or #3, we’re careful to only end our “idle” timer if there aren’t already any requests being processed (if @active_request_counter == 0).

On the flip side, we have a decr method that runs every time a request finishes (ends):

module Judoscale
  class UtilizationTracker
    # ...
    def decr
      @mutex.synchronize do
        @active_request_counter -= 1

        if @active_request_counter == 0
          # We're now idle - start tracking idle time
          @idle_started_at = get_current_time
        end
      end
    end
    # ...
  end
end

This one’s even simpler: decrement the count of active requests by one and, if that was the last request in flight, mark that our “idle time” has begun — the process is now idle!

The end result of these two functions working together is an accurate value stored into @total_idle_time which, in real time, tells us the number of milliseconds the process was idle.

The last piece of the puzzle, then, is to report that ratio and reset that variable/register! We do that in one last method on Judoscale::UtilizationTracker:

module Judoscale
  class UtilizationTracker
    # ...
    def get_idle_ratio
      @mutex.synchronize do
        total_report_cycle_time = current_time - @report_cycle_started_at

        # Capture remaining idle time
        if @idle_started_at
          @total_idle_time += current_time - @idle_started_at
          @idle_started_at = current_time
        end

        idle_ratio = @total_idle_time / total_report_cycle_time
        @total_idle_time = 0.0
        idle_ratio
      end
    end
    # ...
  end
end

Some background here: Judoscale packages report back to Judoscale servers every 10 seconds (using a zero-performance-impact background POST) with a handful of capacity metrics about the application. In this case, @report_cycle_started_at represents the timestamp at the start of that 10-second bucket. Since we’re trying to figure out the idle ratio, we need to divide the idle time over the total time. “The beginning of the bucket until now” is that “total time”.

Once we have that, we have a special case for when this code runs while the process is actively idle as to prevent over-counting or under-counting idle time. Since our “report cycle” observation window might start/end during an idle period, we need to handle that carefully. Visually, that’d look like this:

Finally, we compute the idle ratio (a decimal, like 0.88 or 0.37), reset the @total_idle_time back to 0.0, and yield that idle ratio as the result. ✨

The last piece of code I’ll highlight is a layer up — the request middleware itself. This class, Judoscale::RequestMiddleware, is essentially what wraps every Rack request before and after it’s handed down to the Rack application itself. I’m chopping out a lot here, but the bits pertinent to our discussion remain:

module Judoscale
  class RequestMiddleware
    # ...
    def call(env)
      # ...
      tracker = UtilizationTracker.instance # Singleton
      tracker.incr

      # ... lots of other code

    ensure
      tracker.decr
    end
    # ...
  end
end

Essentially we’ve created a two-part contract:

Every time a request starts, we guarantee we’re going to call #incr on the Process-level singleton instance of UtilizationTracker
Every time a request ends, regardless of how or why it ends, we guarantee we’re going to call #decr on that same singleton instance (thanks, ensure!)

This is the glue that ensures our data inside of UtilizationTracker is consistent and accurate over the lifespan of the process. Isn’t it great?!

Aggregate It Together

Zooming out a little bit, we’ll conclude the deep-dive with a sense of how the aggregation works beyond a single process. Let’s say that you’ve got 2 production web services/dynos/containers/etc. running, and each runs 4 web processes. Since each process POST’s back its own metrics every 10 seconds, that means our back-end is going to get 8 data-points about your application’s overall web-process idleness/busyness. Maybe for a given 10-second bucket Process #1 on server #1 showed an idle ratio of 0.66 (that is, it was idle for two-thirds of that 10-second window), while process #4 on server #2 read a ratio of 0.22 (meaning it was handling at least one request almost the whole bucket).

Once we have all of the data points, the aggregate is actually simple: we average them together. For example, then, if we received these data points:

Server	Process	Idle Ratio
1	1	0.56
1	2	0.77
1	3	0.48
1	4	0.39
2	1	0.81
2	2	0.44
2	3	0.52
2	4	0.62

For that bucket, our average idle ratio would be:

(0.56 + 0.77 + 0.48 + 0.39 + 0.81 + 0.44 + 0.52 + 0.62)/8

Which is 0.57. So then, that application was idle 57% of the time (for that bucket) and, inversely, busy 43% of the time. Thus, that’d be a 43% utilization metric for that bucket, as we’ve defined it. Gathered, collected, and aggregated simply.

Wrapping It Up

If there’s a theme to this little blog-post saga, it’s that the simplest model that matches reality tends to win. We started by trying to guess at busyness with background sampling, only to discover all the usual traps: aliasing, jitter, and overhead. Then we reframed the problem to match the truth on the ground: a process is either idle or it isn’t. Record the edges. Sum the rectangles. Report the ratio. Done.

That shift gave us three things you actually feel in production: lower overhead, correctness across weird traffic shapes (long I/O, tiny bursts, mixed workloads), and numbers you can trust enough to automate against. When an autoscaler acts on a metric, the worst feeling in the world is, “ehh, it’s probably fine.” Edge-tracking turns “probably” into confidence.

And the aggregation story is intentionally boring, too. Each process tells us how idle it was in the last 10 seconds; we average those into an application-level picture. No fancy weighting, no black-box magic. If your fleet spends 57% of a bucket idle, that’s 43% utilized. That’s a number you can reason about, chart, alert on, and scale from.

So if you’ve been skeptical of utilization-based autoscaling because it felt hand-wavey or weird, we hope this demystifies it. The implementation is small on purpose, tested in the sharp edges of real apps (including our own!), and designed to vanish into the background until you need it. Watch your utilization settle into patterns you recognize, set the thresholds that reflect your own tolerance for headroom vs. cost, then enable utilization autoscaling.

In other words: measure what matters, measure it honestly, and keep the math simple enough that you’ll actually use it.

Scaling Sideways: Why You Might Want To Run Two Production Apps

Jon Sully — Wed, 5 Nov 2025 00:00:00 +0000

We’re really trying to optimize for our public website’s performance for SEO reasons…

…was the core theme of our meetings with one of our customers a few weeks ago. They run a Rails application with several different ‘sectors’ — a public website, two different user portals, and an admin ‘backend’ with several internal tools. It’s not an extremely complex application, but it is diverse in its traffic. After chatting with them for a few hours, we had a great solution ready for them — one we use ourselves but feel isn’t talked about enough! Running a second prod app.

👀 Note

Did you know that we love meeting and chatting performance, strategies, and scaling? Whether you’re a Judoscale customer or not, we’d love to hop on a call, screen-share, or whatever, and chat it out — just set up a call with us! Totally free.

We’re going to dive into that story and our clever suggestions for scaling sideways, but before we do, let’s clarify some terms so this doesn’t all become terribly confusing! We’ll use “main app” to describe the existing, single production application instance. We’ll then use “second app” to describe the new, separate clone of the main app — an instance still running all of the production app code (with all the same environment configs, etc.) but which is separate (more on that in a moment). Alright, let’s dive in!

What We’re Solving For Here

This particular customer has a very SEO-driven business. That means that their public website, which is served by their core Rails application, needs to be excellent: fast, steady, predictable, burst-ready. But the app houses several other sectors which are older, slower, and less performance-friendly — we all have ’em!

We see you, Google!

Unfortunately, in a multi-threaded world (hello, Puma), those slower endpoints don’t just take longer for the people who hit them; they raise the waterline for everyone by occupying threads that subsequent would-be-faster requests must wait on. The result is a p50 (average) request time that looks pretty reasonable… but a p95 that’s much worse and erratic. Oh, and a support channel that pings for performance issues when there seemingly aren’t any.

From a telemetry and metrics standpoint, we’ve seen this issue plenty of times: CPU saturation is nonexistent and database resources look boring, but request queue time (the metric that matters) spikes randomly and p95s are all over. In the case of our customer, it’s not that their public website got slower, per se; it’s that the requests for those public site pages had to wait. Thus we’ve met an old truth: multi-threading increases throughput but amplifies latency (something we dissected in “Why Did Rails’ Puma Config Change?!”). Boil it way down and it’s hosting costs vs. p95s.

Spiky p95’s and a WAY lower p50/average

But the reality for this customer is that they needed to tame and stabilize their p95 response times for their public website. Appeasing the finicky beast that is Google Search Ranking is a broadly unknown game, but stable performance does seem to be a factor.

The good news here is that we’ve got a creative solution. We call it “scaling sideways” — slightly different than ‘horizontal scaling’, yet still horizontal in concept: running a second, but subdomain-separated, instance of your production application.

Scaling Sideways

Let’s expand on the specifics of this strategy, since “scaling” can be a bit of an overloaded term. What we’re describing here isn’t “scaling” in the sense we’re likely all used to these days: changing the number of webserver or worker instances your production application is running at any given time (the core premise of Judoscale itself). Instead we’re talking about “scaling” in a much slower and more methodical approach: running a second production application, which is essentially a clone of the main app, on a separate subdomain with separate infrastructure. It’s still the same code-base, same deployment branch, and really should have all of the same environment and configuration variables… just a different place to request the same data and/or pages.

The key to this strategy is offloading traffic to slower and less consistent endpoints to the second app (via its subdomain) so that your main app can handle its own traffic more consistently and quickly. The main app becomes the home for predictable, latency-sensitive endpoints; the second app absorbs the messier stuff without letting it bleed into the public experience.

Luckily, we don’t need a microservices migration plan to do this. We’re not decomposing the domain model; we’re just decomposing our runtime. One deliberate split is enough: the fast path (main) and the heavy/volatile path (second). The payoff is that your main app’s thread pool stops babysitting slow requests and blocking higher-priority endpoints. Queue time stabilizes. Tails compress. (…Engineers stop arguing about whether going single-threaded everywhere is “worth it.”)

When Is It The Right Move?

We should recognize first that this strategy isn’t perfect for every application. It shines when at least one of a few conditions are true:

Really channeling my inner XKCD here…

Your traffic has distinct “shapes.” If one slice of your app is bursty, slow, or just unpredictable (admin pages, CSV exports, report builders, portals, ‘real time’ (polling) dashboards), while another slice must feel instant and boring (marketing site, signup flow, product pages), you’re a great candidate. Sideways scaling lets you build a fast-lane for the steady stuff and a truck/carpool-lane (or two) for everything else.

You have different SLAs for different routes. Some requests just matter more. If a public route missing its p95 target is business-critical (SEO, ad landing pages, checkout, conversions), prioritize it on the main app and give it a calmer thread pool. If an authenticated portal can tolerate higher p95s without harming KPAs or other business targets, move it to the second app.

You can influence where traffic goes. This sounds obvious, but you need a lever. Many teams already have it: front-end fetch() calls, Turbo Frames/Streams, HTMX targets, or API clients you control. If you can change hostnames in those calls, you can steer traffic to the second app with minimal risk and no user-visible disruption. Especially if these calls are transparent to a browser’s address-bar.

SEO is part of the story. If Google’s crawlers matter a great deal to your business, you might consider splitting your public site from your other application chunks. Instead of the classic “let’s just rewrite the marketing site to static”, you get a lot of the benefits of a dedicated marketing site system (the main app) while retaining all of the comforts of a unified code base and singular mental/domain model.

Judoscale Does It, Too!

As it turns out, Judoscale itself satisfies three of those bolded conditions above. The Judoscale architecture is built around customers installing the Judoscale package, which is essentially just a light-weight monitor for request and job queues within the app. Those metrics ultimately get POST’ed back to Judoscale servers for processing and aggregation. Nice! But those POST’s happen every ten seconds for every process over thousands of applications. We have a ton of API traffic. As in, 3000-3500 requests per second 24/7.

Then, of course, there’s the Judoscale dashboard and user UI where you can see your metric charts, tune your scaling configuration, and do standard SaaS things. While those charts do have automatic 10-second update polling built-in, the traffic for that entire sector of the app trends much closer to about 50 RPS.

So… we (1) definitely have different ‘shapes’ — our API traffic is tiny payload and ultra-fast response whereas our dashboard traffic is small-to-medium payload and variable response. Additionally, we (2) definitely have different SLA’s for these two shapes. Our API needs to be available, but response times can fluctuate (there’s no human waiting)… whereas our dashboard needs to be as fast as possible since it’s customer-facing. Finally, we (3) can control where the majority of our traffic goes by tweaking the client packages to POST somewhere else (and/or some smart routing with Cloudflare).

We’ll get to the implementation specifics below, but hopefully this gives you an idea of the versatility of scaling sideways: applications completely non-SEO focused can still benefit greatly from segmenting traffic in this style.

How You Actually Do It

Spin up a clone of your main prod app. Same repo, same deploy pipeline, same environment variables (with a couple exceptions we’ll note). Point it at a sibling subdomain — ww2.example.com, api2.example.com, or simply 2.example.com all work. The goal is sameness: both apps should boot the same code and talk to the same primary dependencies (database, cache, storage, queue, file storage [S3 et. al]). Differences should be intentional and minimal: web process counts, thread counts, and possibly instance sizes.

From there:

DNS & routing. Create the new subdomain and point it to the second app’s router/load balancer/DNS target.
Environment parity. Duplicate secrets and env vars (including SECRET_KEY_BASE/equivalents so session cookies work across hosts if necessary — more on this below). Consider different Puma thread counts between apps (more on this below too!).
Traffic split. Start by moving non-navigational traffic: API calls from your front-end, background polling, Turbo Frames/Streams targets. These won’t change the URL in the address bar, so the move is low-risk.
Progressively offload. Next, migrate heavier, authenticated pages and long-running endpoints to the second app. Be deliberate around what addresses users might see in their browser’s address bar!
SEO guardrails. Add canonicals on anything public your second app might serve, ensure robots blocking is in place for that host, and keep sitemaps + social meta rooted on the main app.
Observability. Watch queue time and p95s on both apps. You should see the main app flatten out quickly.

Most importantly, treat this like a runtime composition change, not an architecture rewrite. You can ship it safely in small patches and keep rolling forward.

What Actually Moves

A practical rule of thumb:

Stays on the main app: canonical public pages, sitemaps/robots, OpenGraph/Twitter cards, landing pages, docs/blog, marketing flows, and any route that shapes your public narrative or crawlability.
Moves to the second app: authenticated portals, JSON APIs, front-end-driven fragments (Turbo/HTMX/Stimulus/etc.), polling endpoints, file uploads/exports, batchy or I/O-heavy controllers, and admin tooling.

For navigations, you have options but need to be intentional. Keep in mind that browser address bars remain highly useful for users copying or pasting URL’s in/out and potentially sharing those URLs with others. For intra-portal / authenticated endpoints it may not matter than a user sends a colleague https://2.example.com/portal/book/5 (especially if the colleague would’ve ended up forced over to the second app to log in to the portal anyway!).

But for resources and endpoints where the goal is speed and public accessibility, we’ll want to keep those endpoints pointing against the main app.

The good news is that we can be clever. For instance, if an endpoint is slow and synchronous (not recommended but we get it, it happens) yet must result in a public URL, we can still POST to the second app and do the work synchronously in that controller. We just need to make sure the response from the second app redirects back to the first. And since they share the same database, you can fluidly (for example) do an expensive create operation in the second app then immediately redirect to the now-existing record on the main app with confidence. There’s no delay in data propagation between the two applications!

In the case of our customer, this meant offloading most of their user portals and internal admin tools to the second app. Their public marketing site stayed put and immediately got calmer metrics. Problem solved!

Judoscale’s Setup

We mentioned earlier that Judoscale also runs a dual-prod-app setup, but we arrived at our split for different reasons — and with a different emphasis. We’re sharing that to underscore there isn’t one “right” pattern. For us, it was more about cost and UX than isolating slow paths… most of our endpoints are already fast!

Rather than sending volatile endpoints to a second app, we split by human interface. Our main app (app.judoscale.com) is the customer dashboard, so we tune it for UX: snappy, steady, predictable. Our API app (api.judoscale.com) serves the bulk of our traffic, but it’s non-human-facing and can tolerate small, occasional latency blips. The machines don’t mind! But people do. It’s not the fast-vs-volatile split we describe above (which is still the right path for this customer), but it delivers similar benefits: each runtime is optimized for what matters most to it.

Practically, this lets us fine-tune the API runtime for throughput and cost (concurrency, process counts, aggressive autoscaling) while keeping the main app conservative for a consistently great feel. The net effect: a calmer UX and lower hosting spend (more on cost below..). For many, the canonical split paradigm might be “fast vs. volatile” but for us it was “UX vs Cost”. It’s a different motive but the same playbook: split out a second prod app.

A Caveat on Cookies, Auth, and Subdomains

If you’re going to use a second app for a disparate, separate API or fully segmented authentication mechanism (like Judoscale did), feel free to skip this section. If instead you’ll be cleverly (and carefully) shuttling users between the two apps, we need to discuss shared authentication across subdomains.

The simplest way to accomplish this is to setup both applications with the exact same secret key base (or equivalent) so that cookie and session cryptographic signing validates to/from both. That is, if you log in on the main app, a subsequent request to the second app will see that you’re logged in. This strategy upholds the “keep both apps the exact same” principle by keeping sessions transparent between them. Both apps will read and write to the same session/cookie.

Once both applications are running the same keys, you’ll need to ensure that the actual cookie policies are setup correctly for both apps. Essentially we need to make sure that both apps are emitting cookies with the same sharing configurations setup so that browsers will send the same cookie to both apps. In Rails that might look something like this (for session storage via cookie):

Rails.application.config.session_store(
  :cookie_store,
  key: "_my_app_shared_session_key",
  domain: ".example.com",      # explicit eTLD+1; covers example.com + subdomains
  expire_after: 1.year,
  secure: true,                # if this fails in specs/tests, switch to `!Rails.env.test?`
  same_site: :lax,             # mitigates CSRF while allowing subdomains
  httponly: true
)

But, as with all things security-related, make sure you understand every config component here and are confident in your security strategy amidst sharing cookies between the two apps. YMMV.

Magic, P95’s, and Threads

It’s worth taking a little detour here to assess the magic of what we’re presenting: it isn’t. There’s no real magic at play here — this is just simple queueing theory with friendlier furniture. We’ve talked about queueing theory broadly in “Queue Time: The Metric that Matters” but the mechanism at play in scaling sideways isn’t radical. When slow requests leave the main thread pool, fast requests stop waiting behind them. That means lower overall variance in request speeds (e.g. lower p95’s) and an app that users will probably describe as feeling “snappier”.

👀 Note

Of course the slowness has to go somewhere… but we can be much more relaxed around the variance and volatility of our second app. When the slowness is going somewhere made to be slow, it feels much better.

In fact, we can use our “keep the fast app fast” and “keep the slow app slow” mindset in tweaking our thread counts in each app. For a main app we recommend three Puma threads. That’s Rails’ new standard and proves to be an excellent tradeoff: increased throughput with a reasonable, low tail-latency increase (especially after you move all of the slow requests to the second app!). That said, we recommend deliberately choosing a higher thread count on the second app. Maybe five, maybe six. Your mileage will vary on specifics, but when we design and spin up an application specifically to handle our slower (likely I/O-bound) requests, especially when we aren’t as worried about response times, we can really leverage the power of a large thread pool. This should allow us to keep our instance-count low — a single server instance running five or six threads should be able to handle quite a bit of stuff!

Autoscaling Two Applications

Finally, the last major topic to cover for scaling sideways is indeed autoscaling. First, you should use Judoscale (👋). Okay, obvious plug aside, there’s a little nuance here: you’re going to want both apps to autoscale. But they’re going to do so with different parameters and goals.

Main app: now that variance is down and your endpoints are consistently performant, we’ll want to clamp our queue-time thresholds a bit tighter. The target is a flat, boring queue-time line very near zero. In Judoscale, you should see low enough metrics that an upscale threshold between 5-10ms feels very stable and scales nicely with your actual traffic curves (not erratically)!

✅ Tip

If your app has burstable traffic loads at known times, you should still define a schedule for your autoscaling. If it has burstable traffic loads at unknown times, consider autoscaling to guarantee a certain level of headroom.

Second app: still scale on queue time but expect volatility and small spikes that self-resolve. We’d recommend a moderately high upscale threshold like 80ms as well as reducing upscale sensitivity to 20 seconds so brief jitters don’t cause thrashing (AKA ping-pong scaling, which we discussed here). We want to upscale when necessary, but wait a moment to be sure that upscaling is, in fact, necessary.

So, all of that to say, queue time is still absolutely the metric to watch for scaling on both applications. And Judoscale is still absolutely the tool to use. But refining our scaling parameters for each app in their own context is the real path to success here! We want tight bounds and strict expectations on the main app with looser, workload-aware settings on the second.

A Note on Cost

To address the potential elephant in the room: scaling sideways this way may cost a little more in your overall hosting bill. That’s true. But keep in mind that our first goal here was to optimize and speed up a sector of an application without refactoring the whole application. This is a “Can we throw money at the problem?” solution.

But there’s actually better news: it’s likely that this strategy won’t actually cost much more than your base hosting level now. Remember that the main app is likely going to run fewer instances the more surface area you move away from it. That’s savings. And the second app should make broader use of multi-threading, so it too may need fewer instances than you expect. That’s cheap!

At the end of the day, snappier user experiences and conversions tends to yield more sales, and more sales means you probably have a little more space in your hosting budget. We’re not advocating for going wild here — you should still autoscale both applications to keep things efficient — but this strategy is a reasonable cost-path forward for powerful performance gains.

Scale Sideways

We started with a simple ask: “optimize the public site for SEO”, and a familiar constraint: one app serving very different kinds of traffic. That’s why we reached for the often-overlooked move of running a second production app. It squarely addressed this customer’s need: keep the public face fast and predictable while letting portals and internal tools be as spiky and complex as they need to be. We should know, we do the same thing (though not for SEO purposes)!

The path there doesn’t require a big‑bang migration. Stand up the second app, put guardrails in place, and move traffic in slices. Begin with front-end calls, shift over some API action, then gradually migrate entire user-portals when you’re confident in your URL sharing… all while feature-flagging shifts to build confidence.

What you get for that incremental effort is real performance gain with little added domain complexity or cost. The main app’s thread pool narrows to the fast paths, queue time flattens, and p95s stop lurching. The second app absorbs the messy variance without leaking it into the public experience. Same codebase, two runtimes, each excellent at a different job. If your intro sounds like our customer’s (“we’re optimizing public performance for SEO”), or ours (“we really ned to optimize our API for throughput and reliability”), this is the strategy that keeps the promise without rewriting the product or doubling your spend.