Visit the New Academy: Learn the Basics with Steadybit 101

background - stars background - stars

Chart the reliability of your applications

Reveal and validate reliability risks before they impact performance and frustrate customers

Chaos Engineering & Reliability Testing Platform

Ready to get started? Book a Demo →

steadybit platform workflow
Take a tour

TRUSTED BY COMPANIES WORLDWIDE

mano mano logo
stackstate logo
salesforce - logo
kaizen gaming logo
mercado libre logo
valliant group logo
rewe digital logo
mano mano logo
stackstate logo
salesforce - logo
kaizen gaming logo
mercado libre logo
valliant group logo
rewe digital logo
mano mano logo
stackstate logo
salesforce - logo
kaizen gaming logo
mercado libre logo
valliant group logo
rewe digital logo
steadybit integrations - logo gallery

Test system resilience proactively with controlled experiments

Steadybit is reliability platform that helps teams assess and improve the resilience of their services. With automated issue discovery and controlled experiments, you can find and validate system weaknesses before they become outages.

Unlike other chaos engineering tools, Steadybit uses an open source extension framework to quickly connect to popular tools across your tech stack. Need a custom integration? You can easily add it yourself using our Extension Kits.

We’ve supported both SaaS and On-Prem deployments since Day 1.

Image

Validate Monitoring Alerts

Run scenarios to check your alert coverage and accuracy

Image

Reduce Reliability Risks

Catch reliability issues and fix them before they reach production

Image

Resolve Incidents Faster

Train your team to be able to handle any incident quickly

Build experiments with no-code actions & templates

Drag-and-drop actions into the Steadybit experiment editor to create new reliability tests and iterate quickly.

Network
Kubernetes
Cloud Services
Physical & Virtual Hosts
Applications
Observability
Image

Delete Pod

This attack allows you to delete one or multiple pods to test the resilience of your application.
Image

Cause Crash Loop

This action continuously kills specified containers in a selected pod.
Image

Rollout Restart Deployment

Simulate the rollout of a Kubernetes deployment using a kubectl command.
Image

Pause Docker Container

Run this action to pause one or more containers for a certain amount of time.
Image

Taint a Node

Use this attack to taint one or multiple nodes for a given duration.
Image

Drain Node

Use this attack to drain one or multiple nodes and check performance degradation.
Image

Stop Container

Check the exit behavior and restart process by terminating one or more containers.

Explore the Action Library

Browse open source actions that you can easily add to experiments.

explorer view in steadybit

Explore and select targets for experiments

When you install our agent on your network, Steadybit will automatically discover any potential experiment targets and pull in related metadata from your testing environment. Our intuitive query language makes it easy to group and filter your targets however you want.

reliability advice in steadybit

Get advice on what experiments to run first

To help you get started fast, our Reliability Advice feature will provide you with insights on if there are any common reliability issues detected.

You’ll see instructions on how to fix any issues in your code, and then we’ll recommend which experiments would be valuable to run next.

steadybit experiment editor

Design, customize, and run experiments

Design full experiments in seconds using templates for popular use cases and our drag-and-drop editor. With our open source framework, you can easily add custom actions and extensions to run any type of experiment you want.

Once you’re happy with an experiment, you can automate your test executions with the Steadybit API or CLI.

Why SRE & platform teams choose us

Our customers inspire us everyday with new experiment types and custom extensions to really push their systems to the limit.

  • Salesforce logo featured on a blue cloud.
    "With Steadybit, we identified issues and corrective measures, improving our overall system resilience. The efficiency of finding these weak spots has vastly increased with Steadybit, and the time to deliver a solution has significantly decreased. We're moving closer to achieving our target of 99.99% uptime."

    Krishna Palati

    Director of Software Engineering

  • rewe digital logo
    “Steadybit makes it easy to inject faults and really test our system reliability. Their team delivered a new Kafka extension for us that has unlocked new testing possibilities. They are a supportive partner that has made introducing the platforms to new teams easy.”

    Jan Rundshagen

    Cloud Platform Engineer

  • Image
    "Steadybit is helping us move from reactive incident handling to proactive reliability engineering, which is a significant shift for an organization of our size. The Steadybit team is highly responsive, technically strong, and genuinely invested in our success."

    Ilias Tsakiridis

    Site Reliability Engineering Team Lead

Shift to a proactive reliability approach

Measure your current reliability posture and validate system behaviors with controlled tests.

reliability advice - explorer

Reliability Advice

Automatically detect vulnerabilities

Assess whether your targets are compliant with reliability best practices.

steadybit experiment editor

Experiment Editor

Run actions with a timeline-based editor

Start quick with templates for common use cases or build fully custom tests.

assigning teams to environments - 5

Assign Teams & Roles

Set guardrails & fine-grained permissions 

Define access and permissions for users to ensure safe testing.

Extend Steadybit to perfectly fit your systems

To get started, you will need to install the Steadybit agent on your network and add any of our open source extensions that match your tech stack. Then, you can use the Steadybit platform to view targets, design experiments, and run tests.

FAQs

Evaluating chaos engineering tools? Here are the most common questions we get from teams.

Can we deploy Steadybit in On-Prem or air-gapped environments?

Yes, of course! From Day 1, Steadybit has offered SaaS and On-Prem deployment options with full feature parity. Install the control plane and extensions in any environment seamlessly and start improving your reliability.

To learn more about our On-Prem support, you can read the installation details here.

How can we evaluate Steadybit to see if it's right for us?

If you’re not sure the best way to get started, a quick call with us can be helpful. We can answer technical questions you have and guide you on what we’ve seen work the best. You can schedule that here.

If you want to get into the platform and start playing around right away, we offer a free 30-day trial. You can either install agents and extensions directly on your systems or use our provided sample data to see how each of our features work. Sign up here.

If none of these sound right, just fill out our contact us form and provide us with more info. We’re here to help!

How do we add custom actions and extensions?

Steadybit is the most extensible reliability platform because it has a hybrid architecture that supports open source extensions.

Our ExtensionKits enable you to add custom actions, templates, targets, advice, and extensions. Write in your preferred coding language and start to customize Steadybit to fit your specific use cases and tech stack.

How does Steadybit automatically detect reliability vulnerabilities?

Our Reliability Advice feature continually analyzes all of your discovered targets and checks whether they are compliant with the best practices outlined in the “Advice” settings.

When you get started with Steadybit, there are 13 Advice checks out-of-the-box based on the best practices outlined by the open source tool, kube-score.

If you want to add checks based on internal standards or other best practices, our AdviceKit provides instructions on how to write your own custom Advice.

What prevents experiments from causing unintended damage?

To start, we have RBAC user permissions that let you limit the actions and targets that users can interact with. Group targets into defined testing environments and assign only the relevant teams to ensure least privilege access.

When designing experiments, you can select a blast radius for your targets. For example, you could specify that you only want to target 10% of the pods in a cluster. This is an easy way to ensure that your experiments start small with limited impact.

Before an experiment runs, you can configure pre-flight webhooks. These customizable checks allow you to ensure that all conditions are ready for your experiment to begin running.

When experiments are running, anyone in your organization is able to hit the “Emergency Stop” button. This will immediately rollback changes and ensure that you can respond fast.

With all of the features, you can set up controls and guardrails to enable experimenting with confidence.

Want to learn more?

We’re here to answer any questions you have!

Get a Personalized Demo

Ready to hear more about Steadybit?

Schedule a demo with our team to see a platform walk-through and get your questions answered.

cta ufo