From “Testing Everything” to “Testing on Demand”

From “Testing Everything” to “Testing on Demand”

Hi, I’m Mykola Panasiuk , a QA Team Lead at MOJAM. Our product is a platform for esports athletes and CS2 players with over 5.7 million users from 168 countries worldwide. Interactivity, responsiveness, and system resilience are our top priorities.

I’m responsible for building and implementing Quality Assurance processes at all levels of the company, leading and managing the QA team, functional test automation, as well as developing testing frameworks and tools.

In this article, I’ll share our transformation journey from a chaotic “test everything” approach to a structured “test on demand” model since I joined the company. I’ll talk about the initial state and the results we achieved. And no, there won’t be a single word about artificial intelligence or how it started doing all the work for us. 

This publication will be equally useful for members of development teams (QA and developers alike) as well as managers at all levels. If you want your QA team to actually focus on Quality Assurance activities rather than just testing tickets, you’re in the right place. Let’s go! 

How It All Started

In May 2024, I basically “talked my way into” an interview at MOJAM (hi, Roman Lepetyuhin👋). Even back then, the company’s CTO asked for my opinion on how to raise the level of ownership over one’s work, reduce testing overhead, and speed up code delivery to production.

In other words, how to make developers test instead of QA, remove one “extra” link from the process, and, most importantly, understand why and when that link actually becomes excessive. 

And of course, there was the question of whether I was ready to build such processes in the company myself (to which I said yes).

The very idea of developers testing their own code without a QA specialist (read as a “tester”) is not new. There is a well-known book in the QA community, How Google Tests Software, which describes how Google has been successfully releasing an endless number of products for many years without having what we traditionally call QA Engineers on staff.

Yes, some readers might point out that Google does have Software Engineers in Test (SDETs). However, in essence, these are still developers who are just working at a neighboring “workbench.” It was largely because of this book that QA specialists started being scared that their profession would soon die out and that everyone urgently needed to learn how to code. As we can see, that extinction didn’t happen 🙃

State of Affairs at the Starting Point

Just like in classic school-level physics problems(ah… what great times those were, when the grass seemed greener), we have our initial conditions:

  • Several product teams (3-4 developers, a designer, and a product owner per team)
  • A separate DevOps team
  • A separate QA team (a few General QAs, mostly manual), which had been working without a team lead for quite some time
  • Development based on the Trunk-Based Development approach with Feature Flags (FF)
  • A standard development pipeline: “task definition → code written by a developer → task tested by a QA engineer → release to production”

Yes, at that time, the QA team was effectively operating as a “service” team (albeit unofficially), and product teams did not have dedicated QA engineers. And all of this was in a product company, not in outsourcing. One relatively small QA team serving multiple product teams automatically meant a high workload and a constant flow of tasks.

And it’s probably already obvious where that very bottleneck often appeared. A regular QA engineer simply didn’t have time to think about any kind of “higher-level” Quality Assurance activities. Add to this an outdated and not very reliable UI automation testing framework (Python + Selenium), the lack of clear testing processes suitable for Trunk-Based Development, as well as a large amount of squatting and manual work required to perform a single release (believe me, there really was a lot of it).

So there were more than enough challenges for me and the team to tackle.

Shaping the 'QA Required' Concept, or “Testing on Demand”

It was obvious that something had to change, or at least that we needed to choose a clear direction. There were two options: either scale the QA team both quantitatively and qualitatively, or optimize the processes. As you can tell from the article’s title, we chose the process optimization path. This initiative was called 'QA Required' or “testing on demand.” If you want to reach a goal, visualize it.

That said, I should admit that later on we still hired one additional QA engineer, and that decision was justified: the number of developers grew, as did the number of contexts that needed to be kept in mind.

So what were we aiming for? If not to completely remove it, then at least to significantly reduce the “extra” step of testing every task in the release candidate—in other words, to simplify the pipeline (Fig. 1).

SDLC pipeline simplification
Fig. 1. SDLC pipeline simplification

We also needed to:

  • Reduce the overhead of context handoff between a developer and a tester within a specific task. If a task is simple (hi, decomposition) and the developer is capable of testing/validating it on their own, it is simply much “cheaper” than involving another person. Even if the time spent by the developer and the QA is roughly the same, we still win in terms of communication. Yes, one could argue that a developer’s time is usually more expensive. But the key question is different: do we win in the long run, and do we speed up product delivery?
  • Reduce the number of bugs being “thrown over the fence.” This point is inseparably linked to the previous one. Logging bugs that you notice in the first two minutes of testing a task requires a significant amount of a QA engineer’s time (simply clicking through Jira bug report forms), time that could be directed toward more constructive work. So why shouldn’t the developer spend those same two minutes testing the task and avoid the bureaucracy altogether?
  • Add more responsibility for one’s own work (read as “raise a development culture”). When both a developer and a QA engineer work on the same task, the boundaries of responsibility become blurred: it’s harder to understand who is ultimately accountable, and the level of uncertainty increases.
  • One might object that similar logic exists in product–developer collaboration as well. Responsibility between them is also not perfectly clear, but that doesn’t mean the product manager should write code themselves. What matters here is not to fall into extremes, which is exactly what I ask of the reader. I believe it is necessary to separate task definition from task execution.

In short, our goal sounded like this: “Achieve higher speed and efficiency without sacrificing quality.”

Yes, this goal can be achieved in other dimensions as well. For example, by introducing various code quality control tools in CI/CD, improving the Code Review process, or integrating AI agents. But I want to emphasize that this article describes work specifically in the Quality Assurance domain. One does not preclude the other.

Preparation for Implementing the Concept

I think it’s obvious that you can’t just walk in and declare that, with a wave of a magic wand, developers will start testing tasks instead of QA from today on. So, we first had to define the key elements and stages needed to launch the 'QA Required' concept.

We identified the following items that had to be implemented or refined:

  • A proper testing pyramid
  • Reducing bureaucracy or automating the release process
  • Adapting processes and tools to a concept that was new to us

Next, I’ll go through the key details of our preparation.

Implementing a Proper Testing Pyramid

At the starting point, what we had was more of a testing trapezoid than a pyramid. Unit and integration tests were written by developers, and things were in fairly good shape. On my side, I added Code Coverage metrics collection with visualization in Grafana and also exposed test coverage results directly in Merge Requests.

The top of the pyramid, which is usually the responsibility of QA engineers, was not in a critical state, but there was still a lot of work to be done.

So what exactly was wrong with the top of the testing pyramid? First of all, it simply wasn’t ready for rapid scaling and was barely keeping up with the pace we had at the time. If we talk about specifics, there were two main problems:

  1. Unreliable E2E (UI) automated tests
  2. Uncertainty and chaos in testing new functionality

New E2E (UI) Automated Tests

Here are a few issues we had with the old framework and the existing E2E test suite:

  • Long execution time for a single run (10-15 minutes per run for ~60 tests)
  • No built-in support for horizontal scaling in the framework itself
  • A lot of hardcoding, high coupling, and minimal use of dynamic data generation
  • A large number of flaky tests that were “fixed” by retrying everything
  • An incomplete and inconvenient reporting and logging system
  • Very basic CI/CD integration
  • The framework was developed in parts by many different people
  • A lot of dead code
  • Use of Selenium WebDriver instead of Playwright (forgive me, old-school fans)

There were more minor issues as well, but I think the overall picture is already clear. As a QA Team Lead, I had the authority to decide what to do next: go for a deep refactor or rewrite everything from scratch. Surprise-surprise, I chose the second option.

Two factors influenced this decision:

  • My primary programming language is C#. I’ve been through fire and water with it, so I was ready to quickly and efficiently build a solid base architecture in C#. The cost of moving away from Python was minimal 🙈
  • I didn’t want to work with Selenium WebDriver anymore, because I had already tried Playwright

So I sketched out a high-level architecture (Fig. 2), wrote a PoC, defended the solution in front of the CTO (thanks here for the trust), and within three months we had fully migrated to a new framework with rewritten tests.

High-level architecture of the new testing framework
Fig. 2. High-level architecture of the new testing framework

As you might have guessed, all the drawbacks of the old framework were turned into advantages in the new one. The main achievements include:

  • Test execution time reduced to ~3 minutes 30 seconds for 80 tests (Fig. 3). The tests themselves run for about 2 minutes 30 seconds. The remaining time is overhead, which we tried to minimize.
  • Added horizontal scaling and significantly improved stability (Fig. 4).
  • Created the best reporting system we could :) (Fig. 5—a test was deliberately failed to showcase the reporting).

Testing time reduction
Fig. 3. Testing time reduction
Horizontal scaling of tests
Fig. 4. Horizontal scaling of tests
Test results reporting system
Fig. 5. Test results reporting system

A big advantage of the new reporting solution is that all artifacts are hosted on AWS S3 storage, and we can access them directly via URL. The Playwright Trace Viewer, which we host on our own infrastructure, automatically loads the ZIP archive, parses it, and allows us to work with it directly in the browser. No need to download anything to the localhost or share files in work chats.

I hope I’ve convinced the reader that we accomplished the first part of our plan properly. By the way, I didn’t use BDD (SpecFlow or Cucumber) because I’m not a fan of having an extra textual abstraction layer. Now, let’s move on to the next stage.

Creating “Clarity”

How can you push for accountability if there’s still a lot of work left in your own team? The biggest issue for the QA team was a lack of clarity. Everyone was responsible for everything and, at the same time, for nothing in particular. Being pulled out of context (or the so-called workflow) was routine.

Responsible QA. The first thing we did was introduce the concept of Responsible QA. It related to everything: from managing releases to handling requests from the support team or teammates asking for help. The same approach is applied to testing new functionality (i.e., large change blocks structured as Epics in Jira). This gave us a clear answer to the question: “Who is responsible for what?” We no longer wasted resources on unnecessary communication, didn’t wait on each other, and stayed in context. Of course, we also started sharing this approach with others.

Traceability Matrix. Next step. Initially, it was quite hard to answer the question, “What and how are we testing within an Epic?” There was no clear strategy, tasks were often tested without considering context, and work was duplicated at the final stage of new feature development. We would take the technical specification and check on the fly what had been done, what was missing, or what was implemented incorrectly.

As a lead, resolving this was critical for me. We addressed it by implementing a tool called the Traceability Matrix (fig. 6).

Traceability Matrix example
Fig. 6. Traceability Matrix example

Before my reader reaches for the keyboard, I agree with you—this is more like a good old checklist on steroids. But I was genuinely inspired by classic Traceability Matrices when I created this Google Sheets template. That’s why we kept the name. We create such an artifact for each Epic at the early stages of developing new functionality. Our sources are:

  • The technical specification
  • The design document (technical solution from the development team)
  • UI/UX from the design team

It’s also worth explaining why I chose Google Sheets as the tool for creating and maintaining these matrices, instead of a “monster” like TestRail or our existing Allure TestOps. And it’s not about money. Here are my arguments:

  • Extremely low barrier to entry, it is practically accessible to everyone. For example, try inviting a developer or a product manager to log in to TestRail. How long will they stay there, and how often will they actually check it?
  • Very powerful functionality, always on the cutting edge.
  • Easy to administer: just a corporate email, and you’re good to go.
  • Uncluttered interface, everything is right in front of you.

We’re not launching rockets here, nor running a legacy e-commerce with tons of corner cases and scenarios that would require meticulous documentation and hundreds of test cases. The pace of our development would have killed all our test cases in a matter of weeks, and the cost of keeping them up to date would be too high. That said, we still maintain test documentation in Confluence, describing complex functionality for our own reference in the future.

And now we’ve reached a stage where we’ve put things in order “at home” (at least, I really want to believe that). We turned the trapezoid into a proper testing pyramid. I also invested a lot into the technical growth of my QA team, but that’s a whole other story, so let’s not get distracted and move forward.

Reducing Bureaucracy or Automating the Release Process

Honestly, I lied a bit here. I wasn’t able to reduce bureaucracy. In fact, my changes probably even made it a little worse. Sure, you could just create tickets, push them to production, and not worry about anything else. But when questions about metrics, analysis, post-mortems, reporting, and so on come up, all this bureaucracy suddenly feels very necessary. It kind of sets the “rules of the game,” keeps things in check, and maintains discipline. To clarify, I’m talking about the part of bureaucracy that arises immediately after a developer finishes a task, merges it into the master/main branch, and the code reaches the test environment. From there, all the delivery work was handled by the QA team.

Now, for you to have a complete picture, let’s look at the main actions that Responsible QA performed for each release:

  • Manually creating the release in Jira
  • Manually adding Fix Versions for each ticket
  • Manually updating ticket statuses on the Jira board
  • Manually running E2E tests and sharing the results
  • Manually testing all tasks before deployment to production
  • Manually notifying about release stages in Slack, in the so-called Release Candidate Message
  • Manual production update
  • Manually posting release notes (changelog) to a dedicated Slack channel
  • Manually updating ticket statuses in Jira and closing the release

One might think, “Well, okay, it’s not that bad.” If so, I suggest taking

a look at the release dynamics (Fig. 7).

Development dynamics
Fig. 7. Development dynamics

I wasn’t mentioning “manual” work in the list above for nothing. Now, let’s try to compare the workload and the amount of manual work for the QA team with this schedule. Three to six releases per day, 10–50 tasks. Not so easy and fun anymore. And I won’t even start on the number of errors and delays a person can make under such a load.

Yes, we still couldn’t completely eliminate manual work, and probably never will. But we did manage to significantly (actually significantly!) reduce its volume.

So, what did we do? We automated all the routine steps in the release process, from automatically creating releases in Jira to posting release notes in the Slack channel.

The QA team acted as the “client” here and created the technical requirements—that is, the specifications for the automated release process. Of course, the requirements went through all approval stages before being implemented. As of now, approximately 80% of the initial scope has been implemented. The crown jewel of this effort is the Release Candidate Message in Slack for each release (Fig. 8).

Release Candidate Message on Slack
Fig. 8. Release Candidate Message on Slack

This way, we track and automatically record every step of the release—no more manually moving tickets across statuses on the Jira board. The planned changelog is immediately posted in the release message thread, along with a series of automated updates as the process unfolds. Behind the scenes (in Jira, in GitLab), everything is also automated and synchronized with what we see in Slack.

To be honest, at the time we launched the 'QA Required' concept, the release process was not yet automated. The technical requirements had been drafted, and the work had just begun. However, after a short while, we received the first and most important portions of automation. It was a futures.

Adapting Processes and Tools

The final part of our preparation for implementing the 'QA Required' concept focused on processes and tools.

We use a fairly standard project management tool for the IT industry—Jira. So it’s no secret that 70-80% of our processes revolve around it, and there’s no need to go into detail about other tools. Hence, Jira was the tool we needed to adapt first.

We formed a working group (actually, several at different times), whose task was to align all the details about who tests what and when. I won’t bore you with all the rounds of discussion, so here are the final agreements.

In Jira tickets, we added two new fields:

QA Required

🟡 QA: yes

🟤 QA: no

Testing Status:

⬜️ not tested

🟪 in progress

🟩 passed

🟦 safe

🟥 failed

You may have seen this set of emojis earlier in the Release Candidate Message thread. Why emojis? Because they make it easy and quick to navigate the process (Fig. 9). The vertical color palette on the cards reflects the Testing Status.

Adapting Jira to QA Required
Fig. 9. Adapting Jira to "QA Required"

Each ticket had a default state: 🟤 QA: no + ⬜️ not tested. That is, by default, every individual task as a unit of change (this is a very important clarification!) is tested by the developer. However, the developer has the right, for a specific task, to “require” testing by a QA engineer (hence the name 'QA Required').

Above, I have already described our outlook on what falls under “🟡 QA: yes” and what does not. Thus, at the stage of forming the Release Candidate (status Frozen for us), we already know which tickets have been tested or validated by developers, and which require QA testing.

It’s also worth saying a few words about the two values of Testing Status:

🟩 passed—the final value, indicating “the task is fully tested and requires no further attention”;

🟦 safe—also a final value, indicating “the task is safe to deploy to production” (here it is important to remember the work with Feature Flags). In practice, such a task usually has “deferred” testing, but not as a separate unit of change, rather as part of a full new feature. For example, an early task from an Epic that the Responsible QA will verify within the Traceability Matrix.

Technically, 🟩 passed and 🟦 safe carry the same weight, but for us humans, it’s easier to work with them this way.

For the sake of contrast, how was it before? Practically all tasks went through the hands of a QA engineer, whether it was necessary or not. It’s also worth noting that we had a label called qa_skipped. The essence is in the name itself: a task with this status could safely skip QA testing. You might notice that this is essentially the same idea we implemented, but in the opposite way—and you would be right.

And here’s my answer… it simply didn’t work. There were no clear agreements, the label had to be added manually, and it was often neglected. Moreover, at that time, the QA team itself couldn’t provide better “safety guarantees.” And so everything got stuck.

All our agreements were documented in Confluence after several rounds of discussions, and we were ready for implementation.

Launching 'QA Required' and Results

Each member of the development team received a calendar invite for a general kickoff meeting introducing the new 'QA Required' concept. Everything was ready on my side:

  • The plan was approved by representatives of the management team
  • The audience was “warmed up” via the back-end and front-end team leads
  • The QA team was briefed during the prior discussions
  • Documentation was written with detailed explanations of all aspects
  • All steps I described earlier were completed, except for the full automation of the release process

However, you can’t prepare for everything 🥲. A few clarifying questions came from the “audience”—this was the denial stage. After the presentation, we had several discussions in personal chats (which was great, as it showed people’s engagement!), and a positive decision was reached. Everything moved toward the inevitable—acceptance.

The biggest fear or risk people highlighted was roughly: “I’m used to being backed up, someone else tests for me, there’s another point of view…” The most important message to developers was: “You are not losing a QA engineer. You have the right to request their help, but do so consciously and effectively. Then you yourself will be more effective, your team will be more effective, and your company will be more effective.”

We didn’t impose the process from the top down in a totalitarian way. We listened to people and made targeted adjustments here and there. In fact, the concept itself was built based on team feedback before implementation (I literally rushed to teammates to gather their previous feedback and wishes). Development team leads represented their respective areas.

And so, we started. The launch went much more smoothly than I expected: a few small tweaks, chat discussions, and work was in full swing.

First results looked like this:

First month:

🟡 QA: yes — 26%

🟤 QA: no — 74%

Second month:

🟡 QA: yes — 35%

🟤 QA: no — 65%

Overall, we consider this to be a good start. We didn’t set a target ratio for these metrics. Instead, we decided to evaluate results retrospectively.

Initially, statistics were collected manually by analyzing Jira requests. Later, the QA team developed a service for collecting Quality Metrics, with Jira as one of the data sources (one of the charts was already shown above in Fig. 7). Now we can evaluate how our concept works over time and respond to events promptly (Fig. 10). The service itself was fully launched at the beginning of September.

QA Required Dynamics
Fig. 10. 'QA Required' Dynamics

Overall, we’ve been living with this concept for over 8 months now. Over this period, it has proven itself and shown real effectiveness. We reached 34% “🟡 QA: yes” during this time.

Okay, you might ask:

“How has the release pace changed?”—it increased from 40-50 to 50-70 releases per month

“How has development throughput changed?”—it rose from 200-250 to 250-350 tasks per month

“What about bugs—did quality drop?” no, we didn’t lose quality. We have our own quality coefficient (number of high-priority bugs in production / number of released tasks * 100%). Compare:

  • Six months before implementation: min 2.86, max 4.78
  • Six months after implementation: min 1.15, max 4.47
  • Last two months: 1.85 (Fig. 11). Reminder: live metrics have been collected only over the last two months, so the metric started from zero.

“Did you survey developers anonymously to assess the new concept?"—yes, we conducted an anonymous survey a few months after launch. We asked two questions:

  • Your evaluation of the effectiveness of QA Required?—8.7/10
  • Your satisfaction with QA Required?—8.0/10

“Are you sure your metrics improved solely because of ‘QA Required’?”—of course not. Not everything is solely due to the new concept. We operate in an open model with many input parameters, each influencing the final result in one way or another. For example, I’ll be honest: a major factor in the increase in development throughput (releases + task count) was the automated release process and the correct testing pyramid. Nevertheless, I consider them an integral part of our concept.

Development quality dynamics
Fig. 11. Development quality dynamics

These are our results. If I haven’t anticipated any of your questions, feel free to ask them in the comments. It’s time to wrap up the article and move on to the conclusions.

Conclusions & Afterword

We didn’t start a revolution, nor did we “hack” the world. Perhaps we even invented our own wheel. But we built it carefully—for ourselves, not for sale. That makes it enjoyable and safe to roll.

Yes, in the end, we test everything, but no longer multiple times like before. Everyone tests within their own zone of responsibility, at their own level. We eliminated everything unnecessary. That’s what allowed us to achieve the sense of “lightness” we now enjoy.

The article would be criminally incomplete if I didn’t emphasize that I am not a lone D’Artagnan who did everything myself. These were collective discussions, our shared work and achievements. A huge thank you to Dmytro O., Serhiy, Ihor, Yulian, Pavlo, Roma, Kostyantyn, Mykyta, Dmytro K., Tymofiy, and everyone else who contributed. And “Merci beaucoup” 🫰 to my Anastasiia!


Ніколи не думав, що маю втомлюватись. Бувало, звісно. Втомлювався, ловив паніки через надмірне навантаження. Але то було, бо хотів заробити більше грошей, а не задовбатись 😁

  • No alternative text description for this image
Like
Reply

Do you support The Imagine project?

Like
Reply

To view or add a comment, sign in

More articles by MOJAM

Others also viewed

Explore content categories