Stories by AutoExplore on Medium

Fastest open source Git GUI: Why We Built GitComet

AutoExplore — Thu, 19 Mar 2026 11:05:24 GMT

GitComet in action: browsing Linux repository and opening Chromium repo

How frustration with huge repositories, deep histories, and hundred line diffs turned into a new Git GUI.

To be fair there is no shortage of Git GUIs to choose from. On small repositories, many of them work great. But once we pointed them at massive codebases and large diffs, the experience changed fast; and not in a good way.

Some tools froze. Some consumed gigabytes of memory. Some turned simple code review into a system stress test.

We were looking for something fairly basic: a Git client that was free to use for both personal and commercial work, didn’t stray too far from a familiar workflow, and did not make the computer unresponsive or consume gigabytes of memory while opening history or reviewing changes.

We tried SmartGit, GitKraken, and several others. But after a few days of testing, we reached an uncomfortable conclusion: the tool we wanted did not really exist.

The problem was not that every existing product was bad. Some were polished. Some were powerful. But for our taste, they either pushed workflows that felt strange, or they stopped being responsive once the repository or diff got large enough.

Could we build the tool?

Around the same time, we had been trying out Zed. It stood out immediately because it felt fast, reponsive and visually clean. Zed runs on their own UI framework called GPUI and we decided to see whether GPUI could serve a Git GUI built around the same principles.

It was a good decision, but it also taught us something quickly: a fast UI framework alone does not make a fast Git client. You can build the most responsive shell but that alone does not really matter if the Git layer underneath it does not scale.

That is where gix/gitoxide comes in. We tried to use it wherever possible because it had already optimized many of the Git operations that matter in real use: cloning, fetching, status checks and reading / writing git information. When a required feature was not yet supported, we fell back to the local Git installation.
This hybrid approach allowed for performance without giving up compatibility.

Designing for scale

The the first problem was large Git histories.

Rendering huge commit lists naively was painfully slow. The solution will sound familiar to web developers: virtualization. Instead of rendering the entire history, GitComet renders only the visible area and reuses UI elements as you scroll. As a result, it looks like one long list, but the application only does work for what is actually on screen.
This solved part of the problem.

The next big hurdle was large file diffs.

One of our stress tests was a 50 MB HTML file with roughly 500,000 lines and conflicts on nearly every line. This test was designed to expose every weak assumption in our diff viewer. Diffing is expensive. Syntax highlighting is expensive. Rendering is expensive. And doing all three naively is a good way to crash.

In early iterations, cases like that crashed the application due to out of memory exception.

The turning point came when we noticed that Zed could open very large files without completely falling apart. We looked into the how and streaming kept coming up as the key idea. So we pushed that idea through the whole pipeline. Not just file reading. Streaming the diff. Streaming syntax highlighting. Streaming content. Streaming rendering. The important lesson was that partial streaming is not enough. If one stage still assumes it can fully materialize everything in memory, the bottleneck just moves somewhere else. To make huge differences workable, the entire pipeline has to respect streaming from start to finish.
After a few iterations, it started to work.

Why GitComet Exists

We did not build it to be a Git GUI that looks good in a screenshot but struggles when the repository is real. We built it for the a moment when the history is deep, the diff is ugly, the files are large, and you still need realiable tool.

GitComet is our attempt to make Git tooling fast, familiar, local-first, open source, and free to use for both individuals and organizations.
We are now bringing GitComet to the public in its early form to gather feedback and ideas.

We know there is still a lot to improve, and we are very much at the beginning. But the core idea is clear: speed is a feature, and Git tooling should never freeze.

If this problem sounds familiar, we hope you will try GitComet and tell us where it helps, where it breaks, and what we should build next.

Website: https://gitcomet.dev/
Repository: https://github.com/Auto-Explore/GitComet

Hidden Performance Killers in Axum, Tokio, Diesel, WebRTC, and Reqwest

AutoExplore — Wed, 08 Oct 2025 21:01:52 GMT

I want to emphasize that all the used technologies in the article are great, and the performance issues were caused by my own code on how I integrated them together.

This month, I have been hunting mysterious performance issues in the AutoExplore stream functionality, and what I found surprised me.

I’m writing this article to share my findings with the community to make software better for everyone.

It all started innocently enough…

The Setup

I had built a custom screencasting pipeline:
Chromium WebRTC (video producer) → Rust WebRTC fan-out server → WebRTC browser (viewer).

It worked beautifully.
Compared to Chromium’s default image-based streaming, latency dropped from seconds to milliseconds.
Perfect, right?

Well… almost.

The First Symptom: Random Black Screens

Everything looked smooth until the WebRTC stream started disconnecting randomly.
To users, this appeared as an occasional black screen.

Digging into the issue, I discovered the WebRTC protocol has a “NACK” / PLI feedback mechanism. The idea is simple:
When a client loses a keyframe, it sends a Picture Loss Indication (PLI) request to the producer, asking for a new one.

I added PLI support end-to-end viewer → server → producer and it helped.
The black screen now recovered automatically.

But I wasn’t satisfied. I wanted to eliminate the black screens entirely. It’s not a good user experience if there are black screens in the middle of a stream.

Scaling Up… and Breaking Again

As I increased the number of viewers, the problem returned, worse than before.
This time, even PLIs couldn’t recover the stream.

The chromium producer logs revealed this line:

Timeout: No RTCP RR received.

That led me to Chromium’s source https://source.chromium.org/chromium/chromium/src/+/refs/tags/142.0.7393.8:third_party/webrtc/modules/rtp_rtcp/source/rtcp_receiver.cc;l=294-300, where I learned that RTCP Receiver Reports (RR) are crucial feedback packets that inform the sender what data has been received.

At first, I blamed my custom Chromium producer, maybe it wasn’t reading these packets properly.
But after some tests, I realized the real issue was on the Rust server side.

The Culprit: `tokio::time::interval`

The Rust WebRTC receiver was missing ticks and sending Receiver-Reports (RR) too late, in bursts.
That behavior came from tokio::time::interval, which defaults to MissedTickBehavior::Burst.

After switching to Skip, the bursts disappeared.
I sent a PR to webrtc-rs to patch it. 🎉 https://github.com/webrtc-rs/webrtc/pull/745

Progress! But still, under load, black screens persisted.

Deep Dive: Network Layer & Cryptography

I started by optimizing the WebRTC layer:

- Switched to 1-to-1 UDP4 connections where possible
- Avoided unnecessary STUN/ICE servers
- Predefined codec negotiations

Still, no luck.

So I turned to profiling using perf + hotspot and valgrind + kcachegrind.
A huge chunk of time was being spent on HTTP/2 handshakes.

well… it was sort of expected.

Even though this traffic was all internal (Azure VNET), it was encrypted due to zero-trust policies.
After analyzing traces with GPT-5, it suggested:
👉 Try Elliptic Curve (EC) certificates instead of RSA.

I replaced RSA-2048 with EC P-256 — and boom, 50% faster handshakes. Thanks, GPT-5. 😄 (Elliptic Curve certificates are faster than RSA certificates but does not support old clients)

But sadly, the black screens persisted under load.

Mystery of the Slow Reqwest Client

Profiling again showed the reqwest client eating CPU cycles — even though it wasn’t part of the streaming logic!

Turned out, I was creating a new client in few places to handle custom certificates authentication and trust.
After reading the Reqwest source code, I realized that Client is just an `Arc` internally, cheap to clone, expensive to create.

I refactored to use a single shared client, cloned wherever needed.

Result: CPU spikes were gone, handshakes were halved, and the app felt faster.
But… black screens? Still there.

The Plot Twist: It Wasn’t WebRTC at All

At this point, I suspected Azure networking. Maybe packet loss? Maybe VM throttling?
To isolate it, I ran everything offline locally, and it worked flawlessly…
Zero delay. Zero black screens. Even with many viewers.

Then, one late night, out of pure frustration, I started hammering F5 to refresh the browser UI while watching the stream playing perfectly.
And suddenly — BOOM! Black screens appeared locally.

That’s when I caught it in the profiler.

Everything looked familiar — CPU spikes, TLS handshakes, encryption…
But one new name stood out:
diesel::pg::connection::result::PgResult::get

Wait — the database?

The True Villain Revealed

The streaming logic barely touched the database after startup.
But I realized that whenever the DB was busy auto vacuuming, analyzing, or under load — everything else slowed down.

AutoExplore is a write-heavy system; AI agents are constantly generating data.
And Diesel’s blocking database queries were freezing the Tokio worker threads.

That single mistake caused a domino effect:

1. Diesel blocked a worker thread.
2. Axum server couldn’t serve new requests
3. Agent Reqwest client tried new handshakes
4. More encryption, more CPU time spent
5. More database connections waiting
6. Tokio’s pool jammed → missed WebRTC ticks → black screen occurred

All from synchronous DB calls inside an async runtime.

The Fix: diesel_async

I switched to diesel_async and refactored all DB queries and connection handling and it changed everything.

Now, when a query stalls, only that task waits.
The server stays fully responsive.
No more black screens. No more lag. No more mystery.

Results

My custom WebRTC solution now streams smoothly — with millisecond latency — no hiccups, even under a heavy load.

And the side effect?
The whole server became twice as fast.

TL;DR — Lessons Learned

✅ Use one shared Reqwest client — it’s cheap to clone, expensive to create
✅ Prefer EC certificates (e.g., P-256) for faster internal HTTPS/TLS
✅ Never block Tokio threads — use diesel_async or equivalent for async DB access

What started as a simple black screen bug turned into a deep dive across the Rust async stack —
from WebRTC internals to cryptography, HTTP handshakes, and database drivers.

Sometimes, fixing performance killers feels like solving a detective mystery, one trace at a time. 😄

AutoExplore issue reporting

AutoExplore — Wed, 07 May 2025 13:11:29 GMT

Autonomous testing without specified test cases creates a unique challenge for issue reporting. How to present the information about the found issues in a way that it is easy to understand and reproduce without overloading the user with too much information?

Report categories

During test agent execution, the agent tests and analyzes many different aspects of the service under test in parallel. These analyzed aspects include HTML markup, network traffic, rendered user interface, application logs, and many more. As the agent executes, without breaks or pauses, it quickly generates a lot of information about the application.

The main reports view tries to help navigate this issue list by categorizing similar types of issues into groups.

These categories are

Errors Uncaught Javascript exceptions, failed http requests, error logs from browser console.
Audit Browser audit results, form validation issues, client side validation etc.
Performance Performance related audits, network traffic performance, page load times and slow Javascript code.
Deprecations Deprecated API usage, deprecated HTML elements, deprecated CSS properties.
Accessibility Accessibility issues, missing aria attributes, missing alt attributes, low text contrast etc.
Security Security issues, missing security headers, insecure http requests, insecure cookies.
External Issues detected outside the configured application domain.
Reasoning LLM reasoning results from the service under test
Other Issues that do not fit into any of the above categories, for example: Agent getting stuck on a specific view.

Each category can then be opened in a separate view that shows a list of issues that belong to that category. The list of issues is again grouped by the similarity of the issue. This allows counting how many times a similar issue has occurred and when it was first and last seen. Finally, selecting a group opens a detailed view of the issue.

Detailed view

The detailed view shows the lowest level of details about the reported issue. It allows navigating through different variations of the same issue. It is useful to better understand the scope of the issue. If, for example, the same error has occurred on multiple pages, it can be related to a certain widget or a user flow.

Below the occurrence navigation menu, there is a timeline view that shows what the agent did during the selected occurrence and what happened on the application before and after the issue occurred. Timeline view allows navigating the agent steps using screenshots from the application, as well as understanding network race conditions by seeing when the network request started and when it finished. Certain issues only occur when network requests finish in an unexpected order.

Following the agent steps in the timeline view can help to understand the issue better and reproduce it, the steps can be easily followed even without technical knowledge.

Finally, the detailed view shows the information about the issue itself. This information varies depending on the type of issue.

At AutoExplore, we are committed to helping R&D teams implement autonomous testing as part of their development processes. Would you like to see what issues AutoExplore finds from your application? Contact us for a demo to learn more.

Measuring AI Agent Coverage

AutoExplore — Thu, 17 Apr 2025 05:56:36 GMT

Using an autonomous testing Agent introduces a new challenge, understanding what it has tested and what not. This information is needed for analyzing the effectiveness of the agent and identifying which areas of the target software are covered.

Traditional test coverage metrics

Typically test coverage is measured by analyzing the code, requirements or test cases.

Code coverage is typically calculated by analyzing the code and identifying which lines were executed during test runs. Tools instrument the code and keep track of code execution. Common code coverage metrics include branch coverage, statement coverage, function coverage, and line coverage.
Test case coverage refers to how well test cases cover the system features. It is usually measured by mapping test cases to specific user stories using tracking tools.
Requirement coverage measures which requirements are covered by test cases. Requirements can be functional of non-functional requirements.

Autonomous software agent that tests the target software through a user interface does not know about requirements, source code or other test cases. This creates a challenge where traditional code coverage measurements are not applicable for measuring the coverage.

UI coverage

Since the agent interacts with the target software through a user interface by emulating keyboard and mouse events. It means the agent can only test the areas of the software that are visible in the user interface.

We used this gathered information as a basis for measuring the test coverage. All the elements detected by The Eye are presented on a screenshot from the user interface. Due to large number of elements AutoExplore coverage page now shows the elements categorized by URLs one screen at a time.

The elements are colored as follows:

Green The element was interacted by the agent on the current page.
Yellow The element was interacted by the agent on another page.
Gray The element has not been interacted by the agent on any page.

Implementing visual coverage also provided us a way to verify Agent recognizes the elements correctly.

At AutoExplore, we are committed to helping R&D teams implement autonomous testing as part of their development processes. Ready to transform your process? Contact us for a demo to learn more.

Comparing Autonomous Testing to Traditional Methods

AutoExplore — Thu, 27 Feb 2025 11:00:09 GMT

Typical modern software development lifecycle testing includes large amounts of automated testing and some manual testing.
Due to recent advancements in AI and machine learning, autonomous testing is becoming more popular.
In this article, we will compare autonomous testing to traditional testing methods in terms of time to test, coverage, cost, effectiveness, and what challenges each has.

Automated testing

Unit Testing

Unit tests form the cornerstone of all automated software testing.
Software developers write unit tests to test individual components of the software.
Unit tests cover the smallest testable parts of the software, usually functions or methods, UI or backend.
Unit tests are highly effective for regression testing because they run quickly and efficiently. Additionally, when a unit test fails, it is typically easy to diagnose and resolve.

- Time to test: Moderate, Writing unit tests takes time. However, it can be lowered by using gen-AI to generate unit tests.
- Coverage: High, can cover all functions and methods in the software.
- Cost: Low, unit tests are usually easy to write and maintain.
- Effectiveness: High, unit tests are very effective at catching bugs early in the development process.
- Challenge: Writing good unit tests, requires a good understanding of the software and the ability to write testable code.

Api Testing

API tests are used to test the communication between different parts of the software, they typically test the backend and database.
API tests cover higher-level functionality than unit tests, and are still relatively easy to write and maintain.
However, API tests can be more difficult to reason and resolve than unit tests, as they cover more complex functionality.

- Time to test: Moderate, similar to unit-tests and it can be lowered using gen-AI to generate API tests.
- Coverage: Moderate, can cover all API endpoints in the software, misses the user interface.
- Cost: Moderate, API tests are more complex than unit tests, but still relatively easy to write and maintain.
- Effectiveness: High, as API tests also cover parts which are not covered by unit-tests f.e. database queries.
- Challenge: Covering more complex scenarios than a single endpoint requires orchestrating numerous requests which cause maintainability issues.

Specialized Testing

Integration, performance, and security tests are specialized tests that cover specific aspects of the software.
These tests are typically more complex and time-consuming to write and maintain than unit and API tests.

- Time to test: High, specialized tests usually require custom tooling, deep understanding of the software and used technology.
- Coverage: Low, covers only a very specific aspects of the software, such as integration, performance, and security.
- Cost: High, specialized tests require more effort to write and maintain.
- Effectiveness: Low, simulating production-like environment for tests is usually difficult.
- Challenge: Test environments and production environments differ by configurations, data and infrastructure. Specialized tests also require people capable doing them.

End-to-end testing

End-to-end tests are used to test the software as a whole, from the user interface to the backend and database.
These tests are typically implemented using a tool like Selenium or Cypress, which can simulate user interactions with the software.
End-to-end tests cover the highest level of functionality, but are also the most difficult to write and maintain.

- Time to test: High, writing these tests are slow due to the dynamic nature of applications and amount of interactions required to cover a simple process.
- Coverage: High, can cover all parts of the software, from the user interface to the backend and database.
- Cost: High, dynamic views, changing environments and evolving software cause unintentional maintenance to the tests. AI-assisted tools try to address this issue with self-healing tests with mixed results.
- Effectiveness: Moderate, end-to-end tests are prone to fail for unintentional reasons.
- Challenge: Writing and maintaining end-to-end tests is very time-consuming and requires high amount of maintenance effort especially if the software is in early development phase.

Manual testing

Manual testing is usually the last step in the testing process, and is used to test the software as a whole.
Often this step is left to the QA team, or domain experts, who are responsible for ensuring the software meets the requirements.
Typically manual testing phase is labeled as acceptance testing, as it is used to determine if the software is ready for release.

- Time to test: High, theoretically low, but in reality people have their schedule full of other work and tickets can wait days for testing.
- Coverage: High, can cover all parts of the software, together, or individually from the user interface to the backend and database.
- Cost: High, manual testing is very time-consuming and requires human intervention. People need training and time to learn the software and testing skills.
- Effectiveness: Moderate, manual testing can be effective when not done too often and people have right mindset for testing.
- Challenge: It depends on persons personal knowledge, it is very difficult to scale well and people get distracted easily, and yet human errors happen.

Autonomous testing

Autonomous testing is a new approach to software testing that uses AI and machine learning to automate the testing process.
Autonomous testing agents can work 24/7, and run multiple different testing in parallel.
Unlike other testing methods, no test cases are planned before the execution.
This makes the approach interesting as the agent can notice issues that were not thought at all during the development process.

- Time to test: Low, “click to run”.
- Coverage: Moderate, can cover the software from the user interface to the backend and database, however its ability to test complex process should be evaluated.
- Cost: Low, autonomous testing agents typically work without human intervention, no test cases are written or maintained.
- Effectiveness: High, autonomous testing agents can run multiple different tests in parallel, and learn from the past.
- Challenge: Autonomous testing agents needs to keep the number of false positive issues low to provide relevant feedback for the R&D team.

Are you finding a high number of issues using manual testing? That might be an indication of issues earlier in the software development process. Autonomous testing can help you find these issues earlier and provide relevant feedback for the R&D team.

At AutoExplore, we are committed to helping R&D teams implement autonomous testing as part of their development processes. Ready to transform your process? Contact us for a demo to learn more.

Autonomous Exploratory Testing in Agile Development

AutoExplore — Sun, 09 Feb 2025 12:06:16 GMT

Agile development has become the standard for software teams aiming to deliver value quickly and to adapt to changing requirements. However, as the market competition is getting tougher, budgets are cut and release cycles shortened, quality assurance methods often struggle to keep up.

In this article, we will explore how autonomous testing can help agile teams to meet their goals and maintain high quality standards.

Agile methodologies prioritize iterative development, collaboration, and customer feedback. Teams work in sprints, delivering incremental updates to the product. However, this fast-paced environment often creates bottlenecks in the testing phase, where manual or scripted tests can’t keep up with the speed of development.

Challenges

Automated tests are being skipped. This can happen due to tight schedules, lack of knowledge or simply ignorance.
Code test coverage is not the same as use case coverage. Code test coverage can be 100% while the real use cases are not fully covered.
When the whole team is not committed to ensure high quality or testing is delegated to another party, it can create a situation where defects are accepted as a norm.
If QA teams aren’t involved in software design, they may test without fully understanding how the program is meant to work.
Engineers use AI to generate code and the same AI to generate tests for the generated code. As a result, the team’s working agreement is easily fulfilled without paying attention to code quality.

Usually people in R&D dislike defects and aim for high quality software. Despite good will, defects still occur. In retrospect it is easy to argue that if had been more careful or had written one more test case to cover “X”, we could have prevented that specific defect. However, usually at that point the defect has already affected the client’s workflow and the damage is done. Luckily, there is something we can improve.

What is Autonomous Exploratory Testing

Autonomous testing leverages machine learning to automate the testing process. Unlike traditional test automation, it autonomously adapts to changes in the codebase, learns from the past, and executes tests without human intervention.

Autonomous exploratory testing can be used as an added layer to prevent defects slipping through to production. It can also be used to guide or pinpoint areas which need additional scripted tests for hardening the software. To better understand why autonomous exploratory testing is beneficial we can split defects into two categories.

Technical defects

Technical defects are something that can most of the time be programmatically decided if a given behavior is incorrect. Examples of these are page crashes, errors, missing security guards, broken UI layouts, infinite loops, major performance degradation or when standard or best practices are not followed.

Substance incorrect behavior

These defects require substance or domain knowledge to understand if the given functionality is incorrect in the given context. These types of defects could be something like a missing field, incorrect text or non-intentional user-flow. For a program to autonomously decide if any of the earlier statements is a defect it needs more information. It is possible a change was made to the software, and the behavior was intentionally changed and thus detecting these types of defects is more challenging for a computer.

Where Autonomous Testing Shines

Autonomous exploratory testing can help with detecting technical defects. It runs without engineers having to write test cases. It can automatically adapt to changes in the software without people spending their time writing or maintaining test cases. There are also areas where autonomous exploratory testing excels and even surpasses human capabilities. For example, how many times did YOU as a developer go through Owasp Top 10, while developing the new features or changing existing functionality? Did YOU as a full-stack engineer also assert that all the views in your web UI have the required accessibility attributes in place and keyboard control works as expected?

Autonomous testing can test these types of issues automatically and continuously throughout the whole target software. While the robot explores the system user interface or APIs it can scan all sorts of issues in the background. It is not limited to testing one type of a problem at a time. It can test your applications entry form and at the same time verify links, buttons and other UI elements are working, analyze network traffic for security vulnerabilities, assert the accessibility attributes, check for missing access controls, look for unexpected errors, test the keyboard input, validate HTML, measure the performance statistics to avoid regressions in the future and the list goes on.

This issue data can then be aggregated and prioritized for the R&D team for further analysis. It can be scheduled before daily standup or provided before weekly release. It can also be used to alert the R&D team in case of a major change in issue trends at any time.

Limitations

While it sounds great, there are also limitations. Autonomous exploratory testing is not good for regression testing. This is because as the robot continuously adapts to the latest changes, it needs to keep the number of false positive issues as low as possible to ensure high quality reports. It is not easy for a program to autonomously decide if a change in user interface is a defect or not. However, it can be used as a tool to provide input for R&D team to then add traditional automated tests as needed to cover possible regression.

At AutoExplore, we are committed to helping R&D teams implement autonomous testing as part of their development processes. Ready to transform your process? Contact us at https://www.autoexplore.ai to learn more.