What Happens When AI Reviews Its Own Code: Our Autonomous Pipeline Delivered 10x Faster Development

How Softjourn's R&D team built a self-reviewing AI agent pipeline on a live client project, cutting individual task time by up to 10x and boosting overall delivery throughput by 125%.

About the Client:

Project:Fully Autonomous AI Development Loop
Industry:Technology Services
Headquarters:Fremont, California
Technologies:Leading LLM,DevOps Platform,AI Code Review Agent

The Challenge

A complex, high-volume codebase required faster delivery without scaling headcount. Context-switching, boilerplate work, and sequential code review were consuming senior engineering time.

The Solution

A fully autonomous ticket-to-Pull Request loop built on two collaborating AI agents, guided by Markdown skill files that carry institutional knowledge across every session.

The Benefits

5 to 10x speed increase on well-defined, clearly scoped tasks. 125% increase in client-side delivery throughput. Near-100% of coding tasks handled at approximately $120/month in compute costs.

Introduction

Most conversations about AI in software development stop at faster coding. What our R&D team wanted to know was whether you could close the loop entirely: a developer issues a single prompt, and a reviewed, ready-to-merge Pull Request comes out the other end, with two AI agents handling everything in between.

That question became an ongoing experiment on a real client engagement, supporting a software platform with a large, active codebase and a steady stream of tickets.

What our senior engineer built is not a collection of AI shortcuts. It is a structured, end-to-end workflow where AI agents not only write the code but also review each other's work before a human ever looks at it.

Here is how it works, what it delivers, and where human judgment still matters most.

The Challenge

Working on a complex, multi-service software platform means overhead compounds quickly. A single UI change can touch multiple services at once.

A new feature landing means test documentation, Pull Request descriptions, and story point estimates all need to be written before the next ticket can start. And when code is ready for review, it waits, slowing release cadence in ways that are hard to see but easy to feel.

Our team was supporting a client with a large, active codebase and a high volume of well-defined tickets covering UI changes, bug fixes, and minor feature work. The client needed a faster pace of delivery without proportionally growing the team, and our senior engineer started asking whether AI agents had matured enough to absorb that overhead reliably.

One constraint was non-negotiable from the start: the AI could not have deployment access to production environments. Any solution had to stay sandboxed and fully reviewable before anything reached the client's live platform.

The Challenge
The traditional software engineering role is changing. Developers are transitioning from writing code to acting as managers who direct and validate AI.
Senior Engineer, Softjourn

The Solution

What our engineer built is best understood as a loop rather than a tool.

The workflow begins with a single prompt. Our engineer asks the AI agent to list active tickets in the project management platform, then issues a simple instruction: take this ticket and do everything.

From there, the agent reads the ticket description, opens a local browser to inspect and test the UI, identifies the issue, writes the code, verifies it locally, creates a branch, pushes a Pull Request, writes the Pull Request description, and assigns story points. All without the engineer writing a line of code.

That is only the first half of the loop. Once the Pull Request exists, our engineer sends it to a second AI agent for independent code review. That agent reads the code, posts comments directly on the Pull Request, and the first agent reads those comments and automatically applies the suggested fixes.

The result is two AI agents checking each other's work before any human touches the output.

AI agent review pipeline workflow

Skill Files: Giving AI Institutional Memory

One of the most transferable elements of this workflow is how the agent carries project knowledge from session to session.

Rather than re-explaining the platform's architecture, compliance requirements, and internal API patterns at the start of every session, our engineer encodes this into Markdown-based skill files: compact instruction documents that live directly in the codebase and load automatically for each task type.

A skill file might describe how to authenticate with internal APIs, how to apply the platform's design system correctly, or which cross-service dependencies to account for before making a specific type of change.

Once written, these files give the agent consistent, reliable context without the engineer spending time to re-establish it each session. They are also shareable across teammates, meaning the workflow is not locked to the person who built it.

With this setup, our engineer runs up to three parallel AI sessions across separate development environments simultaneously, each handling a different ticket. Senior engineering time - previously spent writing boilerplate and waiting on review queues - shifts almost entirely to architectural oversight and final Pull Request validation.

"In the last month, a significant leap occurred. The AI can now effectively handle 100% of the required coding operations. My role has fundamentally changed: I am no longer a developer writing code. I am a manager who monitors, directs, and validates AI output." – Senior Engineer, Softjourn

How the Workflow Matures

Part of building a reliable production workflow is documenting where it fails, not just where it succeeds.

Early in the process, our engineer asked the agent to update a ticket description with a bug's root cause. The agent deleted the original description entirely rather than appending to it.

The recovery was straightforward: the project management platform keeps a full audit log of edits, so the previous description was recoverable from the change history. The response was not a major overhaul. Instead, a simple, targeted prompt update was all it took, explicitly instructing the agent not to remove existing content, only to append or modify specific parts.

That addition has been sufficient to prevent the same issue from recurring. It is a useful illustration of how this workflow improves over time: not through large corrections, but through small, precise adjustments to how the agent is directed.

Hands on keyboard reviewing AI generated code

The Results

The productivity gains across the engagement have been consistent and significant.

Tasks that previously took one to two days are now completed in one to two hours. On well-defined, clearly scoped tasks such as UI changes and bug fixes with clear reproduction steps, the team estimates a 5 to 10x speed increase.

More complex or open-ended work sees smaller gains, and results vary significantly depending on task type.

Two specific examples stand out:

  1. A UI refactor of a complex internal dashboard, which would have taken approximately two days manually, was completed by the AI agent in roughly one hour.
  2. A cross-service migration debug involving an unexpected package compatibility error, the kind of problem that can consume two days of manual investigation, was identified and resolved immediately after the engineer fed the error logs to the agent.

On the client side, tracked metrics, including closed work items and Pull Requests created, show a 125% increase in delivery throughput.

The ongoing cost of running this workflow is notably low. Nearly 100% of all coding tasks on the engagement are handled by the AI agent, with compute costs of approximately $120 per month, reflecting Softjourn's spend on the tooling.

Task

Before AI

After AI

Typical feature work

1 to 2 days

1 to 2 hours

UI refactor (complex internal dashboard)

~2 days

~1 hour

Cross-service debug (package compatibility error)

Up to 2 days

Resolved immediately

Token spend (covers near-100% of coding tasks)

N/A

~$120/month

That said, our team is candid about one tradeoff worth naming directly: this approach can introduce more bugs than manual coding, which means local testing before merging has become more important, not less. QA involvement remains essential.

The efficiency gains are real, but they come with a responsibility to test more thoroughly at the validation stage. The AI handles implementation, while the engineer owns quality.

AI Autonomous Pipeline case study – full width banner

Conclusion

What our engineering team built on this engagement is not a shortcut, and it is not experimental anymore. It is a production workflow, running daily on a real client project with real delivery timelines, and the results have held up over several months.

The workflow is still being refined. Formal guidelines around AI agent permission levels and broader team adoption are in progress. But the foundation, a closed, autonomous agent loop guided by skill files and validated by human oversight, is something we can now bring to other engagements with confidence.

For engineering teams working on complex, service-based platforms and looking to understand what a structured AI development workflow can deliver in practice, we are glad to share what we have built. Contact Softjourn to start the conversation.

Want to Know More?

Fill out the form to discuss your idea with us!