Business

Code Quality Assurance in the Generative AI Era: Shifting from Human Reviews to System Design

Insights from a Month of Vibe Coding: Redefining Review Responsibility and Optimization Strategies

2025-03-27
23 min
ChatGPT
Generative AI
AI Collaborative Development
Code Review
Vibe Coding
Quality Assurance
Development Culture
Ryosuke Yoshizaki

Ryosuke Yoshizaki

CEO, Wadan Inc. / Founder of KIKAGAKU Inc.

Code Quality Assurance in the Generative AI Era: Shifting from Human Reviews to System Design

The Traditional Role of Code Reviews and Signs of Change

When I first started programming, it was an iron rule that to write "high-quality code," you always needed someone else to review it. Code review wasn't just a process—it was the culture of software development itself, and more importantly, a critical ritual that clarified where "responsibility" resided.

"Human verification," "multiple-person checks," "insights from experienced developers"—these elements were unshakable pillars supporting software quality. For years, I adhered to this belief, sometimes spending hours meticulously reviewing even minor changes, convinced that this was my "responsibility" as a developer.

However, with the spread of AI-collaborative development—what's called "Vibe Coding"—this premise is gradually changing. AI-generated code is completed in minutes, and its volume and complexity are becoming increasingly difficult to handle within the framework of traditional code reviews. Moreover, trying to review AI-generated code using conventional methods may itself become a new bottleneck in development.

For the past month, I've thoroughly practiced collaborative development with AI. What emerged was a shift in the very concept of code review and the beginnings of a major transformation in "quality assurance responsibility." In this article, I want to share the lessons learned from this experience and new ways of thinking about quality assurance in the AI era.

Challenges of Traditional Code Reviews: Experiencing Bottlenecks and Limitations

About a month ago, my daily routine consisted of spending 1-2 hours meticulously reviewing minor code modifications generated by AI in just minutes, only proceeding to merge after this detailed review. At the time, I unquestioningly believed this was the "correct development process." In retrospect, it's clear that the bottleneck was already on the human side—specifically, with me.

Still, "the sense of responsibility that humans must ensure quality" held me captive. Although questions like "Am I misunderstanding this?" or "Could this just be my own assumption?" occasionally crossed my mind, the firmly rooted belief that "code absolutely must be reviewed" persisted within me.

Limitations of Traditional Code Reviews

The limitations of the traditional code review model can be summarized as follows:

  1. Scaling Problem: The volume and speed at which AI generates code easily exceed human review capacity.

  2. Time Constraints: The temporal asymmetry of reviewing code for an hour that was produced in a 15-minute AI collaborative session fundamentally disrupts the rhythm of development.

  3. Cognitive Load: When multiple components or files are simultaneously generated or modified, it becomes difficult for humans to grasp the entire picture with their cognitive processing capabilities.

  4. Psychological Conflict: Developers constantly struggle between "the responsibility to understand everything" and "realistic time constraints."

図表を生成中...

Continuing this process leads to the contradiction of humans being unable to keep up with AI productivity. And I found myself facing this contradiction head-on.

Insights from the Bottleneck Experience

One day, confronted with this contradiction, a fundamental question arose. There was no point in continuing to deny it. Rather, fully embracing this new style and then considering it in light of my own values seemed more constructive.

This shift in thinking became the starting point for my month-long practice of AI-collaborative development. As I continued to commit daily and explored new relationships with AI, I gradually began to realize—perhaps the essence of what we traditionally called "review" was actually a matter of "understanding" and "trust".

In code reviews between humans, we understand each other's thinking and intentions, and verify that the implementation meets requirements. Ultimately, we're making a judgment about whether to "trust" the code and the person who wrote it.

How will this "understanding" and "trust" change in the AI era? This question became the axis of my month-long exploration.

The Turning Point in AI-Driven Quality Assurance: A New Paradigm

As I progressed with collaborative development with AI, my thinking about quality assurance began to change significantly. It wasn't merely about "streamlining the review process" but the possibility of a paradigm shift in how we ensure quality.

Traditional Paradigm: Post-Implementation Verification

The traditional quality assurance model can be succinctly described as "post-implementation verification." In this framework:

  1. Code is written
  2. Humans review that code
  3. If there are issues, they're fixed; if not, the code is approved

The premise of this model was that "human eyes are the ultimate guarantors of quality." However, the idea of humans reviewing all code written by AI is becoming increasingly unrealistic, both quantitatively and qualitatively.

New Paradigm: The Possibility of Design-Driven Quality Assurance

In AI-collaborative development, pre-implementation design and framework creation may be more important than post-implementation review.

図表を生成中...

The new paradigm that emerged from a month of practice is a shift in perspective from "what to review and how" to "what to design and how to verify."

Core Elements of Design-Driven Quality Assurance

1. Clear Specification Definition and Test Design

In collaboration with AI, clearly defining "what to build" becomes even more important than before. Since AI itself cannot set purposes, it's the human's role to define clear specifications and expected behaviors.

In my experience, approaches like BDD (Behavior Driven Development) were particularly effective. Designing tests in advance using the "Given-When-Then" framework allowed me to clearly communicate what I expected from the AI.

2. The Importance of Automated Tests and CI/CD

In AI-collaborative development, verification through automated tests may play a more central role than human reviews. Verifying that AI-generated code performs as expected through tests can significantly reduce the burden of human review.

When I implemented this, test automation with GitHub Actions and pre-validation using husky proved very effective. Having tests automatically run before commits or pushes created a balance between quality assurance and development speed.

3. Staged Verification Process

Rather than always running all tests, designing a staged verification process according to the development phase is also important.

For example:

  • Running only unit tests for small changes
  • Running integration tests for feature additions
  • Running all tests (including E2E) when merging to the main branch

This approach allowed me to maintain development speed while ensuring thorough verification at critical points.

The Potential Qualitative Transformation of the Human Role

In AI-collaborative development, the human role may shift from "someone who reviews all code in detail" to "someone who designs the entire quality assurance system." This isn't merely about efficiency but a qualitative shift that focuses human intellectual resources on higher-level judgments and design.

The problem that "with AI-driven development, tasks are often completed without verification" is a challenge specific to this paradigm shift transition period. The current reality that AI is strong in implementation but weak in consideration for verification indicates that we humans need to more strongly design "verification frameworks."

This paradigm shift may, in a sense, impose "deeper responsibility" on developers, though its nature will change significantly. Next, let's consider this redefinition of "responsibility."

The Possibility of Redefining Responsibility: Changing Human Roles

One of the most fundamental changes brought about by AI-collaborative development may be the redefinition of the concept of "responsibility for code quality." How might the form of "responsibility" that traditional development culture has built change with the advent of AI?

Traditional View of "Responsibility": The Obligation to Understand and Verify Everything

In traditional software development, responsibility in code review meant "understanding all code and finding potential problems." I myself had internalized this view of responsibility for a long time.

The person who wrote the code bears the "responsibility for implementation," while the reviewer bears the "responsibility for verification"—this division formed the foundation of quality assurance. And in many cases, senior developers were expected to perform reviews, providing deep verification based on experience.

Possible Transformation of "Responsibility" in the AI Era: System Design and Framework Creation

However, in an era where AI generates large volumes of code, this responsibility model of "understanding everything" is becoming increasingly unrealistic. The change I felt through a month of AI-collaborative development is that the locus of responsibility is beginning to shift from "verification of individual code" to "design of the entire quality assurance system."

Specifically, the following changes in areas of responsibility seem to be emerging:

図表を生成中...

My Personal Experience with Responsibility Transformation

I noticed signs of this transformation when I significantly changed the test strategy for a project.

Initially, when I started collaborative development with AI, I took the traditional approach of "reviewing all code and maintaining test coverage above 80%." However, I quickly realized this approach was unsustainable due to the volume and complexity of AI-generated code.

As the project progressed, I found it increasingly difficult to grasp the overall picture, leading to confusion. Particularly when test code expanded to nearly 1,000 lines, even simple operations like reading a file with read_file became major bottlenecks. Ironically, the test strategy intended to ensure quality had become the biggest barrier to AI-collaborative development.

From this experience, I began to fundamentally reconsider my approach to responsibility. I felt the possibility that my responsibility in the AI era might become "creating a framework that ensures the entire system functions as expected" rather than "understanding every line of code."

The Core of New Responsibility: Pre-Design and Trust Building

The core of the new responsibility model may lie in "pre-design" and "mechanisms for building trust."

1. Responsibility for Pre-Design

In collaboration with AI, the quality of pre-design determines everything that follows. Clearly defining "what to build," "what quality standards to meet," and "how to test"—these may become the center of new responsibility.

図表を生成中...

2. Mechanisms for Building Trust

If it's impossible for "humans to see everything," how do we build trust? It may be through the accumulation of small verifications.

The trust-building mechanisms I practiced were:

  • Test Trophy Strategy: Prioritizing integration tests and E2E tests over unit tests
  • BDD Approach: Test design based on behavior, focusing on expected operations rather than technical implementation details
  • Memory Bank: Document management for context sharing with AI, formalizing tacit knowledge

Building these "mechanisms" may become the new form of responsibility.

Distribution and Sharing of Responsibility

Another important sign of change is that responsibility may be distributed and shared from "specific individuals (reviewers)" to "the entire system."

"The sense of responsibility that humans must ensure quality" weighs heavily on individuals. However, in the new model, the responsibility for quality assurance may be distributed across multiple elements including tests, CI/CD, documentation, and design principles, reducing the cognitive load on individuals.

This isn't "abandoning responsibility" but rather an approach for "more effective fulfillment of responsibility." Perhaps our responsibility lies not in understanding every line of code, but in establishing mechanisms that ensure the entire system functions correctly.

Practical Quality Assurance Approaches: Test-Driven Development and New Verification Strategies

Based on the theoretical possibility of redefining responsibility, what practices are effective? I'd like to share practical insights gained from a month of AI-collaborative development.

Test Trophy Strategy: Reconsidering the Balance of Tests

The traditional test strategy followed the concept of the "test pyramid," which places many unit tests, a moderate number of integration tests, and few E2E tests.

However, in collaboration with AI, the "test trophy" strategy proved effective. This approach places more emphasis on integration tests and component tests.

図表を生成中...

The reasons are simple:

  1. AI tends to write too many unit tests: AI's test generation capabilities are remarkable, often generating tests at a level that exceeds my understanding of the product.
  2. Ease of understanding: Integration tests (API and DB verification) that express expected behavior as imagined at the specification stage are easier to understand and manage than abstract unit tests.
  3. Practicality: E2E tests are intuitively easier to understand as they operate the actual UI, but since execution time is long, they should be kept to a minimum.
情報

The test trophy strategy aims for the following balance:

  • E2E tests: Limit to 1-2 patterns of main flow success paths
  • Integration tests: Focus more on API and DB areas
  • Component tests: Verify the behavior of UI components
  • Unit tests: Limited to particularly complex logic only

BDD Approach: Clarifying Specifications Through Tests

BDD (Behavior-Driven Development) is not just a testing technique but "a development method that describes software behavior in a form close to natural language." While traditional tests focus on "what to test," BDD focuses on "how the system should behave for users."

The essence of BDD lies in these three points:

  1. Describing behavior in natural language: Expressing expected behavior in the form "when X happens, Y should occur"
  2. Given-When-Then structure: A clear three-step structure of "preconditions," "actions," and "expected results"
  3. Common understanding for all: Not just technicians but business stakeholders and AI can understand requirements in the same language
// Traditional test description
test("User registration function", async () => {
  // Test content is difficult to understand
  const user = { email: "test@example.com", password: "password123" };
  const result = await userService.register(user);
  expect(result.success).toBe(true);
});
 
// BDD style test description
describe("User registration function", () => {
  // Scenario: Describing specifications in a form closer to natural language
  test("When registering with a valid email and password, an account should be created", async () => {
    // Given: Clearly state the test preconditions
    const validUser = {
      email: "test@example.com",
      password: "password123", // Valid password with at least 8 characters
    };
 
    // When: Execute the operation under test
    const result = await userService.register(validUser);
 
    // Then: Verify the expected results
    expect(result.success).toBe(true);
    expect(result.user.email).toBe("test@example.com");
    expect(result.message).toContain("Registration complete");
  });
 
  test("When registering with an email that is already in use, an error should occur", async () => {
    // Given: An existing user state
    await userService.register({
      email: "existing@example.com",
      password: "password123",
    });
 
    // When: Attempting to register again with the same email
    const result = await userService.register({
      email: "existing@example.com",
      password: "different123",
    });
 
    // Then: An error should be returned
    expect(result.success).toBe(false);
    expect(result.error).toContain("Already registered");
  });
});

The reason this BDD approach works well with AI-collaborative development is that it allows you to communicate "expected behavior" rather than how to write code to the AI. Tests also function as specification documents, enabling AI to interpret them and implement appropriately. For humans too, tests written in a form close to natural language can be intuitively understood without reading the code, which is an advantage.

Staged Verification and Clear Task Completion

When advancing AI-driven development, I faced the problem of tasks ending with insufficient verification despite AI reporting completed implementation. The following solutions proved effective for this challenge:

  1. Clarification of task completion conditions: Defining completion conditions in detail for each task wasn't realistic. The most effective approach was simply to define completion as "all tests pass."

  2. Automated verification through Git integration: I established a rule to always execute Git commands (add, commit, push) as the final step of a task. I introduced tools like husky for pre-commit and pre-push hooks to run automatic verification.

  3. Staged verification strategy: Initially, I adopted GitHub Flow, but later switched to Git Flow. I implemented a staged approach where only lightweight tests are executed on feature branches, with E2E tests and heavier API tests only running upon integration into the develop branch.

This staged approach made reliable verification possible while maintaining development speed.

Context Sharing with Memory Bank

A major challenge in AI-collaborative development was that information would reset with each task. The introduction of a mechanism called "Memory Bank" proved extremely effective against this.

Memory Bank is not a special technology or plugin but a surprisingly simple concept. Create a dedicated folder and continuously record project progress, decision-making, and trial-and-error history. Then have the AI reference this information at the start of each task. Initially, I manually recorded like meeting minutes, but later established methods for the AI to update it.

The effect of introducing this mechanism was tremendous, dramatically reducing stress. I found myself wondering why I hadn't introduced it sooner.

This experience revealed that in collaboration with AI, "context sharing management" is more important than "individual code reviews." This may indicate a shift from traditional "review-centered" quality assurance to "context sharing-centered" quality assurance.

Reconsidering the Positioning of Refactoring

In AI-collaborative development, the importance of refactoring and its timing also need reconsideration.

There's an important lesson about refactoring in the AI era: refactoring without tests is impossible even for AI. I actually attempted refactoring without tests, but even capable AI couldn't maintain the structure and intentions of the original code. The essence of refactoring as defined—"improving internal structure while maintaining externally observable behavior"—was lost.

In collaboration with AI, the approach of "first pass implementation → refactor later" proved effective. AI has the characteristic that "it's better to let it pass without interference even if code grows large," and then performing test-based refactoring afterwards creates a balance between code quality and development speed.

Future Outlook: The Evolving Concept of Quality Assurance

The changes in quality assurance that emerged from a month of AI-collaborative development seem like signs of a deep transformation concerning the essence of software development, rather than mere "efficiency." How might this change progress in the future?

Decentralization and Automation of Quality Assurance

First, quality assurance is likely to become more "decentralized" and "automated." Code review may shift from intensive human work to a combination of various automation tools and processes.

Specifically:

  • Continuous verification: Environments where verification automatically occurs whenever code is written
  • Self-healing systems: AI-based mechanisms that not only detect problems but also propose automatic corrections
  • Distributed review: Mechanisms for many eyes to see small changes rather than concentrated review by a few experts

Evolution of Human Roles

The prediction that "humans will become unnecessary" in the AI era is likely incorrect. Rather, human roles may concentrate on higher-level judgment and design.

Specific changes in human roles:

  1. Building design philosophy: Answering the fundamental question of "what should be created"
  2. Defining quality standards: The responsibility of establishing quality standards for AI to follow
  3. Curating experiences: The responsibility of designing not just functionality but the overall user experience
  4. Ethical judgment: The responsibility of evaluating the ethical impact of implementations proposed by AI

Further Evolution of the "Responsibility" Concept

The concept of "responsibility" itself may continue to evolve. It may develop from traditional "individual responsibility" to "system responsibility," and eventually to "distributed responsibility."

This isn't "abandoning responsibility" but rather a process of pursuing "more effective forms of responsibility." Instead of concentrating responsibility on some "reviewers," sharing it across the entire team or system may enable more robust quality assurance.

Creation of a New Development Culture

Ultimately, these changes may lead to the creation of a new development culture. There may be a shift in values from "code review culture" to "system design culture," from "individual skill" to "team mechanisms."

This cultural transformation won't happen overnight. However, as we practice collaborative development with AI, we seem to be gradually shaping this new culture.

Personal Outlook: As a Quality Assurer in the AI Collaboration Era

From my own month-long experience, I've glimpsed the skills and attitudes needed for quality assurance in the AI collaboration era:

  1. Systems thinking: A perspective that sees the entire system rather than individual lines of code
  2. Hypothesis verification cycle: Evolution through repeated small experiments and learnings
  3. Design orientation: A mindset that prioritizes design over implementation
  4. Adaptive sense of responsibility: An attitude that flexibly changes the nature of responsibility according to the situation

As AI-collaborative development continues to evolve, these skills and attitudes will become increasingly important. And the "concept of quality assurance" and "redefinition of responsibility" will continue as an endless dialogue.

Conclusion

I've shared initial insights from a limited one-month experiment on "AI-driven transformation of quality assurance." While still in the trial-and-error stage requiring further verification and improvement, I hope this article serves as a reference for developers facing similar challenges.

The responsibility for quality assurance isn't disappearing. Rather, it may transform into a deeper, broader, and more sustainable form. Adapting to signs of this change and exploring new forms of "responsibility" may be the attitude required of developers in the AI era.

We're still in a transition period, and I want to continue deepening my learning through practice. I'd love to hear your approaches to quality assurance in AI-collaborative development and your thoughts on "responsibility" in the comments.

Ryosuke Yoshizaki

Ryosuke Yoshizaki

CEO, Wadan Inc. / Founder of KIKAGAKU Inc.

I am working on structural transformation of organizational communication with the mission of 'fostering knowledge circulation and driving autonomous value creation.' By utilizing AI technology and social network analysis, I aim to create organizations where creative value is sustainably generated through liberating tacit knowledge and fostering deep dialogue.

Get the latest insights

Subscribe to our regular newsletter for the latest articles and unique insights on the intersection of technology and business.