Technology

Test Trophy Strategy Guide for AI-Driven Development: Evolution from the Test Pyramid

Optimal Testing Approach for Rapid Prototype Development with Generative AI

2025-03-30
40 min
Test Driven Development
AI Collaborative Development
Test Trophy
Jest
Playwright
Next.js
Vibe Coding
Ryosuke Yoshizaki

Ryosuke Yoshizaki

CEO, Wadan Inc. / Founder of KIKAGAKU Inc.

Test Trophy Strategy Guide for AI-Driven Development: Evolution from the Test Pyramid

The Turning Point in Test Strategy for AI-Driven Development

Since I started programming, I've believed and practiced the maxim that "good code requires good tests." Test-driven development (TDD) and its derivative methodologies have been accepted as the royal road to creating high-quality software. And the test pyramid strategy—placing many unit tests, a moderate number of integration tests, and few E2E tests—has been an almost unquestionable established principle.

However, in the era of AI collaborative development (Vibe Coding), especially in prototype development, this established principle is no longer applicable. About a month ago, I launched a new prototype project in collaboration with AI. Initially, I adopted the conventional test pyramid strategy and had the AI write a large number of unit tests aiming for 80% test coverage.

図表を生成中...

The problem I faced was unexpected. AI is surprisingly good at writing tests, generating large quantities of tests that cover even more edge cases than human-designed ones. And they're high quality. However, in the prototype stage of a project, these "too good" test codes became an impediment to development.

In this article, I'll explain the transition from the test pyramid to the test trophy strategy and its implementation approach in the context of AI collaborative prototype development. It's important to recognize that this strategy is specialized for prototypes and rapidly evolving projects, and the traditional test pyramid remains a valid option for mature products.

The Relationship Between Prototype Development and Test Strategy

First, it's important to understand the characteristics of quality assurance in prototype development. The purpose and priorities of test strategies differ significantly between prototypes and mature products.

Test Objectives in Prototype Development

The main purposes of testing in prototype development are:

  1. Core functionality verification: Confirming that ideas or key functions work as expected
  2. Rapid feedback cycle: Supporting quick verification and course correction for changes
  3. Minimal safety assurance: Providing minimum protection to prevent basic functional failures

In contrast, the test objectives for mature products are:

  1. Comprehensive quality assurance: Ensuring the reliability and stability of all features
  2. Regression prevention: Preventing impacts on existing functionality when adding features or making fixes
  3. Long-term maintainability: Balancing ease of code changes with quality maintenance

Understanding this difference is the first step in selecting an appropriate test strategy.

Evolution of Projects and Changes in Test Strategy

Projects typically evolve through the following stages:

図表を生成中...

The appropriate test strategy changes with each stage:

  1. Prototype Stage: Test Trophy Strategy (centered on integration and component tests)
  2. MVP Stage: Extended Test Trophy (adding basic unit tests)
  3. Growth Stage: Hybrid Approach (gradual transition to pyramid)
  4. Mature Stage: Test Pyramid Strategy (comprehensive unit tests)

In prototype development, code changes occur rapidly and frequently, making overly detailed unit tests more of an obstacle. This is the main reason I adopted the test trophy strategy for AI collaborative prototype development.

Challenges of the Test Pyramid in AI Collaborative Prototype Development

The traditional test pyramid strategy has been recommended by software development authorities and adopted by many organizations. The basic structure of this strategy is:

  1. Base (most numerous): Unit tests (testing individual functions or methods)
  2. Middle layer: Integration tests (testing cooperation between multiple components)
  3. Top (fewest): E2E tests (testing the behavior of the entire system)

This structure was designed to balance test execution speed and test coverage. Unit tests are fast and verify detailed functionality, while E2E tests take longer to execute but verify important user flows.

However, in AI collaborative prototype development, I faced the following problems:

1. AI's Test Code Bloat

AI faithfully follows given instructions. When setting a goal of 80% test coverage, AI generates a large volume of tests covering all conceivable edge cases and input patterns. In my experience, test code sometimes reached nearly 1,000 lines for implementation code of around 200 lines.

A particular problem was AI's tendency to spend more time on test code than implementation code. This sacrificed the speed of feature implementation, which should be prioritized in prototypes.

2. Context Length Waste

Especially when using AI tools like Roo Code or Cline, a large amount of test code consumes context length when using read_file, which significantly reduces AI's judgment and implementation capabilities.

For example, situations like this frequently occurred:

Interaction with AI (before context length exceeded)
-----------------------------------
Me: "Can you add this functionality to the API endpoint?"

AI: "Yes, I'll propose three efficient implementation methods considering specific use cases..." (detailed analysis and implementation proposals)

Interaction with AI (after context length exceeded)
-----------------------------------
Me: "Can you add this functionality to the API endpoint?"

AI: "Yes, I can add it. What kind of functionality would you like to add?" (just basic questions)

As a temporary measure, I split the tests midway, but this wasn't a fundamental solution.

3. Rapid Increase in Token Consumption and Cost

A lot of test code significantly increases token consumption in dialogues with AI. This pushes up the cost of using AI APIs, which is an inefficient investment for prototype development.

In my case, the cost per task increased about 2.5 times after test code bloat. Spending this much on tests during the prototype stage is not optimal resource allocation.

4. Test Code Review Burden

It's not realistic for humans to review the large amount of test code generated by AI. In particular, there was a time asymmetry where reviewing code generated in a 15-minute AI collaboration session took more than an hour.

A dialogue I had while reviewing test code for a feature symbolizes this problem:

Interaction with AI (about the prototype test strategy)
-----------------------------------
Me: "This test is quite comprehensive. But do we really need this much for this prototype?"

AI: "Based on the goal of 80% test coverage, I'm covering all possible cases. Testing completely, including error handling and edge cases, ensures quality."

Me: "That's true... but at this prototype stage, couldn't we focus a bit more on the important parts?"

AI: "Of course, I can adjust the test strategy to focus on critical functionality. What kind of approach would you prefer?"

This exchange was the catalyst for changing my thinking. It was a shift to recognizing that in prototype development, "appropriate tests" are more important than "perfect tests".

Test Trophy Strategy: An Approach Suited for Prototype Development

The test trophy strategy is an alternative approach to the test pyramid proposed by Kent C. Dodds. Named after the shape of a trophy, this strategy emphasizes integration tests and component tests.

Structure and Ratio of the Test Trophy Strategy

  1. E2E Tests (10-20%): Verify only the main functional flows (e.g., login → data entry → submission)
  2. Integration Tests (50-60%): Verify cooperation between multiple components, such as API integration and database operations
  3. Component Tests (30-40%): Verify the behavior of UI components or functional units
  4. Unit Tests (minimal): Target only particularly complex logic or important calculation processes
図表を生成中...

Here are the specific differences between the traditional test pyramid and the test trophy strategy:

AspectTest PyramidTest Trophy
FocusVerification of individual functionsVerification of behavior and integration
Unit Tests60-70% of totalOnly for particularly important parts
Integration Tests20-30% of total50-60% of total
E2E Tests5-10% of total10-20% of total
Use of MocksExtensive useMinimal use
Maintenance CostHigh (updating many tests)Moderate
Execution SpeedVery fastRelatively fast
ReliabilityPartial function guaranteeOverall behavior guarantee
Prototype FitLow (excessive testing)High (sufficient testing)
Change CostHigh (many updates required)Moderate (focused updates)

Benefits of the Test Trophy in AI-Driven Prototype Development

The main reasons I switched to the test trophy strategy in AI-driven prototype development are:

  1. Aligns with AI Characteristics: AI can generate more appropriate tests when it understands actual usage scenarios and integration points rather than individual functions

  2. Improved Review Efficiency: Integration tests and component tests can cover the entire functionality with fewer tests, significantly improving review efficiency

  3. Token Consumption Optimization: Achieves effective coverage with less test code, reducing token consumption in dialogues with AI

  4. Suitability for Prototype Development: The test trophy strategy is more suitable for prototype development, which emphasizes rapid iteration and function verification

  5. Maintaining AI Judgment: Doesn't waste context length, allowing maximum utilization of AI performance

  6. Resilient to Changes: In the prototype stage, frequent specification changes occur, but the test trophy strategy involves fewer tests affected by changes

In my project, after transitioning to the test trophy strategy, the total number of test lines decreased by about 65%, significantly reducing review time. Also, token consumption per task decreased by about 40%, improving cost efficiency.

Practical Implementation of the Test Trophy in AI Collaborative Prototype Development

Here's a specific approach to implementing the test trophy strategy in AI collaborative prototype development.

1. How to Communicate the Test Plan

Clear communication is essential to help AI understand the test trophy strategy. Here's a prompt example I found effective:

Interaction with AI (about the test trophy strategy)
-----------------------------------
Me: "I'd like some advice on the test strategy"

AI: "I recommend adopting the test trophy strategy for this project. This means:
1. Emphasize integration tests and component tests (70-80% of the total)
2. Limit unit tests to complex logic only
3. E2E tests for only 1-2 normal patterns of main flows

Since this is a prototype stage, focus on tests that confirm important functions work correctly rather than comprehensive tests."

With such clear instructions, AI can understand and implement an appropriate test composition.

2. Prioritizing Test Implementation

The priorities for implementing the test trophy strategy are:

図表を生成中...

In actual test planning, the distribution of tests is determined based on these priorities. For example, for backend testing centered on APIs, a distribution like this is appropriate:

// api/parameter.test.ts
describe("Parameter API Tests", () => {
  // Priority 1: API Integration Tests (about 60%)
  describe("API Integration", () => {
    test("Should retrieve correct parameters from DB", async () => {
      // ...
    });
 
    test("Should save evaluation results to DB", async () => {
      // ...
    });
 
    test("Should return an error for invalid IDs", async () => {
      // ...
    });
  });
 
  // Priority 2: Service Component Tests (about 30%)
  describe("Parameter Service Component", () => {
    test("Parameter normalization should be done correctly", async () => {
      // ...
    });
 
    test("Batch processing of multiple parameters should work", async () => {
      // ...
    });
  });
 
  // Priority 4: Unit Tests for Important Logic (about 10%)
  describe("Critical Utility Functions", () => {
    test("Value normalization should work correctly even with boundary values", () => {
      // ...
    });
  });
});

For the frontend, component tests using React Testing Library are central:

// components/ParameterDisplay.test.tsx
import { render, screen, fireEvent } from "@testing-library/react";
import ParameterDisplay from "./ParameterDisplay";
 
describe("ParameterDisplay Component", () => {
  // Component Tests (about 70%)
  test("Parameters should be displayed correctly", () => {
    const parameters = [
      { id: "param1", name: "Technical Skill", value: 4.2, confidence: 0.8 },
      { id: "param2", name: "Team Skill", value: 3.8, confidence: 0.7 },
    ];
 
    render(<ParameterDisplay parameters={parameters} />);
 
    expect(screen.getByText("Technical Skill")).toBeInTheDocument();
    expect(screen.getByText("4.2")).toBeInTheDocument();
    expect(screen.getByText("Team Skill")).toBeInTheDocument();
    expect(screen.getByText("3.8")).toBeInTheDocument();
  });
 
  test("Visual display should change according to confidence", () => {
    const parameters = [
      { id: "param1", name: "Technical Skill", value: 4.2, confidence: 0.9 },
      { id: "param2", name: "Team Skill", value: 3.8, confidence: 0.3 },
    ];
 
    render(<ParameterDisplay parameters={parameters} />);
 
    // High confidence parameters are emphasized
    const highConfidenceItem = screen
      .getByText("Technical Skill")
      .closest("div");
    expect(highConfidenceItem).toHaveClass("high-confidence");
 
    // Low confidence parameters are displayed modestly
    const lowConfidenceItem = screen.getByText("Team Skill").closest("div");
    expect(lowConfidenceItem).toHaveClass("low-confidence");
  });
});

E2E tests use Playwright and cover only truly important flows:

// e2e/parameterFlow.spec.ts
import { test, expect } from "@playwright/test";
 
test("Basic flow of parameter evaluation", async ({ page }) => {
  // Login
  await page.goto("/login");
  await page.fill('input[name="email"]', "test@example.com");
  await page.fill('input[name="password"]', "password123");
  await page.click('button[type="submit"]');
 
  // Navigate to parameter evaluation page
  await page.click('a[href="/evaluate"]');
 
  // Text input and evaluation execution
  await page.fill('textarea[name="evaluationText"]', "Sample text...");
  await page.click('button[text="Run Evaluation"]');
 
  // Confirm results display
  await page.waitForSelector(".evaluation-results");
  expect(await page.isVisible(".parameter-item")).toBeTruthy();
  expect(await page.textContent(".results-summary")).toContain(
    "Evaluation Complete"
  );
});

3. Modularization and Separation of Tests

I adopted the following file structure to make test code manageable:

__tests__/
├── unit/                   # Unit Tests (minimal)
│   ├── utils/              # Important utility functions
│   └── algorithms/         # Calculation logic and algorithms
├── components/             # Component Tests (30-40%)
│   ├── ui/                 # UI component tests
│   └── behaviors/          # Functional unit behavior tests
├── integration/            # Integration Tests (50-60%)
│   ├── api/                # API integration tests
│   ├── db/                 # Database integration tests
│   └── services/           # Service-to-service integration tests
├── e2e/                    # E2E Tests (10-20%)
│   └── flows/              # Main user flows
└── fixtures/               # Common test data
   ├── testUsers.ts        # Test user data
   └── testParameters.ts   # Test parameter data

This structure allows AI-generated tests to be appropriately classified and managed. Also, separating test data into common fixtures reduced redundancy in test code.

Here's an example of fixtures:

// __tests__/fixtures/testParameters.ts
export const sampleParameters = [
  { id: "param1", name: "Technical Skill", value: 4.2, confidence: 0.8 },
  { id: "param2", name: "Team Skill", value: 3.8, confidence: 0.7 },
  { id: "param3", name: "Market Size", value: 5.0, confidence: 0.9 },
  { id: "param4", name: "Profitability", value: 3.5, confidence: 0.6 },
  { id: "param5", name: "Growth Potential", value: 4.7, confidence: 0.8 },
];
 
export const lowConfidenceParameters = [
  { id: "param6", name: "Uniqueness", value: 2.8, confidence: 0.3 },
  { id: "param7", name: "Market Entry Barriers", value: 3.2, confidence: 0.4 },
];
 
export const mockParameterResponse = {
  success: true,
  parameters: sampleParameters,
  evaluationId: "eval-123456",
};

4. Introduction of a Gradual Verification Process

In prototype development, the efficiency of test execution is also an important consideration. If tests take too long to run, it disrupts the development rhythm. So, I introduced the following gradual verification process:

図表を生成中...

In the CI/CD pipeline, I implemented this using GitHub Actions:

# .github/workflows/test.yml
name: Test
 
on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]
 
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: "18"
      - name: Install dependencies
        run: npm ci
 
      # Feature branches (PRs) only run lightweight tests
      - name: Run unit and integration tests
        run: npm run test:quick
 
      # Develop branch includes component tests
      - name: Run component tests
        if: github.ref == 'refs/heads/develop' || github.base_ref == 'refs/heads/develop'
        run: npm run test:components
 
      # Only main branch runs full verification including E2E tests
      - name: Run E2E tests
        if: github.ref == 'refs/heads/main' || github.base_ref == 'refs/heads/main'
        run: npm run test:e2e

The corresponding package.json configuration looks like this:

{
  "scripts": {
    "test:quick": "jest --bail --testPathIgnorePatterns=e2e,components",
    "test:components": "jest --bail components",
    "test:e2e": "jest --bail e2e",
    "test": "jest"
  }
}

This gradual approach enables reliable verification while maintaining development speed. In Vibe Coding, where task granularity is small and work cycles are short at around 15 minutes, optimizing test execution time is particularly important.

Fusion of BDD Approach and Test Trophy Strategy

In the test trophy strategy, the focus is more on "how" the system should behave rather than "what" to test. In this respect, it works very well with Behavior-Driven Development (BDD).

Basic Concepts and Benefits of BDD

BDD is an evolution of test-driven development that emphasizes clearer description of system behavior. The "Given-When-Then" structure is particularly characteristic:

図表を生成中...

The benefits of BDD include:

  1. Improved test readability: The purpose and preconditions of the test become clear
  2. Enhanced AI understanding: AI can better understand the intent of the test
  3. Functions as specification: Tests also serve as specification documentation
  4. Promotes communication: Expression that's easier to understand even for non-technical people

BDD Implementation Examples in the Test Trophy Strategy

In my project, I implemented BDD-style tests like this:

// Conventional test
test("Can send requests to API correctly", async () => {
  const client = new ApiClient();
  const result = await client.getParameters("user123");
  expect(result.success).toBe(true);
});
 
// BDD-style test
describe("Parameter Retrieval API", () => {
  test("Should retrieve parameters when requested with a valid user ID", async () => {
    // Given: A valid user ID and initialized API client
    const validUserId = "user123";
    const client = new ApiClient();
 
    // When: Sending a request to the parameter retrieval API
    const result = await client.getParameters(validUserId);
 
    // Then: A successful response and parameter data should be returned
    expect(result.success).toBe(true);
    expect(result.parameters).toBeDefined();
    expect(result.parameters.length).toBeGreaterThan(0);
  });
 
  test("Should return an error response when requested with a non-existent user ID", async () => {
    // Given: A non-existent user ID and initialized API client
    const invalidUserId = "nonexistent";
    const client = new ApiClient();
 
    // When: Sending a request to the parameter retrieval API
    const result = await client.getParameters(invalidUserId);
 
    // Then: An error response should be returned
    expect(result.success).toBe(false);
    expect(result.error).toBeDefined();
    expect(result.error.code).toBe("USER_NOT_FOUND");
  });
});

As you can see, BDD-style tests clearly state preconditions and describe in detail what should happen. This allows AI to understand the intent of the test more accurately and generate appropriate tests.

BDD Test Creation Workflow with AI

The practical workflow for the BDD approach in AI collaborative development is as follows:

図表を生成中...

I found that an approach focusing on creating a test plan before test implementation was most effective. Here's a specific prompt example:

Please create a BDD-style test plan for the following functionality

Functionality: Parameter Evaluation System
- Users can input text and execute evaluation
- The system analyzes the text and calculates parameter values
- Parameter values are stored in the database
- Evaluation results are displayed in the UI

Based on the test trophy strategy, plan tests with the following ratio:
- Integration tests: 60%
- Component tests: 30%
- E2E tests: 10%

Please describe each test in Given-When-Then format.
Include not only the test names but also a summary of each Given-When-Then section.

Example AI response:

情報

Test Plan: Parameter Evaluation System

1. Integration Tests (60%)

API and Service Integration Tests

Test 1: The text evaluation API should correctly calculate parameter values

  • Given: Evaluation target text is prepared
  • When: A request is sent to the text evaluation API
  • Then: A correct set of parameter values should be returned

Test 2: Calculated parameter values should be correctly saved to the database

  • Given: There is a set of evaluated parameter values
  • When: The parameter saving service is called
  • Then: Values should be saved to the database, and a save confirmation response should be returned

...(other integration tests)

2. Component Tests (30%)

Test 1: The parameter display component should correctly display values

  • Given: A set of parameter values is provided
  • When: The parameter display component is rendered
  • Then: Each parameter and its value should be displayed on the screen

...(other component tests)

3. E2E Tests (10%)

Test 1: Users should be able to complete the entire flow from text input to result display

  • Given: The user is logged in and accessing the evaluation page
  • When: The user enters text in the text input field and clicks the evaluation button
  • Then: After processing is complete, the calculated parameter values should be displayed on the screen

...(other E2E tests)

Creating such a test plan in advance allows you to confirm the outline of tests before implementation and enables efficient review. Also, AI can generate specific test code based on this plan.

From Prototype to Product: Evolution of Test Strategy

As a project evolves from prototype to more mature phases, the test strategy needs to evolve too. Here, I'll explain how to adapt the test strategy to the growth of the project.

Changes in Test Strategy According to Project Stage

The appropriate test strategy changes with the maturity of the project:

  1. Prototype Stage

    • Test purpose: Verification of basic functionality, proof of concept
    • Optimal strategy: Test Trophy Strategy (centered on integration and component tests)
    • Test ratio: Integration tests 60%, component tests 30%, E2E tests 10%, minimal unit tests
  2. MVP Stage

    • Test purpose: Ensuring reliability of minimal product functionality
    • Optimal strategy: Extended Test Trophy Strategy (strengthening unit tests)
    • Test ratio: Integration tests 50%, component tests 25%, unit tests 15%, E2E tests 10%
  3. Growth Stage

    • Test purpose: Maintaining quality in parallel with feature expansion
    • Optimal strategy: Hybrid Approach (starting transition to pyramid)
    • Test ratio: Integration tests 40%, unit tests 35%, component tests 15%, E2E tests 10%
  4. Mature Stage

    • Test purpose: Ensuring long-term quality and stability
    • Optimal strategy: Test Pyramid Strategy (centered on unit tests)
    • Test ratio: Unit tests 60%, integration tests 25%, E2E tests 15%

Practical Approach to Test Migration

Here's a practical approach to gradually transitioning test strategy as the project grows.

1. Migration Trigger Points

Indicators to determine when to change test strategy:

  • Feature stability: Main feature specifications stabilize, and large changes become less frequent
  • User feedback: Feedback from actual users increases
  • Team size: The development team expands, and multiple people work on the same codebase
  • Release cycle: A regular release process is established

2. Gradual Migration Strategy

Rather than doing the migration all at once, proceed in stages:

  1. Identification of Important Components

    • Identify particularly important features or core logic
    • Strengthen unit tests for these components first
  2. Application of Test Pyramid to New Features

    • Apply the test pyramid strategy to newly added features
    • Maintain the test trophy strategy for existing features for the time being
  3. Gradual Refactoring of Legacy Tests

    • Gradually refactor existing tests over time
    • Determine refactoring priority based on the importance of the functionality

3. Specific Migration Example

For example, when migrating tests for an API endpoint:

// Prototype stage test (Test Trophy)
describe("User API", () => {
  test("Users should be retrieved correctly", async () => {
    // Integration test (testing the entire API request)
    const response = await request(app).get("/api/users/123");
    expect(response.status).toBe(200);
    expect(response.body.name).toBe("Test User");
  });
});
 
// Mature stage test (Test Pyramid)
describe("User API", () => {
  // Unit test (mocking the user service)
  test("getUserById function should return the correct user", async () => {
    const mockUser = { id: "123", name: "Test User" };
    userRepositoryMock.findById.mockResolvedValue(mockUser);
 
    const result = await userService.getUserById("123");
    expect(result).toEqual(mockUser);
    expect(userRepositoryMock.findById).toHaveBeenCalledWith("123");
  });
 
  // Controller unit test
  test("getUserController should return the correct response", async () => {
    const mockUser = { id: "123", name: "Test User" };
    userServiceMock.getUserById.mockResolvedValue(mockUser);
 
    const req = { params: { id: "123" } };
    const res = { status: jest.fn().mockReturnThis(), json: jest.fn() };
 
    await getUserController(req, res);
 
    expect(res.status).toHaveBeenCalledWith(200);
    expect(res.json).toHaveBeenCalledWith(mockUser);
    expect(userServiceMock.getUserById).toHaveBeenCalledWith("123");
  });
 
  // Integration test (more limited scope)
  test("User API endpoint should function correctly", async () => {
    // Testing the actual API request
    const response = await request(app).get("/api/users/123");
    expect(response.status).toBe(200);
    expect(response.body.name).toBe("Test User");
  });
});

In this way, you gradually transition to the test pyramid strategy by decomposing and detailing the tests.

Lessons and Cautions from Practice

I'd like to share lessons and cautions about the test trophy strategy from my one-month Vibe Coding prototype development practice.

1. Selecting Test Strategy According to Project Maturity

One of the most important lessons is that test strategy should be selected according to the maturity and purpose of the project.

  • Prototype stage: The test trophy strategy is suitable
  • Proof-of-concept stage: A combination of the test trophy and limited pyramid strategy
  • Production preparation stage: Gradual transition to the test pyramid strategy as appropriate

Writing excessive tests at the prototype stage requires modifying a large number of tests with each specification change, significantly reducing development speed.

2. Understanding AI's Strengths and Weaknesses

AI is good at writing tests, but it also tends to write "excessively perfect tests." It's important to understand this characteristic and adjust it with clear instructions.

For example, instructions like these were effective:

Focus on covering the most important functions and integration points rather than aiming for perfect tests.
Keep edge cases to the minimum necessary.

Also, AI often tends to forget to write "code to set up preconditions for tests." For example, initializing the test database environment. To prevent this, it's good to explicitly instruct it to describe the preconditions for tests.

3. Managing Test Execution Time

Test execution time has a significant impact on the developer experience. Especially in Vibe Coding prototype development, the balance between AI task completion timing and test execution time is important.

In my case, having E2E tests take 3-4 minutes when an AI task completes in about 15 minutes was a significant stress. Therefore:

  • Execute only lightweight tests on feature branches: Quick verification with only unit and integration tests
  • Execute all tests when merging to the develop branch: Complete verification including E2E tests
  • Optimize parallel test execution: Use the Jest --runInBand flag only for DB-related tests

The following Jest configuration was particularly effective:

// jest.config.js
module.exports = {
  // Project settings
  projects: [
    // DB connection tests run in series
    {
      displayName: "db-tests",
      testMatch: ["<rootDir>/**/*.db.test.{ts,tsx,js,jsx}"],
      runner: "jest-serial-runner", // Serial execution
    },
    // Other tests run in parallel
    {
      displayName: "unit-tests",
      testMatch: ["<rootDir>/**/*.test.{ts,tsx,js,jsx}"],
      testPathIgnorePatterns: [
        "<rootDir>/**/*.db.test.{ts,tsx,js,jsx}",
        "<rootDir>/**/*.e2e.test.{ts,tsx,js,jsx}",
      ],
    },
    // E2E tests are a separate group
    {
      displayName: "e2e-tests",
      testMatch: ["<rootDir>/**/*.e2e.test.{ts,tsx,js,jsx}"],
      runner: "jest-serial-runner", // Serial execution
    },
  ],
};

4. Managing Common Test Data

Managing test data was also a major challenge. AI tends to generate new test data for each test, which leads to code duplication and reduced maintainability.

As a solution, I centralized common test data in a fixtures directory and made it reusable in all tests. This:

  1. Ensures test data consistency: Using the same data in all tests ensures consistency
  2. Improves maintainability: When data structure changes, modifications are concentrated in one place
  3. Simplifies test code: No need to define data in the test code, improving readability
// __tests__/fixtures/parameters.ts
export const sampleParameters = [
  { id: "param1", name: "Technical Skill", value: 3.5, confidence: 0.8 },
  { id: "param2", name: "Team Skill", value: 4.2, confidence: 0.7 },
  // ...
];
 
// Example use in integration test
import { sampleParameters } from "../fixtures/parameters";
 
test("Should process parameter sets correctly", async () => {
  // Given
  const parameters = sampleParameters;
 
  // When & Then
  // ...
});

5. Relationship Between Refactoring and Testing

Having few tests reduces the safety of refactoring. Interestingly, however, in AI collaborative prototyping, an approach of "first get the implementation through, then refactor later" was effective.

AI works more efficiently by first implementing functionality and then refactoring based on tests, rather than aiming for perfection in the initial code generation. This approach is similar to the "Red, Green, Refactor" principle, but is particularly effective in the context of AI.

However, this approach doesn't work well without tests. Minimum tests based on the test trophy strategy are necessary since safety cannot be ensured during refactoring without tests.

Conclusion: Optimal Test Strategy in AI-Driven Development

The choice of test strategy in AI-driven development requires different approaches depending on the project phase. The test trophy strategy introduced in this article is particularly suitable for Vibe Coding in the prototype stage.

Main Benefits of the Test Trophy Strategy

  1. Efficient review: Covers important functionality with fewer tests
  2. Compatibility with AI: Enables test generation that leverages AI characteristics
  3. Token consumption optimization: Effective testing with less code
  4. Maintaining development speed: Prevents impediments from excessive testing
  5. Reduced change costs: Minimizes the scope of modifications during specification changes

Guidelines for Final Test Ratio

Based on my one-month practice, the optimal test ratio for prototype development is:

  • E2E Tests: 10% (only main user flows)
  • Integration Tests: 50% (such as API and DB integration)
  • Component Tests: 30% (such as UI components)
  • Unit Tests: 10% (complex logic only)

This ratio needs to be adjusted according to the nature and purpose of the project.

Future Prospects: Evolution of Test Strategy

With the evolution of AI technology, test strategies will continue to change. Expected developments include:

  1. Improvement of AI self-verification capabilities: AI detecting and fixing issues in its generated code
  2. Specialization of test generation AI: Emergence of AI tools specialized for testing
  3. Deepening context understanding: Optimal test generation with understanding of the entire project

The test trophy strategy is a new approach compared to the traditional test pyramid, but it's very effective in the context of AI-driven prototype development. It has particular value as a flexible and efficient test strategy for prototype development and MVP (Minimum Viable Product) creation.

Finally, regardless of which test strategy you choose, judgment according to the maturity and purpose of the project is most important. For mature products, a more comprehensive test strategy may be more appropriate. The key to judgment is finding the optimal balance between development speed, quality assurance, and resource efficiency.

Ryosuke Yoshizaki

Ryosuke Yoshizaki

CEO, Wadan Inc. / Founder of KIKAGAKU Inc.

I am working on structural transformation of organizational communication with the mission of 'fostering knowledge circulation and driving autonomous value creation.' By utilizing AI technology and social network analysis, I aim to create organizations where creative value is sustainably generated through liberating tacit knowledge and fostering deep dialogue.

Get the latest insights

Subscribe to our regular newsletter for the latest articles and unique insights on the intersection of technology and business.