Optimal Testing Strategy for Next.js: When to Use Jest, RTL, and Playwright

The "15 Minutes Per Task" Rule Discovered in AI Collaboration and Its Impact

Throughout my month of working with AI-driven development (also known as Vibe Coding), I've discovered one of the most surprising facts: when collaborating with AI, a single task can be completed in about 15 minutes.

This might sound far-fetched, but it's a consistent pattern I've observed over the past month. In traditional development, even simple feature implementations like "adding a button to a screen that displays a confirmation dialog when clicked" would typically take 1-2 hours. However, when I give the same task to AI, the code is remarkably complete in about 15 minutes.

The traditional development cycle (2-3 hours) looked like this:

This has been shortened to about 15 minutes in the AI collaborative development cycle:

I first observed this phenomenon while developing a prototype for an investment decision support system. When trying to implement a UI screen to display evaluation information for investment candidates, I asked the AI to "create a component to display evaluation parameters." Within about 8 minutes, the basic component was complete, and in another 4 minutes, styling and conditional displays were adjusted. Had I created a similar component myself, it would have taken at least an hour.

This dramatically shortened development cycle also has significant implications for testing strategy. While traditional development emphasized "test coverage" and "maintainability," in the 15-minute cycle world, test execution time and review efficiency become vastly more important factors.

In this article, I'd like to share insights from my experience on how to differentiate between Jest, React Testing Library, and Playwright in this "15 minutes per task" ultra-short cycle development.

Collaboration Experiences with Various Testing Tools: Unexpected Issues and Solutions

Through three prototype development projects, I discovered how each testing tool functions in AI collaborative development and what unexpected pitfalls exist.

Experience with Jest: The Problem of Mock Overgeneration

Jest is a standard tool for unit testing, but the first issue I faced when collaborating with AI was "mock overgeneration." Let me provide a specific example.

In one project, I asked the AI to test a feature that calculates evaluation parameters from text input. I expected "simple code that tests parameter calculation with several input examples." However, what the AI generated was code like this:

// Initial test code generated by AI
describe("calculateEvaluationParameters", () => {
  // Mocking all dependency modules
  jest.mock("../services/textAnalysis");
  jest.mock("../api/evaluationApi");
  jest.mock("../utils/parameterNormalizer");
 
  // Setting detailed return values for each mock (lengthy)
  beforeEach(() => {
    textAnalysisService.analyzeText.mockReturnValue({
      sentiment: 0.8,
      keywords: ["technology", "innovation", "market"],
      // Many more properties...
    });
    // Similarly detailed settings for other mocks...
  });
 
  test("correctly calculates parameters from text analysis results", () => {
    // Test code spanning over 30 lines
  });
 
  // More than 10 additional test cases
});

The problems with this test code were obvious:

All dependencies are mocked, making it impossible to verify if the actual processing works correctly
The mock setup is too complex, making the test code longer than the implementation
There are too many test cases, taking too much time to review

This defeats the simple purpose of prototype development: "confirming that basic functionality works." The AI was being too faithful to the principles of "good testing," generating excessive tests inappropriate for the prototype stage.

As a solution to this problem, I developed the following prompt pattern:

Please create Jest tests for the following function
calculateEvaluationParameters(text)

Test conditions:
1. Keep mocks to the absolute minimum (only external APIs)
2. Test only 3 cases (normal input, empty input, long text input)
3. Keep each test within 5 lines
4. Write as integration tests (use actual processing as much as possible)

By setting these specific constraints, the AI began generating simpler and more practical tests like this:

// Improved test code
describe("calculateEvaluationParameters", () => {
  // Only mock external APIs
  jest.mock("../api/externalApi");
 
  test("calculates appropriate parameters from normal input text", async () => {
    const result = await calculateEvaluationParameters(
      "Technology innovation affecting the market"
    );
    expect(result).toHaveProperty("technicalStrength");
    expect(result.technicalStrength).toBeGreaterThan(0);
  });
 
  test("returns default values for empty input", async () => {
    const result = await calculateEvaluationParameters("");
    expect(result).toEqual(DEFAULT_PARAMETERS);
  });
 
  test("properly calculates parameters even with long text input", async () => {
    const longText = "Long text...".repeat(100);
    const result = await calculateEvaluationParameters(longText);
    expect(result).toHaveProperty("marketPotential");
  });
});

This test focuses on the essential purpose of prototype development: "confirming basic functionality." It executes quickly and is easy to review.

The lesson learned with Jest is that it's important to give specific constraints to AI, which enables it to generate practical test code efficiently.

Discovery with React Testing Library: The Importance of Selectors

When testing components with React Testing Library (RTL), the biggest challenge was how the AI selects UI elements.

For example, when I asked the AI to test a component that displays evaluation results, here's what it initially generated:

// AI's initial code (problematic)
test("evaluation parameters are displayed correctly", () => {
  render(<EvaluationResults parameters={testParameters} />);
 
  // Selectors dependent on CSS classes
  const items = screen.getAllByTestId(/parameter-item-\d+/);
  expect(items).toHaveLength(5);
 
  // Dependent on DOM tree structure
  expect(items[0].querySelector(".parameter-name").textContent).toBe(
    "Technical Strength"
  );
  expect(items[0].querySelector(".parameter-value").textContent).toBe("4.2");
});

This test code had serious issues:

It depends on implementation details (DOM tree structure and CSS classes)
Low maintainability (tests fail if the UI changes slightly)
Contradicts RTL's philosophy of "testing from the user's perspective"

This is a common trap for developers unfamiliar with RTL, and the AI fell into the same error. To resolve this, I developed a prompt that explicitly explains RTL's philosophy to the AI:

Please write tests for the EvaluationResults component using React Testing Library.

Follow these principles:
1. Query by content that users actually see (text, labels, etc.)
2. Prioritize user-centric queries like getByText and getByRole
3. Use test IDs (data-testid) only as a last resort
4. Never use DOM manipulation like querySelector
5. Test the displayed results, not the internal implementation of the component

With this guidance, the AI began generating high-quality test code like this:

// Improved test code
test("evaluation parameters are displayed correctly", () => {
  render(<EvaluationResults parameters={testParameters} />);
 
  // Testing with text that users see
  expect(screen.getByText("Technical Strength")).toBeInTheDocument();
  expect(screen.getByText("4.2")).toBeInTheDocument();
  expect(screen.getByText("Market Potential")).toBeInTheDocument();
  expect(screen.getByText("3.8")).toBeInTheDocument();
});
 
test("warning icons are displayed for low confidence parameters", () => {
  render(<EvaluationResults parameters={testParameters} />);
 
  // Testing icons from an accessibility perspective
  const warningIcon = screen.getByRole("img", { name: /warning/i });
  expect(warningIcon).toBeInTheDocument();
  // Confirming the icon is in the right place
  const lowConfidenceParam = screen.getByText("Market Size");
  expect(lowConfidenceParam.parentNode).toContainElement(warningIcon);
});

Tests aligned with RTL's philosophy don't depend on UI implementation details and verify that components function correctly from the user's perspective. Surprisingly, when explicitly taught RTL's philosophy, the AI can generate test code comparable to experienced human developers.

Like unit tests, RTL tests run quickly and work well with the 15-minute cycle development. The approach of testing from the user's perspective also aligns with the important "basic functionality verification" in prototype development.

Struggles with Playwright: The Barrier of Relative Execution Time

Next, I'll share my experience with E2E testing using Playwright. The most significant discovery here was the "problem of relative execution time."

To be clear, Playwright is an excellent testing tool. Compared to traditional manual testing, it's incredibly efficient, and E2E tests in small projects can complete in just a few minutes. In my previous development work, I never considered Playwright's test execution time to be an issue.

However, I realized that in the context of 15-minute cycle AI collaborative development, this "few minutes" execution time becomes a significant burden. Here's a specific example:

To test the basic flow from login to result display in the evaluation system, I asked the AI to create a Playwright test. The AI generated the following code:

// Playwright test generated by AI
test("user can log in to the evaluation system and view results", async ({
  page,
}) => {
  // Access the login page
  await page.goto("http://localhost:3000/login");
 
  // Enter login information
  await page.fill('input[name="email"]', "test@example.com");
  await page.fill('input[name="password"]', "password123");
  await page.click('button[type="submit"]');
 
  // Confirm navigation to dashboard
  await expect(page.locator('h1:has-text("Dashboard")')).toBeVisible();
 
  // Navigate to evaluation page
  await page.click("text=Evaluation List");
 
  // Wait for evaluation items to display
  await page.waitForSelector(".evaluation-item");
 
  // Tests for various error cases (input validation, permission errors, etc.)
  // ...10+ cases continue
 
  // Responsive design tests (screen size changes, etc.)
  // ...several more cases continue
});

The problems with this test were:

Too much pursuit of coverage (numerous error cases and responsive tests)
Too many scenarios crammed into a single test file
Resulting in long execution time (about 3-4 minutes)

In traditional development cycles (several hours), 3-4 minutes of test execution time was just a short wait, like going to get coffee. However, in 15-minute cycle AI collaborative development, these 3-4 minutes represent 20-25% of the total work time—a significant delay.

To address this issue, I developed the following prompt pattern:

Please create a Playwright test with the following conditions:

1. Test only the basic flow "Login → Display evaluation list → Check details"
2. Include no error cases (test only the happy path)
3. Describe only one test scenario per test file
4. Minimize waiting time such as waitForSelector
5. Aim for test execution time under 30 seconds

By setting clear constraints, the AI generated simpler tests with shorter execution times:

// Improved Playwright test
test("basic evaluation viewing flow", async ({ page }) => {
  // Access the login page
  await page.goto("http://localhost:3000/login");
 
  // Enter and submit login information
  await page.fill('input[name="email"]', "test@example.com");
  await page.fill('input[name="password"]', "password123");
  await page.click('button[type="submit"]');
 
  // Navigate to evaluation list page
  await page.click("text=Evaluation List");
 
  // Click on the first evaluation item
  await page.click(".evaluation-item:first-child");
 
  // Confirm that evaluation parameters are displayed on the details page
  await expect(page.locator("text=Technical Strength")).toBeVisible();
  await expect(page.locator("text=Market Potential")).toBeVisible();
});

This test focuses only on necessary function verification and reduced execution time to about 30 seconds.

I also refined the timing of test execution. Specifically:

Run only Jest and RTL unit/component tests during development
Run Playwright E2E tests when a series of functionality implementations is complete
Automatically run E2E tests in the CI/CD pipeline

This strategy allowed me to maintain the 15-minute development rhythm while still running important E2E tests periodically.

The lesson from this experience is that the value assessment of testing tools is not absolute but determined by the relative relationship with the development cycle. Test execution time that wasn't an issue in traditional development can become a significant challenge in AI collaborative development's ultra-short cycle.

Test Execution Strategy to Maintain Development Rhythm

Let's quantitatively examine the impact of test execution time in 15-minute cycle development. The difference in the relative impact of test execution time between traditional development and AI collaborative development can be illustrated as follows:

In traditional development, 3 minutes of test execution is only 1.6% of the total, but in AI collaborative development, the same 3 minutes accounts for 20% of the total. This is a proportion that cannot be ignored.

To address this challenge, I adopted the following strategy:

1. Test Hierarchization and Optimizing Execution Frequency

I hierarchized tests based on execution time and adjusted execution frequency:

1. Ultra-fast tests (<100ms): Unit tests, simple component tests
   → Run on every change

2. Fast tests (<1s): Complex component tests, simple integration tests
   → Run when feature implementation is complete

3. Medium-speed tests (<3s): Integration tests with DB connections
   → Run when creating a Pull Request

4. Slow tests (>3s): E2E tests, performance tests
   → Run only in CI pipeline

This hierarchization allowed me to maintain development rhythm by running only ultra-fast tests during development.

2. Optimizing Test Execution Scripts

I also optimized npm scripts as follows:

{
  "scripts": {
    "test:ultra-fast": "jest --findRelatedTests",
    "test:fast": "jest --testPathIgnorePatterns=integration,e2e",
    "test:integration": "jest integration",
    "test:e2e": "playwright test",
    "test:ci": "npm run test:fast && npm run test:integration && npm run test:e2e"
  }
}

The --findRelatedTests flag was particularly effective as it runs only tests related to current changes.

3. Optimizing Parallel Test Execution

I also adjusted Jest's parallel execution settings:

// jest.config.js
module.exports = {
  projects: [
    {
      displayName: "unit",
      testMatch: ["**/__tests__/unit/**/*.test.js"],
      maxWorkers: "50%", // Use half of CPU cores
    },
    {
      displayName: "components",
      testMatch: ["**/__tests__/components/**/*.test.js"],
      maxWorkers: "50%",
    },
    {
      displayName: "integration",
      testMatch: ["**/__tests__/integration/**/*.test.js"],
      maxWorkers: 2, // Limit simultaneous execution for integration tests
    },
  ],
};

These refinements helped minimize interruptions to development rhythm due to test execution.

"Test Planning First Approach" - Considering Test Content in Advance

Another major discovery in AI collaborative development was the importance of establishing a test plan before writing test code. In collaborating with AI, it proved highly effective to first clarify "what should be tested" and then move on to implementation.

The specific flow is as follows:

Let me share an example of this "test planning first approach." When creating tests for the evaluation results display component, I first asked the AI to create a test plan:

Please create a test plan for the "Evaluation Results display component (EvaluationResults)".
Just list the tool selection and basic test cases,
without including actual code.

The AI proposed the following plan:

Test Plan for EvaluationResults Component

Component Tests with RTL

All parameter names and values are displayed correctly
Warning icons are displayed for low confidence parameters
Clicking on parameters displays detailed information
Sort functionality works correctly

Logic Tests with Jest

Parameter sorting function works correctly
Filtering function by confidence level works correctly

Integration Tests with Playwright

Basic flow from evaluation page to detailed display and PDF export

When I saw this plan, I decided some adjustments were needed. In the prototype stage, detailed display and PDF export were still low priority and I wanted to exclude them from testing. So I adjusted the plan as follows:

Please make the following adjustments:

1. We will not implement Playwright tests at this time
2. The third RTL test (click for detailed display) is low priority, so we'll omit it
3. Please add a test case for "appropriate message is displayed when there is no data"

The AI presented the adjusted plan, and after confirming it, I requested implementation:

Based on the adjusted test plan, please implement the test code using RTL and Jest.
Import test data from the existing __tests__/fixtures/evaluationData.js.

The advantages of this approach include:

Pre-adjustment of test scope and granularity: Aligning direction before implementation
Review efficiency: Plans are easier to review than code
Clear guidance for AI: Setting appropriate constraints in advance

Since test code tends to be more complex than implementation code, this "plan-first" approach led to significant time savings. In AI collaborative development, it's more efficient to spend time reviewing test plans rather than test code.

Specific Methods to Maintain Test Data Consistency

Another challenge in AI collaborative development was "maintaining test data consistency." The AI tended to generate unique test data for each test file, which led to decreased maintainability.

For example, in the initial project, the AI generated slightly different test data for each test:

// Data in component test
test("parameters are displayed", () => {
  const testData = [
    { name: "Technical Strength", value: 4.2, confidence: 0.8 },
    { name: "Team Capability", value: 3.8, confidence: 0.7 },
  ];
  render(<EvaluationResults parameters={testData} />);
  // ...
});
 
// Similar data in another file
test("parameters are calculated", () => {
  const params = [
    { name: "Technical Strength", value: 4.0, confidence: 0.75 },
    { name: "Team Capability", value: 3.5, confidence: 0.65 },
  ];
  // ...
});

Using slightly different data made debugging test failures difficult and complicated the modification work when data structures changed.

To solve this problem, I adopted a "test data catalog" approach. This centralizes all data used in tests in one place and shares it between tests.

The specific implementation looks like this:

// __tests__/fixtures/evaluationData.js
export const standardParameters = [
  { name: "Technical Strength", value: 4.2, confidence: 0.8 },
  { name: "Team Capability", value: 3.8, confidence: 0.7 },
  { name: "Market Potential", value: 4.5, confidence: 0.9 },
];
 
export const lowConfidenceParameters = [
  { name: "Uniqueness", value: 2.8, confidence: 0.3 },
];
 
export const noParameters = [];
 
// Mock API response
export const mockApiResponse = {
  success: true,
  data: {
    parameters: standardParameters,
    evaluationId: "eval-123",
  },
};

I then explicitly instructed the AI:

When writing tests, always import data from __tests__/fixtures/evaluationData.js.
Do not define your own test data.

I further developed this approach to share test logic with "test presets":

// __tests__/presets/evaluationTests.js
import { render, screen } from "@testing-library/react";
import {
  standardParameters,
  lowConfidenceParameters,
} from "../fixtures/evaluationData";
 
// Reusable test functions
export function testParameterDisplay(Component) {
  test("all parameters are displayed correctly", () => {
    render(<Component parameters={standardParameters} />);
 
    standardParameters.forEach((param) => {
      expect(screen.getByText(param.name)).toBeInTheDocument();
      expect(screen.getByText(param.value.toString())).toBeInTheDocument();
    });
  });
 
  test("low confidence parameters have warning display", () => {
    const testParams = [...standardParameters, ...lowConfidenceParameters];
    render(<Component parameters={testParams} />);
 
    // Verify warning icon
    const warningIcon = screen.getByRole("img", { name: /warning/i });
    expect(warningIcon).toBeInTheDocument();
 
    // Verify warning icon is near the low confidence parameter
    const lowConfParam = screen.getByText(lowConfidenceParameters[0].name);
    expect(lowConfParam.parentNode).toContainElement(warningIcon);
  });
}

Calling this from test files significantly reduced duplication of test code:

// src/components/EvaluationDisplay/EvaluationDisplay.test.js
import { render, screen } from "@testing-library/react";
import { testParameterDisplay } from "../../../__tests__/presets/evaluationTests";
import EvaluationDisplay from "./EvaluationDisplay";
 
describe("EvaluationDisplay", () => {
  // Run common tests
  testParameterDisplay(EvaluationDisplay);
 
  // Add tests specific to this component
  test("appropriate message is displayed when there is no data", () => {
    render(<EvaluationDisplay parameters={[]} />);
    expect(
      screen.getByText("No evaluation data available")
    ).toBeInTheDocument();
  });
});

The benefits of this approach are:

Ensuring test data consistency: Using the same data across all tests
Eliminating test duplication: Reusing common test logic
Improving maintainability: When data structures change, modifications are needed in only one place

This approach is now standard in my current projects, and the AI has come to understand this structure well and implement tests efficiently.

Optimal Workflow for Mastering Three Libraries

After a month of trial and error, I established a workflow that optimally combines the three testing libraries. This is a specialized approach for AI collaboration, different from traditional development.

Role Division Between Humans and AI

First, I clarified the role division between humans and AI in test creation:

In this role division, it works best when humans handle the "what" and "why" of test strategy, while AI handles the "how." In particular, this approach where humans adjust test granularity and AI handles specific implementation proved efficient.

Clear Criteria for Test Library Selection

The optimal use for each testing library also became clear. Here are the selection criteria I adopted:

Jest (Unit Testing)
- Optimal uses:
  - Utility functions
  - Data transformation logic
  - Complex calculation processes
- Test execution time: A few milliseconds to a few hundred milliseconds
- Guidance for AI: Minimize mock usage and explicitly limit the number of test cases
React Testing Library (Component Testing)
- Optimal uses:
  - UI component display verification
  - User operations like form input and clicks
  - State change verification
- Test execution time: A few hundred milliseconds to 1 second
- Guidance for AI: Thoroughly use user-perspective queries and focus on testing displayed content
Playwright (E2E Testing)
- Optimal uses:
  - User flows spanning multiple screens
  - Integration with backend
  - Behaviors that can only be verified in actual browser environments
- Test execution time: A few seconds to a few minutes
- Guidance for AI: Limit to minimal scenarios and test only happy paths

It's important to note that these selections are optimal solutions in the context of 15-minute cycle development and might lead to different judgments in traditional development cycles.

What I particularly experienced was the significant difference in meaning that "test execution time" has between traditional development and AI collaborative development. While traditional development could prioritize "test coverage," "review efficiency" and "execution time" become overwhelmingly important in AI collaborative development.

Optimal Test Execution Strategy

Finally, let me introduce the test execution strategy I adopted. This is optimized according to development phase and scale of changes:

// package.json
{
  "scripts": {
    // Run only tests related to changed files (within seconds)
    "test:related": "jest --findRelatedTests $(git diff --name-only)",
 
    // Run only unit and component tests (within tens of seconds)
    "test:fast": "jest --testPathIgnorePatterns=integration,e2e",
 
    // Run integration tests (about 1 minute)
    "test:integration": "jest integration",
 
    // Run E2E tests (several minutes)
    "test:e2e": "playwright test",
 
    // Run all tests (for CI/CD)
    "test:ci": "npm run test:fast && npm run test:integration && npm run test:e2e"
  }
}

The differentiation of this strategy is as follows:

Small-scale changes (about 1 file): Run only related tests with test:related
Medium-scale changes (multiple files): Run unit and component tests with test:fast
Large-scale changes (entire feature): Run test:fast and test:integration
Pre-release verification: Run all tests with test:ci

Most importantly, minimize interruptions due to test execution during 15-minute cycle development. Therefore, I mainly used test:related and test:fast during development, leaving E2E tests to the CI pipeline.

Differences in Test Strategies Between Prototypes and Mature Products

I want to emphasize an important point here. The approach I've described is specialized for the prototype development phase and requires different testing strategies for mature products.

To understand this difference, let's illustrate the evolutionary stages of projects and the testing strategies suited to them:

Test purposes in prototype development:

Verification that core functions work as expected
Supporting rapid hypothesis testing and pivoting
Maximizing development speed

Test purposes in mature stage:

Comprehensive quality assurance
Prevention of regression (functional degradation)
Ensuring long-term maintainability

My one-month experience was primarily in the prototype development stage, focusing on finding optimal solutions there. As the product matures, the testing strategy needs to evolve.

Specifically, as the project matures:

Test coverage becomes more important
The ratio of unit tests gradually increases
Tests for edge cases and error conditions are added

In AI collaborative development as well, it's important to adjust instructions to the AI according to the project stage. While "speed priority, test only basic functions" is appropriate instruction for the prototype stage, it should change to "coverage priority, test including edge cases" in the mature stage.

The True Lesson: Appropriate Constraints Maximize AI Productivity

The greatest lesson from this month of experience is that by giving appropriate constraints to AI, you can maximize its productivity.

Initially, I gave too much freedom when asking the AI to create tests. Vague instructions like "write good tests." As a result, the AI pursued perfection and generated overly complex test code with long execution times.

However, by setting specific constraints like these, the AI began generating more practical tests:

Limiting the number of test cases: "Please test only 3 cases"
Clarifying test focus: "Please test only happy paths"
Constraining execution time: "Please ensure test execution time is within 30 seconds"
Limiting code size: "Please keep each test within 5 lines"

These constraints have the effect of narrowing AI's infinite possibilities to a useful range. I call this the "principle of productive constraints."

This principle is similar to design systems. Just as a consistent, limited set of components produces better UIs than infinite choices, giving appropriate constraints to AI yields higher quality outputs.

Summary: Optimal Testing Strategy in 15-minute Cycle Development

Let me summarize the core lessons from my one-month experience with differentiating testing tools in the context of 15-minute cycle AI-driven development:

Relative importance changes in 15-minute cycles: Test execution time that wasn't an issue traditionally can become a major bottleneck in AI collaborative development
Test planning first approach is efficient: Planning tests and agreeing with AI before implementation can save significant time
Ensure test data consistency: Creating a test data catalog and sharing across all tests improves maintainability
Differentiate each tool based on its characteristics:
- Jest: For unit testing of logic
- RTL: For testing UI components
- Playwright: Only for important E2E flows, run at CI stage
Strategy adjustment needed based on development stage: Testing strategies differ significantly between prototypes and mature products
Give specific constraints to AI: Too much freedom leads AI to generate excessive tests

Through this month's experience, I learned that testing strategies in AI collaborative development differ significantly from traditional wisdom. The old adage that "more tests are better" doesn't necessarily apply in 15-minute cycle development. Rather, "appropriate quantity and quality of tests" becomes the new standard.

Testing strategies vary greatly depending on project context, and in the new context of AI collaborative development, factors like test execution time and review efficiency become particularly important. Understanding this and selecting appropriate tools and methods will allow you to unleash the true power of AI collaborative development.

Finally, the strategies introduced here are based on my limited experience and might require different approaches for your project. What's important is to flexibly adjust strategies to fit your project's context. Remember that testing is not an end but a means, and its purpose is always "creating better software more efficiently."

Jest

Playwright

React Testing Library

Next.js

Test Driven Development

AI Collaborative Development

Frontend Testing