I've Come to See That Process Evaluation Is Critical When Considering Quality Assessment
The Limitations of Black Boxes
Currently, I'm interested in and working on quality assurance for written content, and I'd like to share the insights I've gained. While this includes discussion of quality assurance for text, it extends beyond that to an important perspective for process design in the AI era.
With AI evolution, automated evaluation methods like "LLM as a Judge" have become practical. However, the deeper I delve into this approach, the more I face the reality that evaluating only the deliverables cannot solve fundamental problems, no matter how advanced AI becomes. This is the main theme of this article.
When trying to evaluate text with AI, there are always limitations in quality judgment precisely because the generation process is an invisible black box. This is the fundamental limitation of black-box testing in software testing. Evaluating deliverables from the outside alone cannot detect essential problems lurking in the internal structure and generation process.
The Struggles of the Invisible Evaluator
The most challenging aspect of text evaluation is the constraints that always accompany the position of the evaluator as a third party. The evaluator is an "outsider" to the text generation process and is hindered by context-dependent barriers.
Although in a slightly different situation, I strongly felt this barrier during my time serving as an outside director for a listed company. The position of an outside director is a typical example of a "third-party evaluator." In board meetings, only the results of decisions are presented, while the underlying review processes and decision-making materials are difficult to see.
It's challenging to properly evaluate the quality of a decision without seeing the process of "why that decision was reached." This has exactly the same structure when having AI evaluate text.
Breaking Free from the Captivity of Deliverables
Why do we try to evaluate based only on deliverables? Simply because deliverables exist in a visible form. Processes are invisible and elusive, unfolding along a timeline. They are rarely recorded and difficult to evaluate.
However, deliverables contain, in an inseparable form, the thought process of their creation, how they were verified, and how materials were gathered. Unless these elements are clearly decomposed and verified, true quality assurance is not possible.
The Essence of Process Evaluation Discovered Through Experience as an Outside Director
My experience as an outside director provided an opportunity to deeply consider the unique value and limitations that a third-party perspective brings. While there are things that can only be said from a third-party standpoint, there is also the dilemma of lacking specific internal information due to being external.
The Challenge of Decision-Making with Limited Information
There were times when I could not evaluate the validity of judgments on board meeting agenda items based solely on the submitted materials or deliverables. That's when I focused on the decision-making process. Understanding why a certain judgment was reached can be more easily evaluated from an outside perspective when the process is abstracted.
Of course, this required information gathering to understand the decision-making process and communication to understand the mental models of the decision-makers. While this incurred costs in the short term, once it reached a certain level, I could detect the judgment basis and potential biases behind decisions even from a third-party position.
Capturing the Essence by Separating Judgment Materials and Criteria
The greatest insight gained from my experience as an outside director is the importance of clearly separating judgment materials and judgment criteria. Internal people have abundant judgment materials and tacit judgment criteria. In contrast, external people have limited judgment materials but can bring in fresh perspective judgment criteria.
By separating these two factors, the transparency of the decision-making process increases, making evaluation from a third-party perspective possible. It's often said that "even with good judgment ability, you can't make a judgment without judgment materials," and this is absolutely true. To make judgments, you need to appropriately separate the judgment materials.
Shifting from Black Box to White Box
From this experience, I keenly felt the need to shift from deliverable evaluation to process evaluation. While this is a matter of course in the world of software testing, black-box evaluation of deliverables is still overwhelmingly mainstream in areas such as management decisions and text evaluation.
The example from my graduate school days of involvement in pharmaceutical data science is also insightful. The stance of the PMDA (Pharmaceuticals and Medical Devices Agency, Japan's equivalent to the FDA) in emphasizing quality assurance throughout the entire process123, even when AI-based judgments are included in drug approval reviews, has greatly influenced my current thinking. The concept of ensuring overall quality while accepting probabilistic errors will remain fundamentally important no matter how much AI evolves.
Competitive Advantage Through Process Innovation
The modern era has become a saturated time where almost everything desired is obtainable. The scarcity of things has decreased, making differentiation difficult. In such a situation, process innovation can become the next source of competitive advantage.
Sustainable Value Created by Process Design
The role division of "people who create value, people who convert value into money" is an easy-to-understand example, but going further, I believe that value seeps out from processes. Enhancing process quality is the essential approach to increasing value in the long term.
Superficial quality improvements in deliverables may be effective in the short term but are unlikely to lead to long-term competitive advantage because the deliverables themselves can be imitated. In contrast, excellent processes are linked to organizational culture and tacit knowledge, making them difficult to imitate.
The Importance of Long-Term Perspective Beyond Short-Term Pain
As an entrepreneur, I've also experienced the difficulty and loneliness of working on process innovation. This transformation often involves short-term performance degradation. It's also challenging to explain its value to outsiders. When asked "what are you doing," it's hard to be understood without talking in terms of deliverables.
However, accepting this short-term pain and working with a long-term perspective leads to sustainable competitive advantage. Process innovation works on the deep layers of an organization, and while it takes time for its effects to appear, once it takes root, it becomes a powerful differentiating factor.
Building a Chain-Reactive and Cyclical Value Creation System
The true value of process innovation lies in building a chain-reactive and cyclical value creation system. What's important is not a one-time improvement but creating a mechanism that continuously generates value.
In talent development at Kikagaku, the company I founded, we achieved sustainable growth by reinvesting initial profits into human resource development and creating a cycle where developed talent nurtures the next generation. This too can be considered a type of process innovation.
The "Roomba-ble" World: Creating Paths for AI to Travel Easily
I recently heard the term "Roomba-ble" and thought it was spot on. It's the concept of reducing obstacles and preparing an environment where the cleaning robot Roomba can move efficiently. This concept provides important insights for organizational and process design in the AI era, in the context of "AI-compatible process design" in the IT industry.
The Importance of AI-Friendly Process Design
The most important aspect in future process evaluation is the perspective of creating pathways where AI can easily travel. For example, I realized that storing data in Notion, which is recently being used mainly by startups, may look Markdown-like at first glance, but it actually has a complex structure with rich editor specifications, which is not necessarily AI-friendly.
In the AI era, designing AI-friendly, AI-enabled, and AI-ready processes and human resources who can create highways that AI can smoothly travel will be essential. In digitally implemented operations, AI is overwhelmingly faster than humans, making humans likely to become bottlenecks.
The Human Role as a Bridge Between Analog and Digital
What is required of humans in the AI era is the ability to serve as a bridge between analog and digital. It's important to structure information in a way that is easily processable by AI and to design processes that optimize collaboration between AI and humans.
This is precisely process innovation itself, and while it may involve pain in the short term, it is an effort that leads to long-term competitive advantage.
Process Transformation to Eliminate AI Bottlenecks
By transforming not only the evaluation of deliverables but also making the process itself AI-friendly, we can maximize AI capabilities. This is also true for text evaluation, where more advanced quality assurance becomes possible by redesigning not only the text being evaluated but also its generation process.
By identifying points where AI can provide support at each stage of the document creation process and clarifying the optimal role division between humans and AI, the efficiency and quality of the entire process can be enhanced.
Opening the Future Through Process Evaluation
The consideration that began with the entry point of text evaluation has evolved through the realization of the importance of process evaluation to a future-oriented perspective on process design in the AI era. Finally, I'd like to consider the practice of this process evaluation and prospects for the future.
Practicing Process Evaluation Using a Third-Party Perspective
To practice process evaluation, it's important first to consciously incorporate a third-party perspective. Develop the ability to objectively view your own efforts and cultivate the habit of always being aware of how they appear from the outside.
What we recognize as "obvious" while working may not be obvious from a third-party perspective. The essence of the process becomes visible by consciously visualizing tacit knowledge such as context and biases.
The Art of Designing Highways for AI to Travel
The key points in process design for the AI era are structuring and standardizing information. How to structure unstructured information and how to standardize diverse formats are the keys.
Appropriate role division between humans and AI is also important. By discerning what AI excels at and what humans are good at, and designing mutually complementary processes, overall optimization can be achieved.
Future Brought by Process Evaluation - Questions and Prospects
The realization that process should be emphasized in text evaluation came, surprisingly, from the completely different experience of being an outside director requiring third-party objectivity. And what emerged from reviewing processes was a new value creation paradigm for the AI era.
Finally, I'd like to ask: Is your organization or initiative creating paths that AI can easily travel? Process innovation will be the true source of competitive advantage in the coming era. The perspective of evaluating and improving not just deliverables but the processes leading to them will be the key to opening up the future.
References
Footnotes
-
Guidance for Appropriate and Prompt Approval and Development of Medical Device Programs Based on Their Characteristics (May 29, 2023) - Chapter 1 "Positioning of Guidance" indicates requirements for quality assurance throughout the entire lifecycle of medical devices (design → manufacturing → change management → post-marketing surveillance). ↩
-
Report on Medical Device Programs Utilizing AI (August 28, 2023) - Chapter 3 "Implementation of Risk Management" specifically explains the consistent quality management process from hazard identification to risk assessment, risk control, and post-marketing monitoring for AI/ML-based medical devices. ↩
-
Challenges and Recommendations for AI-based Medical Diagnostic Systems and Medical Devices 2017 (December 27, 2017) - Section 2 "Regulatory Science Perspective" organizes the framework of the lifecycle approach to risk management based on ISO 14971 (pre-development evaluation, approval review, post-marketing change management). ↩