Technology

Implementing Uncertainty Modeling with Bayesian Updates in LLM Applications

Practical Methods for Probabilistic Parameter Implementation and Information Maximization

2025-03-30
24 min
LLM
Probabilistic Models
Bayesian Updates
AI Technology
Mathematics & Algorithms
Decision Science
Next.js
Ryosuke Yoshizaki

Ryosuke Yoshizaki

CEO, Wadan Inc. / Founder of KIKAGAKU Inc.

Implementing Uncertainty Modeling with Bayesian Updates in LLM Applications

Introduction: The Challenge of Investment Interview Systems

I was developing an AI system as the first prototype created through Vibe Coding collaboration, designed to help investors evaluate entrepreneurs. This system analyzes entrepreneurs' PDF materials and then deepens the evaluation through dialogue.

The most challenging aspect of this project was "having AI make subjective evaluations." When we ask AI, "How would you score this entrepreneur's technical skills?", what are we actually expecting? Can we even quantify an abstract concept like "technical skills"? And if we can, how reliable would that evaluation be?

To find answers to these questions, I went through many trials and errors. What I eventually arrived at was a mechanism for uncertainty modeling and updates based on Bayesian statistics. In this article, I want to share the challenges I faced and the solutions I found.

Subjective evaluation by AI might seem contradictory at first, but many systems are already doing this every day. For example, systems that estimate satisfaction from restaurant reviews or evaluate customer support quality. However, performing these evaluations with high reliability is more difficult than you might imagine.

Challenges in Investment Evaluation Systems

In my system, I needed to evaluate entrepreneurs on criteria such as:

  • Market Analysis: Understanding of market size, growth rate, and competitive landscape
  • Technical Skills: Technical advantages, feasibility, and innovation
  • Team Composition: Team diversity, experience, and expertise
  • Business Strategy: Business model, profitability, and scalability
  • Track Record & Scalability: Past achievements and future growth potential

All of these include "subjective" elements. For instance, "technical skills" might be evaluated on different standards by Silicon Valley investors versus Japanese investors. Even the same investor might apply different levels of strictness depending on the field, whether it's AI or biotechnology.

More importantly, there's the question of how to express uncertainty in evaluations. There's a significant difference between "technical skills are 3 points, but I'm not confident" and "technical skills are definitely 3 points."

Another difficult problem was how to update evaluations when new information is obtained. If we evaluate "technical skills as 3 points" based on a PDF, how should we reflect additional information gained through Q&A?

I designed my AI evaluation system while facing these challenges.

Introducing Rubrics: Reducing Variability in Subjective Evaluations

The first thing I tackled was standardizing evaluation criteria. Rubrics proved to be very effective for this.

A rubric is a tool that sets specific criteria for each evaluation item and assesses achievement levels against those criteria. I learned about this concept when reading an educational theory book. In education, grading students is important, so there's a need to clearly define how many points to assign to each evaluation item.

In my case, I adopted Bloom's Revised Taxonomy as my evaluation criteria. This classifies depth of knowledge into 6 levels:

  1. Remember: Able to reproduce basic information (1 point)
  2. Understand: Can accurately explain the meaning of concepts (2 points)
  3. Apply: Can effectively use learned knowledge in practical situations (3 points)
  4. Analyze: Can break down information and clearly understand causal relationships (4 points)
  5. Evaluate: Can make logical judgments and determine validity (5 points)
  6. Create: Can produce new value or unique solutions (6 points)

Based on these 6 levels, I decided to assign values from 1.0 to 6.0 to each evaluation axis. For example, if "technical skills" were 4.5 points, it would mean the technical understanding and judgment were somewhere between "analyze" and "evaluate."

Using rubrics made my instructions to the AI clearer and significantly reduced the variability in evaluations. Instead of saying "please evaluate technical skills," I could instruct "please evaluate technical skills on a scale of 1.0 to 6.0 based on Bloom's Revised Taxonomy."

Considering Methods for Updating Evaluations

When having AI evaluate something, we need to update the evaluation not just once but each time new information comes in. For example, when evaluating an entrepreneur's technical skills, we might first evaluate based on presentation materials, then update our evaluation with information gained through Q&A.

So how should we update these evaluations?

Problems with Simple Update Methods

The first method that came to mind was a simple average. If the previous evaluation was 5 points and a new evaluation based on new information is 4 points, we update to (5+4)/2=4.5 points. This is simple, but there's a problem.

For instance, if after 100 interactions the evaluation of "technical skills is 5 points" has been established, it seems unnatural to give equal weight to a 4-point evaluation from just one new interaction.

To address this problem, I considered a weighted average:

new evaluation=w1×previous evaluation+w2×new evaluationw1+w2\text{new evaluation} = \frac{w_1 \times \text{previous evaluation} + w_2 \times \text{new evaluation}}{w_1 + w_2}

For example, if there have been 100 previous interactions and one new interaction:

new evaluation=100×5+1×4100+1=5041014.99\text{new evaluation} = \frac{100 \times 5 + 1 \times 4}{100 + 1} = \frac{504}{101} \approx 4.99

This way, evaluations accumulated over a long time won't change dramatically with one piece of new information. However, there's still a problem with this method: not all information has the same reliability.

For example, a confident statement like "this company's technology is truly innovative" and a vague statement like "the technology seems good, though I don't understand the details" shouldn't be weighted the same.

Furthermore, there's the question of how to express uncertainty in evaluations. There's a big difference between "5 points with confidence" and "probably around 5 points."

Moving Toward Updates That Consider Uncertainty

From these considerations, I decided to give evaluations two elements:

  1. Evaluation value (e.g., 5 points)
  2. Uncertainty (e.g., standard deviation 0.5)

Lower uncertainty means more confidence in the evaluation. For example:

  • "5 points with confidence" → Evaluation value 5 points, standard deviation 0.2
  • "Probably around 5 points" → Evaluation value 5 points, standard deviation 1.0

So how should we update evaluations with these two elements? That's where I arrived at Bayesian updates.

Foundations of Bayesian Statistics

Interpretations of Probability: Frequentist vs. Subjective

In statistics, there are two major approaches to interpreting probability: the "frequentist" approach and the "Bayesian (subjective)" approach.

The frequentist approach defines probability as "the relative frequency when trials are repeated infinitely." For example, saying that the probability of a coin landing heads is 0.5 means that if you flip the coin infinitely, the proportion of heads will converge to 0.5.

On the other hand, the Bayesian approach interprets probability as "subjective degree of confidence in a proposition." In this interpretation, as data and evidence increase, this subjective degree of confidence (probability) gets updated.

The Bayesian approach felt very intuitive for my evaluation system. This is because investors have a "subjective degree of confidence" about entrepreneurs' abilities and update that confidence based on new information gained through materials and dialogue. In other words, Bayesian statistics models human thought processes closely.

Bayes' Theorem

At the center of Bayesianism is "Bayes' Theorem." This is a theorem about conditional probability, expressed by the following equation:

P(AB)=P(BA)×P(A)P(B)P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}

Where:

  • P(AB)P(A|B) is the conditional probability of A given B (posterior probability)
  • P(BA)P(B|A) is the conditional probability of B given A (likelihood)
  • P(A)P(A) is the unconditional probability of A (prior probability)
  • P(B)P(B) is the unconditional probability of B (marginal likelihood)

Interpreting this theorem in the context of Bayesian updates:

P(θx)=P(xθ)×P(θ)P(x)P(\theta|x) = \frac{P(x|\theta) \times P(\theta)}{P(x)}

Where θ\theta is an evaluation parameter (e.g., "technical skills"), and xx is a newly observed value (e.g., "technical skills 4.5 points").

In other words, the probability distribution of the parameter after obtaining a new observation (posterior distribution) equals the probability of obtaining that observation given the parameter (likelihood) multiplied by the probability distribution of the parameter before observation (prior distribution), divided by the marginal likelihood of the observation.

Prior and Posterior Distributions

In the Bayesian update framework, the concepts of "prior distribution" and "posterior distribution" are important.

The prior distribution is the probability distribution of the parameter before obtaining a new observation. This is set based on past experience, expert knowledge, or simple assumptions.

The posterior distribution is the probability distribution of the parameter after obtaining an observation. This is obtained by updating the prior distribution using Bayes' theorem.

In this framework, the posterior distribution obtained based on one data point becomes the prior distribution when observing the next data point. In other words, as data increases, the distribution is gradually updated.

図表を生成中...

This is the basic idea of Bayesian updates. Next, let's look at how to express and update uncertainty.

Normal Distributions and Bayesian Updates

Why Normal Distributions?

When modeling uncertainty, normal distributions (Gaussian distributions) are particularly useful for several reasons:

  1. Mathematical tractability: Normal distributions have additivity, meaning the sum of multiple normal distributions is also a normal distribution
  2. Central limit theorem: The sum of many random variables tends to form a normal distribution around the mean, regardless of their original distributions
  3. Information-theoretic properties: Under given mean and variance, the distribution that maximizes entropy (a measure of uncertainty) is the normal distribution
  4. Computational efficiency: When both the prior distribution and likelihood are normal distributions, the posterior distribution can be analytically derived

In my evaluation system, I decided to express each evaluation axis parameter using a normal distribution. Specifically, I represent parameter θi\theta_i as a pair of mean μi\mu_i and standard deviation σi\sigma_i:

θi=(μi,σi)\theta_i = (\mu_i, \sigma_i)

Here, μi\mu_i represents the estimated value of evaluation axis ii (expected value of the evaluation), and σi\sigma_i represents its uncertainty (inverse of confidence). A smaller σi\sigma_i means higher confidence in the evaluation.

Derivation of Bayesian Updates Between Normal Distributions

The greatest appeal of Bayesian updates using normal distributions is that if both the prior distribution and likelihood are normal distributions, the posterior distribution will also be a normal distribution. Moreover, its mean and variance can be analytically derived.

Let's mathematically derive this. Assume the prior distribution and likelihood are the following normal distributions:

  • Prior distribution: θN(μ,σ2)\theta \sim \mathcal{N}(\mu, \sigma^2)
  • Likelihood: xθN(θ,σx2)x|\theta \sim \mathcal{N}(\theta, \sigma_x^2)

Here, θ\theta is the true evaluation value, xx is the observed value, and σx2\sigma_x^2 is the uncertainty of the observation.

By Bayes' theorem:

P(θx)P(xθ)×P(θ)P(\theta|x) \propto P(x|\theta) \times P(\theta)

The probability density function of a normal distribution is:

f(xμ,σ2)=12πσ2exp((xμ)22σ2)f(x|\mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)

Therefore:

P(θx)exp((xθ)22σx2)×exp((θμ)22σ2)=exp([(xθ)22σx2+(θμ)22σ2])\begin{aligned} P(\theta|x) &\propto \exp\left(-\frac{(x-\theta)^2}{2\sigma_x^2}\right) \times \exp\left(-\frac{(\theta-\mu)^2}{2\sigma^2}\right) \\ &= \exp\left(-\left[\frac{(x-\theta)^2}{2\sigma_x^2} + \frac{(\theta-\mu)^2}{2\sigma^2}\right]\right) \end{aligned}

Rearranging this with respect to θ\theta (focusing on the quadratic terms in the exponent):

(xθ)22σx2+(θμ)22σ2=θ22θx+x22σx2+θ22θμ+μ22σ2=θ2(12σx2+12σ2)2θ(x2σx2+μ2σ2)+C\begin{aligned} \frac{(x-\theta)^2}{2\sigma_x^2} + \frac{(\theta-\mu)^2}{2\sigma^2} &= \frac{\theta^2 - 2\theta x + x^2}{2\sigma_x^2} + \frac{\theta^2 - 2\theta\mu + \mu^2}{2\sigma^2} \\ &= \theta^2\left(\frac{1}{2\sigma_x^2} + \frac{1}{2\sigma^2}\right) - 2\theta\left(\frac{x}{2\sigma_x^2} + \frac{\mu}{2\sigma^2}\right) + C \end{aligned}

Where CC is a term independent of θ\theta.

The exponential part of a normal distribution has the form (xμ)22σ2=x22μx+μ22σ2-\frac{(x-\mu)^2}{2\sigma^2} = -\frac{x^2 - 2\mu x + \mu^2}{2\sigma^2}. Comparing this with our equation:

1σposterior2=1σx2+1σ2μposteriorσposterior2=xσx2+μσ2\begin{aligned} \frac{1}{\sigma_{\text{posterior}}^2} &= \frac{1}{\sigma_x^2} + \frac{1}{\sigma^2} \\ \frac{\mu_{\text{posterior}}}{\sigma_{\text{posterior}}^2} &= \frac{x}{\sigma_x^2} + \frac{\mu}{\sigma^2} \end{aligned}

From these equations, we can derive the parameters of the posterior distribution:

σposterior2=(1σx2+1σ2)1=σx2σ2σx2+σ2μposterior=σposterior2(xσx2+μσ2)=σ2x+σx2μσx2+σ2\begin{aligned} \sigma_{\text{posterior}}^2 &= \left(\frac{1}{\sigma_x^2} + \frac{1}{\sigma^2}\right)^{-1} = \frac{\sigma_x^2 \sigma^2}{\sigma_x^2 + \sigma^2} \\ \mu_{\text{posterior}} &= \sigma_{\text{posterior}}^2 \left(\frac{x}{\sigma_x^2} + \frac{\mu}{\sigma^2}\right) = \frac{\sigma^2 x + \sigma_x^2 \mu}{\sigma_x^2 + \sigma^2} \end{aligned}

This is the general equation for Bayesian updates using normal distributions. Looking at this equation, you'll notice something interesting. The mean of the posterior distribution μposterior\mu_{\text{posterior}} is a weighted average of the mean of the prior distribution μ\mu and the observed value xx. Moreover, the weights are proportional to the precision parameters (inverse of variance) of each distribution.

In other words, information from sources with high uncertainty (large variance) is weighted less, while information from sources with high confidence (small variance) is weighted more. This makes intuitive sense.

Practical Bayesian Update Equations

When applying to an actual evaluation system, the uncertainty of the observation σx2\sigma_x^2 can be defined using the confidence of the observation cc (a value from 0 to 1):

σx2=1c\sigma_x^2 = \frac{1}{c}

The higher the confidence, the lower the uncertainty. For example, when confidence c=1.0c=1.0, σx2=1.0\sigma_x^2=1.0; when confidence c=0.5c=0.5, σx2=2.0\sigma_x^2=2.0.

Using this, the Bayesian update equations become:

μ(new)=σx2μ+σ2xσ2+σx2=μ/c+σ2x1/c+σ2\mu^{(new)} = \frac{\sigma_x^2 \mu + \sigma^2 x}{\sigma^2 + \sigma_x^2} = \frac{\mu/c + \sigma^2 x}{1/c + \sigma^2}
σ2(new)=σ2σx2σ2+σx2=σ2/cσ2+1/c\sigma^{2(new)} = \frac{\sigma^2 \sigma_x^2}{\sigma^2 + \sigma_x^2} = \frac{\sigma^2/c}{\sigma^2 + 1/c}

Here, μ\mu and σ2\sigma^2 are the parameters before the update, xx is the observed value, and cc is the confidence.

Characteristics of Bayesian Updates

This Bayesian update equation has several important characteristics:

  1. Higher confidence observations have greater influence: The higher the confidence cc, the more the observed value xx influences the updated mean μ(new)\mu^{(new)}

  2. Uncertainty always decreases: The updated variance σ2(new)\sigma^{2(new)} is always smaller than the prior variance σ2\sigma^2

  3. The higher the current uncertainty, the greater the influence of observations: The larger the current uncertainty σ2\sigma^2, the more the observed value xx influences the updated mean μ(new)\mu^{(new)}

These characteristics make intuitive sense. Information with higher confidence has a greater impact on the evaluation, and uncertainty decreases as information accumulates. Also, if you lack confidence in your current evaluation, you'll give more weight to new information.

Implementation Example: Evaluation System Using Bayesian Updates

Here, I'll introduce an implementation example of an evaluation system using Bayesian updates. The basic implementation is simple, but its applications are wide-ranging.

Basic Implementation of Bayesian Updates

First, let's implement the core part of Bayesian updates:

import math
 
def bayes_update(prior_mean, prior_std, observed_value, confidence):
    """
    Update parameters using Bayesian updates
 
    Args:
        prior_mean: Mean of the prior distribution
        prior_std: Standard deviation of the prior distribution
        observed_value: Observed value
        confidence: Confidence in the observation (0.0 to 1.0)
 
    Returns:
        (posterior_mean, posterior_std): Updated mean and standard deviation
    """
    # Calculate observation variance from confidence
    observed_var = 1.0 / max(confidence, 0.001)  # Prevent division by zero
 
    # Prior distribution variance
    prior_var = prior_std ** 2
 
    # Calculate posterior distribution variance
    posterior_var = (prior_var * observed_var) / (prior_var + observed_var)
    posterior_std = math.sqrt(posterior_var)
 
    # Calculate posterior distribution mean
    posterior_mean = (
        (observed_var * prior_mean + prior_var * observed_value) /
        (prior_var + observed_var)
    )
 
    return posterior_mean, posterior_std

This function takes the parameters of the prior distribution, the observed value, and the confidence in the observation as inputs, and returns the parameters of the posterior distribution.

Application to an Actual Evaluation System

To incorporate this into an actual evaluation system, we need a data structure that manages parameters for each evaluation axis:

def initialize_parameters():
    """Initialize evaluation parameters"""
    return {
        'market_analysis': {'mean': 3.0, 'std': 1.0},
        'technical_skills': {'mean': 3.0, 'std': 1.0},
        'team_composition': {'mean': 3.0, 'std': 1.0},
        'business_strategy': {'mean': 3.0, 'std': 1.0},
        'track_record': {'mean': 3.0, 'std': 1.0}
    }
 
def update_parameter(parameters, axis, observed_value, confidence):
    """
    Update parameters for a specific evaluation axis
 
    Args:
        parameters: Parameter dictionary
        axis: Name of the evaluation axis to update
        observed_value: Observed value
        confidence: Confidence in the observation (0.0 to 1.0)
 
    Returns:
        Updated parameter dictionary
    """
    # Create a copy to avoid modifying the original data
    updated_parameters = parameters.copy()
 
    # Get current parameters
    prior_mean = parameters[axis]['mean']
    prior_std = parameters[axis]['std']
 
    # Apply Bayesian update
    posterior_mean, posterior_std = bayes_update(
        prior_mean, prior_std, observed_value, confidence
    )
 
    # Set updated parameters
    updated_parameters[axis]['mean'] = posterior_mean
    updated_parameters[axis]['std'] = posterior_std
 
    return updated_parameters

Using these functions, the evaluation system operates in the following flow:

  1. Set initial parameters (initialize_parameters)
  2. Update parameters each time a new observation is obtained (update_parameter)
  3. Output the final evaluation results

Concrete Example: Evaluating an Entrepreneur's Technical Skills

As an example, let's consider the "technical skills" evaluation axis for an entrepreneur. Suppose the current evaluation state and a new observation are as follows:

  • Current state: Mean μ=3.2\mu = 3.2, standard deviation σ=0.8\sigma = 0.8
  • New observation: Evaluation value x=4.5x = 4.5, confidence c=0.7c = 0.7

Applying Bayesian updates:

σx2=1/0.7=1.429\sigma_x^2 = 1/0.7 = 1.429
σ2(new)=0.82×1.4290.82+1.429=0.478σ(new)=0.692\sigma^{2(new)} = \frac{0.8^2 \times 1.429}{0.8^2 + 1.429} = 0.478 \Rightarrow \sigma^{(new)} = 0.692
μ(new)=1.429×3.2+0.82×4.50.82+1.429=3.77\mu^{(new)} = \frac{1.429 \times 3.2 + 0.8^2 \times 4.5}{0.8^2 + 1.429} = 3.77

So, after the update, the mean is 3.77 and the standard deviation is 0.692. The current evaluation of 3.2 was pulled toward the new observation of 4.5, updating to 3.77, but it didn't change all the way to nearly 4.5. Also, the uncertainty decreased from 0.8 to 0.692.

# Code for the concrete example
prior_mean, prior_std = 3.2, 0.8
observed_value, confidence = 4.5, 0.7
 
posterior_mean, posterior_std = bayes_update(
    prior_mean, prior_std, observed_value, confidence
)
 
print(f"Before update: Mean {prior_mean}, Standard deviation {prior_std}")
print(f"Observation: {observed_value}, Confidence: {confidence}")
print(f"After update: Mean {posterior_mean:.2f}, Standard deviation {posterior_std:.2f}")

Running this code would produce output like:

Before update: Mean 3.2, Standard deviation 0.8
Observation: 4.5, Confidence: 0.7
After update: Mean 3.77, Standard deviation 0.69

Effective Question Design: Maximizing Information Gain

Another advantage of an evaluation system using Bayesian updates is that it provides a theoretical foundation for deciding "what to ask next." Let me share the insights I gained from my trials and errors, applying concepts from information theory to design questions that extract the maximum information within limited time.

Question Design to Maximize Information Gain

Time is limited in investment interviews. That's why it's important for each question to extract as much information as possible. I focused on information gain as a concept to measure the "information value" of questions.

First, to quantitatively measure the uncertainty of each evaluation axis, I use the concept of entropy. Entropy is a measure of uncertainty; the larger the value, the higher the uncertainty. The entropy of a normal distribution can be calculated with the following equation:

H=12log(2πeσ2)H = \frac{1}{2} \log(2\pi e \sigma^2)

As you can see from this equation, as the standard deviation σ\sigma increases, the entropy HH also increases.

Information gain is the expected amount of information obtained from a question, expressed by:

IG=HbeforeE[Hafter]IG = H_{before} - E[H_{after}]

Here, HbeforeH_{before} is the uncertainty (entropy) before the question, and E[Hafter]E[H_{after}] is the expected uncertainty after the question. In other words, it's a measure of "how much uncertainty will be reduced by asking this question."

Trial and Error in Question Design

In my initial approach, I prepared question templates for each evaluation axis like "market analysis" and "technical skills," and chose questions about the axis with the highest uncertainty. However, I discovered major problems with this method:

  1. Time inefficiency: Each question could only evaluate one axis
  2. Lack of context: It was difficult to have natural conversations that built on previous answers
  3. Algorithmic limitations: Evaluating the "quality" of questions solely based on entropy was insufficient

As a more advanced approach, I also considered using PyTorch to optimize the information gain of questions as a latent variable, but decided it wasn't realistic considering the complexity of implementation and maintainability.

Shifting to Multi-Parameter Optimization

After much trial and error, I arrived at a method of "integrating multiple evaluation axes" in question design. This method aims to obtain more information with a single question by combining questions about multiple evaluation axes with high uncertainty.

図表を生成中...

Practical Approaches to Improve Question Quality

I found that incorporating the following elements was effective in improving the quality of questions:

  1. Strategic integration of multiple evaluation axes

    • Combine correlated axes like "technical skills" and "market analysis"
    • Prioritize combining axes with high uncertainty
  2. Integration with knowledge graphs

  3. Question design considering respondent psychology

    • Empathetic introductions ("I understand about ~. Next, about ~")
    • Explicit requests for specific examples or numbers (avoiding abstract answers)
    • Contextual continuity by referencing previous answers

Examples and Effects of Question Design

Here's an example of an integrated question designed with this approach:

"I understand your company's technology stack. Next, I'd like to ask about technical advantages and market differentiation. How does this core technology create differences from competitors? Specifically, how does it provide technical solutions to customer needs in your target market? If you have specific examples or numbers, please share them."

This question is designed to extract information about both "technical skills" and "market analysis" at once. When I actually used it, compared to single-axis questions:

  • Information density improved: Able to update multiple evaluation axes from a single answer
  • Conversation naturalness improved: AI dialogue became a "natural interview" rather than a "mechanical questionnaire"
  • Interview time reduced: Fewer questions needed to obtain the same amount of information

Practical Use of Entropy

I found that monitoring uncertainty (entropy) is useful not just for question selection but for optimizing the entire interview process.

  • Convergence detection: When entropy falls below a certain value, it can be judged that sufficient information has been gathered
  • Objective criteria for session ending: End when entropy for all evaluation axes falls below a threshold
  • Switching to exploration mode: Move to more exploratory questions once uncertainty about major evaluation axes is resolved

This allowed for optimal use of limited time, improving both the efficiency and quality of interviews.

For a deeper understanding of context using knowledge graphs, please refer to Knowledge Graph Design & Implementation Guide: Realizing Dialog Systems through Relationship Modeling.

Results and Insights from Implementation

After actually implementing this evaluation system using Bayesian updates, I gained several interesting results and insights.

  1. Improved evaluation stability: Compared to simple weighted averages, evaluations converged more stably

  2. Visualization of uncertainty: Being able to explicitly show confidence in evaluations through standard deviation enabled distinguishing between "certain evaluations" and "uncertain evaluations"

  3. Efficient information collection: Able to efficiently improve evaluation precision by prioritizing questions about evaluation axes with high uncertainty

  4. Evaluation transparency: Able to mathematically explain why an evaluation resulted as it did

What was particularly impressive was that even the same "3 points" can have greatly different uncertainties. For example, I could now distinguish between evaluations like "technical skills are 3 points, but responses to technical questions were vague, so uncertainty is high" and "technical skills are 3 points, and they could explain specific implementation methods, so confidence is high."

Also, when delegating evaluation to AI, it's important that evaluation processes can be explained in a way humans can understand. Thanks to its mathematical foundation, the evaluation system using Bayesian updates can clearly explain "why this evaluation resulted." This was very important for enhancing the reliability of AI evaluations.

Summary and Future Directions

Uncertainty modeling through Bayesian updates is a powerful approach to the challenge of quantifying subjective evaluations by AI. It's particularly strong in its ability to handle both evaluation and uncertainty simultaneously, and to update appropriately with new information.

While I applied this method to an investment evaluation system, its range of applications is wide. For example:

  • Customer support quality evaluation
  • Learner skill assessment
  • Skill matching evaluation for recruitment candidates
  • Analysis of customer satisfaction or product evaluations

As for future developments, the following extensions are possible:

  1. Extension to multivariate normal distributions: Modeling correlations between evaluation axes
  2. Integration with Bayesian networks: Modeling more complex causal relationships
  3. Combination with active learning: Efficient information collection based on uncertainty

Finally, I want to emphasize that the essence of this method is "not fearing uncertainty." Rather, by explicitly modeling uncertainty and utilizing it, we can build more reliable evaluation systems. This is an important perspective in collaboration between AI and humans.

While the mathematical models and implementation details may seem complex, the basic idea is intuitive. The principle of "giving more weight to information with high confidence and tracking uncertainty quantitatively" can be applied to various evaluation systems. I hope you'll try incorporating it into your own projects.

Ryosuke Yoshizaki

Ryosuke Yoshizaki

CEO, Wadan Inc. / Founder of KIKAGAKU Inc.

I am working on structural transformation of organizational communication with the mission of 'fostering knowledge circulation and driving autonomous value creation.' By utilizing AI technology and social network analysis, I aim to create organizations where creative value is sustainably generated through liberating tacit knowledge and fostering deep dialogue.

Get the latest insights

Subscribe to our regular newsletter for the latest articles and unique insights on the intersection of technology and business.