Knowledge Graph Design & Implementation Guide for LLM Applications

Exploring "Understanding" That Cannot Be Expressed by Numbers Alone

What does it mean for an AI system to "understand" someone? This was the fundamental question I faced while developing an investment interview system.

From Education to Knowledge Graphs

I originally worked in educational technology research. In educational contexts, it's understood that measuring a learner's understanding with numerical scores alone is insufficient—the connections between concepts and knowledge structuring are crucial. For example, to understand the concept of "differentiation," one needs to grasp related concepts like "functions," "limits," and "derivatives."

In education, tools like knowledge maps and concept maps are used to visualize these relationships between concepts. I found these tools extremely effective in deepening learners' understanding. Rather than simply assigning a numerical evaluation like "differentiation comprehension: 80 points," structuring the connections with related concepts promotes a true understanding of the subject.

Application to Conversational Systems and My Realization

This experience directly led to my investment interview system development. As an investor interviewing entrepreneurs, I always felt something was missing. Does rating someone as "3 out of 5 for technical capability" or "4 points for market analysis" really mean we understand them?

The investment interview system I initially began creating also focused primarily on estimating latent trait values $\theta$ . However, this approach made conversations extremely mechanical. For example, exchanges would look like this:

System: "Please tell me about your technical capabilities."
Entrepreneur: "I have a background in educational technology, specializing in learning analytics and adaptive learning systems."
System: "Please tell me about your market size."

I had a moment of realization—this isn't a dialogue but an interrogation. Instead of listening to someone's response and asking related follow-up questions, I was simply asking a series of questions I wanted answers to. This approach would neither make the person comfortable nor draw out their true strengths and characteristics.

What I wanted to achieve was natural question flow based on the person's responses, like: "I see you have strengths in educational technology. How are you utilizing learning analytics data?"

Even AI Interviews Shouldn't Feel Impersonal

Even when the interviewer is AI, people wouldn't want to interact with something that only asks questions without genuinely trying to understand them. The user experience of those being interviewed should certainly be considered. People will provide better responses when they feel comfortable answering. That's why it's important to absorb knowledge about the words, domains, and terms the person values while estimating their abilities, and then use that information to show empathy while considering the next optimal conversational question.

The extreme questioning I mentioned earlier isn't a dialogue with the person. It's merely an interrogation that asks only what you want to know. Even when delegating work to AI, I want to properly design such dialogues and enable AI to create conversations that feel even better than human ones.

Knowledge Graphs as a Solution

That's when I focused on knowledge graphs. This method represents concepts (nodes) and their relationships (edges) in a graph structure and can be seen as an evolution of the knowledge maps used in education.

With knowledge graphs, I could structure concepts like "learning analytics," "education market," and "data utilization" along with their relationships. I believed this structured information could generate more natural and empathetic questions.

What I wanted to achieve was true understanding that captures the relationships between concepts behind someone's words, not just numerical evaluations. And based on that understanding, I wanted to create natural conversations.

Fundamentals of Knowledge Graphs and My Approach

From Graph Theory to Implementation

When implementing knowledge graphs, I first relearned the basics of graph theory. A graph is a mathematical structure defined as a collection of nodes (vertices) and edges.

Formally, a graph $G$ is represented as $G = (V, E)$ , where $V$ is the set of nodes and $E$ is the set of edges. To represent this in code, I designed the following interfaces:

interface Node {
  id: string;
  label: string;
  type: string; // 'Person', 'Technology', 'Market', etc.
  importance: number; // Importance (0-1)
  sigma: number; // Uncertainty
  description?: string; // Description (optional)
}
 
interface Edge {
  id: string;
  source: string; // Source node ID
  target: string; // Target node ID
  label: string; // Type of relationship
  weight: number; // Strength of relationship (0-1)
  sigma: number; // Uncertainty
}

The key point was including "importance" and "uncertainty" parameters for both nodes and edges. This was to maintain consistency with the latent trait estimation $\theta$ approach that formed the foundation of my investment interview system.

The Concept of Centrality and Its Application

One concept in graph theory I found particularly interesting was centrality. This is a measure of a node's "importance" within a graph and can be calculated in various ways.

I particularly focused on the following four standard measures commonly used in social network analysis:

Degree Centrality: Simply counts the number of edges connected to a node. Nodes connected to many concepts are considered important.
Closeness Centrality: The sum of the reciprocals of a node's distances to all other nodes. Related to the speed of information propagation.
Betweenness Centrality: The number of times a node appears on the shortest paths between other nodes. Indicates the ability to control information flow.
Eigenvector Centrality: The idea that nodes connected to important nodes are themselves important. This forms the basis of Google's PageRank algorithm.

After experimentation, I found that eigenvector centrality best suited my purposes. This is because this measure doesn't just evaluate concepts connected to many other concepts but assigns higher value to concepts connected to important concepts. For example, if the concept of "data analysis" appears frequently in an entrepreneur's conversation and connects to many other important concepts, it's likely at the center of that entrepreneur's strengths and interests.

Why Focus on Knowledge Graphs Instead of RAG?

Considering the Standard RAG Approach

Initially, I also considered the RAG (Retrieval Augmented Generation) approach, which is commonly used with LLMs. RAG divides large texts into chunks, vectorizes them, and retrieves relevant information through similarity searches.

However, even before extensive experimentation, I realized RAG had fundamental limitations for an investment interview system:

Information Volume Constraints: RAG inherently requires a certain amount of data. In the early stages of an investment interview, we know very little about the person. Even if we have a PDF company profile, it's insufficient—we need to gather information through conversation.
Context Fragmentation: When dividing text into chunks, the overall structure of the document and relationships between concepts are lost. This doesn't constitute "understanding."
Searches Based Only on Surface Similarities: RAG searches based on word-level similarity, failing to capture semantic and structural relationships.

While RAG libraries and infrastructure are quite developed, making them convenient to use, I felt they weren't the fundamental approach needed. So I initially considered alternative approaches, though I haven't ruled out exploring RAG in the future.

Why Knowledge Graphs Were Suitable

Knowledge graphs, on the other hand, explicitly represent concepts and relationships, making them suitable for building a complete picture even from limited information. In investment interviews, we gradually extract important concepts and their relationships from conversations and use them to ask further questions. Knowledge graphs naturally model this process.

For me, the decisive factor was the realization that "understanding" isn't merely having information but grasping the relationships between pieces of information. Knowledge graphs explicitly represent these relationships.

Combining $\theta$ Estimation with Knowledge Graphs

In the investment interview system, we estimate the latent trait value $\theta$ for each of the entrepreneur's capabilities. This value is represented by a normal distribution $N(\mu, \sigma^2)$ , where the mean $\mu$ is the estimated value and the standard deviation $\sigma$ represents uncertainty.

What I wanted to achieve was a hybrid approach leveraging the strengths of both $\theta$ estimation and knowledge graphs. With $\theta$ estimation alone, we only get impersonal information like "this entrepreneur's technical capability is 3.5 out of 5." Combined with knowledge graphs, however, we can gain structured, concrete understanding such as "they have strengths in educational technology, with particular innovation at the intersection of data analysis and personalized learning."

I believed this combination could achieve both the objectivity of numerical evaluation and the richness of conceptual understanding.

Knowledge Graph Design and Implementation: A Record of Trial and Error

Entity and Relationship Extraction: Prompt Design Trials

The first challenge in building knowledge graphs was extracting entities and relationships from text. I decided to use LLMs for this, but here I faced the difficulty of prompt design.

I first tried a very simple prompt:

Please extract important concepts and relationships from the following text.
Text: [text]

However, this method produced too much variation in output. The granularity of extracted concepts was inconsistent—sometimes extracting broad concepts like "AI technology," other times extracting detailed concepts like "neural networks," "deep learning," and "natural language processing." This made it impossible to maintain consistency across the entire knowledge graph.

I particularly struggled with aligning the granularity of concepts. Should they be divided at the word level, phrase level, or by meaningful segments of text? After much trial and error, I settled on the following more detailed prompt:

Extract important concepts (entities) and their relationships from the following text and output in JSON format.

[TEXT]
${text}

[EXTRACTION POINTS]
1. Identify important concepts and determine their types (Technology, Market, Person, etc.)
2. Identify relationships between concepts (e.g., "develops," "applies," "depends on")
3. Evaluate the importance and uncertainty of each concept, and the strength and uncertainty of relationships on a scale of 0.1 to 1.0
4. Express concepts as 2-3 word noun phrases, avoiding overly specific or broad concepts

[OUTPUT FORMAT]
{
  "concepts": [
    {
      "concept": "concept name",
      "type": "concept type",
      "importance": importance value 0.1-1.0,
      "sigma": uncertainty value 0.1-1.0
    },
    ...
  ],
  "relationships": [
    {
      "source": "source concept name",
      "target": "target concept name",
      "relation": "relationship type",
      "strength": relationship strength 0.1-1.0,
      "sigma": uncertainty value 0.1-1.0
    },
    ...
  ]
}

The key points of this prompt were:

Explicitly specifying concept granularity: Including the instruction "2-3 word noun phrases"
Classifying concept types: Promoting structure by requesting classifications like Technology
Evaluating both importance and uncertainty: Maintaining consistency with $\theta$ estimation
Clearly specifying output format: Showing the exact JSON format for easier parsing

Even so, there was considerable variation in the outputs. Different concept sets would be extracted from the same text in different runs. The "importance" and "sigma" values in particular could change significantly between runs.

JSON Parsing Struggles and Reinventing the Wheel

I also struggled when parsing LLM outputs as JSON. Initially, I used basic JSON.parse(), but I learned that LLMs don't always output perfectly valid JSON. So I implemented error handling like this:

try {
  const knowledge = JSON.parse(response);
  return validateAndNormalizeKnowledge(knowledge);
} catch (error) {
  console.error("JSON parsing error:", error);
 
  // If parsing fails, remove extraneous characters and retry
  const extractedJson = extractJsonFromText(response);
  try {
    const knowledge = JSON.parse(extractedJson);
    return validateAndNormalizeKnowledge(knowledge);
  } catch (error) {
    console.error("Second JSON parsing error:", error);
    throw new Error("Failed to parse LLM response");
  }
}

After implementing this parsing process, I discovered LangChain's withStructuredOutput feature and realized it was exactly the wheel I had been reinventing.

import { ChatOpenAI } from "langchain/chat_models/openai";
import { z } from "zod";
 
// Zod schema definition
const extractedKnowledgeSchema = z.object({
  concepts: z.array(
    z.object({
      concept: z.string(),
      type: z.string(),
      importance: z.number().min(0.1).max(1.0),
      sigma: z.number().min(0.1).max(1.0),
    })
  ),
  relationships: z.array(
    z.object({
      source: z.string(),
      target: z.string(),
      relation: z.string(),
      strength: z.number().min(0.1).max(1.0),
      sigma: z.number().min(0.1).max(1.0),
    })
  ),
});
 
// Initialize LLM model with structured output setting
const model = new ChatOpenAI({
  modelName: "gpt-4",
  temperature: 0.1,
}).withStructuredOutput(extractedKnowledgeSchema);

This method uses a Zod schema to specify the output format and directly converts LLM output into type-safe objects. It was exactly the functionality I had been struggling to create. I strongly felt that reinventing the wheel is a waste of time.

However, this withStructuredOutput is separate from LangChain's knowledge graph functionality. LangChain has recently added modules specifically for knowledge graphs, but I wasn't aware of them during my initial implementation due to insufficient research. Therefore, I decided to use LangChain's functionality only for structuring JSON output while implementing the knowledge graph construction and utilization myself.

Updating Knowledge Graphs and Maintaining Consistency

What I particularly struggled with when creating knowledge graphs was updating the graph and maintaining consistency. The graph needs to be updated whenever new information is acquired. The main challenges were:

Identifying Identical Concepts: How to handle cases where the same concept appears with different expressions, like "AI" and "artificial intelligence."
Updates Considering Reliability: How to weight updates based on the reliability (inverse of uncertainty) of new and existing information.
Maintaining Graph Consistency: How to ensure relationships across the entire graph remain consistent after updates.

Simple string matching was insufficient for identifying identical concepts. It was difficult to handle abbreviations like "machine learning" and "ML" or similar concepts like "education system" and "learning system." This area is still in the experimentation stage, but I'm considering adopting semantic matching. Ideally, I'd like to incorporate this into the prompt stage where meanings are extracted. I'm curious to see how LangChain handles this and plan to investigate and experiment.

Centrality Calculation and Application to Question Generation

After constructing the knowledge graph, what interested me most was calculating and utilizing centrality. I particularly believed that eigenvector centrality would be useful for identifying the entrepreneur's strengths and core interests.

Initially, I considered Python's NetworkX library for implementation simplicity, but ultimately decided on a TypeScript implementation for technical stack consistency. I chose Graphology as a library capable of calculating centrality in TypeScript. Looking back, I think I would adopt Neo4j in the future for better compatibility with LangChain's knowledge graph module. However, I had no complaints about Graphology itself—calculating centrality was very straightforward.

By focusing on important knowledge through centrality, I was able to generate somewhat more natural questions focused on a person's strengths, like "How are you applying data analysis techniques in education systems to achieve personalization?" rather than impersonal questions like "How is your technical capability?"

Lessons Learned from Practice

Prompt Design and Dealing with Output Variation

One of the most important lessons I learned through this development was the importance of LLM prompt design. Output variation becomes a significant problem, especially when generating data structures like knowledge graphs.

The countermeasures I found effective were:

Detailed Instructions and Examples: Rather than simply saying "extract concepts," specifying the granularity and types of concepts and providing examples improved consistency.
Optimizing Temperature Parameter: Setting low values like temperature: 0.1 increased output stability and reproducibility.
Specifying Structured Output: Using JSON or Zod schemas to strictly specify output formats.
Implementing Post-processing: Implementing normalization and validation as post-processing to handle remaining variations.

I researched and tried various best practices, but many challenges remained. In retrospect, I regret that using customized prompts from LangChain would have produced higher quality results and helped me reach my goals in less time than creating prompts from scratch. Of course, making detailed adjustments was a learning experience.

Prioritizing Conversation Quality and User Experience

What I valued most in this development was conversation quality and user experience. While the technical completeness of the system is important, I believed the key to success was whether the person could comfortably engage in conversation.

To achieve this, I kept the following points in mind:

Context-Aware Questions: Generating questions based on the person's previous statements, not just extracting information.
Focus on Strengths and Interests: Promoting positive dialogue by focusing on the person's strengths and interests.
Natural Conversation Flow: Creating a natural conversation flow rather than mechanically listing questions.
Empathetic Responses: Showing empathetic responses to the person's statements, not just recording information.

With these considerations, the system functions not merely as an "evaluation tool" but as a "conversation partner."

Lessons on Avoiding Reinventing the Wheel

What I strongly felt during the development process was the importance of avoiding reinventing the wheel. I had been implementing basic functions like JSON parsing and validation myself, but these could have been easily solved using existing tools like LangChain's withStructuredOutput.

From this experience, I learned the following lessons:

Thorough Research Before Implementation: Before implementing anything, thoroughly investigate whether existing libraries or tools already provide similar functionality.
Following Latest Trends: The AI field evolves rapidly, so regularly check the latest libraries and tools.
Ensuring Modularity: Even when custom implementation is necessary, ensure high modularity for easy replacement with existing tools later.
Focusing on Core Business Logic: Rather than spending time reinventing the wheel, focus on core logic that creates unique value for the project.

I want to keep these lessons in mind for future projects.

Future Prospects and Next Steps

Fusion of RAG and Knowledge Graphs

Through development, I became particularly interested in the possibility of fusing RAG and knowledge graphs. RAG excels at searching through large amounts of information, while knowledge graphs excel at expressing relationships. I believe combining these approaches could achieve a more advanced level of "understanding."

Specifically, I'm considering a hybrid approach that integrates RAG search results into knowledge graphs and generates questions or answers based on this structured knowledge.

Introduction of Temporal Dimension

The current knowledge graph is static, but I'm considering introducing a temporal dimension in the future. For example, capturing temporal changes like "previously focused on the education market, but recently expanding into the healthcare market" would enable deeper understanding.

This would require adding time information to each node and edge in the knowledge graph and tracking changes over time.

Utilizing LangChain's Knowledge Graph Functionality

I discovered this after completing much of my implementation, but recent LangChain versions have enhanced knowledge graph-related functionality. I believe replacing many parts of my custom implementation with LangChain modules would create a more robust and feature-rich system. After actually building the system, I found there were many areas where I wanted to reduce initial customization, so I'd like to try using such libraries in the future. In particular, utilizing LangChain's Neo4j integration would enable more advanced graph queries and centrality calculations.

In Conclusion

Through developing this investment interview system, I was deeply prompted to consider the essence of "understanding." I realized that true understanding comes from capturing relationships between concepts, not merely numerical evaluations or surface similarities.

Knowledge graphs are powerful tools for achieving this "understanding," but they aren't perfect. Many challenges remain, such as the difficulty of prompt design and output variations. Nevertheless, I'm confident this approach has the potential to greatly improve the quality of human-AI dialogue.

I want to continue exploring ways to achieve more natural and empathetic conversations. And I aim to create systems that enable true dialogue, not mere interrogation.

LLM

Knowledge Graph

AI Technology

Relationship Modeling

Structured Knowledge

Next.js

ChatGPT

Knowledge Graph Design & Implementation Guide for LLM Applications

Exploring "Understanding" That Cannot Be Expressed by Numbers Alone

From Education to Knowledge Graphs

Application to Conversational Systems and My Realization

Even AI Interviews Shouldn't Feel Impersonal

Knowledge Graphs as a Solution

Fundamentals of Knowledge Graphs and My Approach

From Graph Theory to Implementation

The Concept of Centrality and Its Application

Why Focus on Knowledge Graphs Instead of RAG?

Considering the Standard RAG Approach

Why Knowledge Graphs Were Suitable

Combining $\theta$ Estimation with Knowledge Graphs

Knowledge Graph Design and Implementation: A Record of Trial and Error

Entity and Relationship Extraction: Prompt Design Trials

JSON Parsing Struggles and Reinventing the Wheel

Updating Knowledge Graphs and Maintaining Consistency

Centrality Calculation and Application to Question Generation

Lessons Learned from Practice

Prompt Design and Dealing with Output Variation

Prioritizing Conversation Quality and User Experience

Lessons on Avoiding Reinventing the Wheel

Future Prospects and Next Steps

Fusion of RAG and Knowledge Graphs

Introduction of Temporal Dimension

Utilizing LangChain's Knowledge Graph Functionality

In Conclusion

Ryosuke Yoshizaki

Related Articles

Complete Guide to Google's Agent2Agent (A2A) Protocol: The New Era of LLM Agent Collaboration

Implementing Uncertainty Modeling with Bayesian Updates in LLM Applications

Understanding AI's Essence Through Solomonoff Induction: The Science of Generalization Through Compression

Knowledge Graph Design & Implementation Guide for LLM Applications

Exploring "Understanding" That Cannot Be Expressed by Numbers Alone

From Education to Knowledge Graphs

Application to Conversational Systems and My Realization

Even AI Interviews Shouldn't Feel Impersonal

Knowledge Graphs as a Solution

Fundamentals of Knowledge Graphs and My Approach

From Graph Theory to Implementation

The Concept of Centrality and Its Application

Why Focus on Knowledge Graphs Instead of RAG?

Considering the Standard RAG Approach

Why Knowledge Graphs Were Suitable

Combining θ\thetaθ Estimation with Knowledge Graphs

Knowledge Graph Design and Implementation: A Record of Trial and Error

Entity and Relationship Extraction: Prompt Design Trials

JSON Parsing Struggles and Reinventing the Wheel

Updating Knowledge Graphs and Maintaining Consistency

Centrality Calculation and Application to Question Generation

Lessons Learned from Practice

Prompt Design and Dealing with Output Variation

Prioritizing Conversation Quality and User Experience

Lessons on Avoiding Reinventing the Wheel

Future Prospects and Next Steps

Fusion of RAG and Knowledge Graphs

Introduction of Temporal Dimension

Utilizing LangChain's Knowledge Graph Functionality

In Conclusion

Ryosuke Yoshizaki

Related Articles

Complete Guide to Google's Agent2Agent (A2A) Protocol: The New Era of LLM Agent Collaboration

Implementing Uncertainty Modeling with Bayesian Updates in LLM Applications

Understanding AI's Essence Through Solomonoff Induction: The Science of Generalization Through Compression

Combining $\theta$ Estimation with Knowledge Graphs