Overview

The Knowledge Graph (KG) service is a foundational microservice in FlowX.AI’s AI Agent ecosystem that provides a distributed graph database solution for managing AI Agent state, enabling Retrieval-Augmented Generation (RAG), and facilitating multi-agent collaboration.
The Knowledge Graph service uses DGraph as the underlying graph database technology and is essential for AI Agent operations, conversation state management, and cross-agent data sharing.

Key Capabilities

AI Agent State Management

Persistent storage and retrieval of AI Agent states across horizontal scaling scenarios

RAG Support

Enhanced LLM prompts with domain-specific data through efficient graph-based retrieval

Multi-Agent Collaboration

Transparent agent integration with shared context and state management

Connected Data

Property graph implementation for scalable, queryable data relationships

Infrastructure Prerequisites

Before setting up the Knowledge Graph service, ensure the following components are installed and configured:

Configuration Parameters

Core Service Configuration

Core Service Settings
# Knowledge Graph Service
KG_SERVICE_NAME=knowledge-graph
KG_SERVICE_PORT=8080
KG_SERVICE_HOST=0.0.0.0
KG_HEALTH_CHECK_INTERVAL=30s

# Service Discovery
SPRING_APPLICATION_NAME=knowledge-graph
SERVER_PORT=8080

DGraph Cluster Configuration

DGraph Settings
# DGraph Endpoints
DGRAPH_ALPHA_ENDPOINT=http://dgraph-alpha:8080
DGRAPH_ADMIN_ENDPOINT=http://dgraph-alpha:8080/admin
DGRAPH_ZERO_ENDPOINT=http://dgraph-zero:5080

# Cluster Configuration
DGRAPH_ALPHA_REPLICAS=3
DGRAPH_ZERO_REPLICAS=3
DGRAPH_STORAGE_PATH=/dgraph/data

# Performance Tuning
DGRAPH_MEMORY_MB=8192
DGRAPH_CACHE_SIZE_MB=2048
DGRAPH_MAX_CONNECTIONS=100
DGRAPH_QUERY_TIMEOUT=300s

Integration Configuration

FlowX Integration
# Kafka Configuration
KAFKA_BOOTSTRAP_SERVERS=${KAFKA_BOOTSTRAP_SERVERS}
KAFKA_CONSUMER_GROUP_ID=kg-service-group
KAFKA_AUTO_OFFSET_RESET=earliest

# Kafka Topics
KAFKA_TOPICS_AI_AGENT_STATE=ai-agent-state
KAFKA_TOPICS_MULTI_AGENT_COLLAB=multi-agent-collaboration
KAFKA_TOPICS_RAG_QUERIES=rag-queries

# FlowX Service Integration
FLOWX_ENGINE_ENDPOINT=http://process-engine:8080
FLOWX_ADVANCING_CONTROLLER_ENDPOINT=http://advancing-controller:8080

# Redis Configuration
REDIS_HOST=${REDIS_HOST}
REDIS_PORT=${REDIS_PORT}
REDIS_PASSWORD=${REDIS_PASSWORD}
REDIS_DATABASE=5

Security Configuration

Basic Auth (DGraph OSS)
# Shared token authentication
DGRAPH_AUTH_TOKEN=${DGRAPH_AUTH_TOKEN}
DGRAPH_AUTH_ENABLED=true

# TLS Configuration
DGRAPH_TLS_ENABLED=false
DGRAPH_TLS_CERT_PATH=/etc/ssl/certs/dgraph.crt
DGRAPH_TLS_KEY_PATH=/etc/ssl/private/dgraph.key

Deployment

Docker Compose Deployment

version: '3.8'

services:
  # DGraph Zero Nodes (Cluster Management)
  dgraph-zero-1:
    image: dgraph/dgraph:latest
    command: |
      dgraph zero 
      --my=dgraph-zero-1:5080 
      --replicas=3 
      --raft="idx=1"
    ports:
      - "5080:5080"
    volumes:
      - dgraph-zero-1:/dgraph
    networks:
      - dgraph-network

  dgraph-zero-2:
    image: dgraph/dgraph:latest
    command: |
      dgraph zero 
      --my=dgraph-zero-2:5080 
      --replicas=3 
      --raft="idx=2" 
      --peer=dgraph-zero-1:5080
    ports:
      - "5081:5080"
    volumes:
      - dgraph-zero-2:/dgraph
    networks:
      - dgraph-network
    depends_on:
      - dgraph-zero-1

  dgraph-zero-3:
    image: dgraph/dgraph:latest
    command: |
      dgraph zero 
      --my=dgraph-zero-3:5080 
      --replicas=3 
      --raft="idx=3" 
      --peer=dgraph-zero-1:5080
    ports:
      - "5082:5080"
    volumes:
      - dgraph-zero-3:/dgraph
    networks:
      - dgraph-network
    depends_on:
      - dgraph-zero-1

  # DGraph Alpha Nodes (Data Storage)
  dgraph-alpha-1:
    image: dgraph/dgraph:latest
    command: |
      dgraph alpha 
      --my=dgraph-alpha-1:7080 
      --zero=dgraph-zero-1:5080,dgraph-zero-2:5080,dgraph-zero-3:5080
      --security="whitelist=0.0.0.0/0"
    ports:
      - "8080:8080"
      - "9080:9080"
    volumes:
      - dgraph-alpha-1:/dgraph
    networks:
      - dgraph-network
    depends_on:
      - dgraph-zero-1
      - dgraph-zero-2
      - dgraph-zero-3

  dgraph-alpha-2:
    image: dgraph/dgraph:latest
    command: |
      dgraph alpha 
      --my=dgraph-alpha-2:7080 
      --zero=dgraph-zero-1:5080,dgraph-zero-2:5080,dgraph-zero-3:5080
      --security="whitelist=0.0.0.0/0"
    ports:
      - "8081:8080"
      - "9081:9080"
    volumes:
      - dgraph-alpha-2:/dgraph
    networks:
      - dgraph-network
    depends_on:
      - dgraph-zero-1
      - dgraph-zero-2
      - dgraph-zero-3

  dgraph-alpha-3:
    image: dgraph/dgraph:latest
    command: |
      dgraph alpha 
      --my=dgraph-alpha-3:7080 
      --zero=dgraph-zero-1:5080,dgraph-zero-2:5080,dgraph-zero-3:5080
      --security="whitelist=0.0.0.0/0"
    ports:
      - "8082:8080"
      - "9082:9080"
    volumes:
      - dgraph-alpha-3:/dgraph
    networks:
      - dgraph-network
    depends_on:
      - dgraph-zero-1
      - dgraph-zero-2
      - dgraph-zero-3

  # Knowledge Graph Service
  knowledge-graph:
    image: flowx/knowledge-graph:latest
    ports:
      - "8090:8080"
    environment:
      - DGRAPH_ALPHA_ENDPOINT=http://dgraph-alpha-1:8080
      - KAFKA_BOOTSTRAP_SERVERS=kafka:9092
      - REDIS_HOST=redis
      - SPRING_PROFILES_ACTIVE=docker
    networks:
      - dgraph-network
      - flowx-network
    depends_on:
      - dgraph-alpha-1
      - dgraph-alpha-2
      - dgraph-alpha-3

volumes:
  dgraph-zero-1:
  dgraph-zero-2:
  dgraph-zero-3:
  dgraph-alpha-1:
  dgraph-alpha-2:
  dgraph-alpha-3:

networks:
  dgraph-network:
    driver: bridge
  flowx-network:
    external: true

Kubernetes Deployment

apiVersion: v1
kind: Namespace
metadata:
  name: flowx-kg
  labels:
    name: flowx-kg

Performance Benchmarks

Based on synthetic testing with realistic AI Agent workloads:

Read Performance

20-45ms average response time
  • Query complexity: Multi-hop graph traversals
  • Concurrent threads: 10-15
  • Data scale: 4M+ nodes, 8M+ relationships

Write Performance

10-20ms per node average
  • Includes relationship creation
  • Batch operations supported
  • ACID transaction guarantees

Test Data Scale

Synthetic Data Stats
{
  "conversations": 4156,
  "threads": 41560,
  "messages": 831000,
  "actions": 4154000,
  "outcomes": 4155000,
  "feedback": 4155000
}

Health Checks and Monitoring

Service Health Endpoints

Health Check URLs
# Knowledge Graph Service Health
curl http://localhost:8090/actuator/health

# DGraph Cluster Health
curl http://localhost:8080/health

# DGraph Cluster State
curl http://localhost:8080/state

# Performance Metrics
curl http://localhost:8090/actuator/metrics

Key Metrics to Monitor

Schema Management

The Knowledge Graph service automatically manages schemas for AI Agent operations:

Core Schema Types

Conversation Types
type Conversation {
  id: ID!
  tenantId: String! @index(exact)
  userId: String! @index(exact)
  createdAt: DateTime!
  updatedAt: DateTime!
  threads: [Thread!]! @hasInverse(field: conversation)
  status: ConversationStatus!
}

type Thread {
  id: ID!
  conversation: Conversation! @hasInverse(field: threads)
  messages: [Message!]! @hasInverse(field: thread)
  executionPlan: [Action!]! @hasInverse(field: thread)
  createdAt: DateTime!
}

type Message {
  id: ID!
  thread: Thread! @hasInverse(field: messages)
  content: String!
  role: MessageRole!
  timestamp: DateTime!
  metadata: String
}

Troubleshooting

Security Best Practices

For production deployments, implement these security measures:
  1. Network Security
    • Use private networks for inter-node communication
    • Implement firewall rules for DGraph ports
    • Enable TLS for all communications
  2. Authentication & Authorization
    • Use DGraph Enterprise ACLs for fine-grained access control
    • Implement JWT-based authentication for API access
    • Rotate authentication tokens regularly
  3. Data Protection
    • Enable encryption at rest (DGraph Enterprise)
    • Implement backup encryption
    • Use secure communication protocols
  4. Monitoring & Auditing
    • Enable audit logging (DGraph Enterprise)
    • Monitor for suspicious query patterns
    • Set up alerts for security events

Backup and Recovery

Automated Backup (DGraph Enterprise)

Backup Configuration
# Environment variables for backup
DGRAPH_BACKUP_DESTINATION=s3://your-backup-bucket
DGRAPH_BACKUP_ACCESS_KEY=${AWS_ACCESS_KEY}
DGRAPH_BACKUP_SECRET_KEY=${AWS_SECRET_KEY}
DGRAPH_BACKUP_SCHEDULE="0 2 * * *" # Daily at 2 AM

# Manual backup command
curl -X POST localhost:8080/admin/backup \
  -H "Content-Type: application/json" \
  -d '{"destination": "s3://your-backup-bucket/backup-$(date +%Y%m%d)"}'

Export/Import (DGraph OSS)

Manual Export/Import
# Export data
curl -X POST localhost:8080/admin/export

# Import data (during cluster initialization)
dgraph bulk -r /path/to/export -s /path/to/schema.graphql

Developer Guidelines

Schema Contribution Standards

The Knowledge Graph service acts as “Database as a Service” for all FlowX microservices. Follow these guidelines when contributing schemas:

Schema Deployment Process

Local Schema Management
# Clear existing schema and data
make initialize-knowledge-graph-clear

# Apply latest schema changes
make initialize-knowledge-graph

# Verify schema deployment
curl http://localhost:8080/admin/schema

Data Contribution Patterns

The Knowledge Graph supports two data pipeline patterns:

Synchronous Pipeline

Use for: Critical data requiring immediate consistency
  • Real-time AI Agent state updates
  • User interaction data
  • Process execution state
Characteristics:
  • Strong consistency guarantees
  • Immediate availability
  • Higher latency tolerance required

Asynchronous Pipeline

Use for: Bulk data processing and analytics
  • Historical conversation data
  • Training data ingestion
  • Background embeddings generation
Characteristics:
  • Eventual consistency
  • Higher throughput
  • Lower resource impact
Currently only synchronous pipeline is supported

KAG RPC Interface

The Knowledge Graph provides a gRPC interface for cross-language compatibility:
KAG Service Definition
service KnowledgeGraphService {
  // Query operations
  rpc Query(QueryRequest) returns (QueryResponse);
  rpc QueryStream(QueryRequest) returns (stream QueryResponse);
  
  // Mutation operations  
  rpc Mutate(MutateRequest) returns (MutateResponse);
  rpc BatchMutate(BatchMutateRequest) returns (BatchMutateResponse);
  
  // Schema operations
  rpc GetSchema(SchemaRequest) returns (SchemaResponse);
  rpc UpdateSchema(UpdateSchemaRequest) returns (UpdateSchemaResponse);
}

message QueryRequest {
  string query = 1;           // GraphQL query
  map<string, string> variables = 2;
  string tenant_id = 3;
}

Client Integration Patterns

Interface-Based Query Resolvers:
public interface ConversationRepository {
    // Query methods
    Optional<Conversation> findById(String id, String tenantId);
    List<Conversation> findByUserId(String userId, String tenantId);
    Page<Conversation> findRecent(String tenantId, Pageable pageable);
    
    // Mutation methods
    Conversation save(Conversation conversation);
    void delete(String id, String tenantId);
    
    // Graph traversal methods
    List<Message> getConversationMessages(String conversationId);
    List<AIAgent> getParticipatingAgents(String conversationId);
}

@Component
public class DGraphConversationRepository implements ConversationRepository {
    // DGraph-specific implementation
}
Benefits of Interface-Based Approach:
  • Database Agnostic: Easy to switch between graph databases
  • Testable: Mock implementations for unit testing
  • Maintainable: Changes confined to specific implementations
  • Consistent: Same semantics across different backends

Multi-Database Support Strategy

Direct query translation between graph databases (DGraph ↔ Neo4j ↔ JanusGraph) is not practical due to fundamental differences in query languages and capabilities.
Recommended Approach:
  1. Abstract Business Logic: Use repository interfaces for domain operations
  2. Database-Specific Implementations: Separate implementation for each graph DB
  3. Consistent Data Models: Maintain same semantic meaning across databases
  4. Configuration-Based Selection: Choose database implementation at runtime
Database Selection Config
knowledge-graph:
  provider: dgraph  # dgraph | neo4j | janusgraph
  dgraph:
    endpoint: "http://dgraph-alpha:8080"
  neo4j:
    uri: "bolt://neo4j:7687"
  janusgraph:
    hosts: ["cassandra:9042"]

Next Steps

After successfully deploying the Knowledge Graph service:
Need Help? Check the troubleshooting section above or contact the FlowX.AI support team for assistance with your Knowledge Graph deployment.