KB Enrichment setup

Available starting with FlowX.AI 5.9.1. ai-platform-kb-enrichment replaces the decommissioned di-platform service.

Overview

ai-platform-kb-enrichment provides design-time document intelligence over knowledge-base documents — chunking, enrichment, and preparing content for indexing. It is a Python service in the AI Platform, exposing gRPC and consuming a Kafka enrichment-request topic. It deploys as a sub-chart of the AI Platform umbrella Helm chart like every other AI service (listening on port 9100 via SERVICE_PORT in Kubernetes), but has a few extra deployment requirements covered below.

Prerequisites

Infrastructure

AI Platform chart deployed (KB Enrichment is a subchart)
PostgreSQL instance (shared with the other AI Platform databases)
S3-compatible object storage (MinIO, AWS S3, …)
Kafka

Platform dependencies

doc-parser deployed (doc-parser.installed: true)
An organization TEXT_GENERATION LLM capability configured
Egress to huggingface.co on first start (or a pre-seeded model cache)

Deployment

KB Enrichment is enabled as part of the AI Platform chart. The dependencies below are provisioned automatically by the FlowX Helm charts in standard deployments — the notes call out what each one needs.

PostgreSQL database

KB Enrichment uses a dedicated kbenrichment database on the same instance and credentials as the other AI Platform databases (for example, next to the embedder database). The FlowX Helm dependencies chart provisions it automatically.

KB Enrichment uses PostgreSQL only for service/job state — vector embeddings are stored in Qdrant. The knowledgebases_design Qdrant collection is auto-created by the embedder service; no provisioning step is required.

Object storage

Parse offloads go to the kb-enrichment-bucket bucket (auto-created on MinIO), shared with Knowledge Base Indexer v2: the indexer writes offload documents and KB Enrichment reads them back.

Environment Variable	Description	Default Value
`KB_ENRICHMENT_OFFLOAD_BUCKET`	Optional override for the offload bucket name. Leave unset to use the shared default.	`kb-enrichment-bucket`

Model cache and egress

On first start the service downloads its model2vec chunking model from huggingface.co. The Helm chart already points the model caches at writable paths, so no manual configuration is needed:

Environment Variable	Description	Default Value
`HF_HOME`	Hugging Face cache directory	`/tmp`
`XDG_CACHE_HOME`	XDG cache directory	`/tmp`
`NLTK_DATA`	NLTK data directory	`/tmp`

Air-gapped environments must allow the huggingface.co download on first start or pre-seed the model cache; otherwise the service cannot initialize its chunking model.

Kafka topic

KB Enrichment consumes enrichment requests produced by Knowledge Base Indexer v2:

Topic	Partitions	Direction
`ai.flowx.ai-platform.knowledgebase.internal.enrichment-request.v1`	3	Indexer v2 → KB Enrichment (consume)

The topic is provisioned automatically by the FlowX Kafka chart — no manual creation is required for standard deployments.

Required platform configuration

Turn on doc-parser

AI Base ships with doc-parser.installed: false. Document ingestion through KB Enrichment requires doc-parser to be deployed — set doc-parser.installed: true in your AI Platform values. See the Document Parser setup guide.

Seed the LLM configuration

The organization needs a TEXT_GENERATION capability with a default model and provider API key configured in the AI Providers UI. Without it, enrichment fails with a no capability TEXT_GENERATION error. See AI providers.

Service accounts and Keycloak clients follow the standard AI Platform pattern — nothing new is required for KB Enrichment.

Verification

Check the pod

kubectl get pods -l app=kb-enrichment   # 5.9.1: -l app=ai-platform-kb-enrichment

Confirm dependencies

Verify the kbenrichment database exists, the kb-enrichment-bucket was created on object storage, and the enrichment-request topic is present.

Review logs

On a healthy first start the logs show the model2vec model downloading and the service connecting to PostgreSQL and Kafka. A no capability TEXT_GENERATION error means the organization LLM configuration is missing (see above).

AI Platform setup

Full AI Platform deployment and shared configuration

Document Parser setup

Required dependency for document ingestion

AI providers

Configure the TEXT_GENERATION LLM capability

Qdrant setup

Vector store for knowledge-base embeddings

Microservices

AI Platform

Plugins

Observability

Access management

KB Enrichment setup

Overview

Prerequisites

Infrastructure

Platform dependencies

Deployment

PostgreSQL database

Object storage

Model cache and egress

Kafka topic

Required platform configuration

Verification

AI Platform setup

Document Parser setup

AI providers

Qdrant setup

​Overview

​Prerequisites

Infrastructure

Platform dependencies

​Deployment

​PostgreSQL database

​Object storage

​Model cache and egress

​Kafka topic

​Required platform configuration

​Verification

​Related resources

AI Platform setup

Document Parser setup

AI providers

Qdrant setup

Overview

Prerequisites

Deployment

PostgreSQL database

Object storage

Model cache and egress

Kafka topic

Required platform configuration

Verification

Related resources