Skip to main content

Overview

The Knowledge Base testing interface allows you to validate operations and queries before adding them to production workflows. This helps you:
  • Verify that content is properly indexed and searchable
  • Test query parameters and relevance thresholds
  • Validate content source operations
  • Understand which chunks are returned for specific queries
Testing operations in isolation ensures that your Knowledge Base is configured correctly before integrating it into workflows.

Accessing the testing interface

1

Open your Knowledge Base

Navigate to FlowX Designer → Your Knowledge Base
2

Go to Operations tab

Click the Operations tab in the Knowledge Base interface
3

Select operation type

Choose the type of operation you want to test
Knowledge Base Testing Interface

Available test operations

The testing interface supports the following operations:

Query Prompt

Test semantic search queries to see which chunks are returned

Append Content

Test adding new content to a content source

Replace Content

Test replacing all content in a content source

Delete Content Source

Test deleting a content source and its chunks

Testing query prompts

Query prompt testing is the most commonly used testing feature. It allows you to see exactly what information an AI agent would retrieve for a given query.

Query configuration

Query Prompt
string
required
Enter a natural language question or search queryExamples:
  • “What are the system requirements?”
  • “How do I reset my password?”
  • “Explain the refund policy”
Content Source
select
Filter by a specific content sourceOptions:
  • All (default) - Search across all content sources
  • Specific content source - Search only in selected source
Max. Number of Results
number
Maximum number of chunks to returnRange: 1-10 chunks Default: 5
Min. Relevance Score
percentage
Minimum relevance score thresholdRange: 0-100% Default: 70%Only chunks with relevance scores above this threshold will be returned
Metadata
object
Filter chunks by metadata (exact match with AND logic)
First iteration includes only system metadata

Running a test query

1

Enter your query

Type a natural language question in the Query Prompt field
2

Set parameters

Configure the content source filter, max results, and minimum relevance score
3

Run query

Click Test Query to execute the search
4

Review results

Examine the returned chunks, their relevance scores, and content
Test Query Results

Understanding query results

Each returned chunk displays:
Relevance Score
percentage
Indicates how relevant the chunk is to your query
  • 90-100%: Highly relevant, exact or near-exact match
  • 70-89%: Relevant, strong semantic similarity
  • 50-69%: Moderately relevant, partial match
  • Below 50%: Low relevance, may not be useful
Content Source
string
The content source that contains this chunk (clickable to view details)
Submitted Content
link
Link to view the original document or JSON payload
  • For uploaded files: Opens preview in new tab
  • For JSON payloads: Opens modal with JSON view
Chunk Content
text
The actual text content of the chunk that was retrieved
Chunk Metadata
object
System metadata associated with the chunk:
  • source: manual_upload or from_workflow
  • path: Document filepath or JSON payload path
  • chunk_id: Unique identifier from vector database
  • knowledge_base: Knowledge Base ID

Example test queries

Query: “How can I return a product?”Configuration:
  • Content Source: “Return Policy”
  • Max Results: 3
  • Min Relevance: 75%
Expected Results:
  • Chunks about return procedures
  • Timeframes for returns
  • Required documentation

Optimizing query parameters

Use test queries to find the optimal configuration:
Too few results (1-3):
  • Pros: Focused, precise information
  • Cons: May miss relevant context
  • Best for: Simple, direct questions
Moderate results (4-6):
  • Pros: Balanced coverage and focus
  • Cons: May include some less relevant chunks
  • Best for: Most use cases
Many results (7-10):
  • Pros: Comprehensive coverage
  • Cons: May include irrelevant information, slower processing
  • Best for: Complex questions requiring broad context
High threshold (80-100%):
  • Returns only highly relevant chunks
  • May return very few or no results
  • Use when precision is critical
Medium threshold (60-79%):
  • Balanced precision and recall
  • Good starting point for most applications
  • Recommended for general use
Low threshold (0-59%):
  • Returns many chunks, including marginally relevant ones
  • May include noise
  • Use for exploratory searches

Testing content operations

You can test content source operations to preview their effects before using them in workflows.
Important: The user expectation is to preview operation results without affecting actual content. The behavior of test operations on live content is TBD (To Be Determined).

Testing append content

Test adding new content to a content source:
1

Select operation

Choose Append Content from the operation dropdown
2

Select content source

Choose an existing content source or enter a new name
3

Enter content

Provide JSON content to append
4

Execute test

Click Test Operation
5

Review preview

See how many new chunks would be created and their content
Test output:
  • Number of chunks that would be created
  • Preview of chunk content
  • Estimated processing time

Testing replace content

Test replacing all content in a content source:
1

Select operation

Choose Replace Content from the operation dropdown
2

Select content source

Choose an existing content source
3

Enter new content

Provide JSON content that will replace existing content
4

Execute test

Click Test Operation
5

Review changes

See what chunks would be deleted and what new chunks would be created
Test output:
  • Number of existing chunks that would be deleted
  • Number of new chunks that would be created
  • Preview of new chunk content

Testing delete content source

Test deleting a content source:
1

Select operation

Choose Delete Content Source from the operation dropdown
2

Select content source

Choose the content source to delete
3

Execute test

Click Test Operation
4

Review impact

See what chunks would be deleted
Test output:
  • Number of chunks that would be deleted
  • List of chunks with their content
  • Confirmation required before actual deletion

Testing best practices

Query testing workflow

1

Start broad

Begin with a general query and default parameters to understand baseline behavior
2

Analyze results

Review relevance scores and chunk content to identify patterns
3

Refine parameters

Adjust max results and min relevance score based on initial results
4

Test edge cases

Try ambiguous queries, typos, and different phrasings
5

Document findings

Record optimal parameters for different query types

Common testing scenarios

Scenario: Testing queries on a newly created Knowledge BaseExpected: No results returnedAction: Add test content and verify it becomes searchable
Scenario: Verifying that new content is properly indexedExpected: New chunks appear in query resultsAction: Compare results before and after updates
Scenario: Finding the right relevance score cutoffExpected: Different chunk sets for different thresholdsAction: Run same query with varying min relevance scores
Scenario: Verifying content source filtering works correctlyExpected: Only chunks from selected source are returnedAction: Run query with “All” vs. specific content source

Troubleshooting test results

Possible causes:
  • Min relevance score is too high
  • Content source filter is too restrictive
  • Content hasn’t finished indexing
  • Query doesn’t match any content
Solutions:
  • Lower min relevance score to 0% temporarily
  • Select “All” for content source
  • Check content source status (should be “Available”)
  • Try broader or different query terms
Possible causes:
  • Min relevance score is too low
  • Max results is set too high
  • Query is too generic
Solutions:
  • Increase min relevance score (try 70-80%)
  • Reduce max results to 3-5
  • Make query more specific
Possible causes:
  • Content has significant overlap
  • Query is ambiguous
  • Content structure affects chunking
Solutions:
  • Review chunk content to understand why scores vary
  • Test different phrasings of the query
  • Consider reorganizing content sources

Interpreting test results

Relevance score interpretation

Understanding relevance scores helps you set appropriate thresholds:
Score RangeInterpretationRecommendation
90-100%Exact or near-exact matchHigh confidence, use this content
75-89%Strong semantic matchGood quality, likely relevant
60-74%Moderate matchReview content, may be useful
40-59%Weak matchProbably not relevant
0-39%Very weak matchLikely irrelevant
Set your minimum relevance score based on your use case:
  • High precision needs (customer support): 75-80%
  • Balanced approach (general Q&A): 65-75%
  • High recall needs (exploratory search): 50-65%

Chunk content analysis

When reviewing chunks, consider:
  1. Completeness: Does the chunk contain enough context to be useful?
  2. Accuracy: Is the information correct and up-to-date?
  3. Redundancy: Are multiple chunks returning the same information?
  4. Relevance: Does it actually answer the query?

Metadata analysis

System metadata provides insights into chunk origin:
  • source: Understand whether chunks come from manual uploads or workflows
  • path: Trace chunks back to their source documents
  • chunk_id: Unique identifier for debugging
  • knowledge_base: Confirm correct Knowledge Base

Example testing session

Here’s a complete example of testing a Knowledge Base:
1

Initial setup

Knowledge Base: “Product Documentation KB”Content Sources:
  • “Installation Guide” (Manual upload, PDF)
  • “API Reference” (Workflow ingestion, JSON)
  • “FAQ” (Manual upload, PDF)
2

Test Query 1: Installation

Query: “How do I install the product?”Config:
  • Content Source: All
  • Max Results: 5
  • Min Relevance: 70%
Results:
  • 3 chunks returned from “Installation Guide”
  • Relevance scores: 92%, 88%, 73%
  • All chunks relevant
Conclusion: Good results, threshold appropriate
3

Test Query 2: API Endpoint

Query: “user authentication endpoint”Config:
  • Content Source: All
  • Max Results: 5
  • Min Relevance: 70%
Results:
  • 2 chunks from “API Reference”
  • 1 chunk from “Installation Guide” (authentication setup)
  • Relevance scores: 95%, 89%, 71%
Conclusion: Good coverage, multiple sources helpful
4

Test Query 3: Specific Technical Detail

Query: “What port does the service use?”Config:
  • Content Source: “Installation Guide”
  • Max Results: 3
  • Min Relevance: 80%
Results:
  • 1 chunk returned
  • Relevance score: 94%
  • Contains exact port information
Conclusion: Content source filtering effective
5

Optimization

Based on tests:
  • Min relevance 70-75% works well
  • Max results 5 provides good coverage
  • Content source filtering useful for specific questions

Testing checklist

Before deploying to production, verify:
  • Queries return relevant results for common questions
  • Relevance scores are appropriate (>70% for good matches)
  • Content source filtering works correctly
  • All content sources are in “Available” state
  • Chunks contain complete, useful information
  • Metadata is correct for traceability
  • Edge cases handled appropriately (typos, ambiguous queries)
  • Performance is acceptable (query response time)

Next steps

Knowledge Base Overview

Understanding Knowledge Base capabilities

Custom Agent Nodes

Using AI agents with Knowledge Bases