Overview
The Knowledge Base testing interface allows you to validate operations and queries before adding them to production workflows. This helps you:- Verify that content is properly indexed and searchable
- Test query parameters and relevance thresholds
- Validate content source operations
- Understand which chunks are returned for specific queries
Accessing the testing interface
Open your Knowledge Base
Go to Operations tab
Select operation type

Available test operations
The testing interface supports the following operations:Query Prompt
Append Content
Replace Content
Delete Content Source
Testing query prompts
Query prompt testing is the most commonly used testing feature. It allows you to see exactly what information an AI agent would retrieve for a given query.Query configuration
- “What are the system requirements?”
- “How do I reset my password?”
- “Explain the refund policy”
- All (default) - Search across all content sources
- Specific content source - Search only in selected source
Running a test query
Enter your query
Set parameters
Run query
Review results

Understanding query results
Each returned chunk displays:- 90-100%: Highly relevant, exact or near-exact match
- 70-89%: Relevant, strong semantic similarity
- 50-69%: Moderately relevant, partial match
- Below 50%: Low relevance, may not be useful
- For uploaded files: Opens preview in new tab
- For JSON payloads: Opens modal with JSON view
source: manual_upload or from_workflowpath: Document filepath or JSON payload pathchunk_id: Unique identifier from vector databaseknowledge_base: Knowledge Base ID
Example test queries
- Customer Support
- Technical Documentation
- Policy Information
- Content Source: “Return Policy”
- Max Results: 3
- Min Relevance: 75%
- Chunks about return procedures
- Timeframes for returns
- Required documentation
Optimizing query parameters
Use test queries to find the optimal configuration:Adjusting Max. Number of Results
Adjusting Max. Number of Results
- Pros: Focused, precise information
- Cons: May miss relevant context
- Best for: Simple, direct questions
- Pros: Balanced coverage and focus
- Cons: May include some less relevant chunks
- Best for: Most use cases
- Pros: Comprehensive coverage
- Cons: May include irrelevant information, slower processing
- Best for: Complex questions requiring broad context
Tuning Min. Relevance Score
Tuning Min. Relevance Score
- Returns only highly relevant chunks
- May return very few or no results
- Use when precision is critical
- Balanced precision and recall
- Good starting point for most applications
- Recommended for general use
- Returns many chunks, including marginally relevant ones
- May include noise
- Use for exploratory searches
Testing content operations
You can test content source operations to preview their effects before using them in workflows.Testing append content
Test adding new content to a content source:Select operation
Select content source
Enter content
Execute test
Review preview
- Number of chunks that would be created
- Preview of chunk content
- Estimated processing time
Testing replace content
Test replacing all content in a content source:Select operation
Select content source
Enter new content
Execute test
Review changes
- Number of existing chunks that would be deleted
- Number of new chunks that would be created
- Preview of new chunk content
Testing delete content source
Test deleting a content source:Select operation
Select content source
Execute test
Review impact
- Number of chunks that would be deleted
- List of chunks with their content
- Confirmation required before actual deletion
Testing best practices
Query testing workflow
Start broad
Analyze results
Refine parameters
Test edge cases
Document findings
Common testing scenarios
Testing with empty Knowledge Base
Testing with empty Knowledge Base
Testing after content updates
Testing after content updates
Testing relevance thresholds
Testing relevance thresholds
Testing content source isolation
Testing content source isolation
Troubleshooting test results
No results returned
No results returned
- Min relevance score is too high
- Content source filter is too restrictive
- Content hasn’t finished indexing
- Query doesn’t match any content
- Lower min relevance score to 0% temporarily
- Select “All” for content source
- Check content source status (should be “Available”)
- Try broader or different query terms
Too many irrelevant results
Too many irrelevant results
- Min relevance score is too low
- Max results is set too high
- Query is too generic
- Increase min relevance score (try 70-80%)
- Reduce max results to 3-5
- Make query more specific
Inconsistent relevance scores
Inconsistent relevance scores
- Content has significant overlap
- Query is ambiguous
- Content structure affects chunking
- Review chunk content to understand why scores vary
- Test different phrasings of the query
- Consider reorganizing content sources
Interpreting test results
Relevance score interpretation
Understanding relevance scores helps you set appropriate thresholds:| Score Range | Interpretation | Recommendation |
|---|---|---|
| 90-100% | Exact or near-exact match | High confidence, use this content |
| 75-89% | Strong semantic match | Good quality, likely relevant |
| 60-74% | Moderate match | Review content, may be useful |
| 40-59% | Weak match | Probably not relevant |
| 0-39% | Very weak match | Likely irrelevant |
Chunk content analysis
When reviewing chunks, consider:- Completeness: Does the chunk contain enough context to be useful?
- Accuracy: Is the information correct and up-to-date?
- Redundancy: Are multiple chunks returning the same information?
- Relevance: Does it actually answer the query?
Metadata analysis
System metadata provides insights into chunk origin:- source: Understand whether chunks come from manual uploads or workflows
- path: Trace chunks back to their source documents
- chunk_id: Unique identifier for debugging
- knowledge_base: Confirm correct Knowledge Base
Example testing session
Here’s a complete example of testing a Knowledge Base:Initial setup
- “Installation Guide” (Manual upload, PDF)
- “API Reference” (Workflow ingestion, JSON)
- “FAQ” (Manual upload, PDF)
Test Query 1: Installation
- Content Source: All
- Max Results: 5
- Min Relevance: 70%
- 3 chunks returned from “Installation Guide”
- Relevance scores: 92%, 88%, 73%
- All chunks relevant
Test Query 2: API Endpoint
- Content Source: All
- Max Results: 5
- Min Relevance: 70%
- 2 chunks from “API Reference”
- 1 chunk from “Installation Guide” (authentication setup)
- Relevance scores: 95%, 89%, 71%
Test Query 3: Specific Technical Detail
- Content Source: “Installation Guide”
- Max Results: 3
- Min Relevance: 80%
- 1 chunk returned
- Relevance score: 94%
- Contains exact port information
Optimization
- Min relevance 70-75% works well
- Max results 5 provides good coverage
- Content source filtering useful for specific questions
Testing checklist
Before deploying to production, verify:- Queries return relevant results for common questions
- Relevance scores are appropriate (>70% for good matches)
- Content source filtering works correctly
- All content sources are in “Available” state
- Chunks contain complete, useful information
- Metadata is correct for traceability
- Edge cases handled appropriately (typos, ambiguous queries)
- Performance is acceptable (query response time)

