Knowledge Base Query Report
A semantic benchmark run of 47 queries against the indexed collection has completed. The knowledge base covers 312 documents, 8,941 chunks, with retrieval averaging 22ms. Coverage is strong at 84% — but 6 query categories return low-confidence results, indicating content gaps or chunking issues.
Attention Required
6 query categories returning low-confidence results. These represent either missing content or documents that are not chunking well — long tables, scanned PDFs, or heavily formatted files.
- Leave entitlements — 3 queries, max similarity 0.41. Content may exist but is buried in HR handbook appendices.
- Emergency procedures — 2 queries, max similarity 0.38. Likely in a scanned PDF not processing correctly.
- Contractor onboarding — 4 queries, max similarity 0.44. No dedicated document found. Gap confirmed.
Query Results by Category
HR, compliance, and finance show strong retrieval. Leave, emergency, and contractor categories are the problem areas.
Retrieval Performance Distribution
83% of queries complete under 25ms. The two outliers above 50ms both involve multi-document synthesis queries — expected behaviour.
Document Processing Issues
Processing Well
- Standard Word documents (.docx)
- Plain text policies (.txt, .md)
- Native PDFs with selectable text
- Structured HTML exports
- Spreadsheets with clear headers
Processing Poorly
- Scanned PDFs (OCR quality variable)
- Documents with large embedded tables
- Files with heavy header/footer repetition
- Multi-column layouts losing reading order
- Password-protected files (skipped entirely)
Content Gap Analysis
No document covers contractor onboarding process. 4 queries all return < 0.45 similarity. Document needs to be created.
Emergency evacuation procedure exists but is a scanned PDF. Re-ingest as native PDF or Word document.
Content exists in HR Handbook appendix (pages 34–41) but appendix is a separate embedded file. Extract and index separately.
Referenced in 3 other documents but no standalone policy document found.
6 months out of date. Queries return old content confidently. Flag for review.
Table-heavy document chunking poorly. Split into separate documents by approval level.
Recommended Actions
Draft new document covering induction, access provisioning, and compliance requirements. Index immediately.
Convert scanned PDF to Word, re-index. Estimated retrieval improvement from 0.38 → 0.75+.
Split appendix sections into individual indexed documents. Improves leave query coverage significantly.
Consolidate scattered references into a single authoritative document.
Reduce chunk size from 512 to 256 tokens for finance approval documents. Reindex these 4 files only.
Set automated re-index to catch updated documents. IT Security doc currently 6 months stale.
| Rating | Label |
|---|---|
| strong | Bottom Line |
The knowledge base is performing well where content exists. The 84% coverage score reflects genuine content gaps, not system failures. Three high-priority actions — creating the contractor onboarding document, re-processing the emergency procedures PDF, and extracting HR appendices — will push coverage above 95%. The retrieval speed is already strong at 22ms average and requires no tuning. Once gaps are addressed, this collection is ready for production use.