Enterprise Knowledge Base
The AccelOS Knowledge Base is a central repository for your team’s operational knowledge. It stores runbooks, procedures, and learnings that the AI agent can reference during incident investigations.Why a Knowledge Base?
When an incident occurs, your team’s collective knowledge is often scattered across wikis, Slack threads, and tribal memory. The AccelOS Knowledge Base:- Centralizes operational knowledge in one searchable location
- Enables AI to reference your team’s expertise during investigations
- Captures learnings from incidents to prevent future issues
- Standardizes procedures across your team
Content Types
Alert Runbooks
Step-by-step procedures for handling specific alerts and incident types.
Integrations
Service-specific troubleshooting guides tied to your integrations.
Generic Instructions
General operational procedures and best practices.
Memories
Learnings from past investigations and incidents.
How It Works
1. Create Documentation
Write runbooks and procedures using markdown. Each document includes:- Title - Clear, searchable name
- Category - Alert Runbook, Integration, Generic Instruction, or Memory
- Content - Step-by-step procedures, context, and commands
- Metadata - Associated alerts, integrations, or services
2. AI Agent Access
During investigations, the AI agent automatically:- Searches the knowledge base for relevant documents
- References runbooks that match the incident type
- Suggests procedures based on your team’s documented knowledge
- Cites sources so you can verify recommendations
3. Continuous Improvement
After each incident:- Document learnings in the knowledge base
- Update runbooks based on what worked
- Add new procedures for novel issues
- Build organizational memory over time
Document Structure
Each knowledge base document follows a consistent structure:Categories Explained
Alert Runbooks
Alert runbooks are tied to specific alerts or incident types. They provide:- Clear symptoms to identify the issue
- Step-by-step investigation procedures
- Remediation actions
- Escalation paths
Integrations
Integration docs contain service-specific knowledge for your connected tools:- How to interpret specific metrics
- Common queries and commands
- Service-specific troubleshooting
- Architecture context
Generic Instructions
General procedures that apply across services:- Incident response protocols
- Communication templates
- General debugging techniques
- Tool usage guides
Memories
Learnings captured from past investigations:- Root cause analyses
- Novel problems and solutions
- Lessons learned
- Post-mortem insights
Features
Version History
All documents are version-controlled:- Track changes over time
- See who made edits
- Revert to previous versions if needed
AI-Generated Reviews
When you save a document, the system can:- Suggest improvements for clarity
- Identify missing sections
- Flag potential issues
Search & Discovery
Find relevant documents through:- Full-text search
- Category filters
- Integration/alert associations
- AI-powered recommendations
Best Practices
Start with High-Impact Runbooks
Start with High-Impact Runbooks
Begin by documenting your most frequent or critical alerts:
- Alerts that page on-call regularly
- Incidents that take longest to resolve
- Issues that require specialized knowledge
Keep Procedures Actionable
Keep Procedures Actionable
Write for the 3 AM on-call engineer:
- Use clear, numbered steps
- Include actual commands (with placeholders)
- Specify expected outputs
- Define clear escalation criteria
Update After Incidents
Update After Incidents
Treat runbooks as living documents:
- Update procedures that didn’t work
- Add new steps discovered during investigation
- Remove outdated information
- Capture novel solutions as memories
Link to Integrations
Link to Integrations
Associate documents with relevant integrations:
- Tag Datadog alerts with corresponding runbooks
- Link service docs to GitHub repositories
- Connect procedures to monitoring dashboards
Getting Started
Setup Guide
Step-by-step guide to setting up your knowledge base.