Enterprise Knowledge Base

The AccelOS Knowledge Base is a central repository for your team’s operational knowledge. It stores runbooks, procedures, and learnings that the AI agent can reference during incident investigations.

Why a Knowledge Base?

When an incident occurs, your team’s collective knowledge is often scattered across wikis, Slack threads, and tribal memory. The AccelOS Knowledge Base:

Centralizes operational knowledge in one searchable location
Enables AI to reference your team’s expertise during investigations
Captures learnings from incidents to prevent future issues
Standardizes procedures across your team

Content Types

Alert Runbooks

Step-by-step procedures for handling specific alerts and incident types.

Integrations

Service-specific troubleshooting guides tied to your integrations.

Generic Instructions

General operational procedures and best practices.

Memories

Learnings from past investigations and incidents.

How It Works

1. Create Documentation

Write runbooks and procedures using markdown. Each document includes:

Title - Clear, searchable name
Category - Alert Runbook, Integration, Generic Instruction, or Memory
Content - Step-by-step procedures, context, and commands
Metadata - Associated alerts, integrations, or services

2. AI Agent Access

During investigations, the AI agent automatically:

Searches the knowledge base for relevant documents
References runbooks that match the incident type
Suggests procedures based on your team’s documented knowledge
Cites sources so you can verify recommendations

3. Continuous Improvement

After each incident:

Document learnings in the knowledge base
Update runbooks based on what worked
Add new procedures for novel issues
Build organizational memory over time

Document Structure

Each knowledge base document follows a consistent structure:

---
category: alert_runbooks
alert_name: HighCPUUsage
integration_id: <uuid>
---

# High CPU Usage Runbook

## Symptoms
- CPU usage above 90% for more than 5 minutes
- Increased latency on affected services

## Investigation Steps
1. Check which processes are consuming CPU
2. Review recent deployments
3. Check for traffic spikes

## Remediation
1. Scale horizontally if load-related
2. Restart affected pods if process is stuck
3. Roll back if caused by recent deployment

## Escalation
Contact the platform team if issue persists > 30 minutes.

Categories Explained

Alert Runbooks

Alert runbooks are tied to specific alerts or incident types. They provide:

Clear symptoms to identify the issue
Step-by-step investigation procedures
Remediation actions
Escalation paths

Best for: Recurring alerts, known failure modes, standard operating procedures.

Integrations

Integration docs contain service-specific knowledge for your connected tools:

How to interpret specific metrics
Common queries and commands
Service-specific troubleshooting
Architecture context

Best for: Service-specific knowledge, tool documentation, integration guides.

Generic Instructions

General procedures that apply across services:

Incident response protocols
Communication templates
General debugging techniques
Tool usage guides

Best for: Team-wide procedures, best practices, onboarding materials.

Memories

Learnings captured from past investigations:

Root cause analyses
Novel problems and solutions
Lessons learned
Post-mortem insights

Best for: Building institutional knowledge, preventing repeat incidents.

Features

Version History

All documents are version-controlled:

Track changes over time
See who made edits
Revert to previous versions if needed

AI-Generated Reviews

When you save a document, the system can:

Suggest improvements for clarity
Identify missing sections
Flag potential issues

Search & Discovery

Find relevant documents through:

Full-text search
Category filters
Integration/alert associations
AI-powered recommendations

Best Practices

Start with High-Impact Runbooks

Begin by documenting your most frequent or critical alerts:

Alerts that page on-call regularly
Incidents that take longest to resolve
Issues that require specialized knowledge

Keep Procedures Actionable

Write for the 3 AM on-call engineer:

Use clear, numbered steps
Include actual commands (with placeholders)
Specify expected outputs
Define clear escalation criteria

Update After Incidents

Treat runbooks as living documents:

Update procedures that didn’t work
Add new steps discovered during investigation
Remove outdated information
Capture novel solutions as memories

Link to Integrations

Associate documents with relevant integrations:

Tag Datadog alerts with corresponding runbooks
Link service docs to GitHub repositories
Connect procedures to monitoring dashboards

Getting Started

Setup Guide

Step-by-step guide to setting up your knowledge base.

Getting Started

Integrations

Knowledge Base

Administration

Knowledge Base Overview

Enterprise Knowledge Base

Why a Knowledge Base?

Content Types

Alert Runbooks

Integrations

Generic Instructions

Memories

How It Works

1. Create Documentation

2. AI Agent Access

3. Continuous Improvement

Document Structure

Categories Explained

Alert Runbooks

Integrations

Generic Instructions

Memories

Features

Version History

AI-Generated Reviews

Search & Discovery

Best Practices

Getting Started

Setup Guide

Getting Started

Integrations

Knowledge Base

Administration

​Enterprise Knowledge Base

​Why a Knowledge Base?

​Content Types

Alert Runbooks

Integrations

Generic Instructions

Memories

​How It Works

​1. Create Documentation

​2. AI Agent Access

​3. Continuous Improvement

​Document Structure

​Categories Explained

​Alert Runbooks

​Integrations

​Generic Instructions

​Memories

​Features

​Version History

​AI-Generated Reviews

​Search & Discovery

​Best Practices

​Getting Started

Setup Guide

Enterprise Knowledge Base

Why a Knowledge Base?

Content Types

How It Works

1. Create Documentation

2. AI Agent Access

3. Continuous Improvement

Document Structure

Categories Explained

Alert Runbooks

Integrations

Generic Instructions

Memories

Features

Version History

AI-Generated Reviews

Search & Discovery

Best Practices

Getting Started