Getting Started

What is OpsCompanion

A living map of your infrastructure that turns complexity into clarity.

OpsCompanion builds a living map of your infrastructure, capturing both technical connections and business context so your team can understand and navigate your systems.

The Problem

Infrastructure starts small, but then it grows. What used to be a clear picture becomes a sprawling ecosystem:

  • Services spread across AWS, GCP, Azure, and more
  • Hidden connections between systems you didn't know existed
  • Documentation that's outdated the day it's written
  • Critical knowledge that lives only in people's heads

When someone leaves, that tribal knowledge goes with them.

What OpsCompanion Does

OpsCompanion creates a living map of your infrastructure:

  • Connects to your cloud providers and development tools
  • Automatically discovers resources and relationships
  • Lets you add business context that machines can't figure out
  • Provides AI-assisted investigation that understands your infrastructure, dependencies, and ownership

This is not a static diagram you draw once and forget. It's a dynamic, real-time view that builds itself.

What OpsCompanion Provides

  • Living map - Automatically updated as your infrastructure changes
  • Technical connections - Dependencies derived from actual configuration
  • Business context - Manual links and notes that capture tribal knowledge
  • Context-aware AI - An assistant that understands your specific systems

How It Works

  1. Connect - Install integrations with your cloud providers and tools
  2. Discover - OpsCompanion automatically maps resources and relationships
  3. Enrich - Add business context, ownership, and human knowledge
  4. Explore - Ask questions and investigate using AI that understands your infrastructure

Business Context

OpsCompanion lets you add the human knowledge that machines can't discover:

  • Link resources that are connected for business reasons
  • Add notes explaining why things exist
  • Capture tribal knowledge before it's lost
  • Document the "why" alongside the "what"

This turns abstract infrastructure into something the whole team can understand.

Current Access Model

  • Integrations use read-only access
  • No write permissions currently requested
  • Does not modify your infrastructure
  • You control what access is granted

Use Cases

  • Incident investigation - Follow connections to find root causes
  • Change impact - Understand blast radius before deploying
  • Onboarding - New team members can explore and understand
  • Knowledge capture - Document tribal knowledge before it's lost

What Makes It Different

Most reliability tools break down as systems grow more complex.

Traditional approaches fail because:

  • Static diagrams are obsolete as soon as production changes
  • Documentation drifts away from reality
  • Critical operational context lives only in people's heads
  • Most tools surface signals, not understanding

As a result, teams respond to symptoms instead of preventing failure.

Why "AI SRE" Misses the Point

The current wave of AI SRE tools is built around a flawed assumption: that the primary problem in reliability is how fast you respond.

These systems focus on:

  • Faster detection
  • Faster triage
  • Faster remediation

But speed is not the bottleneck.

Most incidents are not caused by missing alerts or slow humans. They are caused by unknown dependencies, unclear ownership, and hidden blast radius.

AI SRE tools attempt to fix outcomes without fixing understanding.

Reliability Breaks Before the Incident

By the time an AI system is trying to resolve an issue, the real failure has already happened.

The failure was:

  • A change made without knowing what depended on it
  • A service owned by no one in particular
  • A system no single person fully understood anymore

No amount of automated remediation can fix that.

A Different Use of AI

OpsCompanion uses AI where it actually helps reliability:

  • Making system structure explicit
  • Preserving operational context over time
  • Explaining how services, infrastructure, and teams connect
  • Showing impact before changes are deployed

This is not about reacting faster. It is about removing the conditions that create incidents in the first place.

Proactive Reliability Requires Understanding

You cannot automate your way out of complexity.

Reliability improves when teams can:

  • See the system as it actually exists
  • Understand how changes propagate
  • Share context instead of rediscovering it during incidents

OpsCompanion exists to restore that understanding.

Next Steps

On this page