What is OpsCompanion
A living map of your infrastructure that turns complexity into clarity.
OpsCompanion builds a living map of your infrastructure, capturing both technical connections and business context so your team can understand and navigate your systems.
The Problem
Infrastructure starts small, but then it grows. What used to be a clear picture becomes a sprawling ecosystem:
- Services spread across AWS, GCP, Azure, and more
- Hidden connections between systems you didn't know existed
- Documentation that's outdated the day it's written
- Critical knowledge that lives only in people's heads
When someone leaves, that tribal knowledge goes with them.
What OpsCompanion Does
OpsCompanion creates a living map of your infrastructure:
- Connects to your cloud providers and development tools
- Automatically discovers resources and relationships
- Lets you add business context that machines can't figure out
- Provides AI-assisted investigation that understands your infrastructure, dependencies, and ownership
This is not a static diagram you draw once and forget. It's a dynamic, real-time view that builds itself.
What OpsCompanion Provides
- Living map - Automatically updated as your infrastructure changes
- Technical connections - Dependencies derived from actual configuration
- Business context - Manual links and notes that capture tribal knowledge
- Context-aware AI - An assistant that understands your specific systems
How It Works
- Connect - Install integrations with your cloud providers and tools
- Discover - OpsCompanion automatically maps resources and relationships
- Enrich - Add business context, ownership, and human knowledge
- Explore - Ask questions and investigate using AI that understands your infrastructure
Business Context
OpsCompanion lets you add the human knowledge that machines can't discover:
- Link resources that are connected for business reasons
- Add notes explaining why things exist
- Capture tribal knowledge before it's lost
- Document the "why" alongside the "what"
This turns abstract infrastructure into something the whole team can understand.
Current Access Model
- Integrations use read-only access
- No write permissions currently requested
- Does not modify your infrastructure
- You control what access is granted
Use Cases
- Incident investigation - Follow connections to find root causes
- Change impact - Understand blast radius before deploying
- Onboarding - New team members can explore and understand
- Knowledge capture - Document tribal knowledge before it's lost
What Makes It Different
Most reliability tools break down as systems grow more complex.
Traditional approaches fail because:
- Static diagrams are obsolete as soon as production changes
- Documentation drifts away from reality
- Critical operational context lives only in people's heads
- Most tools surface signals, not understanding
As a result, teams respond to symptoms instead of preventing failure.
Why "AI SRE" Misses the Point
The current wave of AI SRE tools is built around a flawed assumption: that the primary problem in reliability is how fast you respond.
These systems focus on:
- Faster detection
- Faster triage
- Faster remediation
But speed is not the bottleneck.
Most incidents are not caused by missing alerts or slow humans. They are caused by unknown dependencies, unclear ownership, and hidden blast radius.
AI SRE tools attempt to fix outcomes without fixing understanding.
Reliability Breaks Before the Incident
By the time an AI system is trying to resolve an issue, the real failure has already happened.
The failure was:
- A change made without knowing what depended on it
- A service owned by no one in particular
- A system no single person fully understood anymore
No amount of automated remediation can fix that.
A Different Use of AI
OpsCompanion uses AI where it actually helps reliability:
- Making system structure explicit
- Preserving operational context over time
- Explaining how services, infrastructure, and teams connect
- Showing impact before changes are deployed
This is not about reacting faster. It is about removing the conditions that create incidents in the first place.
Proactive Reliability Requires Understanding
You cannot automate your way out of complexity.
Reliability improves when teams can:
- See the system as it actually exists
- Understand how changes propagate
- Share context instead of rediscovering it during incidents
OpsCompanion exists to restore that understanding.
Next Steps
- Quick Start - Get connected in minutes
- How It Works - Understand the living map