Request early access

architecture-overview.md

Rendered from /docs/introduction/architecture-overview.md

FixFast Architecture & System Overview

FixFast provides alert and incident intelligence with a deterministic, explainable pipeline. This overview describes the system components, data flow, and controls for multi-tenancy and access.

Architecture Overview

FixFast consists of these major components:

  • Alert sources
  • Ingestion layer
  • Alert processing and grouping
  • Incident correlation
  • Incident storage
  • Explainable summaries
  • Incident Pattern Intelligence
  • API layer
  • User interfaces and integrations
  • Security and access control
flowchart LR
    A[Grafana Alertmanager] -->|Webhooks| B[FixFast Ingestion Layer]

    B --> C[Alert Processing & Grouping]
    C --> D[Incident Correlation Engine]

    D --> E[Incident Store]
    D --> F[Explainable Summaries]

    E --> G[Incident Pattern Intelligence]
    G --> H[Aggregated Metrics Store]

    F --> I[FixFast API]
    E --> I
    H --> I

    I --> J[FixFast Web UI]
    I --> K[Slack Integration]
    I --> L[Generic Webhooks]

    subgraph "Security & Access"
        M["Org Isolation (org_id)"]
        N["RBAC: Admin / Editor / Viewer"]
    end

    I --> M
    I --> N

Component Descriptions

1. Alert Sources

  • Primary supported source: Grafana Alertmanager
  • Alerts are sent via webhooks with severity, service, environment, and labels
  • Purpose: Provide reliable alert signals into FixFast

2. Ingestion Layer

  • Receives alerts from external systems
  • Validates payload structure and applies retry handling
  • Associates alerts with the correct organization (org_id)
  • Purpose: Ensure secure, reliable alert intake

3. Alert Processing & Grouping

  • Normalizes incoming alerts
  • Applies deterministic grouping rules using fingerprints, time windows, and shared context
  • Purpose: Reduce alert noise and prepare alerts for correlation

4. Incident Correlation Engine

  • Groups related alerts into incidents
  • Identifies primary and supporting signals
  • Records grouping rationale for auditability
  • Purpose: Create explainable, trustworthy incidents

5. Incident Store

  • Stores active and resolved incidents
  • Maintains alert-to-incident relationships
  • Enforces retention and deletion policies
  • Purpose: Provide a reliable source of incident data

6. Explainable Summaries

  • Generates structured summaries per incident
  • Includes what happened, why alerts were grouped, probable causes, and recommended actions
  • Purpose: Help teams understand incidents quickly and clearly

7. Incident Pattern Intelligence

  • Operates on aggregated incident data only
  • Produces incident volume trends, exposure analysis, alert noise trends, and recovery speed (MTTR)
  • Does not retain raw alerts beyond retention
  • Purpose: Enable long-term learning and prevention

8. API Layer

  • Provides programmatic access to FixFast data
  • Secured via authentication; all requests scoped by org_id
  • Enforced by RBAC
  • Purpose: Enable UI, integrations, and automation

9. User Interfaces & Integrations

  • Web UI for engineers and operators
  • Slack integration for notifications
  • Generic webhooks for external systems
  • Purpose: Deliver insights where teams work

10. Security & Access Control

  • Multi-tenancy: Each organization is fully isolated; data is scoped by org_id; no cross-tenant access.
  • Role-Based Access Control (RBAC): Admin, Editor, Viewer; each request is validated against role permissions.
  • Purpose: Ensure secure, controlled access.

Architecture Principles

  • Deterministic behavior
  • Explainability over automation
  • Separation of real-time and historical analysis
  • Strong tenant isolation
  • Predictable retention and deletion
FixFast — Alert & Incident Intelligence