FixFast Architecture & System Overview

FixFast provides alert and incident intelligence with a deterministic, explainable pipeline. This overview describes the system components, data flow, and controls for multi-tenancy and access.

Architecture Overview

FixFast consists of these major components:

Alert sources
Ingestion layer
Alert processing and grouping
Incident correlation
Incident storage
Explainable summaries
Incident Pattern Intelligence
API layer
User interfaces and integrations
Security and access control

flowchart LR
    A[Grafana Alertmanager] -->|Webhooks| B[FixFast Ingestion Layer]

    B --> C[Alert Processing & Grouping]
    C --> D[Incident Correlation Engine]

    D --> E[Incident Store]
    D --> F[Explainable Summaries]

    E --> G[Incident Pattern Intelligence]
    G --> H[Aggregated Metrics Store]

    F --> I[FixFast API]
    E --> I
    H --> I

    I --> J[FixFast Web UI]
    I --> K[Slack Integration]
    I --> L[Generic Webhooks]

    subgraph "Security & Access"
        M["Org Isolation (org_id)"]
        N["RBAC: Admin / Editor / Viewer"]
    end

    I --> M
    I --> N

Component Descriptions

1. Alert Sources

Primary supported source: Grafana Alertmanager
Alerts are sent via webhooks with severity, service, environment, and labels
Purpose: Provide reliable alert signals into FixFast

2. Ingestion Layer

Receives alerts from external systems
Validates payload structure and applies retry handling
Associates alerts with the correct organization (org_id)
Purpose: Ensure secure, reliable alert intake

3. Alert Processing & Grouping

Normalizes incoming alerts
Applies deterministic grouping rules using fingerprints, time windows, and shared context
Purpose: Reduce alert noise and prepare alerts for correlation

4. Incident Correlation Engine

Groups related alerts into incidents
Identifies primary and supporting signals
Records grouping rationale for auditability
Purpose: Create explainable, trustworthy incidents

5. Incident Store

Stores active and resolved incidents
Maintains alert-to-incident relationships
Enforces retention and deletion policies
Purpose: Provide a reliable source of incident data

6. Explainable Summaries

Generates structured summaries per incident
Includes what happened, why alerts were grouped, probable causes, and recommended actions
Purpose: Help teams understand incidents quickly and clearly

7. Incident Pattern Intelligence

Operates on aggregated incident data only
Produces incident volume trends, exposure analysis, alert noise trends, and recovery speed (MTTR)
Does not retain raw alerts beyond retention
Purpose: Enable long-term learning and prevention

8. API Layer

Provides programmatic access to FixFast data
Secured via authentication; all requests scoped by org_id
Enforced by RBAC
Purpose: Enable UI, integrations, and automation

9. User Interfaces & Integrations

Web UI for engineers and operators
Slack integration for notifications
Generic webhooks for external systems
Purpose: Deliver insights where teams work

10. Security & Access Control

Multi-tenancy: Each organization is fully isolated; data is scoped by org_id; no cross-tenant access.
Role-Based Access Control (RBAC): Admin, Editor, Viewer; each request is validated against role permissions.
Purpose: Ensure secure, controlled access.

Architecture Principles

Deterministic behavior
Explainability over automation
Separation of real-time and historical analysis
Strong tenant isolation
Predictable retention and deletion

architecture-overview.md