Incident Management

The Elementary Cloud Platform's Incidents page is a centralized hub for managing alerts and collaborating on issue resolution. Each incident consolidates related failures, making tracking easier and preventing alert fatigue. An incident is automatically resolved when the failing tests, monitors and / or models are successful again. Users can filter incidents by status, severity, assignee, or time frame, helping them prioritize and manage incidents efficiently.  

Key Features:

  • Grouping and Notifications: Incidents are automatically grouped by error type, and soon, lineage-based grouping will further streamline management. When an incident is resolved, notifications are sent.
  • Assignment and Status Management: Users can assign incidents to team members and adjust statuses, such as setting an incident to "Acknowledged" or "Open." Once resolved, the incident is marked as closed to avoid further modifications.
  • Severity Control and Bulk Actions: Incident severity (e.g., Critical, High) can be customized, and bulk actions allow quick updates, such as changing the status or assigning multiple incidents to users.
  • Integrations: Elementary integrates with tools like PagerDuty and OpsGenie for on-call management. Upcoming integrations with Jira, Linear, and ServiceNow will enable direct ticket creation from incidents.

These features provide comprehensive oversight, helping teams manage recurring failures, assign ownership, and resolve data issues proactively. You can explore the full details of these capabilities on the official Elementary documentation pages on Incidents and Incident Management.