The Alert Fatigue Problem
DDoS detection systems that send every alert at the same priority level create a dangerous situation: on-call engineers start ignoring alerts. When a 500 PPS blip and a 500,000 PPS sustained attack both trigger the same page, the team learns to dismiss notifications. Eventually, a real attack escalates unchecked because the alert looked identical to the dozen false alarms that came before it.
The solution is severity-based escalation. Flowtriq classifies every detected event into one of four severity levels, and PagerDuty's escalation policies route each level differently. A low-severity anomaly creates an informational ticket. A critical attack pages the on-call engineer and auto-escalates to the team lead if unacknowledged within 5 minutes.
Flowtriq's Severity Levels
Flowtriq assigns severity based on three factors: the magnitude of the anomaly relative to baseline, the duration of the event, and the attack classification confidence score. The four levels are:
- Info (P4): A brief anomaly that exceeded the fast baseline but returned to normal within 10 seconds. Likely a legitimate traffic spike. No page is sent; the event is logged in the Flowtriq dashboard and optionally sent to a low-priority PagerDuty service.
- Warning (P3): A sustained anomaly lasting more than 10 seconds or exceeding both fast and slow baselines. This could be a small attack or an unusual but legitimate traffic pattern. Sent to PagerDuty as a low-urgency incident that notifies via push notification but does not page.
- High (P2): A confirmed attack that exceeds baselines by 5x or more, or an event that matches a known attack IOC pattern. Sent to PagerDuty as a high-urgency incident that pages the primary on-call engineer.
- Critical (P1): An attack exceeding baselines by 10x or more, multiple simultaneous attack vectors detected, or traffic approaching the interface line rate. Sent as a critical PagerDuty incident with aggressive escalation and auto-notification to the incident commander.
Setting Up the Integration
Connecting Flowtriq to PagerDuty takes about 5 minutes. Here is the step-by-step process:
Step 1: Create PagerDuty Services
We recommend creating two PagerDuty services for Flowtriq alerts: one for low-urgency events (Info and Warning) and one for high-urgency events (High and Critical). This separation allows you to assign different escalation policies to each service.
In PagerDuty, navigate to Services > New Service and create:
Flowtriq - DDoS Monitoring(low urgency, email/push notification only)Flowtriq - DDoS Critical(high urgency, phone call + SMS escalation)
For each service, select "Events API v2" as the integration type and copy the resulting integration key.
Step 2: Configure Flowtriq Alert Channels
In the Flowtriq dashboard, navigate to Settings > Alert Channels and add a new PagerDuty channel for each service. Paste the corresponding integration key and assign the severity mapping:
{
"channel": "pagerduty",
"name": "PagerDuty - Critical",
"integration_key": "your-integration-key-here",
"severity_filter": ["high", "critical"],
"dedup_key_template": "flowtriq-{{node_id}}-{{attack_id}}"
}
{
"channel": "pagerduty",
"name": "PagerDuty - Monitoring",
"integration_key": "your-other-key-here",
"severity_filter": ["info", "warning"],
"dedup_key_template": "flowtriq-{{node_id}}-{{attack_id}}"
}
The dedup_key_template is important. It ensures that multiple alerts from the same attack on the same node are grouped into a single PagerDuty incident rather than creating a flood of duplicate pages. When the attack ends, Flowtriq sends a resolve event with the same dedup key, automatically closing the incident in PagerDuty.
Step 3: Configure Escalation Policies
In PagerDuty, create an escalation policy for the critical service that matches your team's incident response process. A typical DDoS escalation policy looks like this:
- Level 1 (0 minutes): Primary on-call network engineer. Notified via phone call and SMS.
- Level 2 (5 minutes): If unacknowledged, escalate to the secondary on-call and the team lead. Notified via phone call.
- Level 3 (15 minutes): If still unacknowledged, escalate to the VP of Engineering and the incident commander. Notified via phone call and email to the engineering-incidents mailing list.
For the monitoring service, a single-level escalation policy with push-notification-only is usually sufficient. These events are informational and should be reviewed during business hours, not at 3 AM.
Fine-Tuning Severity Thresholds
Flowtriq's default severity thresholds work well for most deployments, but you can customize them per node or per node group. Common adjustments include:
- Lowering the Critical threshold for production databases. A database server that should never see significant traffic spikes might warrant a Critical alert at 3x baseline instead of the default 10x.
- Raising the Warning threshold for edge proxies. Servers that handle bursty client traffic can use a higher Warning threshold (e.g., 5x instead of the default 2x) to reduce noise.
- Disabling Info events for development environments. Non-production nodes often have erratic traffic patterns. Suppressing Info-level events for these nodes keeps the monitoring service clean.
All threshold adjustments are made through the Flowtriq dashboard under Settings > Detection Policies or via the API for programmatic configuration management.
What a Well-Configured Setup Looks Like
After deploying Flowtriq with PagerDuty integration across your infrastructure, the expected behavior is:
- Daily: A handful of Info events logged in the dashboard from normal traffic variance. No pages, no noise.
- Weekly: Occasional Warning events from traffic spikes (marketing campaigns, deployments, etc.). The on-call sees a push notification, confirms it is expected, and acknowledges the incident.
- Monthly: One or two High events from small-scale attacks or scanning activity. The on-call is paged, reviews the Flowtriq dashboard and PCAP, and takes appropriate action.
- Rarely: Critical events from significant attacks. The entire escalation chain is activated, and the team responds with the urgency the situation demands.
Result: Teams using Flowtriq's severity-based PagerDuty integration report an 80% reduction in after-hours pages while maintaining a 100% detection rate for attacks exceeding 2x baseline.
PagerDuty integration is available on all Flowtriq plans starting at $9.99/mo per node. Slack, Discord, and email channels are also included. Start your free trial to set up severity-based alerting for your infrastructure.
Back to Blog