PagerDuty Escalation Policies for DDoS Incidents

The Alert Fatigue Problem

DDoS detection systems that send every alert at the same priority level create a dangerous situation: on-call engineers start ignoring alerts. When a 500 PPS blip and a 500,000 PPS sustained attack both trigger the same page, the team learns to dismiss notifications. Eventually, a real attack escalates unchecked because the alert looked identical to the dozen false alarms that came before it.

The solution is severity-based escalation. Flowtriq classifies every detected event into one of four severity levels, and PagerDuty's escalation policies route each level differently. A low-severity anomaly creates an informational ticket. A critical attack pages the on-call engineer and auto-escalates to the team lead if unacknowledged within 5 minutes.

Flowtriq's Severity Levels

Flowtriq assigns severity based on three factors: the magnitude of the anomaly relative to baseline, the duration of the event, and the attack classification confidence score. The four levels are:

Info (P4): A brief anomaly that exceeded the fast baseline but returned to normal within 10 seconds. Likely a legitimate traffic spike. No page is sent; the event is logged in the Flowtriq dashboard and optionally sent to a low-priority PagerDuty service.
Warning (P3): A sustained anomaly lasting more than 10 seconds or exceeding both fast and slow baselines. This could be a small attack or an unusual but legitimate traffic pattern. Sent to PagerDuty as a low-urgency incident that notifies via push notification but does not page.
High (P2): A confirmed attack that exceeds baselines by 5x or more, or an event that matches a known attack IOC pattern. Sent to PagerDuty as a high-urgency incident that pages the primary on-call engineer.
Critical (P1): An attack exceeding baselines by 10x or more, multiple simultaneous attack vectors detected, or traffic approaching the interface line rate. Sent as a critical PagerDuty incident with aggressive escalation and auto-notification to the incident commander.

Setting Up the Integration

Connecting Flowtriq to PagerDuty takes about 5 minutes. Here is the step-by-step process:

Step 1: Create PagerDuty Services

We recommend creating two PagerDuty services for Flowtriq alerts: one for low-urgency events (Info and Warning) and one for high-urgency events (High and Critical). This separation allows you to assign different escalation policies to each service.

In PagerDuty, navigate to Services > New Service and create:

Flowtriq - DDoS Monitoring (low urgency, email/push notification only)
Flowtriq - DDoS Critical (high urgency, phone call + SMS escalation)

For each service, select "Events API v2" as the integration type and copy the resulting integration key.

Step 2: Configure Flowtriq Alert Channels

In the Flowtriq dashboard, navigate to Settings > Alert Channels and add a new PagerDuty channel for each service. Paste the corresponding integration key and assign the severity mapping:

{
  "channel": "pagerduty",
  "name": "PagerDuty - Critical",
  "integration_key": "your-integration-key-here",
  "severity_filter": ["high", "critical"],
  "dedup_key_template": "flowtriq-{{node_id}}-{{attack_id}}"
}

{
  "channel": "pagerduty",
  "name": "PagerDuty - Monitoring",
  "integration_key": "your-other-key-here",
  "severity_filter": ["info", "warning"],
  "dedup_key_template": "flowtriq-{{node_id}}-{{attack_id}}"
}

The dedup_key_template is important. It ensures that multiple alerts from the same attack on the same node are grouped into a single PagerDuty incident rather than creating a flood of duplicate pages. When the attack ends, Flowtriq sends a resolve event with the same dedup key, automatically closing the incident in PagerDuty.

Step 3: Configure Escalation Policies

In PagerDuty, create an escalation policy for the critical service that matches your team's incident response process. A typical DDoS escalation policy looks like this:

Level 1 (0 minutes): Primary on-call network engineer. Notified via phone call and SMS.
Level 2 (5 minutes): If unacknowledged, escalate to the secondary on-call and the team lead. Notified via phone call.
Level 3 (15 minutes): If still unacknowledged, escalate to the VP of Engineering and the incident commander. Notified via phone call and email to the engineering-incidents mailing list.

For the monitoring service, a single-level escalation policy with push-notification-only is usually sufficient. These events are informational and should be reviewed during business hours, not at 3 AM.

Fine-Tuning Severity Thresholds

Flowtriq's default severity thresholds work well for most deployments, but you can customize them per node or per node group. Common adjustments include:

Lowering the Critical threshold for production databases. A database server that should never see significant traffic spikes might warrant a Critical alert at 3x baseline instead of the default 10x.
Raising the Warning threshold for edge proxies. Servers that handle bursty client traffic can use a higher Warning threshold (e.g., 5x instead of the default 2x) to reduce noise.
Disabling Info events for development environments. Non-production nodes often have erratic traffic patterns. Suppressing Info-level events for these nodes keeps the monitoring service clean.

All threshold adjustments are made through the Flowtriq dashboard under Settings > Detection Policies or via the API for programmatic configuration management.

What a Well-Configured Setup Looks Like

After deploying Flowtriq with PagerDuty integration across your infrastructure, the expected behavior is:

Daily: A handful of Info events logged in the dashboard from normal traffic variance. No pages, no noise.
Weekly: Occasional Warning events from traffic spikes (marketing campaigns, deployments, etc.). The on-call sees a push notification, confirms it is expected, and acknowledges the incident.
Monthly: One or two High events from small-scale attacks or scanning activity. The on-call is paged, reviews the Flowtriq dashboard and PCAP, and takes appropriate action.
Rarely: Critical events from significant attacks. The entire escalation chain is activated, and the team responds with the urgency the situation demands.

Result: Teams using Flowtriq's severity-based PagerDuty integration report an 80% reduction in after-hours pages while maintaining a 100% detection rate for attacks exceeding 2x baseline.

PagerDuty integration is available on all Flowtriq plans starting at $9.99/mo per node. Slack, Discord, and email channels are also included. Start your free trial to set up severity-based alerting for your infrastructure.

Back to Blog

PagerDuty escalation policies
for DDoS incidents

The Alert Fatigue Problem

Flowtriq's Severity Levels

Setting Up the Integration

Step 1: Create PagerDuty Services

Step 2: Configure Flowtriq Alert Channels

Step 3: Configure Escalation Policies

Fine-Tuning Severity Thresholds

What a Well-Configured Setup Looks Like

Related Articles

PagerDuty escalation policiesfor DDoS incidents

The Alert Fatigue Problem

Flowtriq's Severity Levels

Setting Up the Integration

Step 1: Create PagerDuty Services

Step 2: Configure Flowtriq Alert Channels

Step 3: Configure Escalation Policies

Fine-Tuning Severity Thresholds

What a Well-Configured Setup Looks Like

Related Articles

Why static thresholds fail and what we use instead

Multi-vector DDoS: why single-protocol detection fails

What your PCAP can tell your ISP

PagerDuty escalation policies
for DDoS incidents