Back to Blog

Incident Summary

In April 2024, OVHcloud published a detailed disclosure of DDoS attacks they had been observing and mitigating across their global infrastructure since the beginning of the year. The numbers were unprecedented — not in bandwidth, but in packet rate. Here is what we know.

ParameterDetail
Time periodJanuary — April 2024
Peak packet rate840 Mpps (million packets per second)
Peak bandwidth~500 Gbps
Primary attack vectorTCP ACK floods, small-packet UDP floods
Attack source~5,000 compromised MikroTik routers (active per attack)
Exploitable MikroTik pool~99,382 devices identified as potentially vulnerable
Previous PPS record~809 Mpps (reported by Akamai in 2020)
TargetOVHcloud customer infrastructure (bare-metal, VPS, cloud)

The 840 Mpps figure is the critical number here. At roughly 500 Gbps of bandwidth, this attack would not have made the top 10 largest DDoS events by volume. But 840 million packets per second is a fundamentally different kind of threat — one that targets the processing plane, not the transport plane.

Why PPS Matters More Than Bandwidth

The DDoS mitigation industry has spent a decade competing on bandwidth capacity. Scrubbing centers advertise 10, 15, 20+ Tbps of mitigation throughput. Hosting providers list "DDoS protection up to X Gbps" on their marketing pages. But bandwidth is only one axis of a DDoS attack. The other axis — packets per second — is what actually kills servers, and it is chronically undermonitored.

Consider the math. An 840 Mpps attack at ~500 Gbps means each packet averaged roughly 74 bytes on the wire. Strip away Ethernet framing (preamble, SFD, IFG = 20 bytes) and each IP-layer packet was around 54 bytes — the minimum size for a TCP ACK with no payload. These are not large packets. They do not fill pipes. They fill processing queues.

Every packet that arrives at a server triggers a chain of operations:

  1. The NIC raises a hardware interrupt (or the NAPI polling loop picks up the packet from the ring buffer)
  2. The kernel's softirq handler runs the network stack: header parsing, checksum validation, netfilter/nftables rule evaluation
  3. If conntrack is active, the kernel looks up (or creates) a connection tracking entry
  4. The packet is matched against socket structures and either delivered to a socket or dropped

Each of these steps costs CPU cycles. The total cost per packet on a modern Linux server with iptables/nftables and conntrack enabled is approximately 1,500 to 5,000 nanoseconds, depending on ruleset complexity. At 840 Mpps:

# Per-packet processing cost calculation
#
# Assume 2,500 ns per packet (moderate iptables ruleset)
# 840,000,000 packets/sec x 2,500 ns = 2,100,000,000,000 ns/sec
# = 2,100 seconds of CPU time per second
#
# A 16-core server at 3.5 GHz delivers roughly:
# 16 cores x 1,000,000,000 ns/sec = 16,000,000,000 ns/sec of capacity
# = 16 seconds of CPU time per second
#
# Required CPU capacity: 2,100 / 16 = 131x more CPU than available
#
# The bandwidth pipe (500 Gbps) might not even be full,
# but the server needs 131x its total CPU capacity to process every packet.

This is why a "small" 500 Gbps attack at 840 Mpps is far more destructive to end servers than a 2 Tbps attack using 1,400-byte packets. The 2 Tbps attack delivers ~178 Mpps. The "smaller" attack delivers 4.7x more packets. Each packet demands the same CPU processing regardless of its payload size.

Bandwidth-Focused vs. PPS-Focused Attacks

CharacteristicBandwidth-FocusedPPS-Focused
Packet size1,200 - 1,500 bytes64 - 128 bytes
Typical vectorsDNS amplification, NTP monlist, memcachedTCP ACK floods, small UDP, SYN floods
Primary bottleneckLink capacity / upstream pipeCPU, NIC ring buffers, interrupt handling
Detection via bandwidth monitoringEasy — large spike in GbpsOften missed — bandwidth may look normal
Detection via PPS monitoringModerate spikeMassive spike — primary indicator
Conntrack impactModerateSevere — rapid table exhaustion
Mitigation approachUpstream scrubbing, blackholingXDP/eBPF, NIC offload, kernel bypass

If your monitoring only tracks bandwidth (Gbps), you are blind to the most dangerous class of DDoS attacks. A 500 Gbps attack might not even trigger an alert on a 10 Gbps-monitored link — but 840 Mpps will bring the server to its knees in seconds.

The MikroTik Router Problem

What made the OVHcloud attacks uniquely powerful was not a new technique — it was the attack infrastructure. The packet floods originated from compromised MikroTik routers, and MikroTik devices are exceptionally good at generating packets.

Why MikroTik Devices Are Perfect Weapons

MikroTik's RouterOS runs on custom hardware with dedicated packet-forwarding silicon. Their higher-end models (CCR series) use multi-core ARM or Tilera processors specifically designed for per-packet processing. A single MikroTik CCR1036, for instance, can forward 24+ million packets per second at line rate. This is a $500 router generating more PPS than most servers can handle.

In the context of a botnet, this means:

  • 5,000 compromised MikroTik CCR devices, each generating a conservative 150,000 pps of attack traffic, collectively produce 750 Mpps. Higher-end models with the Bandwidth Test tool enabled can push significantly more.
  • MikroTik's built-in /tool/bandwidth-test can generate UDP floods natively — the attacker does not need to upload custom malware. They just need RouterOS credentials or a working exploit.
  • The ARM/MIPS SoCs in these routers are optimized for small-packet throughput. Unlike general-purpose x86 servers, they handle minimum-size frames without performance degradation.

The Vulnerability Chain

The primary entry point for MikroTik compromise has been CVE-2018-14847, a critical vulnerability in the Winbox management interface. This vulnerability allows unauthenticated remote attackers to read arbitrary files from the router — including the user database, which stores credentials in a reversible format. From there, full RouterOS access is trivial.

Despite being disclosed in 2018, the vulnerability remains unpatched on a staggering number of devices. OVHcloud's investigation identified approximately 99,382 MikroTik devices accessible on the public internet with potentially exploitable firmware versions. Additional RouterOS CVEs from 2019 through 2023 expanded the attack surface further.

The timeline of compromise looks like this:

2018

CVE-2018-14847 disclosed. MikroTik releases patches. Thousands of devices are already compromised before patches are applied. Many are never patched.

2019-2022

Compromised MikroTik routers are used for cryptomining (Glupteba), spam relay, and proxy networks. The devices are "owned" by multiple threat actors simultaneously. Additional CVEs (CVE-2019-3943, CVE-2019-3924) provide alternate entry points.

2023

Threat actors begin weaponizing the Bandwidth Test tool on compromised MikroTik devices for high-PPS DDoS. The technique spreads through underground forums. Botnet operators realize these routers outperform IoT botnets at packet generation by 10-100x per device.

Early 2024

OVHcloud begins observing sustained packet-rate attacks exceeding 100 Mpps. Multiple attacks escalate through the year, culminating in the 840 Mpps record. Source analysis confirms MikroTik infrastructure as the primary origin.

How Small-Packet Floods Work

The minimum Ethernet frame size on the wire is 64 bytes (excluding preamble and inter-frame gap). For TCP, a minimum-size ACK with no payload breaks down as:

# Minimum TCP ACK packet structure (no payload)
#
# Ethernet header:     14 bytes (dst MAC + src MAC + EtherType)
# IP header:           20 bytes (no options)
# TCP header:          20 bytes (no options, ACK flag set)
# Ethernet padding:    10 bytes (pad to 64-byte minimum)
# -----------------------------------------
# Total on wire:       64 bytes
#
# With Ethernet preamble (8) + IFG (12):
# Total line rate:     84 bytes per frame
#
# Maximum PPS on 100 Gbps link:
# 100,000,000,000 bits/sec / (84 bytes x 8 bits) = 148,809,523 pps
# = ~148.8 Mpps per 100G interface
#
# To reach 840 Mpps, the attack used traffic spread across
# multiple ingress points — consistent with a distributed botnet.

These minimum-size packets are devastating because they bypass the assumption that most monitoring tools make: that attack traffic will be visible as a bandwidth spike. A 100 Gbps link carrying 840 Mpps of 64-byte frames is only 53.76% utilized by bandwidth. Many threshold-based alerts would not fire. But every one of those 840 million packets per second demands individual processing.

The Kernel Processing Bottleneck

When packets arrive faster than the kernel can process them, the failure cascade is:

  1. NIC ring buffer overflow — the NIC's RX ring buffer fills up. New packets are dropped at the hardware level before the kernel ever sees them. These appear as rx_missed_errors or rx_no_buffer_count in ethtool -S.
  2. softirq CPU saturation — the NET_RX softirq consumes 100% of one or more CPU cores. The NAPI polling budget (default: 300 packets per poll cycle via net.core.netdev_budget) is exhausted every cycle, starving other softirqs.
  3. Interrupt coalescing failure — adaptive interrupt coalescing (common on Intel/Mellanox NICs) tries to batch interrupts, but at extreme PPS rates, the coalescing window shrinks to near-zero, effectively creating per-packet interrupts again.
  4. Conntrack table exhaustion — each new ACK that does not match an existing connection creates a new conntrack entry. At 840 Mpps with randomized source addresses, the conntrack table fills in milliseconds.

Detect packet-rate attacks before they cause damage

Flowtriq monitors PPS per second — not just bandwidth. Attacks like OVHcloud's 840 Mpps are detected in under 2 seconds with automatic classification and alerting.

Start Free Trial →

Detection at the Server Level

If you are running bare-metal infrastructure — as many OVHcloud customers do — you need to detect packet-rate attacks at the server level. Bandwidth monitoring tools like SNMP-based graphs will show you a modest traffic increase that does not correlate with the severity of the impact. Here is what to watch instead.

1. NIC Ring Buffer Drops via ethtool

# Check for NIC-level packet drops (Intel NICs)
ethtool -S eth0 | grep -E "rx_dropped|rx_missed|rx_no_buffer|rx_discards"

# Example output during a PPS flood:
#   rx_dropped:          0
#   rx_missed_errors:    4829174    # <-- packets lost at the NIC
#   rx_no_buffer_count:  4829174    # <-- ring buffer was full

# For Mellanox/ConnectX NICs:
ethtool -S eth0 | grep -E "rx_discards|rx_out_of_buffer"

# Monitor rate of change (packets dropped per second):
while true; do
  BEFORE=$(ethtool -S eth0 | awk '/rx_missed_errors/{print $2}')
  sleep 1
  AFTER=$(ethtool -S eth0 | awk '/rx_missed_errors/{print $2}')
  echo "$(date +%T) NIC drops/sec: $(( AFTER - BEFORE ))"
done

2. Kernel Packet Drops via /proc/net/softnet_stat

# /proc/net/softnet_stat — one line per CPU core
# Column 1: total packets processed
# Column 2: packets dropped (queue overflow)
# Column 3: time_squeeze (softirq ran out of CPU budget)
#
cat /proc/net/softnet_stat

# Example during attack (8-core server):
# 0a3f21c8 00000000 00000047   <-- core 0: 71 time squeezes, 0 drops
# 09f8b412 000f42a1 000001a3   <-- core 1: 999,585 drops, 419 squeezes!
# 02183c90 00000000 00000012   <-- core 2: fine
# ...
#
# Non-zero column 2 = the kernel is dropping packets before processing
# Non-zero column 3 = softirq is running out of time budget

# Parse it into readable format:
awk '{printf "CPU%-2d  processed=%-12d dropped=%-10d time_squeeze=%d\n", \
  NR-1, strtonum("0x"$1), strtonum("0x"$2), strtonum("0x"$3)}' \
  /proc/net/softnet_stat

3. IRQ Distribution via /proc/interrupts

# Check which CPUs are handling NIC interrupts
grep eth0 /proc/interrupts

# During a PPS flood, you will often see one or two cores
# handling a disproportionate share of interrupts:
#
#            CPU0       CPU1       CPU2       CPU3
#  47:  894201738   12847291   11293847   13928471  IR-PCI-MSI  eth0-TxRx-0
#  48:   12384719  891247381   12938471   11284719  IR-PCI-MSI  eth0-TxRx-1
#
# CPU0 and CPU1 are saturated. This is an RSS (Receive Side Scaling)
# distribution problem — the hash function maps attack traffic
# to only a few queues.

# Check current RSS configuration:
ethtool -x eth0 | head -20

4. PPS Monitoring via /proc/net/dev

# /proc/net/dev gives you raw packet counters per interface
# Poll it twice with a 1-second gap to calculate PPS:

BEFORE=$(awk '/eth0/{print $3}' /proc/net/dev)
sleep 1
AFTER=$(awk '/eth0/{print $3}' /proc/net/dev)
echo "Current RX PPS: $(( AFTER - BEFORE ))"

# Normal server: 1,000 - 50,000 pps
# Under PPS flood: 1,000,000+ pps
# The OVHcloud attack: 840,000,000 pps (distributed across targets)

This is exactly what Flowtriq automates. The agent reads /proc/net/dev, /proc/net/softnet_stat, and NIC counters every second, computes the per-second deltas, compares against a rolling baseline, and fires alerts when PPS anomalies are detected — typically within 1-2 seconds of attack onset. Bandwidth-only monitoring would have missed the OVHcloud attack pattern entirely.

Mitigation Techniques for PPS Floods

Traditional iptables/nftables rules execute in the kernel network stack — after the packet has already consumed ring buffer space, triggered an interrupt, and passed through the early packet processing path. At 840 Mpps, even a simple DROP rule is too slow. Mitigation needs to happen earlier.

XDP/eBPF: Kernel-Bypass Filtering

XDP (eXpress Data Path) processes packets at the NIC driver level, before the kernel allocates an sk_buff structure. This eliminates per-packet memory allocation and cuts processing cost to roughly 100-200 nanoseconds per packet — an order of magnitude faster than netfilter.

// xdp_drop_small_acks.c — XDP program to drop small TCP ACK floods
// Compile: clang -O2 -target bpf -c xdp_drop_small_acks.c -o xdp_drop.o
// Load:    ip link set dev eth0 xdpgeneric obj xdp_drop.o sec xdp

#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <bpf/bpf_helpers.h>

SEC("xdp")
int xdp_drop_ack_flood(struct xdp_md *ctx) {
    void *data = (void *)(long)ctx->data;
    void *data_end = (void *)(long)ctx->data_end;

    // Parse Ethernet header
    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end) return XDP_PASS;
    if (eth->h_proto != __constant_htons(ETH_P_IP)) return XDP_PASS;

    // Parse IP header
    struct iphdr *ip = (void *)(eth + 1);
    if ((void *)(ip + 1) > data_end) return XDP_PASS;
    if (ip->protocol != IPPROTO_TCP) return XDP_PASS;

    // Parse TCP header
    struct tcphdr *tcp = (void *)ip + (ip->ihl * 4);
    if ((void *)(tcp + 1) > data_end) return XDP_PASS;

    // Drop TCP ACK packets with no payload (pure ACKs)
    // that are smaller than 60 bytes at the IP layer
    __u16 ip_total_len = __constant_ntohs(ip->tot_len);
    __u16 tcp_header_len = tcp->doff * 4;
    __u16 ip_header_len = ip->ihl * 4;
    __u16 payload_len = ip_total_len - ip_header_len - tcp_header_len;

    if (tcp->ack && !tcp->syn && !tcp->fin && !tcp->rst
        && payload_len == 0 && ip_total_len <= 60) {
        return XDP_DROP;  // Drop pure ACKs with no payload
    }

    return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

nftables with notrack for High-PPS Mitigation

If XDP is not available (older kernels, driver limitations), the next best option is to bypass conntrack for attack traffic using nftables notrack. This eliminates the per-packet conntrack lookup cost:

# Bypass conntrack for small TCP ACK packets on high-traffic ports
# This prevents conntrack table exhaustion during PPS floods

nft add table inet raw_filter
nft add chain inet raw_filter prerouting \
  '{ type filter hook prerouting priority -300; policy accept; }'

# Skip conntrack for TCP ACKs with no payload (size <= 60 bytes)
nft add rule inet raw_filter prerouting \
  tcp flags & ack == ack \
  meta length <= 60 \
  notrack

# Rate-limit small UDP packets (common in PPS floods)
nft add rule inet raw_filter prerouting \
  udp length <= 64 \
  limit rate over 100000/second \
  drop

# Verify conntrack is being bypassed:
conntrack -C   # Should stop climbing during attack

NIC Hardware Flow Steering and RPS/RFS

# Enable Receive Side Scaling (RSS) across all CPU cores
# This distributes NIC interrupts across cores instead of pinning to one

# Check current number of RX queues:
ethtool -l eth0

# Set to maximum:
ethtool -L eth0 rx 16

# If RSS alone is insufficient, enable RPS (Receive Packet Steering)
# to software-distribute packets across all cores:
echo "ffff" > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo "ffff" > /sys/class/net/eth0/queues/rx-1/rps_cpus

# Enable RFS (Receive Flow Steering) to improve cache locality:
echo 32768 > /proc/sys/net/core/rps_sock_flow_entries
echo 4096 > /sys/class/net/eth0/queues/rx-0/rps_flow_cnt

# Increase the softirq budget so the kernel processes more packets
# per NAPI poll cycle (default 300, increase during attacks):
sysctl -w net.core.netdev_budget=600
sysctl -w net.core.netdev_budget_usecs=8000

# Increase NIC ring buffer size to absorb packet bursts:
ethtool -G eth0 rx 4096

The Broader Problem: Edge Device Security Is Infrastructure Security

The OVHcloud incident exposes a systemic problem in internet infrastructure that goes far beyond one hosting provider. There are an estimated 2+ million MikroTik devices exposed to the public internet. OVHcloud identified roughly 99,382 with potentially exploitable firmware. The actual number of compromised devices is unknown, but the ones that participated in the 840 Mpps attack represented only about 5,000 — a fraction of the available pool.

Several structural factors make this problem difficult to solve:

  • The "forgotten router" problem. MikroTik devices are widely deployed by ISPs and small businesses as CPE (customer premises equipment). Once installed, they are rarely updated. The admin credentials set during initial configuration are often weak and never changed. Many of these routers have been online, unpatched, with Winbox exposed to the internet, for six or more years.
  • MikroTik's patching model. While MikroTik does release firmware updates, RouterOS does not auto-update by default. Updates require manual action by the device administrator — who may no longer be employed by the company that deployed the router, or may not know the device exists.
  • ISP indifference. The ISPs whose networks carry attack traffic from compromised MikroTik routers often have no economic incentive to identify and notify affected customers. BCP38 (ingress filtering to prevent IP spoofing) is still not universally implemented. The cost of outbound DDoS traffic is externalized to the target.
  • Multi-tenant compromise. Many compromised MikroTik devices are simultaneously used by multiple threat actors — cryptominers, spam relays, proxy networks, and now DDoS botnets. Removing one implant does not remove the others, and the underlying vulnerability remains if the firmware is not updated.

This is not a problem that any individual organization can solve. But it is a problem that every organization needs to be prepared to withstand. The MikroTik botnet that generated 840 Mpps in early 2024 still exists. The devices are still compromised. The next attack could be larger.

Lessons for Infrastructure Operators

The OVHcloud 840 Mpps incident makes several things clear for anyone running bare-metal or VPS infrastructure:

1. PPS-Based Detection Is Not Optional

Bandwidth monitoring alone will not catch packet-rate attacks. The OVHcloud attacks consumed only moderate bandwidth relative to their destructive potential. If your monitoring dashboard shows "Bandwidth: 487 Gbps — within normal range for this link" while your servers are dropping packets at the NIC level, your monitoring has failed at its primary job.

Per-second PPS monitoring is the baseline. You need to know, at one-second granularity, how many packets your server is receiving — and you need automated alerting when that number deviates from baseline. This is the core problem Flowtriq was built to solve. Every Flowtriq agent polls packet counters once per second, computes the instantaneous PPS, compares it against a rolling seven-day baseline, and fires an alert within 1-2 seconds of detecting an anomaly.

2. Conntrack Is the First Thing That Dies

Under a PPS flood, the Linux conntrack table will fill up long before the CPU is fully saturated. Once conntrack is full, all new connections are dropped — including legitimate ones. Plan for this: increase nf_conntrack_max, use notrack rules for known-good flows, and consider disabling conntrack entirely on servers that do not need stateful firewalling.

3. XDP Should Be Part of Your Mitigation Toolkit

If you are running bare-metal infrastructure with modern NICs (Intel X710/E810, Mellanox ConnectX-5+), XDP is available today and can drop attack packets 10x faster than nftables. Have XDP programs ready to deploy — do not wait until the attack is in progress to start writing eBPF code.

4. NIC Configuration Matters

Default NIC settings on most Linux distributions are optimized for throughput, not PPS resilience. Ring buffer sizes, RSS queue counts, interrupt coalescing parameters, and NAPI budgets all need to be tuned for your specific traffic profile. A server that handles 50,000 PPS normally might survive a 500,000 PPS spike with proper tuning — or die at 200,000 PPS with defaults.

5. The Threat Is Structural, Not Episodic

The MikroTik botnet is not going away. The vulnerable devices are not being patched. The exploit techniques are public knowledge and require minimal skill to execute. Packet-rate attacks are the new normal. Plan your infrastructure and monitoring accordingly — this is not a one-time incident but a persistent shift in the DDoS threat landscape.

PPS monitoring is Flowtriq's core feature

Per-second packet rate monitoring, automatic attack classification, NIC-level drop detection, PCAP forensics, and instant multi-channel alerts. Built for bare-metal infrastructure operators who need to detect what bandwidth monitoring misses. $9.99/node/month with a 7-day free trial.

Start your free 7-day trial →
Back to Blog

Related Articles