Back to Blog

The Outage You Never Had

Most infrastructure teams have a reasonably good handle on the cost of a full outage. The site goes down, the on-call gets paged, a post-mortem gets written, and someone eventually runs the numbers: X hours of downtime at Y dollars per minute equals Z total cost. That math is uncomfortable but it is at least visible.

What almost no team measures is the cost of the outage they never had — the slow, grinding degradation caused by sub-threshold attacks, undetected congestion, and mystery latency spikes that never trigger a single alert. This is the "mystery slowdown tax," and for many organizations it dwarfs the cost of the incidents they actually respond to.

Direct Costs: The Numbers Are Worse Than You Think

Downtime Revenue Loss

Gartner's figure of $5,600 per minute of downtime gets cited constantly and is frequently dismissed as applying only to enterprise. But the math scales down directly. A small e-commerce store doing $80,000 per month in revenue generates roughly $111 per hour in expected revenue around the clock. An outage during peak hours — say, 7pm on a Friday — might run at 4-5x that rate. Even a modest two-hour outage during peak time costs $800-1,000 in direct lost sales before you count a single support ticket or engineer hour.

For SaaS products, the math is different but not more forgiving. Downtime means churn. A B2B SaaS charging $200/month per customer who loses three customers to a single high-profile outage has lost $600 in monthly recurring revenue — and probably $7,200 in annualized revenue if those customers had stayed for a year. That is before accounting for the cost to acquire replacements.

SLA Penalties

If you sell SLAs — and many infrastructure operators do, even informally — downtime has a direct contractual cost. A standard 99.9% uptime SLA allows 43.8 minutes of downtime per month. A 99.5% SLA allows 3.65 hours. Breaching those thresholds triggers credits that can represent 10-25% of a customer's monthly fee. For a managed hosting provider with 200 customers averaging $150/month, a single bad outage month affecting half the customer base at 15% credit exposure is $2,250 off the top — and that is the credit exposure, not the churn that follows.

Engineering Hours

The most consistently underestimated cost is the fully-loaded hourly rate of your engineers spent on network problems that should not require manual investigation. When a customer tickets saying "the site feels slow," what happens? A developer checks the application, finds nothing. An ops engineer looks at CPU and memory, finds nothing. Someone eventually stares at iftop for 20 minutes, suspects a traffic anomaly, and the issue resolves on its own. Total time invested: two to three hours across two engineers. At a fully-loaded rate of $150/hour, that is $300-450 per incident. If this happens twice a week — and for organizations running anything internet-facing, it often does — you are spending $2,400-3,600 per month on unstructured network investigation alone.

The Sub-Threshold DDoS Problem

Here is a scenario that is far more common than most teams realize. An attacker — possibly automated botnet traffic with no specific target in mind — is sending a low-volume UDP flood at your infrastructure. Not enough to take you down. Not enough to saturate your link. Just enough to cause 2% packet loss and an additional 40ms of latency on average.

For many applications, 2% packet loss is catastrophic in slow motion. TCP retransmits. HTTPS connections stall waiting for dropped segments. Database queries timeout at higher rates. Users in regions with already-marginal latency see their effective performance drop to unusable levels. They do not file a support ticket — they leave and do not come back.

The conversion rate math: Google's research found that a 400ms increase in page load time reduces traffic by 0.59%. Akamai found that a 100ms delay reduces conversion rates by 7%. A persistent 40ms latency increase from a sub-threshold attack, compounded across a 2% packet loss rate, easily represents a 3-5% ongoing revenue reduction — invisible, unalarmed, and wrongly attributed to seasonal trends or product issues.

Put numbers to it: a SaaS doing $50,000/month in revenue running with a persistent 4% conversion reduction from undetected network degradation is losing $2,000/month. Over 12 months, that is $24,000 in lost revenue from an attack that never generated a single alert. The attacker spent nothing — botnet time is cheap — and your business quietly bled for a year.

Stop paying the mystery slowdown tax

Flowtriq detects attacks like this in under 2 seconds, classifies them automatically, and alerts your team instantly. 7-day free trial.

Start Free Trial →

The Detection Gap

Most organizations only know about attacks that take them fully down. This is a structural problem with how monitoring is typically deployed. Nagios or Zabbix checks every 60 seconds whether a service responds. Cloudflare shows you a bandwidth graph. Your ISP might send a warning when your 95th percentile bandwidth crosses a billing threshold. None of these tell you about a 15-minute UDP flood at 60% of your link capacity, or a SYN flood that exhausted your connection tracking table for 3 minutes before the attacker moved on.

The detection gap — the space between "attack is happening" and "we know about it" — is where the sub-threshold tax lives. Closing it requires monitoring at a resolution that matches the attack tempo, which means per-second sampling with automatic baseline comparison, not 60-second polling against static thresholds.

# You can see the gap yourself. Check your current connection state distribution:
ss -s

# Now check how many SYN_RECV connections exist right now:
ss -n state syn-recv | wc -l

# And check for packet drops on your primary interface:
ip -s link show eth0 | grep -A1 "RX:"

# If you see dropped > 0 and have no current alert, that is the detection gap.

Indirect Costs: Reputation and Customer Support

Support Ticket Load

During network degradation, support ticket volume increases — but most tickets are vague. "The app is slow." "I keep getting timeouts." "Your service is unreliable." These tickets do not tell you there is a network problem. They get triaged as application bugs, investigated fruitlessly, and closed with a canned response. Each ticket costs an average of $12-25 to handle when you account for triage, investigation, and response time. A degradation event that generates 30 extra tickets costs $360-750 in support overhead, plus the goodwill cost of making customers feel heard and resolved when the real issue was never found.

Reputation Damage and Churn

Unlike outages, which have a clear start and end, degradation events have a long tail. A customer who experienced slow performance three times in a month does not necessarily churn immediately — but they do start evaluating alternatives. By the time you lose them, the connection to the network issue is invisible. Your churn attribution says "competitive pricing" or "missing features" when the real driver was three weeks of intermittent degradation that you were never alerted about.

Review sites amplify this. A single one-star review citing slow performance indexed on Google can suppress conversion from search traffic for months. The real cost of that review is not the one lost customer who wrote it — it is the percentage reduction in acquisition from every prospect who reads it.

The ROI Framing

Flowtriq costs $9.99 per node per month. For a two-node setup, that is $19.98/month, or roughly $240/year. Set that against the costs outlined above:

  • One hour of peak-time downtime on a mid-size e-commerce site: $400-2,000+
  • One month of sub-threshold degradation at 4% conversion loss on $50k MRR: $2,000
  • Two engineers spending three hours investigating a mystery slowdown: $450-900
  • SLA credits triggered by a single bad outage month: hundreds to thousands of dollars

The detection gap is not a technical problem — it is a business risk that has a specific dollar value. Monitoring that closes that gap at $9.99/node/month is not a cost; it is the cheapest insurance you will buy this year.

The attacks you detect are incidents. The attacks you miss are just "bad months." The only difference is instrumentation.

Protect your infrastructure with Flowtriq

Per-second DDoS detection, automatic attack classification, PCAP forensics, and instant multi-channel alerts. $9.99/node/month.

Start your free 7-day trial →
Back to Blog

Related Articles