How to Trace Network Anomalies on AWS and Azure

The Visibility Gap in Cloud Networking

On bare metal, you own the NIC. You can run tcpdump -i eth0 -n -s 0 and see every packet crossing the wire. In a cloud VPC, you do not own anything below the hypervisor. You are renting a virtualized network adapter, and the physical underlay — the spine switches, the ECMP fabric, the transit routers — is completely invisible to you. This is a fundamental difference in how you approach network troubleshooting.

The cloud providers offer flow log products to partially fill this gap, but every one of them has significant limitations that engineers hit hard the first time they try to use them during a live incident. Understanding those limitations upfront is the difference between having a useful investigation tool and staring at data that is ten minutes stale while your load balancer is dropping connections.

AWS VPC Flow Logs: What They Capture and What They Miss

VPC Flow Logs capture a summary of IP traffic going to and from network interfaces in your VPC. Each flow record includes source IP, destination IP, source port, destination port, protocol, packet count, byte count, start/end time, and an ACCEPT or REJECT action. They do not capture packet payloads, DNS hostnames, or traffic to the Amazon DNS server at 169.254.169.253.

The critical limitation for anomaly investigation is aggregation. AWS aggregates flow records over a capture window, which defaults to 10 minutes for most configurations, though you can reduce this to 1 minute with the --traffic-type ALL --max-aggregation-interval 60 setting when creating the flow log via CLI:

aws ec2 create-flow-logs \
  --resource-type NetworkInterface \
  --resource-ids eni-0abc1234def56789 \
  --traffic-type ALL \
  --log-destination-type cloud-watch-logs \
  --log-group-name /vpc/flowlogs \
  --max-aggregation-interval 60

Even at 60-second aggregation, you are getting summaries — total bytes and packets over that window — not a per-second view. If a UDP flood runs for 45 seconds and stops, your flow log shows one record with large totals and you have no idea whether it was a burst or a steady stream. You also cannot see inter-packet timing, TTL values, or TCP flag distributions, which are essential for classifying attack types.

Querying VPC Flow Logs in Athena

If you are shipping flow logs to S3, Athena is the right way to query them at scale. Create a partitioned table to avoid scanning the entire log bucket on every query:

CREATE EXTERNAL TABLE vpc_flow_logs (
  version int, account_id string, interface_id string,
  srcaddr string, dstaddr string, srcport int, dstport int,
  protocol bigint, packets bigint, bytes bigint,
  start bigint, end bigint, action string, log_status string
)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
LOCATION 's3://your-flow-log-bucket/AWSLogs/123456789/vpcflowlogs/us-east-1/';

-- Find top source IPs by packet count in a 1-hour window
SELECT srcaddr, SUM(packets) AS total_packets, SUM(bytes) AS total_bytes
FROM vpc_flow_logs
WHERE year='2026' AND month='01' AND day='14'
  AND start >= 1736812800 AND start < 1736816400
  AND action = 'ACCEPT'
GROUP BY srcaddr
ORDER BY total_packets DESC
LIMIT 50;

This query will identify source IPs generating disproportionate packet counts, which is your first signal for amplification or botnet traffic. For port-based analysis, group by dstport to see if traffic is concentrating on a single service.

Azure NSG Flow Logs: Same Architecture, Same Limitations

Azure's equivalent is Network Security Group (NSG) Flow Logs, stored in Azure Storage and queryable via Traffic Analytics in Log Analytics. The data model is nearly identical to VPC Flow Logs: 5-tuple plus packet/byte counts plus an ALLOW or DENY decision. The same aggregation problem applies — the default capture interval is 1 minute for Version 2 logs, but this is still a summary.

Enable NSG flow logs via the Azure CLI targeting the specific NSG protecting your VM subnet:

az network watcher flow-log create \
  --resource-group myResourceGroup \
  --nsg myNSG \
  --storage-account myStorageAccount \
  --enabled true \
  --format JSON \
  --log-version 2 \
  --retention 7

Version 2 logs add throughput information (bytes/packets per interval) which Version 1 does not include. Always use Version 2. Once flowing into Log Analytics via Traffic Analytics, use KQL to interrogate the data:

AzureNetworkAnalytics_CL
| where TimeGenerated > ago(1h)
| where FlowType_s == "ExternalPublic"
| summarize TotalPackets=sum(PacketsSrcToDest_d), TotalBytes=sum(BytesSrcToDest_d)
    by SrcIP_s, DestPort_d
| order by TotalPackets desc
| take 50

Cloud flow logs won't tell you you're under attack in time

Flowtriq detects attacks like this in under 2 seconds, classifies them automatically, and alerts your team instantly. 7-day free trial.

Start Free Trial →

Host-Level Monitoring: What Actually Works During an Incident

Flow logs are retrospective. When you are in an active incident, you need real-time data from the host itself. The good news is that EC2 instances and Azure VMs behave like any other Linux host at the guest level — the kernel network stack, /proc/net/ counters, and packet capture tools all work exactly as they do on bare metal.

Live PPS and bandwidth from the kernel

The fastest way to see current traffic volume without installing anything is to poll /proc/net/dev directly. The counters there update at kernel tick rate (typically 4ms on modern kernels), giving you sub-second resolution:

# Watch inbound PPS on eth0 every second
IFACE=eth0
prev=$(awk "/$IFACE/{print \$3}" /proc/net/dev)
while true; do
  sleep 1
  curr=$(awk "/$IFACE/{print \$3}" /proc/net/dev)
  echo "$(( curr - prev )) pps inbound"
  prev=$curr
done

# Same for bytes (column $2 = rx_bytes, $3 = rx_packets)
# Multiply byte delta by 8 for bits per second

Packet capture on cloud instances

Both EC2 and Azure VMs support standard packet capture via tcpdump and tshark. The main constraint is that cloud instances typically have a single NIC (eth0 or ens5 on AWS Nitro). For anomaly investigation, these are the commands you reach for first:

# Capture top talkers by packet count (10-second window)
tshark -i eth0 -a duration:10 -q \
  -z "conv,ip" 2>/dev/null | sort -k3 -rn | head -20

# Identify protocol distribution in real time
tshark -i eth0 -a duration:5 \
  -q -z "io,phs" 2>/dev/null

# Capture suspicious traffic from a specific source for forensics
tcpdump -i eth0 -n -s 0 -w /tmp/incident-$(date +%s).pcap \
  'src host 203.0.113.47' &

# Detect SYN flood by counting TCP SYN packets per second
tshark -i eth0 -a duration:10 -q \
  -Y "tcp.flags.syn==1 && tcp.flags.ack==0" \
  -z "io,stat,1" 2>/dev/null

On AWS Nitro instances (most current generation types), the NIC driver is ena and you can retrieve driver-level statistics including queue depths and hardware drops using ethtool -S eth0 | grep -i drop. Drops at the driver level before the kernel network stack indicate the NIC is overwhelmed — something VPC Flow Logs will never show you.

Correlating Cloud-Level and Host-Level Data

The most complete picture of a network anomaly comes from combining both data sources. Cloud flow logs show you accepted and rejected traffic at the security group/NSG boundary — traffic your instance never even saw because it was dropped upstream. Host-level data shows you what actually reached the kernel and what it did with it. The gap between the two is significant.

During an incident, run this correlation process:

Start a host-level packet capture immediately (tcpdump -i eth0 -w /tmp/incident.pcap -C 100 -W 5 rotates 100MB files, keeping 5). This captures what the instance is actually receiving.
Pull recent flow log data from CloudWatch Logs Insights or Log Analytics to see the traffic that arrived at the VPC/subnet boundary, including what was blocked by security group rules.
Compare source IP distribution. If you see 10,000 unique source IPs in flow logs but only 200 reaching the host, your security groups are doing useful work. If the counts match, you are getting no upstream filtering.
Check for traffic that appears in host-level captures but not in flow logs — this occasionally indicates internal VPC traffic bypassing the ENI (e.g., inter-instance traffic within the same subnet that never crosses a logged interface).

Why Cloud-Native DDoS Detection Falls Short

AWS Shield Standard (free) detects network and transport-layer attacks against ELBs, CloudFront, and Route 53. It does not protect individual EC2 instances unless they are behind those services. Azure DDoS Basic provides similar infrastructure-level protection. Both services focus on volumetric scrubbing at the edge and provide no visibility into what attacks look like, what was blocked, or what is reaching your instance.

The gap is host-level detection. If you run a game server, a VPN endpoint, or any UDP-based service directly on an EC2 instance with an Elastic IP, Shield Standard's scrubbing may not engage until the attack is already large enough to affect AWS's own infrastructure. You will have no alert, no classification, and no packet evidence.

Running Flowtriq's agent directly on the instance closes this gap. It monitors /proc/net/dev counters every second, establishes per-protocol baselines, and triggers automatic PCAP capture within 2 seconds of anomaly detection — giving you forensic data that cloud flow logs simply cannot provide. The combination of Flowtriq for host-level detection and VPC/NSG flow logs for upstream boundary visibility is the most complete approach available without moving to expensive Shield Advanced tiers.

Cost note: AWS Shield Advanced costs $3,000/month minimum with a 1-year commitment. Azure DDoS Network Protection is $2,944/month per protected VNet. Flowtriq runs at $9.99/node/month and gives you per-second host-level detection and PCAP forensics that neither service provides at any price tier.

Protect your infrastructure with Flowtriq

Per-second DDoS detection, automatic attack classification, PCAP forensics, and instant multi-channel alerts. $9.99/node/month.

Start your free 7-day trial →

Back to Blog

How to trace network anomalies
on AWS and Azure

The Visibility Gap in Cloud Networking

AWS VPC Flow Logs: What They Capture and What They Miss

Querying VPC Flow Logs in Athena

Azure NSG Flow Logs: Same Architecture, Same Limitations

Cloud flow logs won't tell you you're under attack in time

Host-Level Monitoring: What Actually Works During an Incident

Live PPS and bandwidth from the kernel

Packet capture on cloud instances

Correlating Cloud-Level and Host-Level Data

Why Cloud-Native DDoS Detection Falls Short

Protect your infrastructure with Flowtriq

Related Articles

How to trace network anomalieson AWS and Azure

The Visibility Gap in Cloud Networking

AWS VPC Flow Logs: What They Capture and What They Miss

Querying VPC Flow Logs in Athena

Azure NSG Flow Logs: Same Architecture, Same Limitations

Cloud flow logs won't tell you you're under attack in time

Host-Level Monitoring: What Actually Works During an Incident

Live PPS and bandwidth from the kernel

Packet capture on cloud instances

Correlating Cloud-Level and Host-Level Data

Why Cloud-Native DDoS Detection Falls Short

Protect your infrastructure with Flowtriq

Related Articles

How to detect a DDoS attack: signs, tools & response steps

Cloud DDoS protection compared: AWS Shield, Azure, Cloudflare

Network traffic analysis tools for DDoS detection

How to trace network anomalies
on AWS and Azure