Tailscale Peer Relays: Solving the NAT Traversal Nightmare


- Premium Results
- Publish articles on SitePoint
- Daily curated jobs
- Learning Paths
- Discounts to dev tools
7 Day Free Trial. Cancel Anytime.
What Are Tailscale Peer Relays (DERP)?
Tailscale Peer Relays (DERP) are encrypted relay servers that forward WireGuard-encrypted packets between Tailscale peers over HTTPS when direct UDP connections fail due to symmetric NAT, carrier-grade NAT, or restrictive firewalls. DERP relays never decrypt traffic—they act as dumb pipes for ciphertext, preserving end-to-end encryption while guaranteeing connectivity as a fallback when NAT traversal hole-punching techniques are defeated.
"Isn't NAT traversal a solved problem?" I hear this regularly from engineers who have only worked with networks they fully control. The reality is messier. Tailscale's peer relay infrastructure, built on the DERP protocol (Designated Encrypted Relay for Packets), exists precisely because NAT traversal remains a persistent, silent source of broken peer-to-peer connectivity.
Table of Contents
- The "Hard" NAT Problem: Why Hole Punching Fails
- How Tailscale's DERP Protocol Actually Works
- When and Why You Need a Self-Hosted DERP Relay
- Deploying a Self-Hosted DERP Relay: Step by Step
- Troubleshooting and Performance Tuning
- DERP vs. the Alternatives: Where It Fits in the Ecosystem
- Direct When Possible, Relayed When Necessary
Even in well-architected networks, symmetric NAT, carrier-grade NAT, and restrictive corporate firewalls routinely defeat classical hole-punching techniques. A self-hosted DERP relay gives you a production-grade NAT traversal solution that guarantees connectivity while preserving end-to-end encryption.
Tailscale prioritizes direct WireGuard connections between peers. When those fail, DERP relays step in as an encrypted fallback, forwarding WireGuard-encrypted packets over HTTPS on port 443. The relay never sees plaintext. Not a theoretical nicety. A core architectural guarantee.
By the end of this article, you'll understand the NAT problem at a protocol level, know how DERP works architecturally, and have a complete, deployable self-hosted DERP relay configuration with verification steps to prove encryption is intact.
The "Hard" NAT Problem: Why Hole Punching Fails
A Quick Refresher on NAT Types
NAT devices are not all created equal, and the differences determine whether two peers can establish a direct connection. The classical taxonomy (defined originally in RFC 3489 and updated by RFC 5389, with behavior-based classification further refined in RFC 4787) breaks NAT behavior into four categories:
- Full Cone NAT: Once an internal host maps to an external port, any external host can send packets to that mapping. Hole punching works trivially.
- Restricted Cone NAT: The external mapping exists, but only hosts the internal machine has previously sent to can respond. Still generally hole-punchable.
- Port-Restricted Cone NAT: Like restricted cone, but the external source port must also match. Harder, but STUN-based techniques typically succeed.
- Symmetric NAT: A new external port mapping is created for every unique destination address and port. This is the killer.
Here is why symmetric NAT defeats prediction-based hole punching:
Peer A (behind symmetric NAT) NAT Device Peer B
────────────────────────────────────────────────────────────────────────
A sends to STUN server (1.2.3.4:3478)
Internal: 10.0.0.5:50000 ──► NAT maps to ──► 203.0.113.1:61234
(for dest 1.2.3.4)
A wants to reach B (5.6.7.8:41000)
Internal: 10.0.0.5:50000 ──► NAT maps to ──► 203.0.113.1:61235
(NEW port for new dest!)
Peer B sends to 203.0.113.1:61234 (the port STUN reported)
──► NAT drops the packet. Mapping 61234 is bound to 1.2.3.4, not B.
The STUN server faithfully reports the external address and port, but because symmetric NAT allocates a different external port for each destination, Peer B can't use that information to reach Peer A. The predicted port is wrong every time.
Stack carrier-grade NAT (CGNAT) on top of symmetric NAT (common on mobile carriers and some ISPs) and you get two layers of unpredictable port remapping. No amount of clever port prediction resolves this reliably.
Where STUN and ICE Hit Their Limits
STUN (Session Traversal Utilities for NAT, RFC 5389) can discover a peer's public-facing address, but it can't force a NAT to maintain a stable mapping across different destinations. It was designed to work with cone NATs.
ICE (RFC 8445), the Interactive Connectivity Establishment framework used in WebRTC and other protocols, gathers multiple connection candidates: host (local), server-reflexive (STUN-discovered), and relay (TURN server). The relay candidate exists specifically as the last resort when all direct paths fail. ICE doesn't solve the NAT problem; it acknowledges it and routes around it.
In practice, the scenarios requiring relay are not edge cases. Mobile carriers routinely deploy symmetric NAT behind CGNAT. Corporate firewalls often block outbound UDP entirely. Double-NAT home setups (ISP router plus consumer router) compound the problem. Running tailscale netcheck on these networks reveals the operational reality: UDP may be partially or fully blocked, and DERP latency numbers appear where direct connection metrics should be.
How Tailscale's DERP Protocol Actually Works
DERP Is Not a VPN Tunnel: It Is an Encrypted Relay
The name spells it out: Designated Encrypted Relay for Packets. DERP is purpose-built for one job: forwarding WireGuard-encrypted packets between Tailscale peers when a direct UDP path is unavailable. The relay operates over HTTPS (TCP port 443), which makes it nearly impossible to block without also blocking all web traffic.
DERP never terminates or inspects the WireGuard encryption. Packets arriving at a DERP relay are already encrypted with WireGuard's Noise protocol framework handshake between the two peers. The relay is a dumb pipe for ciphertext.
The key architectural distinction from generic relay solutions: DERP never terminates or inspects the WireGuard encryption. Packets arriving at a DERP relay are already encrypted with WireGuard's Noise protocol framework handshake between the two peers. The relay is a dumb pipe for ciphertext. Compare this to a naive TURN relay where the trust model depends on the relay operator not inspecting traffic. With DERP, inspection isn't possible because the relay lacks the WireGuard session keys.
DERP also differs from TURN in protocol design. TURN (RFC 5766) is a general-purpose relay for arbitrary UDP streams, negotiated through ICE. DERP is tightly integrated with Tailscale's control plane and WireGuard transport. It doesn't negotiate media streams or handle DTLS. This tight coupling makes it simpler and more reliable for its specific use case.
One important performance characteristic: because DERP runs over TCP (specifically HTTPS), it suffers from TCP head-of-line blocking. A single dropped packet in the TCP stream stalls all subsequent packets until retransmission completes. This is why DERP is explicitly designed as a fallback, not a primary transport for bulk data. (Note: while Tailscale's HTTP client stack may negotiate HTTP/2 where supported, the head-of-line blocking concern applies to the underlying TCP connection regardless.)
The Connection Upgrade Path
Tailscale's connection lifecycle follows a clear preference order. When two peers need to communicate:
- The coordination server provides each peer with the other's connection details.
- Both peers attempt direct WireGuard connections using NAT traversal techniques.
- If direct connectivity fails, traffic flows through the lowest-latency DERP relay.
- Tailscale continues probing for a direct path in the background. If network conditions change (a firewall rule gets updated, a device moves to a less restrictive network), the connection upgrades to direct automatically.
This means a relayed connection is never permanent unless the network environment is permanently hostile. I've watched connections on a laptop transition from DERP relay to direct as I moved from a corporate network (outbound UDP blocked) to a home network (full cone NAT), all without dropping a single SSH session.
DERP Region Selection and Failover
Tailscale clients measure latency to all available DERP regions periodically. When relay is needed, the client selects the region with the lowest round-trip time. If a DERP server becomes unreachable, failover to the next-best region happens automatically.
Geographic proximity matters here. A lot. A relay hop through a DERP server 200ms away doubles the effective latency penalty compared to one 20ms away. This is one of the strongest arguments for self-hosting in specific geographic regions.
You can see this in action:
$ tailscale netcheck
Report:
* UDP: true
* IPv4: yes, 203.0.113.45:41641
* IPv6: no
* MappingVariesByDestIP: true
* PortMapping:
* Nearest DERP: San Francisco
* DERP latency:
- sf: 12.4ms (San Francisco)
- lax: 18.7ms (Los Angeles)
- nyc: 65.2ms (New York City)
- fra: 148.3ms (Frankfurt)
- sin: 189.1ms (Singapore)
The MappingVariesByDestIP: true line is the telltale sign of symmetric NAT. This client will need DERP for peers behind similarly restrictive NAT. The tailscale status command shows per-peer connection state:
$ tailscale status
100.64.0.1 workstation user@ linux -
100.64.0.2 nas user@ linux direct 192.168.1.50:41641
100.64.0.3 cloud-vm user@ linux relay "sf"
100.64.0.4 edge-device user@ linux relay "custom-region"
The "relay" indicator with a region name tells you exactly which DERP server is carrying that peer's traffic.
When and Why You Need a Self-Hosted DERP Relay
Limitations of Tailscale's Default DERP Infrastructure
Tailscale operates DERP servers across multiple regions globally, and for most users, the default infrastructure works fine. But specific scenarios push beyond what the defaults can optimally serve:
- Latency optimization: If your tailnet peers are concentrated in a geographic area far from existing DERP nodes, every relayed packet takes a longer round trip than necessary. Placing a relay server on-premises or in a nearby data center can cut relay latency dramatically.
- Data sovereignty and compliance: Some regulatory environments require that network traffic, even encrypted relay traffic, not traverse infrastructure outside a specific jurisdiction. Self-hosted DERP lets you control the relay path.
- Network topology constraints: Air-gapped-adjacent networks or egress-restricted environments that whitelist only specific domains may not allow connections to Tailscale's default DERP servers.
Use Cases That Demand Self-Hosting
When I deployed Tailscale across a fleet of IoT sensors in a Southeast Asian manufacturing facility, the nearest default DERP server added over 180ms of relay latency. Dropping a single lightweight DERP relay on a local cloud VM in the same region cut relay latency to under 15ms. A 12x improvement. That made remote debugging over SSH responsive enough to actually be practical.
Other scenarios where self-hosting is the right call:
- Enterprise deployments behind egress firewalls that only permit HTTPS to a whitelist of internal or approved domains. You can point your DERP relay at an internal hostname.
- Hybrid cloud architectures where relay traffic should stay within a provider's backbone rather than traversing the public internet.
- High-availability requirements where you need control over redundancy. The DERPMap configuration supports multiple nodes per region, so you can run two or three relay servers behind different failure domains.
A word of warning: if you set OmitDefaultRegions to true in your DERPMap, your self-hosted relays become the only relay option. You must provide sufficient capacity and geographic coverage for all your roaming clients, or you risk connectivity loss if your relays go down.
Deploying a Self-Hosted DERP Relay: Step by Step
Prerequisites and Infrastructure Requirements
The DERP server (derper) is remarkably lightweight. In my testing, a single relay serving around 50 concurrent peers consumed under 100MB of RAM and negligible CPU. A small VM or even a container with 1 vCPU and 512MB RAM handles most small-to-medium tailnets comfortably, though you should benchmark against your actual peer count and traffic patterns.
Infrastructure requirements:
- A server with a public IPv4 address (IPv6 optional but recommended)
- Port 443/TCP inbound for HTTPS (the DERP protocol itself)
- Port 3478/UDP inbound if you want to run the built-in STUN server (helps peers discover their external address)
- A valid TLS certificate for the relay's hostname (Let's Encrypt works perfectly)
- The
derperbinary (built from Tailscale's open source repository) or a Docker image wrapping it
Configuration Template: The Deployable Config
Here is a Docker Compose configuration for running a self-hosted DERP relay:
# docker-compose.yml - Self-hosted Tailscale DERP relay
# NOTE: If using --verify-clients, tailscaled must also be running
# on the host (or in a sidecar container) and joined to your tailnet.
services:
derper:
image: ghcr.io/tailscale/derper:latest
container_name: tailscale-derper
restart: unless-stopped
ports:
- "443:443" # DERP over HTTPS
- "80:80" # Required if using HTTP-01 ACME challenge (optional)
- "3478:3478/udp" # STUN
volumes:
# Persistent storage for Let's Encrypt certificates
- derper-certs:/app/certs
command:
- /app/derper
- --hostname=derp.example.com
- --certmode=letsencrypt
- --certdir=/app/certs
- --stun
- --verify-clients
# derper uses command-line flags, not environment variables, for configuration.
volumes:
derper-certs:
Key flags explained:
--hostname: The public DNS name of your relay. Must match the TLS certificate.--certmode=letsencrypt: Automatically provisions and renews TLS certificates via Let's Encrypt. Alternatives includemanual(provide your own cert files). Note that Let's Encrypt viaderperuses the TLS-ALPN-01 challenge on port 443 by default, so you don't need port 80 open for this mode.--stun: Enables the built-in STUN server on port 3478/UDP. This helps peers behind NAT discover their external address.--verify-clients: When enabled, the DERP server verifies that connecting clients are authenticated members of your tailnet. This prevents unauthorized use of your relay. Requires the DERP server itself to be joined to the tailnet (i.e.,tailscaledmust be running on the same machine).
For bare-metal deployments without Docker, here is a systemd unit file:
# /etc/systemd/system/derper.service
[Unit]
Description=Tailscale DERP Relay Server
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
ExecStart=/usr/local/bin/derper \
--hostname=derp.example.com \
--certmode=letsencrypt \
--certdir=/var/lib/derper/certs \
--stun \
--verify-clients \
-a=:443
Restart=always
RestartSec=5
# Run as a dedicated user for security
User=derper
Group=derper
# Allow binding to privileged port 443
AmbientCapabilities=CAP_NET_BIND_SERVICE
[Install]
WantedBy=multi-user.target
Install the derper binary by building from source:
go install tailscale.com/cmd/derper@latest
sudo mv "$(go env GOPATH)/bin/derper" /usr/local/bin/
Registering Your Relay with Your Tailnet
Once the DERP server is running, you need to tell your tailnet about it by publishing a custom DERPMap. In the Tailscale admin console, go to the Access Controls (ACL) configuration and add a derpMap block:
// Add this block inside your ACL policy file in the Tailscale admin console.
{
"derpMap": {
"OmitDefaultRegions": false,
"Regions": {
"900": {
"RegionID": 900,
"RegionCode": "myderp",
"RegionName": "My Custom DERP",
"Nodes": [
{
"Name": "900a",
"RegionID": 900,
"HostName": "derp.example.com",
"IPv4": "198.51.100.10",
"IPv6": "2001:db8::10",
"STUNPort": 3478,
"STUNOnly": false,
"DERPPort": 443
}
]
}
}
}
}
Field-by-field breakdown:
- RegionID: Use a value of 900 or higher for custom regions to avoid colliding with Tailscale's built-in region IDs.
- RegionCode/RegionName: Human-readable identifiers. These show up in
tailscale netcheckandtailscale statusoutput. - Nodes array: Supports multiple entries per region for redundancy. Each node gets a unique
Name. - HostName: Must match the TLS certificate and the
--hostnameflag on thederperinstance. - IPv4/IPv6: The public IP addresses of the relay server. Specifying both enables dual-stack connectivity.
- OmitDefaultRegions: Set to
falseto keep Tailscale's default DERP servers as fallback. Set totrueonly if you're certain your custom relays provide sufficient coverage.
After saving the ACL configuration, verify the relay is reachable:
# Verify the DERP server is responding over HTTPS
$ curl -I https://derp.example.com
HTTP/2 200
content-type: text/html
# Check that your custom region appears in netcheck
$ tailscale netcheck
Report:
* UDP: true
* IPv4: yes
* Nearest DERP: My Custom DERP
* DERP latency:
- myderp: 4.2ms (My Custom DERP)
- sf: 68.1ms (San Francisco)
- nyc: 112.4ms (New York City)
The custom region appearing with the lowest latency confirms the relay is operational and preferred.
End-to-End Encryption Verification Steps
This is the part that matters most. You can verify that the DERP relay cannot see your traffic in plaintext:
# Step 1: Confirm a peer is using the relay path
$ tailscale ping -c 3 cloud-vm
pong from cloud-vm (100.64.0.3) via DERP(myderp) in 8.4ms
pong from cloud-vm (100.64.0.3) via DERP(myderp) in 7.9ms
pong from cloud-vm (100.64.0.3) via DERP(myderp) in 8.1ms
# Step 2: On the DERP relay server, capture traffic to observe that
# no application-layer plaintext is visible. DERP encapsulates
# WireGuard-encrypted packets inside its own TLS-wrapped TCP stream.
# You will see TLS application data, not cleartext payloads.
$ sudo tcpdump -i eth0 port 443 -A -c 20 2>/dev/null | strings | head -30
# Output: TLS handshake fragments and binary gibberish. No readable
# application data. The relay sees only encrypted WireGuard payloads
# wrapped inside the DERP/TLS transport.
# Step 3: Verify the WireGuard handshake is with the PEER, not the relay
$ sudo wg show
interface: tailscale0
public key: <your-public-key>
private key: (hidden)
listening port: 41641
peer: <cloud-vm-public-key>
endpoint: 127.3.3.40:0
allowed ips: 100.64.0.3/32
latest handshake: 12 seconds ago
transfer: 1.24 MiB received, 856.00 KiB sent
The WireGuard handshake shown by wg show is between your device and the peer device. The peer's public key belongs to the remote machine, not the DERP relay. The relay is a transport layer beneath this; it never participates in the WireGuard key exchange.
Worth noting: tcpdump on the DERP server captures the DERP protocol running inside TLS over TCP, not raw WireGuard UDP packets. The WireGuard ciphertext is nested inside the DERP framing, which is itself inside TLS. Two layers of encryption sit between the relay's network interface and the actual application data.
Troubleshooting and Performance Tuning
Common Deployment Pitfalls
TLS certificate failures are the most frequent issue. When using --certmode=letsencrypt, derper uses the TLS-ALPN-01 challenge by default, which requires port 443 to be reachable. If you're using an alternative ACME client or setup that relies on HTTP-01 validation, port 80 must be accessible. Let's Encrypt also enforces rate limits: if you've been testing repeatedly with failed attempts, you may hit the duplicate certificate or failed validation limits. Check derper logs for ACME errors.
Missing STUN port: If you enabled --stun but forgot to allow UDP 3478 inbound in your firewall or security group, the STUN functionality silently fails. Peers connecting through your network won't benefit from STUN-assisted NAT discovery. The DERP relay itself still works over TCP 443, but you lose the supplementary NAT traversal assistance.
The --verify-clients tradeoff: With this flag enabled, only authenticated tailnet members can use the relay. This is the right default for most deployments. Without it, anyone who discovers your relay's hostname could use it to relay their own traffic, eating your bandwidth. However, enabling client verification requires the DERP server to be a member of your tailnet, which means running tailscaled alongside derper on the same machine.
A relay hop through a DERP server 200ms away doubles the effective latency penalty compared to one 20ms away. This is one of the strongest arguments for self-hosting in specific geographic regions.
Monitoring Relay Health
Key metrics to track for a production DERP deployment:
- Active connections: How many peers are currently using the relay.
- Bandwidth throughput: Sustained relay traffic indicates peers that can't establish direct connections. Investigate whether those peers have a path to upgrade.
- Relay-to-direct upgrade rate: If most connections stay relayed indefinitely, the problem is likely persistent (symmetric NAT, blocked UDP) rather than transient.
The derper binary exposes a built-in debug endpoint (typically at /debug/) with runtime metrics including active connections and traffic counters. You can also use tailscale debug derp-map to inspect the active DERP map configuration as seen by a client:
$ tailscale debug derp-map
{
"Regions": {
"1": {
"RegionID": 1,
"RegionCode": "nyc",
"RegionName": "New York City",
"Nodes": [ ... ]
},
"900": {
"RegionID": 900,
"RegionCode": "myderp",
"RegionName": "My Custom DERP",
"Nodes": [
{
"Name": "900a",
"HostName": "derp.example.com",
"IPv4": "198.51.100.10",
"DERPPort": 443,
"STUNPort": 3478
}
]
}
}
}
On the DERP server itself, log output provides connection-level visibility:
2025/01/15 14:23:01 derper: client nodekey:abc123... connected
2025/01/15 14:23:01 derper: client nodekey:abc123... using home region 900
2025/01/15 14:23:05 derper: relay packet from nodekey:abc123... to nodekey:def456... (148 bytes)
2025/01/15 14:23:06 derper: relay packet from nodekey:def456... to nodekey:abc123... (92 bytes)
These logs confirm which peers are connecting and that relay activity is happening. The packet sizes visible in logs are the sizes of the encrypted WireGuard payloads, not plaintext content.
DERP vs. the Alternatives: Where It Fits in the Ecosystem
| Feature | DERP (Tailscale) | TURN (WebRTC) | Cloudflare Tunnel | ngrok | Plain WireGuard Relay |
|---|---|---|---|---|---|
| E2E encryption preserved | Yes (WireGuard) | Depends on SRTP/DTLS | TLS terminates at edge | TLS terminates at edge | Yes (WireGuard) |
| Self-hostable | Yes (derper) |
Yes (coturn, etc.) | No (SaaS) | Limited (ngrok agent) | Yes |
| Protocol | HTTPS/TCP 443 | UDP (or TCP fallback) | HTTPS/QUIC | HTTPS | UDP (WireGuard) |
| Firewall friendliness | Excellent (443/TCP) | Moderate (UDP preferred) | Excellent | Excellent | Poor (custom UDP ports) |
| Primary use case | Tailscale P2P fallback | WebRTC media relay | Ingress/reverse proxy | Ingress/dev tunnels | Manual VPN relay |
| Integrated NAT traversal | Yes (with STUN + upgrade) | Yes (ICE framework) | N/A (ingress model) | N/A (ingress model) | Manual |
DERP occupies a specific niche: it's a fallback relay for WireGuard-based mesh networking, not a general-purpose tunnel or ingress product. Cloudflare Tunnel and ngrok solve a fundamentally different problem (exposing local services to the internet). TURN is the closest analog, acting as a relay of last resort in ICE-negotiated connections, but TURN is designed for relaying arbitrary UDP streams and doesn't inherently guarantee E2E encryption without the application layer (SRTP/DTLS) handling it. DERP gets E2E encryption for free because WireGuard encryption happens before packets ever reach the relay.
Direct When Possible, Relayed When Necessary
Tailscale's philosophy is simple: establish a direct WireGuard connection whenever the network allows it, and fall back to DERP relay when it doesn't. The relay preserves end-to-end encryption by design, meaning the fallback path is not a security compromise.
Tailscale's philosophy is simple: establish a direct WireGuard connection whenever the network allows it, and fall back to DERP relay when it doesn't. The relay preserves end-to-end encryption by design, meaning the fallback path is not a security compromise. Deploy the self-hosted configuration template above, run the verification steps, and confirm with your own packet captures that the relay sees nothing but ciphertext. That's the proof that matters.