How It Works: API Gateway & Traffic Control with Envoy
Updated: 2025-08-24
Summary
Envoy is a programmable edge proxy. Terminate TLS, route by host/path, enforce auth via ext_authz, handle WebSockets, shape traffic with retries/timeouts, and attach rate‑limit/backpressure.
Edge Responsibilities
- TLS termination (cert manager issues; Envoy mounts certs)
- Host/path routing and redirects
- Centralized auth (sessions/JWT via ext_authz to an auth service)
- WebSocket upgrades and idle timeouts
- Rate limiting (external service) and connection limits
- Header normalization, gzip/br encoding, HSTS
Minimal Listener + Virtual Hosts
static_resources:
listeners:
- name: https
address: { socket_address: { address: 0.0.0.0, port_value: 443 } }
filter_chains:
- filter_chain_match: { server_names: ["api.example.com","auth.example.com"] }
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain: { filename: "/etc/envoy/tls/tls.crt" }
private_key: { filename: "/etc/envoy/tls/tls.key" }
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
upgrade_configs: [{ upgrade_type: websocket }]
normalize_path: true
request_timeout: 15s
route_config:
virtual_hosts:
- name: api
domains: ["api.example.com"]
routes:
- match: { prefix: "/v1" }
route: { cluster: api_v1 }
- name: auth
domains: ["auth.example.com"]
routes:
- match: { prefix: "/" }
route: { cluster: auth_svc }
http_filters:
- name: envoy.filters.http.router
clusters:
- name: api_v1
connect_timeout: 0.5s
type: STRICT_DNS
load_assignment:
cluster_name: api_v1
endpoints:
- lb_endpoints:
- endpoint:
address: { socket_address: { address: api.apps.svc.cluster.local, port_value: 8080 } }
- name: auth_svc
connect_timeout: 0.5s
type: STRICT_DNS
load_assignment:
cluster_name: auth_svc
endpoints:
- lb_endpoints:
- endpoint:
address: { socket_address: { address: auth.apps.svc.cluster.local, port_value: 80 } }
Ext AuthZ (sessions/JWT) — allow/deny at the edge
# Add before router filter
http_filters:
- name: envoy.filters.http.ext_authz
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
http_service:
server_uri:
uri: auth.apps.svc.cluster.local
cluster: auth_svc
timeout: 1s
path_prefix: /oauth2/auth # oauth2-proxy style
authorization_request:
allowed_headers:
patterns: [{exact: cookie}, {exact: authorization}, {exact: x-forwarded-host}]
authorization_response:
allowed_upstream_headers:
patterns: [{exact: x-auth-request-user}, {exact: x-auth-request-email}]
- name: envoy.filters.http.router
Retries, Timeouts, and Circuit Breakers
# Per-route overrides
route:
timeout: 2s
retry_policy:
retry_on: 5xx,reset,connect-failure
num_retries: 2
per_try_timeout: 500ms
# Connection limits (protect backends)
circuit_breakers:
thresholds:
- max_connections: 1024
max_pending_requests: 512
max_requests: 2048
WebSocket Keep‑alives & Limits
# On the HTTP connection manager
stream_idle_timeout: 0s # don't kill long WS connections
idle_timeout: 300s # but cap idle HTTP streams
Cookie & Security Headers
# Example HSTS and frame options via header appender
response_headers_to_add:
- header: { key: "Strict-Transport-Security", value: "max-age=31536000; includeSubDomains" }
- header: { key: "X-Frame-Options", value: "DENY" }
- header: { key: "X-Content-Type-Options", value: "nosniff" }
Kubernetes Mount for TLS secrets (cert-manager)
# Deployment snippet
volumes:
- name: tls
secret: { secretName: envoy-tls }
containers:
- name: envoy
image: envoyproxy/envoy:v1.30.2
volumeMounts: [{ name: tls, mountPath: /etc/envoy/tls, readOnly: true }]
Observability
- Access log JSON with request_id, user, route, duration.
- Prometheus metrics: requests, 4xx/5xx, p95 latency, open connections.
- Tracing: propagate W3C headers; sample thoughtfully.
Security
- mTLS for upstreams if feasible; strictly validate host headers.
- Lock down admin interface; never expose it publicly.
- Regularly rotate TLS keys/certs; use SDS for zero‑downtime rotation.
Pitfalls
- Global timeouts too long → slow failures cascade.
- Missing fall‑through 404 vhost → accidental default routing.
- Over‑broad ext_authz → auth service outage becomes total outage.
Taylor Swift
“You need to calm down.”
Comments
Post a Comment