CrashLoopBackOff is Kubernetes telling you: "the container crashed, I restarted it, it crashed again, and I'm now slowing down retries to avoid thrashing the node." Kubernetes itself is fine — your container is the problem. Here's how to find and fix it systematically.
Step 1: Check the Restart Count and State
kubectl get pod <pod-name> -n <namespace> # NAME READY STATUS RESTARTS AGE # api-7d8f9c-xk2p9 0/1 CrashLoopBackOff 14 22m
A restart count of 14 in 22 minutes tells you the crash is fast and consistent, not intermittent. That points to a startup failure rather than a runtime bug.
Step 2: Read the Previous Container's Logs
kubectl logs <pod-name> -n <namespace> --previous
This is the most important command. The container is dead, so you need --previous. Common patterns to look for:
- Configuration error:
Error: config file not found,environment variable DB_HOST is required - Port conflict:
listen tcp :8080: bind: address already in use - Missing dependency:
dial tcp: connection refusedon startup - Panic / fatal error: stack trace in Go, Python traceback, Java exception
If logs are empty, the container is exiting before writing anything — look at the exit code instead.
Step 3: Read the Exit Code
kubectl describe pod <pod-name> -n <namespace> # Look for: # Last State: Terminated # Reason: OOMKilled (or Error) # Exit Code: 137
Common exit codes and their meaning:
0— container exited cleanly (but shouldn't have — check your CMD)1— general application error137— OOMKilled (SIGKILL from kernel / cgroup)139— segmentation fault143— SIGTERM not handled (graceful shutdown timeout)
Fixing OOMKilled (Exit 137)
The container exceeded its resources.limits.memory. Find the peak usage:
kubectl top pod <pod-name> -n <namespace> --containers
Then raise the limit in your Deployment:
# deployment.yaml
containers:
- name: api
resources:
requests:
memory: "256Mi"
limits:
memory: "1Gi" # was 512Mirequests lower than limits for memory. A request equal to the limit means the pod gets a Guaranteed QoS class and is less likely to be evicted under node pressure, but also means a single spike kills the container.Fixing Bad Configuration / Missing Secrets
If the app exits because it can't find a required environment variable or config file:
# Verify all referenced secrets exist kubectl get secret <secret-name> -n <namespace> # Check what env vars are actually injected kubectl exec <pod-name> -- env | sort # For configmap mounts, confirm the key exists kubectl get configmap <name> -o yaml
Fixing Liveness Probe Misconfiguration
A liveness probe that fires too early is a classic trap. If your app takes 20 seconds to initialize but the probe starts at 10 seconds, Kubernetes kills the container before it's ready:
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30 # was 10 — give the app time to boot
periodSeconds: 10
failureThreshold: 3Fixing an Infinite Restart Loop from a Bad CMD
If a container's entrypoint exits immediately (exit 0), Kubernetes will restart it indefinitely. This happens with script containers that complete their task and exit cleanly. For one-shot jobs, use a Job object instead of a Deployment. For persistent processes, make sure the command doesn't return — e.g., use nginx -g "daemon off;" instead of just nginx.
Still Stuck?
If you've worked through all the above and the crash is still unclear, the root cause is often a subtle interaction — a startup race condition, a secret that exists but has a wrong key name, or a resource quota at namespace level blocking the pod. These take time to correlate manually.