CrashLoopBackOff is Kubernetes telling you: "the container crashed, I restarted it, it crashed again, and I'm now slowing down retries to avoid thrashing the node." Kubernetes itself is fine — your container is the problem. Here's how to find and fix it systematically.

Step 1: Check the Restart Count and State

kubectl get pod <pod-name> -n <namespace>

# NAME                     READY  STATUS             RESTARTS   AGE
# api-7d8f9c-xk2p9         0/1    CrashLoopBackOff   14         22m

A restart count of 14 in 22 minutes tells you the crash is fast and consistent, not intermittent. That points to a startup failure rather than a runtime bug.

Step 2: Read the Previous Container's Logs

kubectl logs <pod-name> -n <namespace> --previous

This is the most important command. The container is dead, so you need --previous. Common patterns to look for:

  • Configuration error: Error: config file not found, environment variable DB_HOST is required
  • Port conflict: listen tcp :8080: bind: address already in use
  • Missing dependency: dial tcp: connection refused on startup
  • Panic / fatal error: stack trace in Go, Python traceback, Java exception

If logs are empty, the container is exiting before writing anything — look at the exit code instead.

Step 3: Read the Exit Code

kubectl describe pod <pod-name> -n <namespace>

# Look for:
# Last State:     Terminated
#   Reason:       OOMKilled   (or Error)
#   Exit Code:    137

Common exit codes and their meaning:

  • 0 — container exited cleanly (but shouldn't have — check your CMD)
  • 1 — general application error
  • 137 — OOMKilled (SIGKILL from kernel / cgroup)
  • 139 — segmentation fault
  • 143 — SIGTERM not handled (graceful shutdown timeout)

Fixing OOMKilled (Exit 137)

The container exceeded its resources.limits.memory. Find the peak usage:

kubectl top pod <pod-name> -n <namespace> --containers

Then raise the limit in your Deployment:

# deployment.yaml
containers:
- name: api
  resources:
    requests:
      memory: "256Mi"
    limits:
      memory: "1Gi"   # was 512Mi
Always set requests lower than limits for memory. A request equal to the limit means the pod gets a Guaranteed QoS class and is less likely to be evicted under node pressure, but also means a single spike kills the container.

Fixing Bad Configuration / Missing Secrets

If the app exits because it can't find a required environment variable or config file:

# Verify all referenced secrets exist
kubectl get secret <secret-name> -n <namespace>

# Check what env vars are actually injected
kubectl exec <pod-name> -- env | sort

# For configmap mounts, confirm the key exists
kubectl get configmap <name> -o yaml

Fixing Liveness Probe Misconfiguration

A liveness probe that fires too early is a classic trap. If your app takes 20 seconds to initialize but the probe starts at 10 seconds, Kubernetes kills the container before it's ready:

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30   # was 10 — give the app time to boot
  periodSeconds: 10
  failureThreshold: 3

Fixing an Infinite Restart Loop from a Bad CMD

If a container's entrypoint exits immediately (exit 0), Kubernetes will restart it indefinitely. This happens with script containers that complete their task and exit cleanly. For one-shot jobs, use a Job object instead of a Deployment. For persistent processes, make sure the command doesn't return — e.g., use nginx -g "daemon off;" instead of just nginx.

Still Stuck?

If you've worked through all the above and the crash is still unclear, the root cause is often a subtle interaction — a startup race condition, a secret that exists but has a wrong key name, or a resource quota at namespace level blocking the pod. These take time to correlate manually.