Sunday, 3 August 2025

Debugging a Pod in Kubernetes

Kubernetes Pod Debugging Guide

🔍 Step-by-Step: Debugging a Crashing or Problematic Pod in Kubernetes

✅ 1. Check Pod Status

kubectl get pods -n <namespace>
  • STATUS: CrashLoopBackOff, Error, ImagePullBackOff, Pending, etc.
  • RESTARTS: Helps understand how often it's failing.

✅ 2. Describe the Pod

kubectl describe pod <pod-name> -n <namespace>
  • Check Events at the bottom: scheduling issues, volume mount errors, etc.
  • Check Container State (waiting, terminated, reason)

✅ 3. Get Pod Logs

kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> -c <container-name>
  • Use --previous if the pod restarted and you want logs from the prior container:
kubectl logs --previous <pod-name> -n <namespace>

✅ 4. Common Pod Failure States

  • CrashLoopBackOff: The container keeps crashing on startup
  • ImagePullBackOff / ErrImagePull: Image is incorrect or unauthenticated
  • OOMKilled: Out of memory — check resource limits
  • ContainerCreating: Volume or node issues
  • Completed: Pod exited successfully (common for Jobs)

✅ 5. Exec Into the Pod (If Running)

kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
  • Explore logs/configs/environment manually

✅ 6. Check Events at Namespace Level

kubectl get events -n <namespace> --sort-by=.metadata.creationTimestamp

✅ 7. Look for Liveness/Readiness Probe Failures

kubectl describe pod <pod-name>
  • Check if probes are misconfigured and causing restarts.

✅ 8. Resource Limits

  • Check if the pod is being OOMKilled (killed due to memory)
kubectl describe pod <pod-name>
  • Look for: State: Terminated Reason: OOMKilled

✅ 9. Pod Stuck in Pending

  • No nodes available, missing resources, or bad nodeSelector/toleration
kubectl describe pod <pod-name>

✅ 10. Look at Node or DaemonSet Logs (for CNI/containerd issues)

  • If pod never gets created or stuck, may be CNI/networking issue
kubectl logs <node-name> -n kube-system -c aws-node

🛠️ Optional: Use Stern to Tail Pod Logs Across Containers

stern <pod-name> -n <namespace>

📌 TL;DR - Common Fixes for Crashing Pods

  • CrashLoopBackOff: Application error or misconfigured command
  • OOMKilled: Memory limit too low — increase it
  • ImagePullBackOff: Bad image or no access to private registry
  • Pending: No schedulable nodes or resource constraints
  • Probe failures: Health check misconfigured

No comments:

Post a Comment