Kubernetes Pod Debugging Guide
🔍 Step-by-Step: Debugging a Crashing or Problematic Pod in Kubernetes
✅ 1. Check Pod Status
kubectl get pods -n <namespace>
- STATUS: CrashLoopBackOff, Error, ImagePullBackOff, Pending, etc.
- RESTARTS: Helps understand how often it's failing.
✅ 2. Describe the Pod
kubectl describe pod <pod-name> -n <namespace>
- Check Events at the bottom: scheduling issues, volume mount errors, etc.
- Check Container State (waiting, terminated, reason)
✅ 3. Get Pod Logs
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> -c <container-name>
- Use
--previous
if the pod restarted and you want logs from the prior container:
kubectl logs --previous <pod-name> -n <namespace>
✅ 4. Common Pod Failure States
- CrashLoopBackOff: The container keeps crashing on startup
- ImagePullBackOff / ErrImagePull: Image is incorrect or unauthenticated
- OOMKilled: Out of memory — check resource limits
- ContainerCreating: Volume or node issues
- Completed: Pod exited successfully (common for Jobs)
✅ 5. Exec Into the Pod (If Running)
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
- Explore logs/configs/environment manually
✅ 6. Check Events at Namespace Level
kubectl get events -n <namespace> --sort-by=.metadata.creationTimestamp
✅ 7. Look for Liveness/Readiness Probe Failures
kubectl describe pod <pod-name>
- Check if probes are misconfigured and causing restarts.
✅ 8. Resource Limits
- Check if the pod is being OOMKilled (killed due to memory)
kubectl describe pod <pod-name>
- Look for:
State: Terminated Reason: OOMKilled
✅ 9. Pod Stuck in Pending
- No nodes available, missing resources, or bad nodeSelector/toleration
kubectl describe pod <pod-name>
✅ 10. Look at Node or DaemonSet Logs (for CNI/containerd issues)
- If pod never gets created or stuck, may be CNI/networking issue
kubectl logs <node-name> -n kube-system -c aws-node
🛠️ Optional: Use Stern to Tail Pod Logs Across Containers
stern <pod-name> -n <namespace>
📌 TL;DR - Common Fixes for Crashing Pods
- CrashLoopBackOff: Application error or misconfigured command
- OOMKilled: Memory limit too low — increase it
- ImagePullBackOff: Bad image or no access to private registry
- Pending: No schedulable nodes or resource constraints
- Probe failures: Health check misconfigured
No comments:
Post a Comment