๐ Step-by-Step: Troubleshoot EKS Node Not Joining the Cluster
✅ 1. Check Node Status in EC2
- Log into AWS Console > EC2:
- Verify that the EC2 instance for the node is running and in a public/private subnet as expected.
- Check tags – ensure they include:
kubernetes.io/cluster/<cluster-name> = owned
orshared
✅ 2. Check Node IAM Role
- Go to the EC2 instance > Check the IAM role attached.
- Confirm that the role has the following AWS managed policies:
AmazonEKSWorkerNodePolicy AmazonEC2ContainerRegistryReadOnly AmazonEKS_CNI_Policy
✅ Also ensure the role is listed in your aws-auth
ConfigMap.
✅ 3. Check aws-auth ConfigMap in EKS
If the IAM role of the EC2 node is not mapped to Kubernetes, the node won't join.
kubectl get configmap aws-auth -n kube-system -o yaml
Look for:
mapRoles: | - rolearn: arn:aws:iam::<account-id>:role/<your-node-instance-role> username: system:node:{{EC2PrivateDNSName}} groups: - system:bootstrappers - system:nodes
If missing, add it:
kubectl edit configmap aws-auth -n kube-system
✅ 4. Check Logs on the Node (via SSH)
- SSH into the instance using the key pair.
- Check kubelet and bootstrap logs:
# Bootstrap logs cat /var/log/cloud-init-output.log # Kubelet logs journalctl -u kubelet -xe
You may see common errors like:
- IAM role not authorized
- Incorrect cluster endpoint
- TLS certificate errors
✅ 5. Check Cluster Endpoint and Bootstrap Script
If you’re using a custom AMI or self-managed node group, ensure the bootstrap script is being run properly.
Look for this in user-data
:
#!/bin/bash /etc/eks/bootstrap.sh <cluster-name>
Validate it's running:
cat /var/log/cloud-init-output.log
✅ 6. Check Security Groups and Networking
- The node's security group allows outbound HTTPS (443) to EKS and S3 endpoints.
- The control plane security group allows traffic from the node's security group.
- If you're using private subnets, ensure NAT Gateway or interface endpoints (VPC endpoints) are properly set.
✅ 7. Check Node Group Events (if using managed node group)
aws eks describe-nodegroup \ --cluster-name <cluster> \ --nodegroup-name <nodegroup>
Look under status
, health.issues
, or use:
kubectl get nodes
✅ 8. Check if Node is Registered with Cluster
kubectl get nodes
- If the node is missing: It failed to register (likely bootstrap or IAM issue)
- If the node is in NotReady state: There’s a runtime issue (e.g., containerd, kubelet, CNI)
๐ Common Fixes
Problem | Fix |
---|---|
IAM role not in aws-auth | Add role using kubectl edit configmap aws-auth -n kube-system |
Kubelet errors | Check /var/log/messages , journalctl -u kubelet |
Networking issue | Update SGs, check subnet routing/NAT |
Bootstrap script not running | Verify user-data and cloud-init logs |
Missing policies | Attach AmazonEKSWorkerNodePolicy , EKS_CNI_Policy , etc. |
๐งช Optional: Run a Quick Node Debug DaemonSet
kubectl apply -f https://raw.githubusercontent.com/Azure/aks-periscope/main/deployment/debug-daemonset.yaml
This runs a privileged pod on all nodes and gives more insight.
✅ Required Tags for Discovery
These tags are required for:
- The EKS control plane to discover worker nodes
- The VPC CNI plugin to discover subnets
- The Cluster Autoscaler to manage scaling
Examples:
- For EC2 Instances (Nodes):
Key:kubernetes.io/cluster/my-eks-cluster
Value:owned
- For Subnets:
Key:kubernetes.io/cluster/my-eks-cluster
Value:shared
- For ELBs (provisioned by Kubernetes):
Key:kubernetes.io/cluster/my-eks-cluster
Value:owned
๐งช Example in Practice
- A VPC shared across multiple EKS clusters.
- You want Cluster A to use subnet-1, and Cluster B to use the same subnet.
Then you tag subnet-1
as:
kubernetes.io/cluster/cluster-a = shared kubernetes.io/cluster/cluster-b = shared
If instead subnet-1
is only used by cluster-a:
kubernetes.io/cluster/cluster-a = owned
✅ TL;DR
Resource | Required? | Tag Example |
---|---|---|
EC2 Node | ✅ Yes | kubernetes.io/cluster/my-cluster = owned |
Subnet | ✅ Yes | kubernetes.io/cluster/my-cluster = shared |
Security Group | Optional* | Sometimes required for Load Balancer discovery |
ELB | ✅ Yes | kubernetes.io/cluster/my-cluster = owned |
No comments:
Post a Comment