Sunday, 3 August 2025

Troubleshoot EKS Node Not Joining

Troubleshoot EKS Node Not Joining

๐Ÿ” Step-by-Step: Troubleshoot EKS Node Not Joining the Cluster

✅ 1. Check Node Status in EC2

  • Log into AWS Console > EC2:
    • Verify that the EC2 instance for the node is running and in a public/private subnet as expected.
    • Check tags – ensure they include:
      kubernetes.io/cluster/<cluster-name> = owned or shared

✅ 2. Check Node IAM Role

  • Go to the EC2 instance > Check the IAM role attached.
  • Confirm that the role has the following AWS managed policies:
AmazonEKSWorkerNodePolicy
AmazonEC2ContainerRegistryReadOnly
AmazonEKS_CNI_Policy
    

✅ Also ensure the role is listed in your aws-auth ConfigMap.

✅ 3. Check aws-auth ConfigMap in EKS

If the IAM role of the EC2 node is not mapped to Kubernetes, the node won't join.

kubectl get configmap aws-auth -n kube-system -o yaml

Look for:

mapRoles: |
  - rolearn: arn:aws:iam::<account-id>:role/<your-node-instance-role>
    username: system:node:{{EC2PrivateDNSName}}
    groups:
      - system:bootstrappers
      - system:nodes
    

If missing, add it:

kubectl edit configmap aws-auth -n kube-system

✅ 4. Check Logs on the Node (via SSH)

  • SSH into the instance using the key pair.
  • Check kubelet and bootstrap logs:
# Bootstrap logs
cat /var/log/cloud-init-output.log

# Kubelet logs
journalctl -u kubelet -xe
    

You may see common errors like:

  • IAM role not authorized
  • Incorrect cluster endpoint
  • TLS certificate errors

✅ 5. Check Cluster Endpoint and Bootstrap Script

If you’re using a custom AMI or self-managed node group, ensure the bootstrap script is being run properly.

Look for this in user-data:

#!/bin/bash
/etc/eks/bootstrap.sh <cluster-name>
    

Validate it's running:

cat /var/log/cloud-init-output.log

✅ 6. Check Security Groups and Networking

  • The node's security group allows outbound HTTPS (443) to EKS and S3 endpoints.
  • The control plane security group allows traffic from the node's security group.
  • If you're using private subnets, ensure NAT Gateway or interface endpoints (VPC endpoints) are properly set.

✅ 7. Check Node Group Events (if using managed node group)

aws eks describe-nodegroup \
  --cluster-name <cluster> \
  --nodegroup-name <nodegroup>
    

Look under status, health.issues, or use:

kubectl get nodes

✅ 8. Check if Node is Registered with Cluster

kubectl get nodes
  • If the node is missing: It failed to register (likely bootstrap or IAM issue)
  • If the node is in NotReady state: There’s a runtime issue (e.g., containerd, kubelet, CNI)

๐Ÿ›  Common Fixes

ProblemFix
IAM role not in aws-authAdd role using kubectl edit configmap aws-auth -n kube-system
Kubelet errorsCheck /var/log/messages, journalctl -u kubelet
Networking issueUpdate SGs, check subnet routing/NAT
Bootstrap script not runningVerify user-data and cloud-init logs
Missing policiesAttach AmazonEKSWorkerNodePolicy, EKS_CNI_Policy, etc.

๐Ÿงช Optional: Run a Quick Node Debug DaemonSet

kubectl apply -f https://raw.githubusercontent.com/Azure/aks-periscope/main/deployment/debug-daemonset.yaml

This runs a privileged pod on all nodes and gives more insight.

✅ Required Tags for Discovery

These tags are required for:

  • The EKS control plane to discover worker nodes
  • The VPC CNI plugin to discover subnets
  • The Cluster Autoscaler to manage scaling

Examples:

  • For EC2 Instances (Nodes):
    Key: kubernetes.io/cluster/my-eks-cluster
    Value: owned
  • For Subnets:
    Key: kubernetes.io/cluster/my-eks-cluster
    Value: shared
  • For ELBs (provisioned by Kubernetes):
    Key: kubernetes.io/cluster/my-eks-cluster
    Value: owned

๐Ÿงช Example in Practice

  • A VPC shared across multiple EKS clusters.
  • You want Cluster A to use subnet-1, and Cluster B to use the same subnet.

Then you tag subnet-1 as:

kubernetes.io/cluster/cluster-a = shared
kubernetes.io/cluster/cluster-b = shared
    

If instead subnet-1 is only used by cluster-a:

kubernetes.io/cluster/cluster-a = owned

✅ TL;DR

ResourceRequired?Tag Example
EC2 Node✅ Yeskubernetes.io/cluster/my-cluster = owned
Subnet✅ Yeskubernetes.io/cluster/my-cluster = shared
Security GroupOptional*Sometimes required for Load Balancer discovery
ELB✅ Yeskubernetes.io/cluster/my-cluster = owned

No comments:

Post a Comment