Saturday, 28 June 2025

Terraform Drift, Taint, and Refresh Management

 


1. Understanding Terraform Drift

What is Drift?

Drift occurs when real-world infrastructure (e.g., AWS resources) is changed manually or by external systems, resulting in a difference between the Terraform state and the actual environment.

Common Drift Scenarios:

a) Resource Modified Manually

  • Example: Instance type changed in AWS Console from t3.micro to t2.medium.

  • Terraform Plan Output:

    ~ aws_instance.web
        instance_type: "t3.micro" => "t2.medium"
    
  • Resolution:

    • Run terraform apply to revert to t3.micro

    • Or update .tf code to match new value and apply

b) Resource Deleted Manually

  • Example: S3 bucket deleted in AWS Console

  • Plan Output:

    -/+ aws_s3_bucket.example
    
  • Resolution:

    • Terraform will recreate the bucket on terraform apply, or

    • If the deletion was intentional and the resource should no longer be managed, remove it from the state file:

      terraform state rm aws_s3_bucket.example
      

c) Resource Added Manually

  • Example: Extra EC2 manually launched

  • Detection: Not shown in terraform plan

  • Resolution: Use terraform import to manage it:

    terraform import aws_instance.extra i-1234567890abcdef0
    

How to Detect Drift

terraform plan

Shows mismatch between desired config and actual infra. Also use:

terraform refresh

To update state file with real-time AWS values.


2. terraform refresh

What does it do?

  • Pulls latest real-world resource values

  • Updates .tfstate file accordingly

  • Does not modify infrastructure

Use Case Example:

If someone manually changes a tag or attribute in AWS:

# main.tf
resource "aws_instance" "web" {
  ami           = "ami-abc123"
  instance_type = "t3.micro"
  tags = {
    Name = "my-instance"
  }
}

Then someone manually changes the tag in AWS Console to:

Name = "production-instance"

Now you run:

terraform refresh

Terraform updates the state file with Name = "production-instance", even though the code still says my-instance.

Then:

terraform plan

Will show:

~ tags.Name: "production-instance" => "my-instance"

Terraform will revert it on apply unless you update your code.

What it doesn't do:

  • Doesn't detect new unmanaged resources

  • Doesn't auto-fix drifted resources

  • Doesn’t change infrastructure


3. terraform taint

What is it?

Manually mark a Terraform-managed resource for forced recreation on next apply.

Syntax:

terraform taint <resource_type>.<resource_name>

Common Use Cases:

a) EC2 instance stuck or unhealthy

terraform taint aws_instance.web
terraform apply

Terraform will destroy and recreate the instance.

b) RDS misbehaving

terraform taint aws_db_instance.db
terraform apply

c) Testing Provisioning Logic

To re-trigger user_data scripts:

terraform taint aws_instance.devbox
terraform apply

Undo a Taint

terraform untaint aws_instance.web

4. Comparing Drift vs. Taint

Feature Drift Taint
Triggered By Manual/external changes Manual terraform taint
Detected By plan, refresh User command
terraform plan Output Shows delta Shows tainted resource
terraform apply Fixes drift Recreates resource
Best For Syncing config with state Forcing fresh resource builds

5. Best Practices

  • Use terraform plan regularly to catch drift

  • Use terraform refresh before critical changes

  • Avoid manual changes in AWS — enforce infra as code

  • Use taint only when you’re sure recreation is safe

  • Store state in remote backends with locking (e.g., S3 + DynamoDB)

  • Use version control + CI pipelines to standardize changes


Would you like examples of GitHub Actions or Jenkins pipelines to automate drift detection and alerts?

No comments:

Post a Comment