9 Essential Terraform Practices for Safe IaC

Terraform practices that prevent the 3am “who deleted production” incident are not the ones the official tutorials teach. The basics — write HCL, run plan, run apply — are easy. The hard part is module structure, state management, drift handling, and team coordination at scale. Teams running infrastructure-as-code well in 2026 are using a tight set of patterns that have emerged from a decade of Terraform pain. Here is what to actually do.

Table of Contents

Remote State Is Mandatory

programming, html, css, javascript, php, website development, code, html code, computer code, coding, digital, computer programming, pc, www, cyberspace, programmer, web development, computer, technology, developer, computer programmer, internet, ide, lines of code, hacker, hacking, gray computer, gray technology, gray laptop, gray website, gray internet, gray digital, gray web, gray code, gray coding, gray programming, programming, programming, programming, javascript, code, code, code, coding, coding, coding, coding, coding, digital, web development, computer, computer, computer, technology, technology, technology, developer, internet, hacker, hacker, hacker, hacking — Photo by Boskampi on Pixabay

Local state files are for tutorials. Any real team needs remote state — S3 with DynamoDB locking, Terraform Cloud, Spacelift, or one of the open-source backends. Without remote state, two engineers running apply simultaneously will corrupt your state and your day.

S3 + DynamoDB lock + state encryption + bucket versioning is the AWS-native pattern that costs near zero. The official Terraform S3 backend documentation covers the configuration. Set this up before you write your first resource.

Modules for Reuse, Not Wrapping

Terraform modules should encapsulate meaningful patterns — a complete VPC with subnets and routing, a Postgres RDS instance with monitoring and backups, a Lambda function with its IAM role and CloudWatch logs.

Anti-pattern: thin wrappers around single resources that just rename the inputs. These add maintenance burden without abstracting anything. If your module is shorter than the resource it wraps, delete it. The Terraform Registry has hundreds of well-designed examples to study.

Plan in CI, Apply With Approval

The pattern is: PR opens, CI runs `terraform plan`, plan output is commented on the PR. Reviewers see exactly what will change. After merge, apply runs (with manual approval gate for production).

Tools like Atlantis, Spacelift, env0, and Terraform Cloud automate this loop. Without it, you have engineers running apply locally with whatever credentials they have, and no audit trail. See our CI/CD pipeline setup guide for the broader pipeline patterns.

Workspace Per Environment

Use separate state files per environment (dev, staging, production), not a single state with workspaces or count-based environment switching. The blast radius of a mistake should be one environment, not everything.

Use the same modules across environments with different variable values. Use a parent stack pattern (one Terraform configuration per environment that calls shared modules) for clarity. The HashiCorp recommended workflow is well-documented and worth following.

Drift Detection Catches Reality

Terraform manages what it knows about. Manual changes in the cloud console (the inevitable production hotfix) create drift that breaks the next apply. Run `terraform plan` on a schedule (daily or weekly) to detect drift early.

Tools like driftctl and HashiCorp’s own drift detection in Terraform Cloud automate this. Surface drift in your team’s chat — surprises during the next intentional change are how production goes down. The HashiCorp drift detection blog post covers the patterns.

Wrap Up

Terraform practices that work focus on team coordination as much as code quality. Remote state with locking, meaningful modules, plan-in-CI workflows, separate environments, and active drift detection. Most production Terraform incidents come from skipping these patterns rather than from bugs in the code itself. Combine with observability practices so your infrastructure changes show up in the same dashboards as your application changes.

Frequently Asked Questions

Should I use Terraform or OpenTofu?

OpenTofu is the open-source fork after the BSL license change in 2023. APIs are compatible. Pick OpenTofu for license-sensitive contexts; Terraform if you are already heavy on the HashiCorp ecosystem (Vault, Consul, Cloud).

How big should a state file be?

Aim for under 1000 resources per state. Larger states slow plan and apply, increase blast radius, and make refactoring painful. Split by team ownership or logical service boundaries.

Pulumi vs Terraform?

Pulumi for teams that strongly prefer programming languages over HCL. Terraform for the bigger ecosystem and longer track record. Both are good; the right answer depends on team preferences.

How do I handle secrets in Terraform?

Never put plaintext secrets in HCL. Use AWS Secrets Manager / Vault references, or pass secrets as variables marked `sensitive = true`. State files contain everything Terraform manages — encrypt them and limit access.

Should I import existing infrastructure?

Yes if you plan to manage it long-term. Use `terraform import` (or the new `import` block in 1.5+) to bring existing resources under management. Plan for several iterations to get the configuration matching reality.

9 Essential Terraform Practices That Save Your Infrastructure