AI Code Review for Terraform PRs: CI Checklist and Automatio

Your Terraform PR passed tflint and checkov — but the AI reviewer just flagged that you’re about to delete a production RDS instance and nobody noticed. That’s the exact scenario that pushed our team to build a structured AI Terraform PR review process into CI. Not as a replacement for human review. As a safety net that catches what humans miss when they’re tired, rushed, or reviewing their twelfth PR of the day.

This checklist is for teams running Terraform 1.5+ in GitHub Actions, GitLab CI, or Atlantis pipelines. Every item here came from either a real incident or a near-miss. Some of it will be obvious. A few items will surprise you.

Why This Checklist

A typical IaC pull request touches anywhere from 10 to 50 resources. Under time pressure — and reviewers are almost always under time pressure — human engineers miss roughly 30% of policy violations. I’ve seen it happen on our own team. A reviewer approves a PR with an unencrypted S3 bucket because the checkov annotation looks fine at a glance, but the skip reason references a deprecated ADR. Nobody catches it until a compliance scan runs two weeks later.

AI code review tools like CodeRabbit, custom GPT-4o scripts, and Atlantis LLM hooks analyze the full plan diff in seconds. They don’t get tired. They don’t skip the last file because standup starts in five minutes. But they also hallucinate. They confuse AWS provider v4 attributes with v5 changes. They miss secrets buried in .auto.tfvars files. That’s exactly why you need a checklist — not to trust the AI blindly, but to define what it must check and what it cannot be trusted to catch alone.

One more thing worth saying upfront: adding AI review as a blocking CI gate on day one is a mistake. Run it as an advisory check for two weeks first. Let the team calibrate signal versus noise. Then promote it to blocking. I learned this the hard way after we blocked three legitimate PRs because the model flagged a lifecycle { prevent_destroy = true } block as “suspicious” on a stateful resource it didn’t recognize.

The AI Code Review Checklist for Terraform PRs

Each item below is something you configure once and enforce on every PR. The first four are static analysis gates. Items five through eight are AI-specific prompt checks. Nine through twelve cover plan diff review. The final three address state hygiene.

Run tflint v0.50+ with --format=json. Earlier versions produce text output that’s unparseable by downstream scripts. Pin the version explicitly in your CI — don’t pull latest.
Wire tflint exit codes correctly. Exit code 1 means violations found. Exit code 2 means the tool itself errored. Most pipelines treat both as the same failure. They are not. A tool error should page someone; a lint violation should block the PR.
Run checkov 3.x on changed files only. Running checkov on the entire repo on every PR is slow and noisy. Use git diff to scope it to changed .tf files.
Distinguish checkov exit code 1 vs 2. Same issue as tflint — exit code 2 is a tool error, not a policy violation. Wire them separately in your CI logic.
Prompt the AI to flag hardcoded credentials. Static tools miss credentials embedded in locals blocks or passed as default values in variable declarations. Include this explicitly in your system prompt.
Prompt for missing lifecycle blocks on stateful resources. RDS instances, ElastiCache clusters, and S3 buckets without prevent_destroy = true are a silent risk. The AI catches this consistently when you ask for it explicitly.
Prompt for untagged resources. Define your required tags (env, owner, cost-center) in the system prompt. The model will flag resources missing any of them.
Prompt for overly permissive IAM. Wildcard actions on wildcard resources. Effect: Allow on *. The AI is genuinely good at spotting these patterns across multiple policy documents simultaneously.
Generate a JSON plan and feed it to the AI. Run terraform plan -out=tfplan followed by terraform show -json tfplan. This gives the model the actual planned state, not just HCL syntax — which is a fundamentally different and more accurate input.
Extract only changed resources before sending to the AI. A full JSON plan for 300 resources hits 80k–110k tokens. GPT-4o’s context window is 128k tokens — you’ll overflow it on large repos. Filter to changed resources only using jq.
Ask the AI to produce a risk score. LOW, MEDIUM, HIGH, CRITICAL. This gives reviewers a fast signal before they read the full comment. It also makes the output machine-parseable if you want to auto-block CRITICAL findings later.
Flag destructive changes explicitly. Use the jq filter below to isolate deletes and replacements. Send this subset to the AI with elevated attention in the prompt. A delete on a database should always surface as CRITICAL.
Verify remote backend configuration is present. No local .tfstate files committed. State locking enabled via DynamoDB. Backend config not hardcoded with account IDs.
Check for state lock contention risk. Running terraform plan without -lock=false in CI causes lock contention when multiple PRs run simultaneously. This one has burned us twice.
Pin provider versions. version = "~> 5.0" in the required_providers block. AI models trained before mid-2023 may suggest deprecated constraint syntax — validate suggestions against the official Terraform provider requirements docs.

Here’s the GitHub Actions workflow that runs this entire checklist. It handles tflint, checkov, plan generation, resource filtering, and the OpenAI API call in a single job:

# .github/workflows/terraform-ai-review.yml
# Requires: OPENAI_API_KEY secret, AWS credentials for plan, terraform 1.5+

name: Terraform AI Code Review

on:
  pull_request:
    paths:
      - '**.tf'
      - '**.tfvars'

jobs:
  ai-review:
    runs-on: ubuntu-22.04
    permissions:
      pull-requests: write   # needed to post review comments
      contents: read

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Terraform 1.7.x
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.7.5"
          terraform_wrapper: false   # wrapper breaks JSON output parsing

      - name: Setup tflint v0.50.3
        run: |
          curl -s https://raw.githubusercontent.com/terraform-linters/tflint/master/install_linux.sh | \
            TFLINT_VERSION=v0.50.3 bash

      - name: Run tflint (JSON output for downstream parsing)
        run: |
          tflint --format=json --recursive > tflint-results.json || true
          # exit code 2 = error, exit code 1 = lint violations — handle separately
          EXIT=$?; if [ $EXIT -eq 2 ]; then echo "tflint tool error" && exit 2; fi

      - name: Run checkov on changed files only
        run: |
          pip install checkov==3.2.0 --quiet
          # get list of changed .tf files from git diff
          CHANGED=$(git diff --name-only origin/${{ github.base_ref }}...HEAD | grep '\.tf$' | tr '\n' ',')
          checkov -f "$CHANGED" \
            --output json \
            --compact \
            --skip-check CKV2_AWS_5 \
            > checkov-results.json || true

      - name: Terraform Init + Plan (generate JSON plan)
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          terraform init -input=false -backend-config="key=pr-${{ github.event.pull_request.number }}.tfstate"
          terraform plan -lock=false -input=false -out=tfplan
          terraform show -json tfplan > tfplan.json

      - name: Extract changed resources only (reduce token usage)
        run: |
          jq '[.resource_changes[] | select(.change.actions != ["no-op"]) |
            {address, actions: .change.actions, before: .change.before, after: .change.after}]' \
            tfplan.json > tfplan-diff.json
          # log token estimate: ~4 chars per token
          CHARS=$(wc -c < tfplan-diff.json)
          echo "Estimated tokens: $((CHARS / 4))"

      - name: Send to OpenAI for review + post PR comment
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
          REPO: ${{ github.repository }}
        run: |
          python3 .github/scripts/ai_review.py \
            --plan tfplan-diff.json \
            --tflint tflint-results.json \
            --checkov checkov-results.json \
            --pr "$PR_NUMBER" \
            --repo "$REPO"

And here’s the Python script that calls GPT-4o and posts the structured review comment back to the PR. It includes exponential backoff via tenacity to handle OpenAI rate limits — tier-1 accounts cap at 500 RPM on GPT-4o, which a busy monorepo with 20 simultaneous PRs will absolutely hit:

# .github/scripts/ai_review.py
# Posts structured AI review comment to GitHub PR
# Requires: openai>=1.14.0, requests, tenacity

import argparse, json, os, sys
import openai
import requests
from tenacity import retry, stop_after_attempt, wait_exponential

SYSTEM_PROMPT = """You are a senior DevOps engineer reviewing Terraform infrastructure changes.
Analyze the provided plan diff and linter results. Return a structured review with:
1. RISK_LEVEL: LOW | MEDIUM | HIGH | CRITICAL
2. DESTRUCTIVE_CHANGES: list any resource deletions or replacements
3. SECURITY_FINDINGS: IAM over-permissions, open security groups, missing encryption
4. MISSING_TAGS: resources lacking required tags (env, owner, cost-center)
5. RECOMMENDATIONS: max 5 bullet points, actionable only
Do NOT hallucinate resource attributes. If unsure, say 'verify in provider docs'."""

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=30))
def call_openai(plan_content: str, lint_content: str) -> str:
    client = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = client.chat.completions.create(
        model="gpt-4o",
        max_tokens=1500,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"PLAN DIFF:\n{plan_content}\n\nLINTER FINDINGS:\n{lint_content}"}
        ]
    )
    return response.choices[0].message.content

def post_pr_comment(repo: str, pr: str, body: str):
    url = f"https://api.github.com/repos/{repo}/issues/{pr}/comments"
    headers = {
        "Authorization": f"Bearer {os.environ['GH_TOKEN']}",
        "Accept": "application/vnd.github+json"
    }
    resp = requests.post(url, json={"body": f"## 🤖 AI Terraform Review\n\n{body}"}, headers=headers)
    resp.raise_for_status()

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--plan"); parser.add_argument("--tflint")
    parser.add_argument("--checkov"); parser.add_argument("--pr"); parser.add_argument("--repo")
    args = parser.parse_args()

    plan = json.load(open(args.plan))
    tflint = json.load(open(args.tflint))

    # Truncate if over ~80k tokens (~320k chars) — keep first 300k chars
    plan_str = json.dumps(plan)[:300_000]
    lint_str = json.dumps(tflint)[:20_000]

    review = call_openai(plan_str, lint_str)
    post_pr_comment(args.repo, args.pr, review)
    print("AI review posted successfully")

if __name__ == "__main__":
    main()

Commonly Missed Items

These are the gaps that bite teams six months after they think the setup is complete.

AI models hallucinate provider attributes. This is the biggest one. AWS provider v5.x introduced breaking changes — renamed attributes, removed arguments, new required fields. Models trained before mid-2023 don’t know about these. When the AI tells you to add acl = "private" to an S3 bucket resource, it’s wrong — that argument was removed in AWS provider v4.0. The fix: always pin provider versions with ~> 5.0 in your required_providers block, and instruct the AI in the system prompt to validate suggestions against the actual plan output, not just HCL syntax. If the plan succeeds, the attributes are valid. If the AI contradicts the plan, trust the plan.

Watch out for: .tfvars and .auto.tfvars files being excluded from AI context. This is where secrets and environment-specific values live. Database passwords passed as variables, account IDs, CIDR blocks that reveal internal network topology. These files are often excluded from the AI context window either by accident (the workflow only globs *.tf) or intentionally (to avoid sending secrets to the API). The problem is that the AI then reviews IAM policies and security group rules without knowing the actual values they’ll be populated with. The solution is to redact sensitive values with sed or sops --decrypt | sed 's/=.*/=REDACTED/' before including them in the prompt, not to exclude the files entirely.

Module version pinning is silently skipped by most AI tools. A Terraform module sourced as source = "terraform-aws-modules/vpc/aws" without a version constraint will pull whatever is latest at plan time. This causes silent drift between environments. Static tools like tflint and checkov don’t flag this by default. The AI won’t flag it either unless you explicitly include “check for unpinned module versions” in your system prompt. Add it. I’ve seen a VPC module minor version bump change subnet behavior in production because nobody pinned it.

Watch out for: tfsec references in older pipelines. tfsec was merged into trivy as of v0.21. If your CI image still references the tfsec binary directly and it’s been upgraded to a newer image, you’ll get a silent no-op — the command exits 0 because the binary isn’t found and the shell swallows the error. Check your CI logs for actual tfsec output, not just a green checkmark.

The checkov skip annotation is itself a finding. When engineers add #checkov:skip=CKV_AWS_20:reason to suppress a finding, some AI configurations will flag the skip annotation as a new security concern without reading the reason. This creates noise. Include in your system prompt: “Treat checkov skip annotations as accepted risks if a reason is provided. Do not re-flag them as findings.”

For more on securing your Terraform CI pipelines and managing infrastructure secrets, see the related posts at kuryzhev.cloud.

Automation Ideas

Once the checklist is solid, the next step is removing every manual touch from the process.

GitHub Actions with CodeRabbit or a custom OpenAI script. The workflow above handles the custom script path. For CodeRabbit, the setup is simpler — install the GitHub App, add a .coderabbit.yaml config file, and it hooks into PR events automatically. CodeRabbit free tier covers 200 files per month; paid starts at $12/user/month. For most teams under 10 engineers, the free tier is sufficient for Terraform-only reviews. For larger teams or monorepos, the custom OpenAI script gives you more control over the system prompt and cost per call.

Atlantis pre/post-plan hooks. Atlantis added pre_workflow_hooks in v0.19.0. You can wire checkov as a pre-plan hook and the Python AI review script as a post-plan hook. The Atlantis config in atlantis.yaml at repo root controls this. Post-plan hooks receive the plan output path as an environment variable — pipe it through jq to extract changed resources, then pass to the AI script. This approach keeps everything server-side and avoids GitHub Actions runner costs for plan generation.

Cost management for large plans. OpenAI API calls on large JSON plan files with 500+ resources cost between $0.08 and $0.40 per PR. That adds up fast on active repos. Two mitigations: first, use the jq filter to send only changed resources, not the full plan. Second, cache plan outputs by content hash — if the plan JSON hasn’t changed between two PR pushes (force-push with no IaC changes), skip the API call entirely. Implement this with a sha256sum check in your workflow before the OpenAI step.

Security note on API calls. Sending your infrastructure topology to a third-party LLM is a real concern for regulated environments. Your plan JSON contains resource names, CIDR blocks, account IDs, and IAM policy structures. For HIPAA or PCI environments, use a self-hosted model — Ollama with CodeLlama 34B works reasonably well — or Azure OpenAI with data residency guarantees. Store OPENAI_API_KEY and REVIEWPAD_API_KEY as encrypted CI secrets, not repo variables. Repo variables are readable by all contributors in GitHub Actions. That’s not a theoretical risk — it’s a common misconfiguration.

Calibration period before blocking. Run AI review as a non-blocking advisory check for two weeks. Export the AI findings to a spreadsheet. Categorize each as true positive, false positive, or noise. Tune your system prompt to eliminate the false positive patterns. Then promote to blocking. Skipping the calibration period is how you end up with a team that ignores AI review comments because they’ve been burned by too many false alarms. The 45–90 second CI overhead is worth it — but only if the signal is trusted.