How to Fail CI Pipelines on AWS Config NON_COMPLIANT Resources

Your Terraform apply exits 0, your pipeline goes green, and 8 minutes later AWS Config quietly flags a public S3 bucket in production — that gap between a clean CI run and a compliance violation is exactly where incidents live. We hit this exact scenario on a Wednesday afternoon, and it took us longer to explain to the security team than it did to actually fix the bucket. This article shows how we closed that window permanently by wiring an AWS Config CI pipeline gate directly into our GitHub Actions workflow.

The Problem We Hit

AWS Config CI pipeline gate illustration

It started with a routine infrastructure PR. A developer was adding a new S3 bucket for a data pipeline — nothing exotic, just a bucket with some lifecycle rules and a bucket policy. The PR went through code review. Terraform plan output looked clean. The security group changes were reviewed. The linter passed. Everything was green.

The PR merged. terraform apply completed in production at 14:32 UTC. At 14:43 UTC — eleven minutes later — a Slack alert fired from our AWS Config notification pipeline: S3_BUCKET_PUBLIC_READ_PROHIBITED: NON_COMPLIANT for the new bucket. The developer had set BlockPublicAcls: false inside a conditional block that only activated in the production workspace. The staging run never triggered it. Code review missed it because the conditional made it non-obvious.

Eleven minutes. That’s the damage window. In that window, the bucket existed in a non-compliant state, publicly accessible, with no automatic remediation configured. We locked it down manually within four minutes of the alert, but the window was already open.

Here’s the core tension: AWS Config is a detective control by default. It tells you what happened after the fact. It does not stop the deployment. It does not fail the pipeline. It sends you a notification while your non-compliant resource is already sitting in production doing whatever damage it’s capable of doing. Our CI pipeline had no idea AWS Config existed. The two systems were completely decoupled — and that decoupling was the real vulnerability.

We’d seen this pattern before with other teams, but assumed our review process was tight enough. It wasn’t. No review process is tight enough to catch every edge case in every environment. You need automated gates, not just human eyes.

Why It Happens

The architectural gap here is straightforward once you see it, but it’s invisible until you get burned.

AWS Config evaluates resources by listening to CloudTrail change events. When terraform apply calls the S3 API to create or modify a bucket, CloudTrail records the event, Config picks it up, and then runs the relevant rule evaluations. That entire chain — CloudTrail event delivery, Config ingestion, rule evaluation — introduces a latency of roughly 2 to 10 minutes post-deployment. The AWS::S3::Bucket change trigger fires after your IaC tool has already exited successfully.

Meanwhile, your CI pipeline — whether it’s GitHub Actions, GitLab CI, or CircleCI — has no native awareness of Config rule state. From the pipeline’s perspective, the infrastructure job is done the moment terraform apply returns exit code 0. There is no built-in “wait for compliance evaluation” step in any standard pipeline template. The pipeline moves on, posts a green checkmark, and the compliance evaluation happens in a completely separate async process that nobody is watching in real time.

The most common workaround I’ve seen is manually running aws configservice describe-compliance-by-config-rule after a deploy and eyeballing the output. That’s not a gate — that’s a manual audit step that gets skipped under deadline pressure. And there’s a critical gotcha here: describe-compliance-by-config-rule gives you a rule-level summary. If you have 50 compliant buckets and 1 new non-compliant one, the rule can still show COMPLIANT at the summary level because the aggregate is dominated by compliant resources. You need get-compliance-details-by-config-rule to see per-resource state — more on that in the fix section.

There’s also a lesser-known feature most teams don’t use: AWS Config’s Proactive evaluation mode, which went GA in November 2022. Proactive mode lets you call StartResourceEvaluation with a resource configuration payload before deployment and get a compliance verdict back synchronously. It’s genuinely useful, but it requires you to construct the resource configuration JSON manually or extract it from your Terraform plan, which adds complexity. We use the polling approach as our primary gate and reserve proactive evaluation for a future iteration. I’ll mention the API in the prevention section.

One more thing worth calling out: some teams disable the AWS Config recorder to cut costs. Config charges $0.003 per configuration item recorded. Across a busy account that adds up, so teams turn the recorder off. If your recorder is off, or scoped to exclude certain resource types, your rules will never fire — and you’ll get INSUFFICIENT_DATA forever. Always verify with aws configservice describe-configuration-recorders that allSupported: true is set.

The Fix (with Code)

The solution is a post-deploy polling script that sits between terraform apply and the pipeline’s success state. It calls get-compliance-details-by-config-rule in a loop, waits for Config’s evaluation to complete, and exits non-zero if anything comes back NON_COMPLIANT. The pipeline only goes green if every targeted rule passes for the deployed resource.

This script — .github/scripts/config-gate.sh — handles three states: NON_COMPLIANT (hard fail, exit 1), INSUFFICIENT_DATA (keep polling, evaluation hasn’t fired yet), and COMPLIANT (gate passed, exit 0). It also handles timeout separately with exit 2, so you can distinguish a compliance failure from an infrastructure or API failure in your alerting.

Watch out for this: setting your poll timeout to 30 seconds is a trap. Teams do this, see INSUFFICIENT_DATA, assume compliant, and exit 0. Config evaluation latency is real — minimum reliable timeout is 180 seconds, and we default to 300 seconds in production. Don’t fight the latency; work with it.

#!/usr/bin/env bash
# .github/scripts/config-gate.sh
# Polls AWS Config compliance for specified rules after terraform apply.
# Usage: CONFIG_RULES="RULE1,RULE2" RESOURCE_ID="my-bucket" ./config-gate.sh
# Exit 0 = all rules compliant, Exit 1 = NON_COMPLIANT found, Exit 2 = timeout

set -euo pipefail

# --- Configuration ---
RULES="${CONFIG_RULES:-}"                  # Comma-separated rule names from env
RESOURCE_ID="${RESOURCE_ID:-}"            # Specific resource to check (optional filter)
TIMEOUT="${CONFIG_GATE_TIMEOUT:-300}"     # Max wait in seconds (default 5 min)
POLL_INTERVAL=15                          # Seconds between polls
ELAPSED=0

if [[ -z "$RULES" ]]; then
  echo "ERROR: CONFIG_RULES environment variable is not set."
  exit 2
fi

IFS=',' read -ra RULE_ARRAY <<< "$RULES"

echo "==> Starting AWS Config compliance gate"
echo "    Rules   : ${RULES}"
echo "    Timeout : ${TIMEOUT}s"
echo "    Resource: ${RESOURCE_ID:-'(all resources)'}"

check_rule() {
  local rule_name="$1"
  local result

  # Scoped check: filter by specific resource ID when provided
  if [[ -n "$RESOURCE_ID" ]]; then
    result=$(aws configservice get-compliance-details-by-config-rule \
      --config-rule-name "$rule_name" \
      --compliance-types NON_COMPLIANT \
      --query "EvaluationResults[?EvaluationResultIdentifier.EvaluationResultQualifier.ResourceId=='${RESOURCE_ID}'].ComplianceType" \
      --output text 2>&1)
  else
    # No resource ID: check if ANY resource is NON_COMPLIANT under this rule
    result=$(aws configservice get-compliance-details-by-config-rule \
      --config-rule-name "$rule_name" \
      --compliance-types NON_COMPLIANT \
      --query "EvaluationResults[0].ComplianceType" \
      --output text 2>&1)
  fi

  echo "    [$rule_name] Result: ${result:-EMPTY}"
  echo "$result"
}

# --- Poll loop ---
while [[ $ELAPSED -lt $TIMEOUT ]]; do
  ALL_COMPLIANT=true

  for rule in "${RULE_ARRAY[@]}"; do
    status=$(check_rule "$rule")

    if [[ "$status" == "NON_COMPLIANT" ]]; then
      echo "FAIL: Rule '$rule' returned NON_COMPLIANT for resource '${RESOURCE_ID}'"
      exit 1  # Hard fail — do not continue polling
    fi

    if [[ "$status" == "INSUFFICIENT_DATA" || -z "$status" ]]; then
      # INSUFFICIENT_DATA means evaluation hasn't fired yet — keep waiting
      echo "INFO: Rule '$rule' not yet evaluated (INSUFFICIENT_DATA). Waiting..."
      ALL_COMPLIANT=false
    fi
  done

  if [[ "$ALL_COMPLIANT" == "true" ]]; then
    echo "==> All Config rules COMPLIANT. Gate passed."
    exit 0
  fi

  sleep "$POLL_INTERVAL"
  ELAPSED=$((ELAPSED + POLL_INTERVAL))
  echo "    Elapsed: ${ELAPSED}s / ${TIMEOUT}s"
done

# Timeout reached without a definitive compliance result
echo "ERROR: Config gate timed out after ${TIMEOUT}s. Treating as gate failure."
exit 2

Now wire the script into your GitHub Actions workflow as a step that runs directly after terraform apply. The workflow below uses OIDC authentication — if you’re still using static IAM access keys in CI, that’s a separate problem worth fixing. We covered the OIDC migration pattern in detail over at kuryzhev.cloud.

The IAM role assumed by the pipeline needs exactly three Config permissions: config:GetComplianceDetailsByConfigRule, config:DescribeComplianceByConfigRule, and config:StartResourceEvaluation. Nothing else. Do not attach AdministratorAccess to your CI role — if the pipeline is compromised, write access to Config (config:PutConfigRule, config:DeleteConfigRule) becomes a privilege escalation vector. Least privilege is not optional here.

# .github/workflows/deploy-infra.yml
# GitHub Actions workflow: Terraform deploy + AWS Config compliance gate
# Requires: AWS_ROLE_ARN secret with least-privilege Config read permissions

name: Deploy Infrastructure

on:
  push:
    branches: [main]
    paths: ['infra/**']

permissions:
  id-token: write   # Required for OIDC token exchange with AWS
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest
    env:
      AWS_REGION: us-east-1
      TF_WORKSPACE: production

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.7.5"

      - name: Terraform Apply
        id: tf_apply
        working-directory: infra/
        run: |
          terraform init
          terraform apply -auto-approve -var="env=production"
          # Capture the bucket name output for scoped Config evaluation
          echo "BUCKET_ID=$(terraform output -raw s3_bucket_id)" >> $GITHUB_ENV

      - name: AWS Config Compliance Gate
        # This step runs AFTER apply and blocks the pipeline on NON_COMPLIANT
        env:
          CONFIG_RULES: "S3_BUCKET_PUBLIC_READ_PROHIBITED,S3_BUCKET_SSL_REQUESTS_ONLY,ENCRYPTED_VOLUMES"
          RESOURCE_ID: ${{ env.BUCKET_ID }}
          CONFIG_GATE_TIMEOUT: "300"
        run: |
          chmod +x .github/scripts/config-gate.sh
          .github/scripts/config-gate.sh
        # continue-on-error defaults to false — pipeline halts on exit 1 or exit 2

      - name: Notify on Gate Failure
        if: failure()
        run: |
          echo "Config gate failed. Review NON_COMPLIANT resources:"
          echo "https://console.aws.amazon.com/config/home#/rules"

Watch out for this second gotcha: make sure your Config rules are set to TRIGGERED evaluation frequency, not PERIODIC. Periodic rules run on a schedule — every 1, 3, 6, 12, or 24 hours. A periodic rule will never fire within your polling window. Only triggered rules respond to resource change events in near-real-time. Check your rule configuration in the AWS Console or via the AWS Config evaluation modes documentation before you trust any rule as a CI gate.

You’ll also want AWS CLI version 2.13.0 or later for the --query filter syntax used in the polling script. Earlier versions handle the JMESPath filtering inconsistently and you’ll get unexpected empty results that the script interprets as INSUFFICIENT_DATA.

Prevention Checklist

The script above solves the immediate problem. These checklist items make the solution durable across your organization — so it doesn’t quietly break six months from now when someone changes a rule configuration.

Tag Config rules with ci-gate: true. The polling script should only target rules that are explicitly designated as CI gates. Not every Config rule is appropriate for blocking deployments — some rules are informational, some have known exceptions, some apply to resource types your pipeline doesn’t touch. Scoping by tag prevents false positives from unrelated rules firing during the same evaluation window. Store the rule list in SSM Parameter Store at /cicd/config-gate/rules so it can be updated without a pipeline code change.

Store the timeout in SSM Parameter Store. The default 300-second timeout works for most environments, but accounts with high Config evaluation load may need longer windows. Put the timeout at /cicd/config-gate/timeout and pull it in the workflow rather than hardcoding it. This also gives your security team a single place to audit and adjust gate parameters.

Verify your Config recorder covers all relevant resource types. Run aws configservice describe-configuration-recorders and confirm allSupported: true is set, or that your explicit resource type list includes every type your pipelines deploy. A recorder scoped to exclude AWS::S3::Bucket will produce INSUFFICIENT_DATA forever and your gate will always time out.

Explore proactive evaluation for shift-left coverage. The StartResourceEvaluation API — part of AWS Config’s proactive evaluation mode — lets you submit a resource configuration payload before deployment and get a compliance verdict synchronously. This is genuinely the right long-term direction: catch violations before terraform apply runs, not after. The Terraform AWS provider version 5.0 and later supports the evaluation_mode block on aws_config_config_rule resources for enabling proactive mode. See the AWS Config proactive evaluation documentation for the API shape.

Audit CI role permissions quarterly. The role that runs your Config gate needs read-only access: config:GetComplianceDetailsByConfigRule and config:DescribeComplianceByConfigRule. Review attached policies every quarter. IAM role scope creep is real — someone adds a permission “temporarily” and it stays forever.

Set up a separate alerting path for exit code 2 (timeout). Timeout failures are infrastructure problems, not compliance violations. Route them to a different Slack channel or PagerDuty policy so your security team isn’t chasing phantom violations when the real issue is a misconfigured recorder or an AWS API throttle event.

The AWS Config CI pipeline gate pattern won’t catch everything — no single control does. But closing the 2-to-10-minute window between a Terraform apply and a compliance evaluation is one of the highest-leverage changes you can make to your infrastructure security posture. We’ve been running this in production for several months now, and it’s caught three real violations before they had time to matter.

Related

Leave a Reply

Your email address will not be published. Required fields are marked *

Support us · 💳 Monobank