AWS From Scratch with Terraform - Setting up your Root Account for IaC with Terraform Cloud and Github actions.
Following this article will get you setup with an AWS Root account that can be managed through through Terraform Cloud with OIDC and github actions. As a best practice you should not keep long-lived access keys in your CI/CD pipelines when deploying to AWS, instead you should use OIDC (OpenID Connect) to securely deploy to AWS when using Terraform Cloud or Github Actions.
TL;DR
Download all the terraform from the blog post here:
- https://github.com/sontek/aws-apply-before-merge
- https://github.com/sontek/aws-terraform-github-actions
How does OIDC work
OIDC enables us to request a short-lived access token directly from AWS. We just have to create trust relationship that controls which workflows are able to request the access tokens.
- No need to duplicate AWS credentials as long-lived GitHub secrets.
- Since we are using a short-lived access token that is only valid for a single job there is no reason to worry about rotating secrets.
The following diagram gives an overview of how we can use Terraform Cloud's OIDC provider to integrate with AWS:
- In AWS, create an OIDC trust between a role and our terraform cloud workflow(s) that need access to the cloud.
- Every time a job runs, TFC's OIDC Provider auto-generates an OIDC token. This token contains multiple claims to establish a security-hardened and verifiable identity about the specific workflow that is trying to authenticate.
- Request this token from TFC's OIDC provider, and present it to AWS
- Once AWS successfully validates the claims presented in the token, it then provides a short-lived cloud access token that is available only for the duration of the job.
What does this post accomplish
- Setup a root AWS account that is managed througuh terraform
- Setup OIDC authentication with Terraform Cloud so it can talk to AWS
- Setup Github Actions authentication with Terraform Cloud so we can run plan and apply through the CI/CD pipeline.
Setup AWS Access
It is very bad practice to use the root account for much of anything but for
bootstrapping the account it is necessary, so the first step is to get your
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
To do this click your account and choose Security Credentials
in the top
right:
Then choose Create Access key
:
You need to set these environment variables in your shell so that your local shell has access to AWS. After you set them you can verify you set them correct by running:
❯ aws sts get-caller-identity
and you should get a result similar to:
{
"UserId": "777777777777",
"Account": "888888888888",
"Arn": "arn:aws:iam::888888888888:root"
}
Bootstrap
Before you can manage any of your accounts through Terraform Cloud you'll need bootstrap some core infrastructure like OIDC so Terraform Cloud can authenticate securely and manage AWS Resources on your behalf.
I personally prefer doing this in two repositories:
-
infra-bootstrap
: This repository does the bare minimum to hook up terraform cloud with your AWS account and stores the state in git. Its the only infra that will not be controlled by your CI/CD pipeline.ccccccug -
infra
: The actual repository where all the rest of your AWS resources are managed. It will store state in Terraform Cloud and you can introduce a CI/CD pipeline for approving changes.Note: This repository will be generated with the terraform code.
After manually creating the git repository infra-boostrap
in your Github
account We will need 3 providers to bootstrap the account aws
, github
, and
tfe
.
Variables
Create a 1-variables.tf
where we can define the variables we'll need
for creating these resources.
variable "tfc_aws_audience" {
type = string
default = "aws.workload.identity"
description = "The audience value to use in run identity tokens"
}
variable "tfc_hostname" {
type = string
default = "app.terraform.io"
description = "The hostname of the TFC or TFE instance you'd like to use with AWS"
}
variable "tfc_project_name" {
type = string
default = "Default Project"
description = "The project under which a workspace will be created"
}
variable "tfc_organization_name" {
type = string
description = "The name of your Terraform Cloud organization"
}
variable "tfc_organization_owner" {
type = string
description = "The owner of the TFC organization"
}
variable "tfc_workspaces" {
type = list(string)
description = "The list of TFC workspaces"
}
variable "github_organization" {
description = "The organization the repositories are owned by"
type = string
}
variable "github_repo_name" {
description = "The name of the git reppository we'll create for managing infra"
type = string
}
variable "github_default_branch" {
description = "The default branch to utilize"
type = string
default = "main"
}
variable "aws_root_account_id" {
description = "The AWS root account we want to apply these changes to"
type = string
}
We will use these variables in the later modules but they are mostly metadata around the terraform and github accounts you'll need to setup manually.
Providers
Create a file called 2-providers.tf
and define the providers:
terraform {
required_providers {
tfe = {
source = "hashicorp/tfe"
version = "0.41.0"
}
aws = {
source = "hashicorp/aws"
version = "4.58.0"
}
github = {
source = "integrations/github"
version = "5.18.3"
}
}
}
provider "aws" {
region = "us-east-1"
# Root account, all other accounts should be provisioned
# via pull requests
allowed_account_ids = [var.aws_root_account_id]
}
provider "github" {
owner = var.github_organization
}
The key things there are we define allowed_account_ids
to prevent us from
working against any account that isn't the root and we are using one of the
variables we defines earlier.
Github
We will utilize terraform
to create the second git repository where the rest
of the infrastructure will go. Create a file called 3-github.tf
:
resource "github_repository" "repo" {
name = var.github_repo_name
description = "Infrastructure Repository"
visibility = "private"
auto_init = true
has_issues = true
}
resource "github_branch_default" "default" {
repository = github_repository.repo.name
branch = var.github_default_branch
}
data "tfe_team" "owners" {
name = "owners"
organization = tfe_organization.organization.name
}
resource "tfe_team_token" "github_actions_token" {
team_id = data.tfe_team.owners.id
}
resource "github_actions_secret" "tfe_secret" {
repository = github_repository.repo.name
secret_name = "TFE_TOKEN"
plaintext_value = tfe_team_token.github_actions_token.token
}
output "repository_id" {
value = github_repository.repo.id
}
This will generate a new repository in your account called infra
.
For the terraform provider to have access to github you need to create a new
personal access token with full repo
access and set it as an environment
variable named GITHUB_TOKEN
.
Terraform Cloud
Now we need to setup dynamic credentials so the terraform cloud agent is
allowed to take actions on your behalf. To do this we'll setup an IAM
role and an OIDC provider. Create a file called 4-tfc.tf
:
resource "tfe_organization" "organization" {
name = var.tfc_organization_name
email = var.tfc_organization_owner
}
/* AWS will use this TLS certificate to verify that requests for dynamic
credentials come from Terraform Cloud.*/
data "tls_certificate" "tfc_certificate" {
url = "https://${var.tfc_hostname}"
}
/* sets up an OIDC provider in AWS with Terraform Cloud's TLS certificate,
the SHA1 fingerprint from the TLS certificate
*/
resource "aws_iam_openid_connect_provider" "tfc_provider" {
url = data.tls_certificate.tfc_certificate.url
client_id_list = [var.tfc_aws_audience]
thumbprint_list = [
data.tls_certificate.tfc_certificate.certificates[0].sha1_fingerprint
]
}
/* Policy to allow TFC to assume the AWS IAM role in our account */
data "aws_iam_policy_document" "assume_role" {
statement {
effect = "Allow"
principals {
type = "Federated"
identifiers = [aws_iam_openid_connect_provider.tfc_provider.arn]
}
condition {
test = "StringEquals"
variable = "${var.tfc_hostname}:aud"
values = [
"${one(aws_iam_openid_connect_provider.tfc_provider.client_id_list)}"
]
}
condition {
test = "StringLike"
variable = "${var.tfc_hostname}:sub"
values = [
for workspace in var.tfc_workspaces : "organization:${tfe_organization.organization.name}:project:${var.tfc_project_name}:workspace:${workspace}:run_phase:*"
]
}
actions = ["sts:AssumeRoleWithWebIdentity"]
}
}
resource "aws_iam_role" "tfc-agent" {
name = "tfc-agent"
assume_role_policy = data.aws_iam_policy_document.assume_role.json
}
/* Policy for what the TFC agent is allowed to do */
data "aws_iam_policy_document" "tfc-agent" {
version = "2012-10-17"
statement {
actions = ["*"]
effect = "Allow"
resources = ["*"]
}
}
resource "aws_iam_policy" "tfc-agent" {
name = "tfc-agent-access-policy"
description = "Access policy for the TFC agent"
policy = data.aws_iam_policy_document.tfc-agent.json
}
resource "aws_iam_role_policy_attachment" "tfc-access-attach" {
role = aws_iam_role.tfc-agent.name
policy_arn = aws_iam_policy.tfc-agent.arn
}
resource "tfe_workspace" "workspaces" {
count = length(var.tfc_workspaces)
name = var.tfc_workspaces[count.index]
organization = tfe_organization.organization.name
working_directory = var.tfc_workspaces[count.index]
}
/* These variables tell the agent to use dynamic credentials */
resource "tfe_variable" "tfc-auth" {
count = length(var.tfc_workspaces)
key = "TFC_AWS_PROVIDER_AUTH"
value = true
category = "env"
workspace_id = tfe_workspace.workspaces[count.index].id
description = "Enable dynamic auth on the TFC agents"
}
resource "tfe_variable" "tfc-role" {
count = length(var.tfc_workspaces)
key = "TFC_AWS_RUN_ROLE_ARN"
value = aws_iam_role.tfc-agent.arn
category = "env"
workspace_id = tfe_workspace.workspaces[count.index].id
description = "Tell TFC what Role to run as"
}
This module is dynamic because there is one piece that will require a manual oauth setup for github. So the first pass will apply without it and then later on we'll create it and run the apply again.
Applying the changes
Now we just need to define our settings for the module and we'll get our
infrastructure applied. Create a file called settings.auto.tfvars
and
populate it with the content for your account. This is an example of what
this should look like:
tfc_organization_name = "sontek"
tfc_organization_owner = "john@sontek.net"
# The workspaces you want to create and be able to manage with IaC
tfc_workspaces = [
"root"
]
# this can be your username
github_organization = "sontek"
github_repo_name = "sontek-infra"
aws_root_account_id = "888888888888"
Now run:
❯ terraform login
❯ terraform init
and you should see:
Terraform has been successfully initialized!
Now lets run our plan:
❯ terraform plan
You should see a result:
Plan: 10 to add, 0 to change, 0 to destroy.
Apply it to make those resources:
❯ terraform apply
At this point it:
- Created a terraform cloud organization
- Created a terraform cloud workspace
- Created a git repository
Verify TFC can talk to AWS
To verify that TFC can communicate with AWS through the dynamic credentials,
lets clone the NEW repository we just generated and make some dummy resources. After
you've cloned the repository lets make a folder for the workspace root
that we
defined in bootstrap:
❯ mkdir root
❯ cd root
Now create a 1-providers.tf
:
terraform {
cloud {
organization = "sontek"
workspaces {
name = "root"
}
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "4.58.0"
}
tfe = {
source = "hashicorp/tfe"
version = "0.42.0"
}
}
}
provider "aws" {
region = "us-east-1"
default_tags {
tags = {
Owner = "john@sontek.net"
Env = "Root"
Service = "BusinessOperations"
}
}
}
NOTE: You should replace organization
, workspaces.name
, and
tags.Owner
to be your own values.
Now create a small resource to prove everything is working, we'll use SQS for
this. Create a file called 2-sqs.tf
:
resource "aws_sqs_queue" "example-sqs" {
name = "example-sqs"
message_retention_seconds = 86400
receive_wait_time_seconds = 10
}
If you run the plan you should see the resource it wants to create:
❯ terraform init
❯ terraform plan
and you should see the run is executing in terraform cloud:
Running plan in Terraform Cloud. Output will stream here. Pressing Ctrl-C
will stop streaming the logs, but will not stop the plan running remotely.
You can click the link it provides to see the logs. Now lets apply this resource to see it all working:
❯ terraform apply
You should get a response like:
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
So Terraform Cloud has full access to create AWS resources! The final step
is to get github running the plan/apply on pull requests. Commit these files
to your repository and we'll remove them in a pull request. Create a
.gitignore
file in the root:
.terraform*
and commit all the files:
❯ git add *
❯ git commit -m "initial infra"
❯ git push origin head
Github Actions
The two most popular workflows when using terraform are:
-
Apply after Merge: This is the default for things like terraform cloud and most github actions.
-
Apply before Merge: This is the default for things like Atlantis.
I don't like apply-after-merge. There are a lot of ways where a plan
can succeed but an apply
will fail and you end up with broken configuration
in main
.
So in this article I'll show you how to implement apply-before-merge with github actions.
All of these changes will be in the infra
repository that was generated from
bootstrap
. We are done with the bootstrap at this point.
First, lets setup the .github
folder, the end result we want is:
.github/
└── workflows
├── on-apply-finished.yml
├── on-pull-request-labeled.yml
└── on-pull-request.yml
So create the folders:
❯ mkdir -p .github/workflows
On Pull Request
The first flow we'll create is the terraform plan
workflow which should be
ran whenever a pull request is opened. Create the file
.github/workflows/on-pull-request.yml
and put this content in it:
name: pr_build
on:
pull_request:
branches:
- main
env:
TERRAFORM_CLOUD_TOKENS: app.terraform.io=${{ secrets.TFE_TOKEN }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
jobs:
terraform_validate:
runs-on: ubuntu-22.04
strategy:
fail-fast: false
matrix:
folder:
- root
steps:
- name: Checkout
uses: actions/checkout@v3
- name: terraform validate
uses: dflook/terraform-validate@v1
with:
path: ${{ matrix.folder }}
workspace: ${{ matrix.folder }}
terraform_fmt:
runs-on: ubuntu-22.04
strategy:
fail-fast: false
matrix:
folder:
- root
steps:
- uses: actions/checkout@v3
- name: terraform fmt
uses: dflook/terraform-fmt-check@v1
with:
path: ${{ matrix.folder }}
workspace: ${{ matrix.folder }}
terraform_plan:
runs-on: ubuntu-22.04
permissions:
contents: read
pull-requests: write
strategy:
fail-fast: false
matrix:
folder:
- root
steps:
- uses: actions/checkout@v3
- name: terraform plan
uses: dflook/terraform-plan@v1
with:
path: ${{ matrix.folder }}
workspace: ${{ matrix.folder }}
This creates three jobs:
- terraform_validate: This validates the terraform via
terraform validate
command to make sure that it is correct and doesn't have duplicate resources or anything like that. - terraform_fmt: This verifies that the terraform is well formatted by
running the
terraform fmt
command.` - terraform_plan: This runs the
terraform
plan and comments on the PR a diff of the changes for you to verify.
To verify this is working, lets delete root/2-sqs.tf
, then lets push a branch
and make a pull request to see the result so far:
❯ rm root/2-sqs.tf
❯ git add .github/ root/
❯ git checkout -b apply-before-merge
❯ git commit -m "Implemented on-pull-request"
❯ git push origin head
After you make the pull request you should 3 checks on it and a comment that shows the plan:
Apply on Label
So now that the plan is working we need some way to apply
the changes. I've
found the best way to do this is via a label rather than a comment because of
the way github actions work. Their event based actions like on-comment
aren't
executed in the context of a pull-request.
Since we will be using a label to signal a plan is ready to be applied lets
create a new file .github/workflows/on-pull-request-labeled.yml
and provide
this content:
name: pr_apply
on:
pull_request:
types: [ labeled ]
env:
TERRAFORM_CLOUD_TOKENS: app.terraform.io=${{ secrets.TFE_TOKEN }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
jobs:
terraform_apply:
if: ${{ github.event.label.name == 'tfc-apply' }}
runs-on: ubuntu-22.04
permissions:
contents: read
pull-requests: write
strategy:
fail-fast: false
matrix:
folder:
- root
steps:
- uses: actions/checkout@v3
- uses: dflook/terraform-apply@v1
with:
path: ${{ matrix.folder }}
workspace: ${{ matrix.folder }}
This will fire whenever a pull request is labeled with the tfc-apply
label.
You will need to create this label for the repository.
It will run the apply
and update the previous plan comment to let you
know the status.
Merge on Apply
One thing you'll notice is that the pull request stayed open even after the
infrastructure is applied and we don't want that. We want any changes that have
made it into the environment to be merged into main
automatically. To do
this we'll create our final action.
Create a new file .github/workflows/on-apply-finished.yml
with this content:
name: pr_merge
# Only trigger, when the build workflow succeeded
on:
workflow_run:
workflows: [pr_apply]
types:
- completed
jobs:
merge:
if: ${{ github.event.workflow_run.conclusion == 'success' }}
runs-on: ubuntu-22.04
permissions:
contents: write
pull-requests: write
checks: read
statuses: read
actions: read
outputs:
pullRequestNumber: ${{ steps.workflow-run-info.outputs.pullRequestNumber }}
steps:
- name: "Get information about the current run"
uses: potiuk/get-workflow-origin@v1_5
id: workflow-run-info
with:
token: ${{ secrets.GITHUB_TOKEN }}
sourceRunId: ${{ github.event.workflow_run.id }}
- name: merge a pull request after terraform apply
uses: sudo-bot/action-pull-request-merge@v1.2.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
number: ${{ steps.workflow-run-info.outputs.pullRequestNumber }}
This will wait until the pr_apply
job completes and as long as it was
successful it'll merge the branch!
NOTE: As I mentioned earlier, the event based actions do not run in the
context of the pull request which means you cannot test changes to them during
the PR either. You must merge the on-apply-finished.yml
file to main
before it starts working.
Branch Protection
The final step to the process is to make sure you go to your github settings and make sure these status checks are required before merging. Branch protection is a feature that will prevent merging changes into a branch unless all required checks are passing.
Go to Settings
-> Branches
-> Branch Protection
and add a branch
protection rule:
You want to enable the following settings:
- Branch Name: main
- ✅ Require a pull request before merging
- ✅ Require status checks to pass before merging
Then for Status checks that are required.
select all of the ones we've
created: