Segment receives billions of events from our customers daily and has grown in to dozens of AWS accounts. Expanding in to many more accounts was necessary in order to best align with our GDPR and security initiatives, but it comes at a large complexity cost. In order to continue scaling gracefully we are investing in building tooling for employees to use with many accounts, and centrally managing employee access to AWS with terraform and our identity provider.
Segment began in a single AWS account and last year finished our move to a dev, stage, prod, and “ops” accounts. For the past few months we’ve been growing at about one new AWS account every week or two, and plan to continue this expansion in to per-team and per-system accounts. Having many “micro-accounts” provides superior security isolation between systems, and reliability benefits by limiting the blast radius of AWS rate-limits.
When Segment had only a few accounts, employees would log in to the AWS “ops” account using their email, password, and 2FA token. Employees would then connect to the
ops-admin role in the dev, stage, and prod accounts using the
Segment now has a few dozen AWS accounts and plans to continue adding more! In order to organize this expansion we needed a mechanism to control our accounts, which accounts employees have access to, and each employee’s permissions in each account.
We also hate using AWS API keys when we don’t absolutely have to so we moved to a system where no employees have any AWS keys. Instead, employees only access AWS through our identity provider. Today we have zero employees with AWS keys and there is no future need for employees to have a personal AWS key. This is a massive security win!
Designing a scalable IAM architecture
Segment uses Okta as an identity provider, and consulted their integration guide for managing multiple AWS accounts, but improved it with a minor change for better employee experience. The integration guide recommends connecting the identity provider to each AWS account but this breaks AWS’ built in support for account switching and was more complicated to audit which teams had access to which roles.
Instead, employees use our identity provider to connect to our “ops” account and then use the simple token service
assume-role API to connect to each account they have access to. Using our identity provider, each team is assigned to a different role in our hub account, and each team role has access to different roles in each account. This is the classic “hub-and-spoke” architecture.
In order to make maintaining our hub-and-spoke architecture simple, we built a terraform module for creating a role in our spoke accounts, and a separate terraform module for creating a role in our hub account. Both modules simply create a role and attach a policy ARN to it, which is part of the module’s input.
The only difference between the modules are their trust relationships. The hub role module allows access from our identity provider while the spoke module only allow access from the hub account. Below is module we use for allowing access to a hub role from our Identity provider.
In order to provide each team with granular access to only the resources the teams need we create a role for each team in the hub account using our hub role terraform module. These roles mostly contain IAM policies for
sts:AssumeRole in to other accounts but it is also possible to give granular access in our hub role too.
One concrete and simple example of a granular policy is our Financial Planning and Analysis team’s role, who keeps close watch on our AWS spend. Our FP&A team only has access to billing information and information about our reserved capacity.
The FP&A team does not have access to our spoke accounts, though. One team that needs full access to much of our infrastructure and all of our accounts is our Foundation and Reliability team, who participate in our on-call rotation. We provide both a
ReadOnly role, and an
Administrator role to our foundation team in all of our accounts.
After per-team roles are created for each team in the hub account, employees are assigned to groups that represent their teams in Okta, and each team can then be assigned to their associated role in the hub account.
Okta allows each group to be assigned different IAM roles in the hub account, and using their UI we can assign the FP&A team to our “Amazon Web Services” app, and restrict their access to the fpa role that we created for them in the hub account.
After building this, we needed the tooling to provide our employees with an amazing engineering experience. Even though this system is far more secure, we wanted it to be just as usable and efficient as our setup with only a handful of AWS accounts.
Maintaining usability with aws-okta
One great thing about our old IAM setup was each employee with AWS access could use AWS APIs from their local computer using
aws-vault. Each employee had their IAM user credentials securely stored in their laptop’s keychain. However, accessing AWS entirely through Okta is a massive breaking change for our old workflows.
Our tooling team took up the challenge and created a (near) drop in replacement for aws-vault which our engineering team used extensively, called
aws-okta. aws-okta is now open-source and available on github.
The quality of
aws-okta is the principal reason that Segment engineers were able to smoothly have their AWS credentials revoked. Employees are able to execute commands using the permissions and roles they are granted, exactly like they did when using aws-vault.
There is a lot of new complexity handled with aws-okta that is is not able to be handled in aws-vault. While aws-vault uses IAM user credentials to run commands, aws-okta uses your Okta password (stored in your keychain) to authenticate with Okta, waits for a response to a push notification for 2FA, and finally provides AWS with a SAML assertion to retrieve temporary credentials.
In order to authenticate with Okta, aws-okta needs to know your Okta “application id”. We took the liberty of extending the
~/.aws/config ini file to add in the necessary id.
When Segment had only a few AWS accounts and the
ops-admin role, Segment engineers all shared the same
~/.aws/config. Once each team had access to different accounts and systems, we needed a better system to manage each team’s
~/.aws/config. Our system also needed a way to update the access that employees had quickly, when new accounts and roles are created.
We decided to integrate this solution closely with prior art that Segment had built. Each team’s config is stored in a git repo that has our company dotfiles in it. Each team can initialize their aws config by using our internal tool called
robo, which is a tool to share helpful commands between employees.
This was only possible to add because all Segment engineers already had an environment variable called called
SEGMENT_TEAM, which denotes the team the engineer is a part of. Running
robo aws.config will clone the dotfiles repo, save the old
~/.aws/config, and initialize the most recent config for their team.
AWS bookmarks were the primary way that engineers navigated our environment when we utilized fewer accounts. When we got rid of the
ops-admin role, the engineers sign-in bookmarks stopped working. Additionally, AWS bookmarks only support up to five different AssumeRole targets and we now have many more than five accounts.
In order to support having many more accounts, we mostly abandoned bookmarks and instead ensured that
aws-okta supports engineers who needed to switch AWS accounts often. Our previous use of
aws-vault meant many of us were familiar with the
aws-vault login command. We found that adding a login command to aws-okta helped engineers who switched accounts often.
After responding to the Duo push notification aws-okta will open a browser and log in to the specified role in only a couple of seconds. This feature is supported by the AWS Custom Federated Login feature, but feels more like magic when using it. It makes logging in a breeze.
Beyond 100 accounts
We expect to be near 50 AWS accounts by the end of this year. The security of having an account be completely closed by default, and the reliability benefits of having isolated per-account rate-limits are compelling.
This system we have built is plenty robust and usable enough to scale our AWS usage to many hundreds of AWS accounts and many more engineering teams.
Deleting all employee AWS keys was extremely satisfying from a security perspective, and this alone is a compelling enough reason to integrate your identity provider with your AWS hub account.