At BSides Atlanta today I gave a talk on how to handle an incident in AWS. The talk and this post is intended to help those already familiar with the principles of Incident Response to understand what to do when the incident involves the AWS Control Plane. You can find the Slides here.
Because everything needs credentials. Creds are flying around everyone’s cloud applications. Long-term creds, short-term creds, it’s all a bunch of strings. And the strings are easy to find if you know where to look. The EC2 Instance Metadata Service hands them out on 169.254.169.254. You can find them in the environment variables of a Lambda Function. Fargate containers have an environment variable that tells you what internal URL to curl.
All these credentials could give access to view or modify your infrastructure. To properly respond to an incident, you need to know what the attacker changed. Luckily the API-driven nature of AWS helps you there.
The most important AWS service to have for incident response is AWS CloudTrail. CloudTrail records every authenticated API call to your AWS account. Every CloudTrail event includes the principal who performed the API call, the source IP address where the API call came from, when the API call was made, which Action was performed against which Service, and what resources were impacted by the call.

CloudTrail and IAM are linked, and understanding one helps with the other. Nearly Every API call to AWS is authenticated, and all API calls must be authorized. This is done via IAM. While IAM policy creation can be very complex, the core of a policy statement is the specific actions that can be performed. These are broken down by service and API action these then appear in CloudTrail as the eventSource and eventName.
| IAM Action | eventSource | eventName |
|---|---|---|
| ec2:DescribeInstances | ec2.amazonaws.com | DescribeInstances |
| iam:CreateUser | iam.amazonaws.com | CreateUser |
| s3:DeleteBucket | s3.amazonaws.com | DeleteBucket |
| cloudtrail:StopLogging | cloudtrail.amazonaws.com | StopLogging |
Not all authenticated events are logged in CloudTrail. Only the management events. If you want to track access to specific Data sources, like S3, DynamoDB, or Lamba Functions, you need to enable Data Events for CLoudTrail. Unfortunately, this is not free. DataEvents cost 10c per 100,000 events. This can add up, but the alternative is having to explain to legal counsel you have no idea if the data was exfiltrated.
The next Service you want to be enabled is GuardDuty. Amazon GuardDuty is AWS’s managed threat detection service It leverages three primary data feeds:
(Diagram courtesy of AWS - taken from my 2021 ChalkTalk on Adversarial Emulatrm ion)
I’m a strong proponent of having a dedicated Security Account, and creating a dedicated account for security tooling is one of AWS’s governance best practices. You should use this account for org-wide security tooling like GuardDuty and Macie. This account should also be trusted for a security-audit role deployed into all the AWS accounts at your organization. Ensure your SOC and IR teams can log into the Security Account (ReadOnly access) and the ability to AssumeRole to the security-audit role in all the other accounts.
If your organization is so inclined, it’s also good to have a dedicated Responder role that has all the visibility of the Security-Audit role, plus the ability to execute some or all of the containment actions documented in this post. Work with your central cloud engineering team to define what steps security can take to perform containment and what needs to be escalated to the cloud team or application owners.
Finally, don’t use this account for security workloads (like your SEIM or EDR). The more network things you put in there, the greater the chance the security account itself can be compromised.
All this data needs to be searchable. What also needs to be searchable is your on-prem and other telemetry. How do you correlate your CloudTrail sourceIPAddress with your VPN Logs or SSO provider?
Even if you don’t have a SEIM or SOAR platform - use something like Slack to bubble up alerts to your responders.
Inventory is critical to know what you have and to decorate the business context around your events. If you see anomalous behavior in an account, is that a prod account that no one should be accessing, or is it a sandbox where things get tested? Does the impacted resource have PII, PCI, or financials? Who do you need to contact about the event?
All of this is the critical business context that AWS cannot provide.
I’ll discuss this in future blog posts, but Steampipe is a valuable tool to pull account inventory information. I’ve built some scripts to pull Steampipe data into Splunk Lookup Tables.

This is how I recommend companies architect to get the core telemetry into their SEIM - specifically for me, Splunk.
Setting the security contacts for an account is often ignored, but super important. By default, AWS will send critical abuse and security notices to the root email address on the account. However, the security team is rarely on that, and AWS sends lots of other marketing spam to the root addresses.
Establish a cloudsecurity-alerts distribution list for your company and set that as the Security Contact in all your AWS Accounts. AWS now allows you to do that via the API, and I’ve written a fast-fix to help.


Ok, I have 3 billion blobs of JSON in S3, now what?

Link to CloudTrail Sightings
| Higher Fidelity Events | Common but Significant |
|---|---|
| CreateTrustAnchor | ConsoleLogin |
| CreateUser | GetFederationToken |
| CreateLoginProfile | StartSession |
| UpdateLoginProfile | GetAuthorizationToken |
| CreateAccessKey | CreateKeyPair |
| AttachUserPolicy | CreateRole |
| DeleteTrail | PutUserPolicy |
| PutEventSelectors | PutGroupPolicy |
| StopLogging | CreateGroup |
| LeaveOrganization | AttachRolePolicy |
| DeleteFlowLogs | PutRolePolicy |
| DeleteVpc | CreatePolicyVersion |
| GetPasswordData | UpdateAssumeRolePolicy |
| GetSecretValue | UpdateFunctionConfiguration |
| ModifyImageAttribute | ListSecrets |
| ModifySnapshotAttribute | |
| PutBucketPolicy | |
| PutBucketAcl |
Many of these were taken from Zack Allen’s excellent cloudtrail2sightings repo
Here are some of the detections I like to use and alert on. These lack a lot of proprietary filtering and cross-referencing, so you’ll need to tune these for your environment.
The first one is obvious: Look for any attempt to mess around with CloudTrail or GuardDuty.
index=cloudtrail
eventName=StopLogging OR DeleteTrail OR PutEventSelectors OR DeleteDetector
| iplocation sourceIPAddress
| table userIdentity.arn, sourceIPAddress,
City, Country
If someone is doing one of these, that’s probably bad. If they’re not doing it from a location where you normally operate, that’s worse.
Is iplocation required here? No, but it can help provide context or give me an idea of which event to jump on first.
Here is another one that can bring up interesting results. This query looks at all the error messages and where they came from. You might expect to see a lot from Ashburn or the AWS regions where applications are active. Fewer from Minsk or Samarkand.
index=cloudtrail errorMessage=*
| iplocation sourceIPAddress
| stats count by City, Country
| sort -City, Country
This will get a lot of hits. Developers & Engineers will make mistakes with their IAM Policies and may not notice the errors are occurring. Also, some folks may not have permission to see stuff, but if they’ve got access to the AWS Console, the Console is going to make the API call for them.
However, if you look at the long tail, you may find someone who has some access credentials and is just testing out what they do and hitting errors.
Another example of looking at the long-tail of sourceIPAddress is this one:
index=cloudtrail
eventName=GetCallerIdentity OR ListBuckets OR DescribeInstances
| iplocation sourceIPAddress
| table userIdentity.arn, sourceIPAddress,
City, Country
| sort -City, Country
The shortest AWS command is aws s3 ls, which translates to the ListBuckets eventName. The next most common recon command is aws sts get-caller-identity (the whoami of AWS). Interestingly, STS GetCallerIdentity is one of the few AWS API commands that does NOT require any authz to perform. You cannot DENY it via IAM.
DescribeInstances is another semi-common recon command.
Here we’re going to IP Lookup their location and again look at the long tail of location.
One last detection example. Here we see the creation of an IAM User. A super common way to establish persistence.
eventName="CreateUser"
| iplocation sourceIPAddress
| search Country!="United States"
| table userIdentity.arn, sourceIPAddress,
City, Country
IAM Users are somewhat common. They’re still the most common way to interact with on-prem systems or other cloud providers. However, if you see them created from strange places, that’s something to look into.
And all of these are just looking at GeoLocation data. Depending on your org and culture, you might be able to filter these queries using sourceIPAddress for even better fidelity.
… or … How your day will be ruined
You’re most likely to get notified of an incident from one of these:

Here is an abstraction of the investigation flow I use when looking at an event:

If the event relates to a resource, one that is probably acting maliciously, I want to know: who created it? Then I look at the identity that created it to see what else that identity did.
You need to recurse all the paths of Identity and sourceIPAddress until you have a full picture of what the attacker did.
you can tie various eventNames into the various MITRE TTPs
Zack Allen at DataDog has started a good repo to do that mapping.
We can start with a query on a suspect resource:
index="aws_cloudtrail" "i-086c8727e55bb6d68"
readOnly=false
| table eventName, eventSource, userIdentity.arn, sourceIPAddress
You’ll get a result like
{ [-]
awsRegion: us-east-1
eventCategory: Management
eventID: 7b4856eb-8c0c-4e66-969c
eventName: RunInstances
eventSource: ec2.amazonaws.com
eventTime: 2022-08-13T12:30:36Z
eventType: AwsApiCall
eventVersion: 1.08
managementEvent: true
readOnly: false
recipientAccountId: 755629548949
requestParameters: { [-]
blockDeviceMapping: { [+] }
disableApiStop: false
disableApiTermination: false
instanceType: t2.micro
instancesSet: { [-]
items: [ [-]
{ imageId: ami-0a25ed80a0ff1d536
} ] }
monitoring: { [+] }
subnetId: subnet-0dc76374a87f3d69c
}
responseElements: { [+] }
sourceIPAddress: 3.236.91.34
tlsDetails: { [+] }
userAgent:
aws-cli/2.7.20 Python/3.9.11 Linux/5.15.0-1015-aws exe/x86_64.ubuntu.22 prompt/off command/ec2.run-instances
userIdentity: { [-]
accessKeyId: ASIA273YH4WKQTS7ZTUB
accountId: 755629548949
arn: arn:aws:sts::755629548949:assumed-role/Developer/testing
principalId: AROA273YH4WKXW4OHPYYD:testing
sessionContext: { [+]
}
type: AssumedRole
}
}
From looking at the actual event, we know:
t2.microami-0a25ed80a0ff1d5363.236.91.34Next, you want to investigate that role with:
index="aws_cloudtrail" userIdentity.arn=arn:aws:sts::759429568549:assumed-role/Developer/*
| table eventName, eventSource, sourceIPAddress

Lateral movement can be either host-to-host as we conventionally think of it. Or it can be cloud-account to cloud-account. Additionally, there are ways to pivot vertically from Ground to Cloud and Cloud to Ground.
AWS AssumeRole calls into other AWS accounts can be an indicator of cloud-plane lateral movement
AuthorizeSecurityGroupIngress as I described at the top of this talk, could be someone using the cloud to work around network controls to facilitate lateral movement in your network.
Signs of Lateral or Vertical Movement:
Note: these actions are abbreviated using IAM Action syntax
AWS SSM is a server management service. If you see someone making SSM calls, that could be Cloud to Ground movement. Same with the EC2 Instance Connect service, which allows you to manage SSH keys on a machine. In both cases, some cloud-plane identity is manipulating things at the network level inside your instance.
Lateral Movement Query:
index=cloudtrail eventName=AssumeRole OR StartSession OR SendCommand OR SendSSHPublicKey | stats count by eventName, userIdentity.arn, sourceIPAddress

So, you’ve identified an incident. Now what? Panic?

Start with these:
If you publicly commit Access Keys to GitHub (Like I did to generate the image below), you’ll get this email, and AWS will automatically apply a Quarantine policy to the user. This policy doesn’t block all use of the user. Instead, it just prevents the user from spinning up a lot of Compute resources to mine bitcoin. Most things are still allowed. The purpose here is to protect AWS from runaway resource usage and having to give you massive credit.
This email and policy attachment is all automated. I committed a key at 15:14:39, and by 15:16:30, AWS had already locked it down. So under two minutes.

Here is the description of the policy:

It denies specific actions most likely to cause financial harm or allow an attacker to pivot to another account. That first action up there, cloudtrail:LookupEvents, is a way an attacker can recon to see if there are other roles in other accounts he or she could assume.
The Quarantine Policy isn’t enough. You want to disable the key ASAP! Note: you want to disable the key, don’t delete it. You may not know the service impact of disabling the key at the outset of the incident. You may need to re-enable it to recover from a service impact.
This is one of the powers your responder role should have.

For the temporary credentials that Lambda, EC2 Instances, and Containers use, you need to invalidate the compromised credentials. Via the AWS console, you can say, “Deny everything for all sessions created before X date-time”. Legit Apps should recover and ask for new keys. But, attackers may be able to use the same exploit again.

If you can’t identify the initial source of compromise, one option is to deny everything. They can no longer do any damage, But you can watch what they’re doing or trying to do. You’ve got two options for a deny policy, depending on impact and the general knowledge of the app.
This one will straight-up deny everything. Start here if you’re not sure and ok breaking prod.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": ["*"],
"Resource": ["*"]
}
]
}
This one is more nuanced and will deny all actions except what’s the array of THINGS YOU NEED.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"NotAction": ["THINGS YOU NEED"],
"Resource": ["*"]
}
]
}
You may need to talk to your developers to get a list of the things you need.
A twist on the previous containment option is to limit the use of the credentials to a handful of known and trusted IP addresses. This can reduce the service impact while also limiting your attacker’s ability to use the credentials.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": ["*"],
"Resource": ["*"],
"Condition": {
"Bool": {"aws:ViaAWSService": "false"},
"NotIpAddress": {
"aws:SourceIp": ["192.0.2.0/24","203.0.113.19/32"]
}
}
}
]
}
One key thing you’ll need to do as part of containment is to figure out how they got in in the first place. If it was leaked keys to GitHub, that’s easy. If attacker exfiltrated the credentials from an EC2 instance or container due to an application issue, that will be a bit harder to find. With EC2 Instance Roles, the Instance ID is part of the Role Session Name, which may help you.
But if you can’t find the initial vector, you’ll play wack-a-mole with containment, or you’ll have to be more impactful with the containment policies you apply. This is one reason that best-practice is to have one role per resource or app (the other is to maintain least-privilege).
What we’ve looked at so far is great to find out if your attacker modified anything, created cryptominers or bots, etc.
But how can you tell if they got your data? You may see a ListBuckets eventName, but that is not proof of any data exfiltration.
I mentioned the Data Events when talking about CloudTrail. They’re really voluminous, so I don’t feed them into a SEIM. Instead, I keep them in S3 because I only ever need them for occasional cost optimization research or in the event of an incident.
To query CloudTrail events in S3, you can use Athena. I’ll point you at the AWS Documentation for setting this up, they have an excellent explanation.
Here I’m pulling a report of all S3 Access in an account. I can use the WHERE clause if I need to limit by a specific IP Address or Identity.
I hope this helps with some of the things you may experience during an incident in AWS. One last thing to note, like your middle school math teacher, I made you do all this work, before teaching the short-cut method. AWS has a team that’s dedicated to helping support customers in the middle of an incident. It’s called the Customer Incident Response Team, and most importantly, it does NOT require you to have their top-tier 5-figure Support plan. It’s open to any AWS customer regardless of size and support plan.