The blog post buries the lead a little bit because it's talking about lots of pain points with the ec2 API and IAM. The important point to take away is that any process with network access running on your instance can contact the EC2 metadata service at http://169.254.169.254 and get the instance-specific IAM credentials.
Think about things like services that accept user submitted URLs, crawl them, and display results...
This is actually a vulnerability I've seen countless times. If a site accepts a URL which it reads and returns to the user, submit the 169.254.169.254 metadata service. About 1 out of 5 times I've tried it, I'm about to get a response.
Fun fact: you can ignore EC2 instance roles and use the Amazon Cognito service for processes to obtain role-based short-term credentials.
I've described it previously as "Kerberos for the AWS Cloud" (which will make any self-respecting crypto nerd squirm) but hopefully it conveys the general idea. Yes it was designed for mobile & browser use, and yes the API isn't pretty, but it's there.
Those are the only two endpoints I know of. I encouraged them to require a special header for access to this API before releasing to the public but it looks like it was not included. A special header would help prevent your app from being able to access this from a user specified URL.
thank you so much for posting these extra details. My product is hosted on Google Cloud and allows users to arbitrarily craft http requests (including headers!)
I try to limit the possibility of abuse (restricting protocol to http, https, //, data, or ftp) but didnt know about this metadata issue.
I updated my product to account for this issue too.
it looks like google's metadata doesn't leak any important secrets (unless I had custom metadata, which i do not) but better safe than sorry!
It's borderline a privilege escalation vulnerability. The default behavior is that an unprivileged user on the system with network access can get access to an instance's credentials that can be used to perform administrative functions.
Like Colin pointed out in the blog post, this completely subverts the permissions model in modern operating systems.
No, it is not a vulnerability. Misunderstanding of this functionality can result in developer introducing a vulnerability, but this is a well described, well defined feature.
Calling this a vulnerability is akin to calling the existence of `rm` a vulnerability because it can delete files.
If your service allows arbitrary url queries that a user can trigger then you should make sure that you only allow queries to publicly routable ip ranges anyway.
169.254.0.0/16 is link-local range which you should be flitering along with publicly routable ip ranges that might be very upset if you access them like .mil reserved ip ranges. Go as far to also only allow DNS names instead of arbitrary ip, keeping in mind dns names may resolve to non publicly routable ranges or ranges you may not wish to access. These are all standard dangers of making queries on a user's behalf.
....and that you can imitate the metadata service to make life easier :) A plug for my friends side project: https://github.com/otm/limes . It's a local metadataservice. Very handy for making aws libs work without having to config them much. And great support for MFAs.
What I've done for a previous company was to, as one of the very first things done within every EC2 instance, add an iptables owner match rule to only allow packets destined to 169.254.169.254 if they come from uid 0. Any information from that webservice that non-root users might need (for instance, the EC2 instance ID) is fetched on boot by a script running as root and left somewhere in the filesystem.
This won't help with IAM roles, since the credentials provided in the metadata expire. Of course, a small tweak to the iptables entry would help there as well.
Mind posting your entry for us iptables impared folks?
Most of it is - your instance ID, network position, etc, will never change after boot (or 'start' if you stop it) - so caching it is just fine. There's very little that will change on a running instance except of course, the IAM credentials referred to in this article (as they expire within 90 minutes IIRC).
The irony is that the attempt to make the instance creds secure by rotation actually prevent protecting them in this fashion. A local file, readable only by root, w/ embedded keys is actually far more secure than the current implementation.
It's easy to implement that yourself. Generate a pair of IAM keys and drop it on the filesystem or bake it in the ami or application or whatever.
What they have reduces the duration of a vulnerability so that if you know someone had access to your machine at some point in time, you cna figure out from there how long their keys would have lasted and scope down the timeframe to start digging with in cloudtrail
Hopefully the operators using EC2 instance profiles understand and weigh the risks of using that feature. It's good to be cautious, but the feature is only dangerous if you don't take the time to understand it. Running a server on the Internet at all is "dangerous" in the same sense. And for this particular risk, it turns out there's a simple fix.
He _is_ right in his first criticism that the IAM access controls available for much of the AWS API are entirely inadequate. In the case of EC2 in particular, it's all or nothing--either your credentials can call the TerminateInstances API or they can't. I'm sure Amazon is working on improving things, but for now it's pretty terrible. But in practice it just means you have to take care in different ways than you would if his tag-based authz solution were implemented.
That said, while it's certainly frustrating to an implementor, it's not "dangerous" that limitations exist in these APIs. We're talking about decade-old APIs from the earliest days of AWS, and while things have been added, the core APIs are still the same. That's an amazing success story. But like any piece of software, there are issues that experienced users learn how to work around.
You can bet that the EC2 API code is hard and scary to deal with for its maintainers. Adding a huge new permissions scheme is likely nearly impossible without a total rewrite... I don't envy them their task.
It's impossible to limit access to any part of the instance metadata in any way w/o firewalling (which has its own issues) or even to expire access to any part of it. Since instance profiles have keys (even though automatically rotated), any process on the system, owned by any user, can access anything exposed via the instance role. This makes embedding IAM keys into your instance and protecting it by root-only or ACL's MUCH MUCH safer... but AWS specifically states that instance profiles are preferred. In fact, for our Userify AWS instances (ssh key management), we are required to use instance roles and not allowed to offer the option. (This is why we do not offer S3 bucket storage on our AWS instances but we do on Pro and Enterprise self-hosted.)
The biggest issue with the IAM instance profiles is that they trade security for convenience.. and it's not a good trade.
For the most part EC2 instances should be single-purpose. Use tiny instances that do one job. Your IAM role describes the permissions that should be granted to that one job. It's absolutely true that you cannot isolate permissions at the process level, but by using single-job-type instances, you can easily isolate permissions on a per-job (in this model, per-instance) basis.
What? Why should EC2 instances be single-purpose? Amazon offers a wide variety of massive instance sizes with 160+gb of RAM and 30+ cores. It's extremely common to run software like mesos, kubernetes, docker, etc on these. Dedicating an instance per app is extremely cost-ineffective.
EC2 instances should be single-purpose (Or, if you want to mux containers onto the instance and retain per-container/job IAM role isolation, use ECS) if you're developing for AWS as a platform. I'm a huge fan of k8s, and have respect for Mesos, but these are largely alternatives to the model provided by EC2/ECS/IAM.
In a perfect world, any service would cleanly interoperate with any other service. Unfortunately we don't live in a perfect world. If you want to take advantage of `advanced` features in a given platform, you have to understand the drawbacks and limitations of those features, and what it means when they aren't available on another platform.
To me, the greatest tragedy in the way EC2 operates is that it looks/tastes/smells like a `server`, but it's far more akin to a process.
Well.. an EC2 instance running Linux is not a process or even a container, even if functionally it's easier to treat it like one.
It is a full virtual server with its own Linux kernel and operating system: it has to be updated, secured, and maintained just like any other Linux server. Most Linux distributions on an EC2 instance have dozens of processes already running out of the box.
I understand your point -- that ideally a single instance can be treated as a single functional point from the point of view of the application, and I agree, but not from a point of view of security. As you know, in any larger environment, there are likely many additional support applications running on that server: things like app server monitoring, file integrity, logging, management, security checks, remote data access or local databases, etc. Those must not all be treated with the same levels of security and access. (i.e., why would rsyslog or systemd need access to all objects in our S3 bucket or be able to delete instances or any of the other rights that might legitimately be granted to an instance via an IAM instance role?)
To treat security for all of these processes as if they're all part of the same app tosses out decades of operating system development and security principles and places your single function app, as well as that of your entire environment, at grave risk. I.e., there's a reason why a typical Linux distribution has about 50 accounts right out of the box and everything doesn't just run as root.
If you are developing or deploying microservices or containers and don't want to be burdened by the security requirements, then there are alternatives at AWS like ECS and Lambda that you should seriously consider.
Amusing examples, systemd/rsyslog, as both at least briefly execute as root, with rsyslog being relied upon to willingly drop its own privs (Not to mention being slowly replaced by systemd-journald, which runs as root), and systemd always running as root (ya know, since it's init, and all).
It really sounds like we have vastly different ideas about what kinds of processes belong in an EC2 instance, as well as the ideal life-cycle of an EC2 instance. I tend to adopt a strategy of relatively short-lived EC2 instances that get killed and replaced frequently. Persistence that depends on a single instance surviving is avoided at all costs, in favor of persistence distributed across a number of instances (or punted out to Dynamo/S3/RDS).
You're absolutely right that there is a reason why the typical Linux distro has 50 accounts out of the box -- it was built with traditional multi-user system security models in mind. I sure as hell appreciate it on workstations and traditional stateful hosts. That said, eschewing the traditional security model in favor of an alternative model does not make your environment inherently more or less safe -- there are going to be pros and cons to both approaches (in terms of both security and functionality).
I agree that it's important to do your research, but Amazon does us no favors here. I didn't know about this potential leakage until I needed to use the metadata system in AWS and then I realized the potential for abuse. Honestly, this should probably be an enabled option and it should be off by default.
The fact is that Amazon provides a commodity service and it's not a standard thing that most people expect to have an internal HTTP service that exposes potentially sensitive information to non-root users.
I actually disagree with the OP where he says they should use Xen Store for metadata. If I were Amazon, there is no way I would want to commit to using an option that is specific to one hypervisor technology. What if Amazon wants to switch to KVM?
If Amazon wants to switch to KVM, they're going to have so many other things they need to change that adjusting how instance metadata is exported will be the least of their problems.
> either your credentials can call the TerminateInstances API or they can't
Note that you can restrict the inputs to this API using IAM Policy in semantically meaningful ways. Three controls I'm familiar with that are useful for restricting inputs are the resource type for instances and the conditions for instance profile and resource tags [1]. The latter two are most flexible.
An instance profile restriction allows you to express a concept like, "This user may only terminate instances that are part of this specific instance profile"; in that way, the instance profile characterizes a collection of instances that can be affected by the policy. The resource tag condition can be used in a similar way. [2] is an example of a policy restricting terminations based on instance profile. The key fragment of it is:
A role with this policy condition can only affect instances that are part of the specified instance profile.
This allows you to create roles or users that have access to instances that are part of a certain instance profile only. If you wanted a group of instances to be able to manage (e.g. terminate) themselves, then the role on those instances could be access-restricted to the instance profile of those same instances. By assigning different fleets of instances different instance profiles, you can control which users or roles can access each fleet by restricting access to the fleet's Instance Profile. Similar restrictions are possible with resource tags on instances.
That said, though, I agree that there's room to improve the access control story. Managing instances through their full lifecycle sometimes involves accessing other resources like EBS volumes too, and it's not easy to construct a policy container that sandboxes access to just the right resources and actions while allowing the creation of new resources. Colin called out some of the gaps in his post. If you do not need to allow the creation of new resources then the problem is a bit easier. For example, you can avoid the need to create new EBS volumes directly by specifying EBS root volumes as part of instance creation using BlockDeviceMapping.
1) IAM instance roles have no security mechanisms to protect them from being read from any process on the instance, thus completely eliminating them from all Linux/UNIX/Windows permission systems. (The real reason for this is that instance metadata was a convenient semi-public information store for things like instance ID, but it was extended to also provide secret material, which was, at best, an idiotic move.) As the author points out, Xen already provided a great filesystem alternative that could be mounted as another drive (or network drive) to be managed with the regular OS filesystem permission system. (reading an instance ID is just a matter of reading a "file")... for some reason, AWS didn't leverage this and instead just added the secret material to its local instance metadata webserver.
2) the API calls are not fine grained enough and/or there are big holes in their coverage -- so, for instance, if you want to use some other AWS services, you can end up exposing much more than you intended.
This is interesting! Can this be abused with AWS-hosted services that reach out to fetch URLs? For example, image hosts that allow specifying an URL to retrieve, or OAuth callbacks, etc? Are there any tricks to be played if someone were to register a random domain and point it to 169.254.169.254 (or worse, flux between 169.254.169.254 and a public IP in case there is blacklisting application code that first checks to resolve the hostname but then passes the whole URL into a library that resolves again?)
Remember that even ELB's in AWS have IP's that change all the time, and this itself is actually a source of vulnerabilities from apps that don't respect DNS TTL's (as has been seen in the forums repeatedly -- apps get connected to the previous IP instead of the new one). It's probably safer to retrieve and verify the IP for each request, and just cache if the IP is 'safe'. (And just doing IP subnet calculations is non-trivial in most less-common languages.)
Also, request throttling should be maintained and HTTP verb checking, to prevent being turned into a proxy for other attacks.
Actually, any decision to accept an arbitrary URL should be carefully examined in light of how hard it is to do safely.
As long as you manually get the IP for every domain. I.e. if they ask for "blah.com" you have to get the IP, check it, then turn it into "curl -H 'Host: blah.com' http://IP". (Otherwise, it may be a race condition that allows the DNS server to resolve to a different IP address the 2nd time. See https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use )
Yes, one project I've worked on required a crawling bot, and you could crawl the metadata service...(until fixed). dont even need a domain in most cases, either the ip or instance-data DNS I bet works in a bunch of places.
We redid all our policies to be extremely restrictive in response, if the instance did anything based on user input. Anything more admin happened on a different machine.
For real-world, complicated applications, that's virtually always a game-over vulnerability no matter what cloud you're deployed on. Be extraordinarily careful with backend code that generates HTTP requests based on user inputs.
Pentesters have been using this trick to pivot from unexpected backend web proxies to (e.g.) management consoles, LOM servers, JBoss interfaces, &c for 15 years or so already.
This is an interesting attack that I must confess I hadn't thought of, but surely any service that accepts an arbitrary URL has a list of IP ranges to avoid. However, to harden a role in the event of instance role credentials leaking, you could use an IAM Condition [0].
There is actually an example of this in the IAM documentation [1], although the source VPC parameter doesn't work for all services, and I can't see a list of services that support this parameter. This would ensure that the requests actually came from instances within your VPC.
Er, the problem is not "people can hit an un-routable IP from outside your instance". The problem is that "if your instance allows an attacker to make a HTTP request, you might expose personal information". For example, web crawlers or other fetchers.
> almost as trivial for EC2 instances to expose XenStore as a filesystem to which standard UNIX permissions could be applied, providing IAM Role credentials with the full range of access control functionality which UNIX affords to files stored on disk.
Doesn't this become more complicated when you think about EC2 offering Windows instances? Even with straight UNIX file writing, what writes this? Where does it write this? Which user has read permissions?
Yeah, having the metadata available over an http interface is actually brilliant. Simple HTTP calls are easy to do from any network-capable OS or language.
That doesn't work for IAM Roles, because (a) AWS library code expects to get the keys out of the metadata store, and (b) IAM Role credentials are periodically rotated, so the keys you downloaded in advance would expire.
You're right it doesn't. IAM roles are intended to be granted to the entire server, not a subset of the server's users. Any compromise of the server would be considered a compromise of its role. Yeah, this is a bit crazy depending on what you're running on it. I was running multi-tenant IIS hosts and the apps had no business with the metadata or ec2 IAM roles in my case.
If you want roles to work for other users via the meta data store you can intercept requests with a proxy and then grab the temp credentials STS assume role. This is how kube2iam works. Depending on your use case you'd have to write the proxy, automate the mappings and firewall rules, etc etc etc. PITA but probably doable.
On a different note, I agree with just about everything you had to say about the PITA that is IAM. Properly scoping permissions is much harder than it needs to be. Not all resources support tags, and even then almost nothing outside of ec2 supports tag conditions in IAM. This leads to many naming schemas and wildcard resource conditions :( AWS should have created the concept of resource groups. This would have greatly simplified giving users permissions to subsets of an accounts resources. I chalk this up to AWS's VERY poor collaboration between service teams. Nothing that came out seemed coordinated. This appears to be getting better (shrug).
It's true that instance profile roles today supply credentials to the entire server. One benefit of virtualization is that it's reasonable to run small, single-purpose VMs. However if you do wish to restrict role credentials to certain processes, there are ways of doing it, such as using EC2 Container Service with task-level IAM Roles [1]:
> Credential Isolation: A container can only retrieve credentials for the IAM role that is defined in the task definition to which it belongs; a container never has access to credentials that are intended for another container that belongs to another task.
If you do firewall the instance metadata service and want to get credentials into individual processes, then you could do that using one of the credential providers in the AWS SDK. I haven't worked with every language SDK, but service clients in the SDK for Java take an AWSCredentialsProvider as input, and you can pick from a number of standard implementations [2] or define a custom one.
> An admin user could fetch these subsets of the metadata and leave a copy of them in the local filesystem.
So if you wanted to take this approach, an admin agent could periodically copy the role credentials as property files into the home directories of users that need them, and then applications could load them by configuring the SDK with ProfileCredentialsProvider (which can refresh credentials periodically). The admin agent could perhaps be a shell script run by cron that `curl`s from the instance metadata service and writes the output to designated files.
One benefit of virtualization is that it's reasonable to run small, single-purpose VMs.
Single-purpose doesn't mean single-user. Lots of services divide their code into "privileged" and "unprivileged" components in order to reduce the impact of a vulnerability in the code which does not require privileges. As far as I'm aware, there's no way to have an sshd process which is divided between two EC2 Containers...
Until AWS fixes this (which, as the article points out, may never happen), a chmod'ed 600 file (only readable by root) is actually much safer, even when STS auto-rotation is taken into account.
If users can issue arbitrary commands on an instance then that instance should have zero Iam roles and should delegate actions to services running on separate instances.
The instances hosting our users go a step further and null route Metadata service requests via iptables.
It isn't just about users, its also about malicious software you may accidentally install, if for example a library you use is compromised as has happened before with Ruby gems.
I just double checked, and the most similar thing we expose is the token's for each service-account in the instance metadata. As pointed out in the article, any uid on the box can read that. But, you can create instances with a zero-permission service account (the equivalent of nobody?) and just avoid it.
This does mean that everywhere else you'd have to have explicit service accounts and such, but that seems like a reasonable "workaround" until or unless we make metadata access more granular (I like the block device idea! Would you want entirely different paths for JSON versus "plain" though?)
Google Cloud does seem better here. The exception is GKE — Kubernetes nodes are associated with service accounts which have permissions that, if abused by a malicious Docker container, could be disasterous for your entire cluster.
Considering the amount of unpatched Docker containers out there, that's a bit scary. It also effectively prevents GKE from being usable in any scenario where you want to schedule containers on behalf of third-party actors (think PaaS). (GKE also doesn't let you disable privileged Docker containers, but that's another story.)
On AWS you can run a metadata proxy to prevent pods from getting the credentials, but I don't know of a clean way to accomplish the same thing on GKE.
If you're sharing the same instance for multiple users, trying to achieve security among the users is almost impossible anyway. That's why physical separation/virtualization is one of the first thing to focus on when talking about security.
Isolation is definitely important, but not all parts of the system running a single function need the same levels of access, and in fact it may be possible to target those components separately. Take a look at the wikipedia articles for 'defense in depth' or 'privilege separation' to see how important it is inside a system to treat each component isolated to itself as much as possible. (This is also why you don't want to rely on only a perimeter firewall for access control.)
IAM instance roles are still an improvement over how it was typically done in the past: hard-coding the same key in a configuration file and deploying it everywhere.
OK, dwaxe, I have to ask: Are you a robot? Because I uploaded this blog post, tweeted it, and then came straight over here to submit it and you still got here first.
Not that I mind, but getting your HN submission in within 30 seconds of my blog post going up is very impressive if you're not a robot.
Yes I am. This is my personal account, but I use it to automatically post to Hacker News. I was playing around with BigQuery one day and found the Hacker News dataset [1]. From my experience with the Reddit submissions dataset [2], I knew that I could compose this query,
SELECT
AVG(score) AS avg_score,
COUNT(* ) AS num,
REGEXP_EXTRACT(url, r'//([^/]*)/') AS domain
FROM
[fh-bigquery:hackernews.full_201510]
WHERE
score IS NOT NULL
AND url <> ''
GROUP BY
domain
HAVING
num > 10
ORDER BY
avg_score DESC
which returns a list of domains with more than ten submissions sorted by average score. This turns out to be a list of some of the most successful tech blogs on the internet, as well as various YCombinator related materials. Out of the domains with over 100 submissions, daemonology.net has the 9th highest average score per submission. I manually visited all the domains with more than about 30 submissions, found the appropriate xml feeds, and saved them. I added a few websites like eff.org whose messages I think everyone should read anyways.
Then I jumped into python and started trying to figure out how to post to Hacker News. It was a little more complicated than I anticipated [3], but an open source HN app for Android helped me figure it out.
I set up a cron job on my $5 Digital Ocean that runs the script every few minutes (pseudocode):
If you can reach http://news.ycombinator.com, Check all feeds for new entries, Post a new entry to hn, Sleep for an hour before posting another
[2] The only difference on Reddit is the subreddit system.
[3] After you send a POST request to send to the login screen, Hacker News gives you a url with a unique "fnid" parameter, and you send another POST request to another url with the appropriate "fnid".
We appreciate both the cleverness here and your detailed explanation. But could you please not do this anymore? It isn't malicious, but it's unhealthy for HN's ecosystem. For example, when an author submits his or her own work, that can add a lot of value to the community—but your bot pre-empts that, as it indeed did in this case.
There are many more reasons why this isn't a good thing for HN. For example, it's better for submissions from popular sites to be distributed across a wide range of accounts. That gives more users a chance to feel like they're making important contributions, and gives the community (and authors) a clearer sense of the audience.
There are lots of ways to write software to interact with HN, and lots of users with the ability to do it, so we really depend on the good will of the community only to do that when it serves the whole.
Wouldn't it be funny if you added an RSS entry that linked to an otherwise unpublished page with the url path /i-am-a-bot-that-posts-to-hackernews with a matching title to go?
I once observed that everything that shows up here eventually shows up in a particular cluster of subreddits and vice versa. Usually with a lag of 24-48 hours.
I half-jokingly floated the idea that one could write a karma arbitrageur bot which cross posts between HN and those subreddits.
I was told that I would be banned. Not just the bot: me.
And to your question, dwaxe is not a bot (there are comments associated with the account too), and this has happened before (apparently a lightning fast submitter):
Or better; assuming the URI is known before posting (it's not randomly generated on submission or encapsulates absurdly precise submission time) just post it to HN a fraction of a second before it's live. Nothing any script can do to beat you then.
We could administer it a test of some sort, and evaluate the ensuing conversation, whether it's convincingly human. I think it'd be fitting name this test in honor of some forefather of computer science.
Think about things like services that accept user submitted URLs, crawl them, and display results...