Generic cross-browser cross-domain theft


Well, here's a nice little gem for the festive season. I like it for a few distinct reasons:

  1. It's one of those cases where if you look at web standards from the correct angle, you can see a security vulnerability specified.

  2. Accordingly, it affected all 5 major browsers. And likely the rest.

  3. You can still be a theft victim even with plugins and JavaScript disabled!

It's much less serious than it could be because there are restrictions on the format of cross-domain data which can be stolen, and the attacker needs to be able to exercise limited control of the target theft page.
The issue is best introduced with an example. The example chosen is deliberately a little bit involved and not too severe. This is to give the upcoming browser updates a chance to get deployed.

Example: Yahoo! Mail cross-domain subject line theft and e-mail deletion

(It's important to note there is no apparent failing of the web app in question here).

If you set up the above scenario as a test, you might see something like this in an alert box upon clicking the link:


The above text is stolen cross-domain, and the interesting pieces are highlighted in bold. The data includes the subjects, senders and "mid" value for all e-mails received between the two set-up e-mails we sent the victim.
Although leaking of subjects and senders is not ideal, it's the "mid" value that interests us most as an attacker. This would appear to be a secure / unguessable ID. Accordingly, it is reasonable for the mail application to rely on it as a distinct anti-XSRF token. This is indeed the case for the "delete" operation, implemented as a simple HTTP GET request. Interestingly, the "forward" operation seems to have an additional anti-XSRF token in the POST body, making the "mid" leak not nearly as serious as it could have been.

That's how this whole attack proceeds in its most powerful form: leak a small amount of text cross-domain, and then bingo! if the leaked text happens to include a global anti-XSRF token.

How does it work?

It works by abusing the standards relating to the loading of CSS style sheets. Approximately, the standards are:

By controlling a little bit of text in the victim domain, the attacker can inject what appears to be a valid CSS string. It does not matter what proceeds this CSS string: HTML, binary data, JSON, XML. The CSS parser will ruthlessly hunt down any CSS constructs within whatever blob is pulled from the victim's domain. To the CSS parser, the text in the above attack looks like this:

(some HTML junk; whatever){} body{background-image:url(' stuff...')}(some trailing HTML junk)

So, the background of the attacker's page will be styled with a background image loaded from an URL, the path of which contains stolen data! One lovely twist of using a CSS string which is an URL is that it will be automatically fetched even if JavaScript is turned off! The stolen data is then harvested by the attacker from their web server logs.
Fortunately, there are various barriers to exploiting this:

General areas that are more susceptible to this attack include:

How do we fix it?

It would be nice to be able to not send cookies for cross-domain CSS loads; however that would certainly break stuff and it's hard to measure what without actually causing the breakage.

It would be nice to be strict on the MIME type when loading CSS resources -- if not globally then at least for cross-domain loads. But this breaks high profile sites, *cough* and text/plain *cough*. (To be fair, it gets much worse with many sites even using text/html, application/octet-stream, it goes on).

A good balance is to require the alleged CSS to at least start with well-formed CSS, iff it is a cross-domain load and the MIME type is broken. This is the approach I used in my pending WebKit patch.

Note that fixing this issue also fixes my previous attack of using cross-domain CSS to reliably tell if someone is logged in or not: