ModSecurity Breach

ModSecurity Blog

« ModSecurity In HP-UX Internet Express | Main | Web Application Firewall Use Cases Update »

XSS Defense HOWTO

We all agree that cross-site scripting is a serious problem, but what continues to amaze me is the lack of good documentation on the subject. It is easy to find instructions how to execute attacks against applications vulnerable to XSS, but finding something adequate to cover defence is a real challenge. No wonder programmers keep making the same errors over and over again. I am sure that one page that describes the problems and the solutions is somewhere out there, but I have been unable to find it. All I am getting is a page after page after page of half-truths and partial information, and even people saying that XSS is impossible to defend against.

Without any planning (so please forgive any omissions), I am now going to write how to produce web applications that are safe against XSS and other injection attacks.

This is what you need to do:

  1. Identify all system components other than the application itself. In a typical web application you will have at least the following:
    1. Database
    2. Browser output, which further consists of:
      1. HTML
      2. JavaScript
      3. CSS
      4. Response headers (e.g. redirection, cookies, etc)
  2. Adopt one character encoding (use UTF-8 unless you have a good reason not to) and make sure all components are configured to use it:
    1. Databases typically need to be created with a character encoding in mind
    2. In the HTML pages you create, set the character encoding explicitly
  3. Then, for every component:
    1. Identify safe characters
    2. Identify how to make unsafe characters safe by converting them into something else
    3. Write a function that looks at characters one by one to determine if they are safe, and converts those that are not (whitelisting, not blacklisting!)
    4. Every such function must be aware of the character encoding used in the application
  4. Then, for every piece of code that sends data from one component into another, make sure you use the correct function to encode data to make it safe
  5. Check that every piece of data you receive is in the correct character encoding and that the format matches that of the type you are expecting (input validation). You must use whitelisting (as blacklisting does not work). This is especially important for user-supplied Internet addresses—see below for details. Before you do anything with the input data make sure to canonicalise it (as suggested by Jim Manico in one of the comments), which will reduce the possibility of evasion through the use of multiple representations of the same character.

The first 4 steps from the list are the actual XSS defence. The fifth item is a matter of good practice and does not directly protect against XSS in most cases. In fact, there is only one case where it does protect, and that is in preventing attackers from executing JavaScript code in data pretending to be an Internet address (e.g. instead of http://www.example.com, which you use to create a link <a href="http://www.example.com">Example</a>, you get javascript:alert('xss').

Notes:

  1. Google Doctype, which is a reference library for web developers, is by far the best resource on XSS, but it too fails when it comes to defence, advising people to use blacklisting instead of whitelisting.
  2. The OWASP Encoding project should be your starting point if you don't want to write all the encoding function yourself.
  3. For the cases when you want to accept some HTML/JavaScript/CSS you will need to adopt a different approach: meet AntiSamy.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e5512c9d3a883300e553aa4d708834

Listed below are links to weblogs that reference XSS Defense HOWTO:

http://www.owasp.org/index.php/Cross_Site_Scripting#How_to_Protect_Yourself is a decent resource - nice and concise.

Be careful with #4 - you only want to encode data once; if you make the mistake of encoding data "for every piece of code that sends data from one component into another" that you run into the mistake of encoding multiple times; which is not pleasing for he user experience; or could even break your app.

Also, you missed a note about turning off HTTP Trace.

I would also like to add that character encoding (cononicalization) should be done both as you render the HTML page, as well as when you validate user input.

You also missed that it's key to protect the transport. A savy attacker could intercept and modify HTTP traffic and get around all other protections mentioned.

Jim,

That OWASP resource is both inadequate and inaccurate: 1) it only mentions encoding for HTML, but not for JavaScript, CSS, or any other component; 2) it suggests blacklisting instead of whitelisting; 3) the list of characters to escape does not include whitespace, which is an error (e.g. vital to prevent attacks via HTML tag attributes) and proves my point about blacklisting being fundamentally wrong; 4) does not discuss the importance of knowing the character encoding you are dealing with; and so on.

Regarding encoding more than once: it commonly occurs when people are encoding on component input (e.g. as part of validation) and then again at output, which is a mistake. I don't see how that could happen provided encoding is only done when data is exiting on a component boundary, which is the correct approach.

I think the canonicalisation suggestion is very good, and I will add it to the list.

As for HTTP TRACE and transport protection: I agree that those are issues that need to be dealt with, but neither creates or contributes to XSS, which is the focus of my post.

1) My understanding is that transport protection would protect against man-in-the-middle XSS attacks that would be prevented if HTTPS was used. Although very difficult to mount, is it still plausible?

2) I agree, HTTP TRACE (Cross Site Tracing) is a different issue and does not belong on the OWASP page - although these issues are tangential.

3) "I don't see how that could happen provided encoding is only done when data is exiting on a component boundary, which is the correct approach."

Think large enterprise N-tiered applications with many component boundaries. The proper approach is encoding before data is presented to the user.

4) And you are right; the OWASP page is not complete. Consider hitting "edit" next time - it will get your message out to a larger community. I made a change this evening, at least, to correct the blacklist example.

> 1) My understanding is that transport protection would
> protect against man-in-the-middle XSS attacks that would
> be prevented if HTTPS was used. Although very difficult
> to mount, is it still plausible?

Yes, it is plausible, especially with unencrypted Wi-Fi, which is so prevalent today. But, although both MITM and XSS can be used with similar results, they are entirely separate attack vectors. XSS is an injection attack made possible by insecure programming. Channel protection (and thus MITM) is a deployment/configuration issue. With MITM you're in full control of the content stream so there's no need to perform XSS; you just rewrite content.


> > 3) "I don't see how that could happen provided encoding
> > is only done when data is exiting on a component
> > boundary, which is the correct approach."
>
> Think large enterprise N-tiered applications with
> many component boundaries.

I still don't see the problem. For every two components that are exchanging data there will be a communication protocol (or exchange format) of some sort. For as long encoding is done once per boundary there will be no danger of double encoding: for every transfer the sending component will encode the data transferred and the receiving component will decode it. The danger lies in encoding for HTML when talking to the database, for example, which is never appropriate.


> The proper approach is encoding before data
> is presented to the user.

I prefer not to use the term user as it only works in some cases (e.g. writing HTML), but not in others (e.g. storing data in databases - where is the user?). Replace the word user with the word consumer and I will agree with you.

> For as long encoding is done once per boundary there will be no danger of double encoding: for every transfer the sending component will encode the data transferred and the receiving component will decode it.

For example, one might want to *not* encode on purpose when dumping data into a log in order to capture the raw attack. But encoding data before presenting it to the consumer (not user, point well taken) is crucial.

> Channel protection (and thus MITM) is a deployment/configuration issue.

It's both a deployment/configuration issue as well as a programming issue. The server config person needs to set up certs and make sure HTTPS is set up properly. A good programmer will not only present HTTPS links to the user, but also force redirection of HTTP pages to HTTPS where appropriate. Also, the programmer may wish for some public pages to be HTTP while other pages after auth would be forces to HTTPS... Certainly, a programmer is a part of ensuring properly secured transport; its not just a config issue.

Regardless, you point it well taken. There are very few resources on the net that provide a complete description of XSS defense.

Thanks for taking the time to have this conversation, its good stuff. Also, thank you for your participation in ESAPI. We are lucky to have you.

To me, the simpliest solution:

Whitelist validation with a user interface that makes sense for the application (this will solve well over 80% of the issues in the 80/20 rule).

Develop applications such that validation occurs at the data layer rather than the interface layer (or better yet, do both). This will probably catch another 10%. Stop trusting client side validation.

Encode data going to another grammar in the right manner such that it will not be interpretted by its parser as anything other than text.

I think it's useful to be a lot more specific about "components" and talk about context, instead. For example, within the "HTML" component you might be writing data inline (not inside a markup tag), as an attribute value inside a tag (e.g. INPUT VALUE="userdata"), inside a STYLE tag, between SCRIPT tags, etc.. Each of these contexts has different rules about what needs to be encoded and what can be written directly to the page, canonicalization, and what is "bad". Then there's also AJAX/AJAJ (JSON) which is another huge can of worms. If you want to have a definitive anti-XSS guide, I think you'll want to enumerate every context that will require encoding in your Step 1.

Regarding your third note, I think you'll have a hard time finding a better solution than HTML Purifier:

http://htmlpurifier.org/

It has been in development for years, and I'm doubtful that reinventing the wheel is worthwhile.

The comments to this entry are closed.

Calendar

November 2010
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30

Feeds

Atom Feed

Search

Categories

Recent Entries

Archives