Google Security Online Blog - Last Entries



Contributing To Open Source Software Security

2008-05-05T11:38:00.001-07:00



From operating systems to web browsers, open source software plays a critical role in the operation of the Internet. The security of open source software is therefore quite important, as it often interacts with personal information -- ranging from credit card numbers to medical records -- that needs to be kept safe. There has been a long-lived discussion on whether open source software is inherently more secure than closed source software. While popular opinion has begun to tilt in favor of openness, there are still arguments for both sides. Instead of diving into those treacherous waters (or giving weight to the idea of "inherent security"), I'd like to focus on the fruits of this extensive discussion. In particular, David A. Wheeler laid out a "bottom line" in his Secure Programming for Linux and Unix HOWTO which applies to both open and closed source software. It predicates real security in software on three actions:


  1. people need to actually review the code

  2. developers/reviewers need to know how to write secure code

  3. once found, security problems need to be fixed quickly, and their fixes distributed quickly


While distilling anything down to three steps makes it seem easy, this isn't necessarily the case. Given how important open source software is to Google, we've attempted to contribute to this bottom line. As Chris said before, our engineers are encouraged to contribute both software and time to open source efforts. We regularly submit the results of our automated and manual security analysis of open source software back to the community, including related software engineering time. In addition, our engineering teams frequently release software under open source licenses. This software was written either with security in mind, such as with security testing tools, or by engineers well-versed in the security challenges of their project.

These efforts leave one area completely unaddressed -- getting security problems fixed quickly, and then getting those fixes distributed quickly. It has been unclear how to best resolve this issue. There is no centralized security authority for open source projects, and operating system distribution publishers are the best bet for getting updates to the highest number of users. Even if users can get updates in this manner, how should a security researcher contact a particular project's author? If there's a potential, security-related issue, who can help evaluate the risk for a project? What resources are there for projects that have been compromised, but have no operational security background?

I'm proud to announce that Google has sponsored participation in oCERT, the open source computer emergency response team. oCERT is a volunteer workforce of security professionals from the open source community with the goal of providing security vulnerability mediation and incident response services to open source projects. It will strive to contact software authors with all security reports and aid in debugging and patching, especially in cases where the author, or the reporter, doesn't have a background in security. Reliable contacts for projects, publishers, and vendors will be maintained where possible and used for notification when issues arise and fixes are available for mediated issues. Additionally, oCERT will aid projects of any size with responses to security incidents, such as server compromises.

It is my hope that this initiative will not only aid in remediating security issues in a timely fashion, but also provide a means for additional security contributions to the open source community.




All Your iFrame Are Point to Us

2008-02-11T13:57:00.001-08:00



It has been over a year and a half since we started to identify web pages that infect vulnerable hosts via drive-by downloads, i.e. web pages that attempt to exploit their visitors by installing and running malware automatically. During that time we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware. During the course of our research, we have investigated not only the prevalence of drive-by downloads but also how users are being exposed to malware and how it is being distributed. Our research paper is currently under peer review, but we are making a technical report [PDF] available now. Although our technical report contains a lot more detail, we present some high-level findings here:

Search Results Containing a URL Labeled as Harmful


The above graph shows the percentage of daily queries that contain at least one search result labeled as harmful. In the past few months, more than 1% of all search results contained at least one result that we believe to point to malicious content and the trend seems to be increasing.

Browsing Habits

Good computer hygiene, such as running automatic updates for the operating system and third-party applications, as well as installing anti-virus products goes a long way in protecting your home computer. However, we have been wondering if users' browsing habits impact the likelihood of encountering malicious web pages. To study this aspect, we took a sample of ~7 million URLs and mapped them to DMOZ categories. Although we found that adult web pages may increase the risk of exploitation, each DMOZ category was affected.

Malicious Content Injection

To understand if malicious content on a web server is due to poor web server security, we analyzed the version numbers reported by web servers on which we found malicious pages. Specifically, we looked at the Apache and the PHP versions exported as part of a server's response. We found that over 38% of both Apache and PHP versions were outdated increasing the risk of remote content injection to these servers.

Our "Ghost In the Browser [PDF]" paper highlighted third-party content as one potential vector of malicious content. Today, a lot of third-party content is due to advertising. To assess the extent to which advertising contributes to drive-by downloads, we analyze the distribution chain of malware, i.e. all the intermediary URLs a browser downloads before reaching a malware payload. We inspected each distribution chain for membership in about 2,000 known advertising networks. If any URL in the distribution chain corresponds to a known advertising network, we count the whole page as being infectious due to Ads. In our analysis, we found that on average 2% of malicious web sites were delivering malware via advertising. The underlying problem is that advertising space is often syndicated to other parties who are not known to the web site owner. Although non-syndicated advertising networks such as Google Adwords are not affected, any advertising networks practicing syndication needs to carefully study this problem. Our technical report [PDF] contains more detail including an analysis based on the popularity of web sites.

Structural Properties of Malware Distribution


Finally, we also investigated the structural properties of malware distribution sites. Some malware distribution sites had as many as 21,000 regular web sites pointing to them. We also found that the majority of malware was hosted on web servers located in China. Interestingly, Chinese malware distribution sites are mostly pointed to by Chinese web servers.

We hope that an analysis such as this will help us to better understand the malware problem in the future and allow us to protect users all over the Internet from malicious web sites as best as we can. One thing is clear - we have a lot of work ahead of us.




Help us fill in the gaps!

2007-11-29T14:28:00.000-08:00



We've been targeting malware for over a year and a half, and these efforts are paying off. We are now able to display warnings in search results when a site is known to be malicious, which can help you avoid drive-by downloads and other computer compromises. We are already distributing this data through the Safe Browsing API, and we are working on bringing this protection to more users by integrating with more Google products. While these are great steps, we need your help going forward!

Currently, we know of hundreds of thousands of websites that attempt to infect people's computers with malware. Unfortunately, we also know that there are more malware sites out there. This is where we need your help in filling in the gaps. If you come across a site that is hosting malware, we now have an easy way for you to let us know about it. If you come across a site that is hosting malware, please fill out this short form. Help us keep the internet safe, and report sites that distribute malware.




Auditing open source software

2007-10-08T16:13:00.000-07:00



Google encourages its employees to contribute back to the open source community, and there is no exception in Google's Security Team. Let's look at some interesting open source vulnerabilities that were located and fixed by members of Google's Security team. It is interesting to classify and aggregate the code flaws leading to the vulnerabilities, to see if any particular type of flaw is more prevalent.

  1. JDK. In May 2007, I released details on an interesting bug in the ICC profile parser in Sun's JDK. The bug is particularly interesting because it could be exploited by an evil image. Most previous JDK bugs involve a user having to run a whole evil applet. The key parts of code which demonstrate the bug are as follows:

    TagOffset = SpGetUInt32 (&Ptr);
    if (ProfileSize < TagOffset)
      return SpStatBadProfileDir;
    ...
    TagSize = SpGetUInt32 (&Ptr);
    if (ProfileSize < TagOffset + TagSize)
      return SpStatBadProfileDir;
    ...
    Ptr = (KpInt32_t *) malloc ((unsigned int)numBytes+HEADER);

    Both TagSize and TagOffset are untrusted unsigned 32-bit values pulled out of images being parsed. They are added together, causing a classic integer overflow condition and the bypass of the size check. A subsequent additional integer overflow in the allocation of a buffer leads to a heap-based buffer overflow.

  2. gunzip. In September 2006, my colleague Tavis Ormandy reported some interesting vulnerabilities in the gunzip decompressor. They were triggered when an evil compressed archive is decompressed. A lot of programs will automatically pass compressed data through gunzip, making it an interesting attack. The key parts of the code which demonstrate one of the bugs are as follows:

    ush count[17], weight[17], start[18], *p;
    ...
    for (i = 0; i < (unsigned)nchar; i++) count[bitlen[i]]++;

    Here, the stack-based array "count" is indexed by values in the "bitlen" array. These values are under the control of data in the incoming untrusted compressed data, and were not checked for being within the bounds of the "count" array. This led to corruption of data on the stack.


  3. libtiff. In August 2006, Tavis reported a range of security vulnerabilities in the libtiff image parsing library. A lot of image manipulation programs and services will be using libtiff if they handle TIFF format files. So, an evil TIFF file could compromise a lot of desktops or even servers. The key parts of the code which demonstrate one of the bugs are as follows:

    if (sp->cinfo.d.image_width != segment_width ||
        sp->cinfo.d.image_height != segment_height) {
      TIFFWarningExt(tif->tif_clientdata, module,
        "Improper JPEG strip/tile size, expected %dx%d, got %dx%d",

    Here, a TIFF file containing a JPEG image is being processed. In this case, both the TIFF header and the embedded JPEG image contain their own copies of the width and height of the image in pixels. This check above notices when these values differ, issues a warning, and continues. The destination buffer for the pixels is allocated based on the TIFF header values, and it is filled based on the JPEG values. This leads to a buffer overflow if a malicious image file contains a JPEG with larger dimensions than those in the TIFF header. Presumably the intent here was to support broken files where the embedded JPEG had smaller dimensions than those in the TIFF header. However, the consequences of larger dimensions that those in the TIFF header had not been considered.

We can draw some interesting conclusions from these bugs. The specific vulnerabilities are integer overflows, out-of-bounds array accesses and buffer overflows. However, the general theme is using an integer from an untrusted source without adequately sanity checking it. Integer abuse issues are still very common in code, particular code which is decoding untrusted binary data or protocols. We recommend being careful using any such code until it has been vetted for security (by extensive code auditing, fuzz testing, or preferably both). It is also important to watch for security updates for any decoding software you use, and keep patching up to date.




Information flow tracing and software testing

2007-09-17T09:32:00.000-07:00



Security testing of applications is regularly performed using fuzz testing. As previously discussed on this blog, Srinath's Lemon uses a form of smart fuzzing. Lemon is aware of classes of web application threats and the input families which trigger them, but not all fuzz testing frameworks have to be this complicated. Fuzz testing originally relied on purely random data, ignorant of specific threats and known dangerous input. Today, this approach is often overlooked in favor of more complicated techniques. Early sanity checks in applications looking for something as a simple as a version number may render testing with completely random input ineffective. However, the newer, more complicated fuzz testers require a considerable initial investment in the form of complete input format specifications or the selection of a large corpus of initial input samples.

At WOOT'07,I presented a paper on Flayer, a tool we developed internally to augment our security testing efforts. In particular, it allows for a fuzz testing technique that compromises between the original idea and the most complicated. Flayer makes it possible to remove input sanity checks at execution time. With the small investment of identifying these checks, Flayer allows for completely random testing to be performed with much higher efficacy. Already, we've uncovered multiple vulnerabilities in Internet-critical software using this approach.

The way that Flayer allows for sanity checks to be identified is perhaps the more interesting point. Flayer uses a dynamic analysis framework to analyze the target application at execution time. Flayer marks, or taints, input to the program and traces that data throughout its lifespan. Considerable research has been done in the past regarding information flow tracing using dynamic analysis. Primarily, this work has been aimed at malware and exploit detection and defense. However, none of the resulting software has been made publicly available.

While Flayer is still in its early stages, it is available for download under the GNU Public License. External contributions and feedback are encouraged!




Automating web application security testing

2007-07-16T11:40:00.000-07:00



Cross-site scripting (aka XSS) is the term used to describe a class of security vulnerabilities in web applications. An attacker can inject malicious scripts to perform unauthorized actions in the context of the victim's web session. Any web application that serves documents that include data from untrusted sources could be vulnerable to XSS if the untrusted data is not appropriately sanitized. A web application that is vulnerable to XSS can be exploited in two major ways:

    Stored XSS - Commonly exploited in a web application where one user enters information that's viewed by another user. An attacker can inject malicious scripts that are executed in the context of the victim's session. The exploit is triggered when a victim visits the website at some point in the future, such as through improperly sanitized blog comments and guestbook entries, which facilitates stored XSS.

    Reflected XSS - An application that echoes improperly sanitized user input received as query parameters is vulnerable to reflected XSS. With a vulnerable application, an attacker can craft a malicious URL and send it to the victim via email or any other mode of communication. When the victim visits the tampered link, the page is loaded along with the injected script that is executed in the context of the victim's session.

The general principle behind preventing XSS is the proper sanitization (via, for instance, escaping or filtering) of all untrusted data that is output by a web application. If untrusted data is output within an HTML document, the appropriate sanitization depends on the specific context in which the data is inserted into the HTML document. The context could be in the regular HTML body, tag attributes, URL attributes, URL query string attributes, style attributes, inside JavaScript, HTTP response headers, etc.

The following are some (by no means complete) examples of XSS vulnerabilities. Let's assume there is a web application that accepts user input as the 'q' parameter. Untrusted data coming from the attacker is marked in red.


  • Injection in regular HTML body - angled brackets not filtered or escaped

    <b>Your query '<script>evil_script()</script>' returned xxx results</b>

  • Injection inside tag attributes - double quote not filtered or escaped

    <form ...
      <input name="q" value="blah"><script>evil_script()</script>">
    </form>

  • Injection inside URL attributes - non-http(s) URL

    <img src="javascript:evil_script()">...</img>

  • In JavaScript context - single quote not filtered or escaped

    <script>
      var msg = 'blah'; evil_script(); //';
      // do something with msg variable
    </script>


In the cases where XSS arises from meta characters being inserted from untrusted sources into an HTML document, the issue can be avoided either by filtering/disallowing the meta characters, or by escaping them appropriately for the given HTML context. For example, the HTML meta characters <, >, &, " and ' must be replaced with their corresponding HTML entity references &lt;, &gt;, &amp;, &quot; and &#39 respectively. In a JavaScript-literal context, inserting a backslash in front of \, ', " and converting the carriage returns, line-feeds and tabs into \r, \n and \t respectively should avoid untrusted meta characters being interpreted as code.

How about an automated tool for finding XSS problems in web applications? Our security team has been developing a black box fuzzing tool called Lemon (deriving from the commonly-recognized name for a defective product). Fuzz testing (also referred to as fault-injection testing) is an automated testing approach based on supplying inputs that are designed to trigger and expose flaws in the application. Our vulnerability testing tool enumerates a web application's URLs and corresponding input parameters. It then iteratively supplies fault strings designed to expose XSS and other vulnerabilities to each input, and analyzes the resulting responses for evidence of such vulnerabilities. Although it started out as an experimental tool, it has proved to be quite effective in finding XSS problems. Besides XSS, it finds other security problems such as response splitting attacks, cookie poisoning problems, stacktrace leaks, encoding issues and charset bugs. Since the tool is homegrown it is easy to integrate into our automated test environment and to extend based on specific needs. We are constantly in the process of adding new attack vectors to improve the tool against known security problems.

Update:
I wanted to respond to a few questions that seem to be common among readers. I've listed them below. Thanks for the feedback. Please keep the questions and comments coming.

Q. Does Google plan to market it at some point?
A. Lemon is highly customized for Google apps and we have no plans of releasing it in near future.

Q. Did Google's security team check out any commercially available fuzzers? Is the ability to keep improving the fuzzer the main draw of a homegrown tool?
A. We did evaluate commercially available fuzzers but felt that our specialized needs could be served best by developing our own tools.




The reason behind the "We're sorry..." message

2007-07-09T11:54:00.000-07:00




Some of you might have seen this message while searching on Google, and wondered what the reason behind it might be. Instead of search results, Google displays the "We're sorry" message when we detect anomalous queries from your network. As a regular user, it is possible to answer a CAPTCHA - a reverse Turing test meant to establish that we are talking to a human user - and to continue searching. However, automated processes such as worms would have a much harder time solving the CAPTCHA. Several things can trigger the sorry message. Often it's due to infected computers or DSL routers that proxy search traffic through your network - this may be at home or even at a workplace where one or more computers might be infected. Overly aggressive SEO ranking tools may trigger this message, too. In other cases, we have seen self-propagating worms that use Google search to identify vulnerable web servers on the Internet and then exploit them. The exploited systems in turn then search Google for more vulnerable web servers and so on.  This can lead to a noticeable increase in search queries and sorry is one of our mechanisms to deal with this.

At ACM WORM 2006, we published a paper on Search Worms [PDF] that takes a much closer look at this phenomenon. Santy, one of the search worms we analyzed, looks for remote-execution vulnerabilities in the popular phpBB2 web application. In addition to exhibiting worm like propagation patterns, Santy also installs a botnet client as a payload that connects the compromised web server to an IRC channel. Adversaries can then remotely control the compromised web servers and use them for DDoS attacks, spam or phishing. Over time, the adversaries have realized that even though a botnet consisting of web servers provides a lot of aggregate bandwidth, they can increase leverage by changing the content on the compromised web servers to infect visitors and in turn join the computers of compromised visitors into much larger botnets. This fundamental change from remote attack to client based download of malware formed the basis of the research presented in our first post. In retrospect, it is interesting to see how two seemingly unrelated problems are tightly connected.






 
top