From operating systems to web browsers, open source software plays a critical role in the operation of the Internet. The security of open source software is therefore quite important, as it often interacts with personal information -- ranging from credit card numbers to medical records -- that needs to be kept safe. There has been a long-lived discussion on whether open source software is inherently more secure than closed source software. While popular opinion has begun to tilt in favor of openness, there are still arguments for both sides. Instead of diving into those treacherous waters (or giving weight to the idea of "inherent security"), I'd like to focus on the fruits of this extensive discussion. In particular, David A. Wheeler laid out a "bottom line" in his Secure Programming for Linux and Unix HOWTO which applies to both open and closed source software. It predicates real security in software on three actions:
It has been over a year and a half since we started to identify web pages that infect vulnerable hosts via drive-by downloads, i.e. web pages that attempt to exploit their visitors by installing and running malware automatically. During that time we have investigated billions of URLs and found more than three million unique URLs on over 180,000 web sites automatically installing malware. During the course of our research, we have investigated not only the prevalence of drive-by downloads but also how users are being exposed to malware and how it is being distributed. Our research paper is currently under peer review, but we are making a technical report [PDF] available now. Although our technical report contains a lot more detail, we present some high-level findings here:
Search Results Containing a URL Labeled as Harmful
The above graph shows the percentage of daily queries that contain at least one search result labeled as harmful. In the past few months, more than 1% of all search results contained at least one result that we believe to point to malicious content and the trend seems to be increasing.
Browsing Habits
Good computer hygiene, such as running automatic updates for the operating system and third-party applications, as well as installing anti-virus products goes a long way in protecting your home computer. However, we have been wondering if users' browsing habits impact the likelihood of encountering malicious web pages. To study this aspect, we took a sample of ~7 million URLs and mapped them to DMOZ categories. Although we found that adult web pages may increase the risk of exploitation, each DMOZ category was affected.
Malicious Content Injection
To understand if malicious content on a web server is due to poor web server security, we analyzed the version numbers reported by web servers on which we found malicious pages. Specifically, we looked at the Apache and the PHP versions exported as part of a server's response. We found that over 38% of both Apache and PHP versions were outdated increasing the risk of remote content injection to these servers.
Our "Ghost In the Browser [PDF]" paper highlighted third-party content as one potential vector of malicious content. Today, a lot of third-party content is due to advertising. To assess the extent to which advertising contributes to drive-by downloads, we analyze the distribution chain of malware, i.e. all the intermediary URLs a browser downloads before reaching a malware payload. We inspected each distribution chain for membership in about 2,000 known advertising networks. If any URL in the distribution chain corresponds to a known advertising network, we count the whole page as being infectious due to Ads. In our analysis, we found that on average 2% of malicious web sites were delivering malware via advertising. The underlying problem is that advertising space is often syndicated to other parties who are not known to the web site owner. Although non-syndicated advertising networks such as Google Adwords are not affected, any advertising networks practicing syndication needs to carefully study this problem. Our technical report [PDF] contains more detail including an analysis based on the popularity of web sites.
Structural Properties of Malware Distribution
Finally, we also investigated the structural properties of malware distribution sites. Some malware distribution sites had as many as 21,000 regular web sites pointing to them. We also found that the majority of malware was hosted on web servers located in China. Interestingly, Chinese malware distribution sites are mostly pointed to by Chinese web servers.
We hope that an analysis such as this will help us to better understand the malware problem in the future and allow us to protect users all over the Internet from malicious web sites as best as we can. One thing is clear - we have a lot of work ahead of us.
Google encourages its employees to contribute back to the open source community, and there is no exception in Google's Security Team. Let's look at some interesting open source vulnerabilities that were located and fixed by members of Google's Security team. It is interesting to classify and aggregate the code flaws leading to the vulnerabilities, to see if any particular type of flaw is more prevalent.
TagOffset = SpGetUInt32 (&Ptr);
if (ProfileSize < TagOffset)
return SpStatBadProfileDir;
...
TagSize = SpGetUInt32 (&Ptr);
if (ProfileSize < TagOffset + TagSize)
return SpStatBadProfileDir;
...
Ptr = (KpInt32_t *) malloc ((unsigned int)numBytes+HEADER);
ush count[17], weight[17], start[18], *p;
...
for (i = 0; i < (unsigned)nchar; i++) count[bitlen[i]]++;
if (sp->cinfo.d.image_width != segment_width ||
sp->cinfo.d.image_height != segment_height) {
TIFFWarningExt(tif->tif_clientdata, module,
"Improper JPEG strip/tile size, expected %dx%d, got %dx%d",
Security testing of applications is regularly performed using fuzz testing. As previously discussed on this blog, Srinath's Lemon uses a form of smart fuzzing. Lemon is aware of classes of web application threats and the input families which trigger them, but not all fuzz testing frameworks have to be this complicated. Fuzz testing originally relied on purely random data, ignorant of specific threats and known dangerous input. Today, this approach is often overlooked in favor of more complicated techniques. Early sanity checks in applications looking for something as a simple as a version number may render testing with completely random input ineffective. However, the newer, more complicated fuzz testers require a considerable initial investment in the form of complete input format specifications or the selection of a large corpus of initial input samples.
At WOOT'07,I presented a paper on Flayer, a tool we developed internally to augment our security testing efforts. In particular, it allows for a fuzz testing technique that compromises between the original idea and the most complicated. Flayer makes it possible to remove input sanity checks at execution time. With the small investment of identifying these checks, Flayer allows for completely random testing to be performed with much higher efficacy. Already, we've uncovered multiple vulnerabilities in Internet-critical software using this approach.
The way that Flayer allows for sanity checks to be identified is perhaps the more interesting point. Flayer uses a dynamic analysis framework to analyze the target application at execution time. Flayer marks, or taints, input to the program and traces that data throughout its lifespan. Considerable research has been done in the past regarding information flow tracing using dynamic analysis. Primarily, this work has been aimed at malware and exploit detection and defense. However, none of the resulting software has been made publicly available.
While Flayer is still in its early stages, it is available for download under the GNU Public License. External contributions and feedback are encouraged!
Cross-site scripting (aka XSS) is the term used to describe a class of security vulnerabilities in web applications. An attacker can inject malicious scripts to perform unauthorized actions in the context of the victim's web session. Any web application that serves documents that include data from untrusted sources could be vulnerable to XSS if the untrusted data is not appropriately sanitized. A web application that is vulnerable to XSS can be exploited in two major ways:
Stored XSS - Commonly exploited in a web application where one user enters information that's viewed by another user. An attacker can inject malicious scripts that are executed in the context of the victim's session. The exploit is triggered when a victim visits the website at some point in the future, such as through improperly sanitized blog comments and guestbook entries, which facilitates stored XSS.
Reflected XSS - An application that echoes improperly sanitized user input received as query parameters is vulnerable to reflected XSS. With a vulnerable application, an attacker can craft a malicious URL and send it to the victim via email or any other mode of communication. When the victim visits the tampered link, the page is loaded along with the injected script that is executed in the context of the victim's session.
The general principle behind preventing XSS is the proper sanitization (via, for instance, escaping or filtering) of all untrusted data that is output by a web application. If untrusted data is output within an HTML document, the appropriate sanitization depends on the specific context in which the data is inserted into the HTML document. The context could be in the regular HTML body, tag attributes, URL attributes, URL query string attributes, style attributes, inside JavaScript, HTTP response headers, etc.
The following are some (by no means complete) examples of XSS vulnerabilities. Let's assume there is a web application that accepts user input as the 'q' parameter. Untrusted data coming from the attacker is marked in red.

Some of you might have seen this message while searching on Google, and wondered what the reason behind it might be. Instead of search results, Google displays the "We're sorry" message when we detect anomalous queries from your network. As a regular user, it is possible to answer a CAPTCHA - a reverse Turing test meant to establish that we are talking to a human user - and to continue searching. However, automated processes such as worms would have a much harder time solving the CAPTCHA. Several things can trigger the sorry message. Often it's due to infected computers or DSL routers that proxy search traffic through your network - this may be at home or even at a workplace where one or more computers might be infected. Overly aggressive SEO ranking tools may trigger this message, too. In other cases, we have seen self-propagating worms that use Google search to identify vulnerable web servers on the Internet and then exploit them. The exploited systems in turn then search Google for more vulnerable web servers and so on. This can lead to a noticeable increase in search queries and sorry is one of our mechanisms to deal with this.
At ACM WORM 2006, we published a paper on Search Worms [PDF] that takes a much closer look at this phenomenon. Santy, one of the search worms we analyzed, looks for remote-execution vulnerabilities in the popular phpBB2 web application. In addition to exhibiting worm like propagation patterns, Santy also installs a botnet client as a payload that connects the compromised web server to an IRC channel. Adversaries can then remotely control the compromised web servers and use them for DDoS attacks, spam or phishing. Over time, the adversaries have realized that even though a botnet consisting of web servers provides a lot of aggregate bandwidth, they can increase leverage by changing the content on the compromised web servers to infect visitors and in turn join the computers of compromised visitors into much larger botnets. This fundamental change from remote attack to client based download of malware formed the basis of the research presented in our first post. In retrospect, it is interesting to see how two seemingly unrelated problems are tightly connected.