2008-05-09T14:02:00.000-07:00
Written by T.V. Raman, Research Scientist
As a follow-up to my previous posts on accessibility, here are some design recommendations for creating web content that remains usable by the widest possible audience while helping ensure that the content gets indexed and crawled.
Avoid spurious XMLHttpRequests
Pages that enable users to look up information often use XMLHttpRequests to populate the page with additional information after the page has loaded. When using this pattern, ensure that your initial page has useful information on it -- otherwise Googlebot as well as those users who have disabled scripting in their browser may believe that your site contains only the message "loading..."
CSS sprites and navigation links
Having meaningful text to go with navigational links is equally important for Googlebot as well as users who cannot perceive the meaning of an image. While designing the look and feel of navigational links on your site, you may have chosen to go with images that function as links, e.g., by placing <img> tags within <a> elements. That design enables you to place the descriptive text as an alt attribute on the <img> tag.
But what if you've switched to using CSS sprites to optimize page loading? It's still possible to include that all-important descriptive text when applying CSS sprites; for a possible solution, see how the Google logo and the various nav-links at the bottom of the Google Results page are coded. In brief, we placed the descriptive text right under the CSS-sprited image.
Google search results with CSS enabled

Use unobtrusive JavaScript
We've talked about the concept of progressive enhancement when creating a rich, interactive site. As you add features, also use unobtrusive JavaScript techniques for creating JavaScript-powered web pages that degrade gracefully. These techniques ensure that your content remains accessible by the widest possible user base without the need to sacrifice the more interactive features of Web 2.0 applications.
Make printer-friendly versions easily available
Web sites with highly interactive visual designs often provide all of the content for a given story as a printer-friendly version. Generated from the same content as the interactive version, these are an excellent source of high-quality content for both the Googlebot as well as visually impaired users unable to experience all of the interactive features of a web site. But all too often, these printer-friendly versions remain hidden behind scripted links of the form:
<a href="#" rel="nofollow" onclick="javascript:print(...)">Print</a>
Creating actual URLs for these printer-friendly versions and linking to them via plain HTML anchors will vastly improve the quality of content that gets crawled.
<a href="http://example.com/page1-printer-friendly.html" rel="nofollow" target="_blank">Print</a>
If you're especially worried about duplicate content from the interactive and printer-friendly version, then you may want to pick a preferred version of the content and submit a Sitemap containing the preferred URL as well as try to internally link to this version. This can help Google disambiguate if we see pieces of the article show up on different URLs.
Create URLs for your useful content
As a webmaster, you have the power to mint URLs for all of the useful content that you are publishing. Exercising this power is what makes the web spin. Creating URLs for every valuable nugget you publish, and linking to them via plain old HTML hyperlinks will ensure that:
- Googlebot learns about that content,
- users can find that content,
- and users can bookmark it for returning later.
Failure to do this often forces your users to have to remember complex click trails to reach that nugget of information they know they previously viewed on your site.
2008-05-06T12:04:00.000-07:00
Rajat Mukherjee, Group Product Manager, Search
If you're a webmaster or site owner, you realize the importance of providing high quality search on your site so that users easily find the right information.
We just announced today that AdSense for Search is now powered by Custom Search. Custom Search (a Google-powered search box that you can install on your website in minutes) helps your users quickly find what they're looking for. As a webmaster, Custom Search gives you advanced customization options to improve the accuracy of your site's search results. You can also choose to monetize your traffic with ads tuned to the topic of your site. If you don't want ads, you can use Custom Search Business Edition.
Now, we're also looking to index more of your site's content for inclusion in your Custom Search Engine (CSE) used for search on your site. We figure out what sites and URLs are included in your CSE, and -- if you've provided
Sitemaps for the relevant sites -- we use that information to create a more comprehensive experience for your site's visitors. You don't have to do anything specific, besides submitting a Sitemap (via
Webmaster Tools) for your site if you haven't already done so. Note that this change will not result in more pages indexed on Google.com and your search rankings on Google.com won't change. However, you will be able to get much better results coverage in your CSE.
Custom Search is built on top of the Google index. This means that all pages that are available on Google.com are also available to your search engine. We're now maintaining a CSE-specific index in addition to the Google.com index for enhancing the performance of search on your site. If you submit a Sitemap, it's likely that we will crawl those pages and include them in the additional index we build.
In order for us to index these additional pages, our crawlers must be able to crawl them. Your Sitemap will also help us identify the URLs that are important. Please ensure you are not
blocking us from crawling any pages you want indexed. Improved index coverage is not instantaneous, as it takes some time for the pages to be crawled and indexed.
So what are you waiting for?
Submit your Sitemap!
2008-04-23T12:11:00.000-07:00
Posted by Ríona MacNamara, Webmaster Tools Team
The Set Geographic Target tool in Webmaster Tools lets you associate your site with a specific region. We've heard a lot of questions from webmasters about how to use the tool, and here Webmaster Trends Analyst Susan Moskwa explains how it works and when to use it.

Want to know more about setting a geographic target for your site? Check out our
Help Center. And if you like this video, you can see more on our
Webmaster Tools playlist on YouTube.
2008-04-23T10:20:00.000-07:00
By John Mueller, Webmaster Trends Analyst, Google Zürich
When we originally launched Sitemaps, we included support for the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) 2.0 protocol, an interoperability framework based on metadata harvesting. In the meantime, however, we've found that the information we gain from our support of OAI-PMH is disproportional to the amount of resources required to support it. Fewer than 200 sites are using OAI-PMH for Google Sitemaps at the moment.
In order to move forward with even better coverage of your websites, we have decided to support only the standard XML Sitemap format by May 2008. We are in the process of notifying sites using OAI-PMH to alert them of the change.
If you have been using OAI-PMH as a Google Sitemap feed, we would love to see you adopt the industry standard XML Sitemap format. This format is supported by all of the major search engines and helps to make sure that everyone is able to find your new and updated content as soon as you make it available.
If you have any questions regarding the move to XML Sitemap files, feel free to post in our Google discussion group for Sitemaps.
2008-04-16T13:55:00.000-07:00
Posted by Ríona MacNamara, Webmaster Tools Team
Planning on moving your site to a new domain? Lots of webmasters find this a scary process. How do you do it without hurting your site's performance in Google search results?
Your aim is to make the transition invisible and seamless to the user, and to make sure that Google knows that your new pages should get the same quality signals as the pages on your own site. When you're moving your site, pesky 404 (File Not Found) errors can harm the user experience and negatively impact your site's performance in Google search results.Let's cover moving your site to a new domain (for instance, changing from www.example.com to www.example.org). This is different from moving to a new IP address; read this post for more information on that.
Here are the main points:
- Test the move process by moving the contents of one directory or subdomain first. Then use a 301 Redirect to permanently redirect those pages on your old site to your new site. This tells Google and other search engines that your site has permanently moved.
- Once this is complete, check to see that the pages on your new site are appearing in Google's search results. When you're satisfied that the move is working correctly, you can move your entire site. Don't do a blanket redirect directing all traffic from your old site to your new home page. This will avoid 404 errors, but it's not a good user experience. A page-to-page redirect (where each page on the old site gets redirected to the corresponding page on the new site) is more work, but gives your users a consistent and transparent experience. If there won't be a 1:1 match between pages on your old and new site, try to make sure that every page on your old site is at least redirected to a new page with similar content.
- If you're changing your domain because of site rebranding or redesign, you might want to think about doing this in two phases: first, move your site; and second, launch your redesign. This manages the amount of change your users see at any stage in the process, and can make the process seem smoother. Keeping the variables to a minimum also makes it easier to troubleshoot unexpected behavior.
- Check both external and internal links to pages on your site. Ideally, you should contact the webmaster of each site that links to yours and ask them to update the links to point to the page on your new domain. If this isn't practical, make sure that all pages with incoming links are redirected to your new site. You should also check internal links within your old site, and update them to point to your new domain. Once your content is in place on your new server, use a link checker like Xenu to make sure you don't have broken legacy links on your site. This is especially important if your original content included absolute links (like www.example.com/cooking/recipes/chocolatecake.html) instead of relative links (like .../recipes/chocolatecake.html).
- To prevent confusion, it's best to make sure you retain control of your old site domain for at least 180 days.
- Finally, keep both your new and old site verified in Webmaster Tools, and review crawl errors regularly to make sure that the 301s from the old site are working properly, and that the new site isn't showing unwanted 404 errors.
We'll admit it, moving is never easy - but these steps should help ensure that none of your good web reputation falls off the truck in the process.
2008-04-14T10:47:00.000-07:00
Written by T.V. Raman, Research Scientist
Hubbell and I enjoying the day at our home in California. Please feel free to view my earlier post about accessibility for webmasters, as well as additional articles I've written for the Official Google blog.One of the most frequently asked questions about
Accessible Search is: What can I do to make my site rank well on Accessible Search? At the same time, webmasters often ask a similar but broader question: What can I do to rank high on Google Search?
Well I'm pleased to tell you that you can kill two birds with one stone: critical site features such as site navigation can be created to work for all users, including our own Googlebot. Below are a few tips for you to consider.
Ensure that all critical content is reachableTo
access content, it needs to be
reachable. Users and web crawlers reach content by navigating through hyperlinks, so as a critical first step, ensure that all content on your site is reachable via plain HTML hyperlinks, and avoid hiding critical portions of your site behind technologies such as JavaScript or Flash.
Plain hyperlinks are hyperlinks created via an HTML anchor element <a>. Next, ensure that the target of all hyperlinks i.e. <a> elements are real URLs, rather than using an empty hyperlink while deferring hyperlink behavior to an
onclick handler.
In short, avoid hyperlinks of the form:
<a href="#" rel="nofollow" onclick="javascript:void(...)">Product Catalog</a>In preference of simpler links, such as:
<a href="http://www.example.com/product-catalog.html" rel="nofollow">Product Catalog</a>Ensure that content is readable To be useful, content needs to be
readable by everyone. Ensure that all important content on your site is present within the text of HTML documents. Content needs to be available without needing to evaluate scripts on a page. Content hidden behind Flash animations or text generated within the browser by executable JavaScript remains opaque to the Googlebot, as well as to most blind users.
Ensure that content is available in reading orderHaving discovered and arrived at your readable content, a user needs to be able to follow the content you've put together in its
logical reading order. If you are using a complex, multi-column layout for most of the content on your site, you might wish to step back and analyze how you are achieving the desired effect. For example, using deeply-nested HTML tables makes it difficult to link together related pieces of text in a logical manner.
The same effect can often be achieved using CSS and logically organized <div> elements in HTML. As an added bonus, you will find that your site renders much faster as a result.
Supplement all visual content--don't be afraid of redundancy!Making information accessible to all does not mean that you need to 'dumb down' your site to simple text. Making your content maximally redundant is critical in ensuring that your content is maximally useful to everyone. Here are a few simple tips:
- Ensure that content communicated via images is available when those images are missing. This goes further than adding appropriate alt attributes to relevant images. Ensure that the text surrounding the image does an adequate job of setting the context for why the image is being used, as well as detailing the conclusions you expect a person seeing the image to draw. In short, if you want to make sure everyone knows it's a picture of a bridge, wrap that text around the image.
- Add relevant summaries and captions to tables so that the reader can gain a high-level appreciation for the information being conveyed before delving into the details contained within.
- Accompany visual animations such as data displays with a detailed textual summary.
Following these simple tips greatly increases the quality of your landing pages for everyone. As a positive side-effect, you'll most likely discover that your site gets better indexed!
2008-04-11T10:50:00.000-07:00
Written by Jayant Madhavan and Alon Halevy, Crawling and Indexing Team
Google is constantly trying new ideas to improve our coverage of the web. We already do some pretty smart things like scanning JavaScript and Flash to discover links to new web pages, and today, we would like to talk about another new technology we've started experimenting with recently.
In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.
Needless to say, this experiment follows good Internet citizenry practices. Only a small number of particularly useful sites receive this treatment, and our crawl agent, the ever-friendly Googlebot, always adheres to robots.txt, nofollow, and noindex directives. That means that if a search form is forbidden in robots.txt, we won't crawl any of the URLs that a form would generate. Similarly, we only retrieve GET forms and avoid forms that require any kind of user information. For example, we omit any forms that have a password input or that use terms commonly associated with personal information such as logins, userids, contacts, etc. We are also mindful of the impact we can have on web sites and limit ourselves to a very small number of fetches for a given site.
The web pages we discover in our enhanced crawl do not come at the expense of regular web pages that are already part of the crawl, so this change doesn't reduce PageRank for your other pages. As such it should only increase the exposure of your site in Google. This change also does not affect the crawling, ranking, or selection of other web pages in any significant way.
This experiment is part of Google's broader effort to increase its coverage of the web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines. The terms Deep Web, Hidden Web, or Invisible Web have been used collectively to refer to such content that has so far been invisible to search engine users. By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide webmasters and users alike with a better and more comprehensive search experience.
top