Filed under SEO

If you have a large database-driven website with multiple pages, you may find that the major search engines are only indexing a small portion of your content and leaving behind some less popular pages. I happened to have this problem with one of my websites, HomeMusician.net, which has over 15,000 pages. Google had indexed about 1000 pages and never got to index some less popular portions of the site.

Introducing Google Sitemaps

In June 2005, Google introduced Sitemaps 0.84 so web developers could publish lists of links from across their sites. About a year and a half later, Google, MSN and Yahoo announced joint support for the Sitemaps Protocol and the schema was changed to 0.90. In April 2007, Ask.com and IBM also announced support for Sitemaps and Google, Yahoo and MSN announced auto-discovery for sitemaps through robots.txt.

The Sitemaps Protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling. Basically, a Sitemap is an XML file that lists the URLs for a site. In addition to listing URLs for a site, the webmaster can include additional information about each URL such as: when it was last updated, how often it changes, and how important it is in relation to other URLs in the site.

Here’s a sample Sitemap file :

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.google.com/schemas/sitemap/0.84“>
<url>
<loc>http://www.thewebmasterscafe.net/</loc>
<lastmod>2007-09-03T13:15:53+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>1</priority>
</url>
<url>
<loc>http://www.thewebmasterscafe.net/news/submit-link.html</loc>
<lastmod>2007-09-03T09:15:53+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.1</priority>
</url>
<url>
<loc>http://www.thewebmasterscafe.net/publicize
/google-adwords-for-dummies-pt1.html</loc>
<lastmod>2007-09-02T21:46:05+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.1</priority>
</url>
</urlset>

This file must be saved at the root of your website in .xml format or it can be compressed in a gzip archive. Your Sitemap should then be accessible at http://www.yourdomain.com/sitemap.xml or http://www.yourdomain.com/sitemap.xml.gz .

Generating Your Sitemap

Even though some web developers will make their own Sitemap generator, there are some utilities that are available for this task. You can get Google’s Sitemap Generator from the Google Webmaster Tools. The script is written in Pythonso you should be able to run it on most plateforms. I liked using this script because it allows you to submit the Sitemap automatically upon generation. Be careful when using this feature because submitting your Sitemap too often can be considered as abuse. The “-testing” switch prevents the script from submitting the Sitemap automatically.

If you are generating content from a Wordpress blog, you can use Google Sitemap Generator for Wordpress (it is not a Google product). Just make sure you read the readme.txt file carefully. I’ve installed version 3.0b9 on a Wordpress 2.2 blog and it worked fine.

There are plenty of other tools available, just make a search on Google.

Submitting Your Sitemap

Before you go ahead and submit your sitemap, you might want to validate it’s content. Many validation tools such as Free Google Sitemap Validator and Submitter. Once you validated your Sitemap, you can submit it directly to search engines :

Alternatively, you can add a “Sitemap” directive to your robots.txt file :

Sitemap : /sitemap.xml

Where can I find more information on the Sitemaps Protocol?

More technical information can be found over at http://www.sitemaps.org/


Related Posts

Comments (0) Posted by Stephane on Monday, September 3rd, 2007


You can follow any responses to this entry through the magic of "RSS 2.0" and leave a trackback from your own site.

Post A Comment