Notice: Undefined index: order_next_posts in /nas/content/live/gadgetmag/wp-content/plugins/smart-scroll-posts/smart-scroll-posts.php on line 194

Notice: Undefined index: post_link_target in /nas/content/live/gadgetmag/wp-content/plugins/smart-scroll-posts/smart-scroll-posts.php on line 195

Notice: Undefined index: posts_featured_size in /nas/content/live/gadgetmag/wp-content/plugins/smart-scroll-posts/smart-scroll-posts.php on line 196
News

Working with sitemaps for Google and Bing

Use tools such as sitemaps to help search engines index the right sections of your sites

Working with sitemaps for Google and Bing

Working with sitemaps for Google and Bing

There’s a war going on out there, one between the folk who run the big search engines and the folk who try to get sites as high up the rankings as possible. It’s one that the search engines are always going to win, as they run the robots that spider the web and they manage the indexes that feed results to our search queries. Each time the SEO gurus figure out a wrinkle in the algorithms at Google and Bing the next big update changes the rules to ensure that the most relevant results get delivered to end users. So if SEO doesn’t get your site at the top of that results page, how can you get the eyeballs you need? The answer’s simple: just follow the rules, and take advantage of the analytic tools that Google and Bing provide. They’re a powerful insight into the way users search for your site.

There’s also the problem of how to get into the indexes in the first place, along with ensuring that the right pages get indexed. That’s where the sitemap standard comes in, a way of generating a guide to your site that search engine robots can find and use to index your content – and not just your HTML, but your video, images, news, and location. Throughout this tutorial we’ve included snippets of the code for each step, but the full code can be found within the project files.

You can download the cover CD project files by clicking here.

This article originally appeared in Web Designer issue 180.

01 Sitemaps

Sitemaps used to be graphical tools for helping users get around sites. Now they’re much more complex creatures, XML documents designed to inform the robots that feed search engines just how to index a site – and to give them additional information about the content we’re delivering. A basic sitemap looks something like this:

001 <?xml version=”1.0” encoding=”UTF-8”?>
 002 <urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”>
 003 <url>
 004 <loc>http://www.mysite.com/</loc>
 005 <lastmod>2011-01-01</lastmod>
 006 <changefreq>monthly</changefreq>
 007 <priority>0.5</priority>
 008 </url>
 009 </urlset>

02 Defining the URL (1)

Each url you want to include in a sitemap needs to be defined using XML tags. The first, <loc>, is the location of the page or site section to be indexed. It’s a good idea to include the <lastmod> field, as this shows when a page was last changed. You’ll need to use the W3C’s Datetime format for this – which means you can use just the date (YYYY-MM-DD).

001 <?xml version=”1.0” encoding=”UTF-8”?>
 002 <urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”>
 003 <url>
 004 <loc>http://www.mysite.com/page1.html</loc>
 005 <lastmod>2011-01-01T19:20:30+01:00</lastmod>
 006 </url>
 007 <url>
 008 <loc>http://www.mysite.com/page2.html</loc>
 009 <lastmod>2011-01-02</lastmod>
 010 </url>
 011 <url>
 012    <loc>http://www.mysite.com/page3.html</loc>

03 Defining the URL (2)

Search engines like Google have a lot of processing power, but even it doesn’t have enough to index every change to the web. Sitemaps can help them prioritise their crawlers. All you need to do is use the <changefreq> tab to show just how often a page is updated. If it’s a dynamic page – use “always”!

001 Valid values for <changefreq>:
 002 always
 003 hourly
 004 daily
 005 weekly
 006 monthly
 007 yearly
 008 never

04 Defining the URL (3)

One of the most important features of a sitemap is the ability to use it to show search engine indexers the sites you think are most important (however, it doesn’t affect how the search engine ranks your site). You’ll need to use the <priority> tag, with a default priority for all pages of 0.5. It’s not a good idea to give all your pages high priorities – they’ll just be treated as the default.

001 <?xml version=”1.0” encoding=”UTF-8”?>
 002 <urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”>
 003 <url>
 004 <loc>http://www.mysite.com/page1.html</loc>
 005 <lastmod>2011-01-01T19:20:30+01:00</lastmod>
 006 <changefreq>weekly</changefreq>
 007 <priority>1.0</priority>
 008 </url>
 009 <url>
 010 <loc>http://www.mysite.com/page2.html</loc>
 011 <lastmod>2011-01-02</lastmod>

05 Using sitemap indexes

If you have a site that has more than one sitemap – perhaps you have handcrafted one for the high level pages in your site, and are automatically creating one for dynamic content from a CMS or a blog – you will want to use a sitemap index file to bring them all together. Another XML document, a sitemap index contains sitemaps for sites on the same server.

001 <?xml version=”1.0” encoding=”UTF-8”?>
 002 <sitemapindexxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”>
 003 <sitemap>
 004 <loc>http://www.mysite.com/sitemap1.xml</loc>
 005 <lastmod>2011-01-01T19:20:30+01:00</lastmod>
 006 </sitemap>
 007 <sitemap>
 008 <loc>http://www.mysite.com/sitemap2.xml</loc>
 009 <lastmod>2011-01-01</lastmod>
 010 </sitemap>
 011 </sitemapindex>

06 Adding video to a sitemap

While the sitemap standard is relatively straight forward, it’s only for HTML pages. Web content isn’t just HTML any more, it’s video and audio and, well, just about anything. That’s why Google has come up with a set of extensions to the sitemap format. Its video extensions add a new namespace and new instructions to a sitemap’s XML.

001 <urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”
 002 xmlns:video=”http://www.google.com/schemas/sitemap-video/1.1”>
 003 <url>
 004 <loc>http://www.mysite.com/videos/videopage.html</loc>
 005 <video:video>
 006 <video:thumbnail_loc>http://www.mysite.com/img/thumb.jpg<video:thumbnail_loc>
 007 <video:title>A Title Goes Here</video:title>
 008 <video:description>Along with a description</video:description>
 009 <video:content_loc>http://www.mysite.com/video.flv</video:content_loc>
 010 <video:player_locallow_embed=”yes” autoplay=”ap=1”>
 011          http://www.mysite.com/videoplayer.

07 Additional video information

Google’s search engine is a hungry beast, and thrives on metadata. It’s not a surprise then to find that its video extensions to sitemap have a lot of optional tags, including support for indicating (among many things) for whether a video is family friendly or not, for showing the cost of video that’s sold, and for links to a gallery of video.

001 <video:duration>60</video:duration>
 002 <video:expiration_date>2011-12-31</video:expiration_date>
 003 <video:rating>2.0</video:rating>
 004 <video:view_count>9999</video:view_count>
 005 <video:publication_date>2010-12-31</video:publication_date>
 006 <video:tag>example</video:tag>
 007 <video:tag>sample</video:tag>
 008 <video:category>tutorials</video:category>
 009 <video:family_friendly>yes</video:family_friendly>
 010 <video:restriction relationship=”allow”>IE GB US CA</video:restriction>

08 Mobile sitemaps

With smartphones now counting for a significant percentage of web traffic, mobile versions of sites have become more and more important – along with mobile search services. If you’ve got a mobile site then it’s a good idea to also have a mobile version of your sitemap, for all your mobile optimised pages. Just use the <mobile:mobile/> tag for each mobile URL.

001 <?xml version=”1.0” encoding=”UTF-8” ?>
 002 <urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”
 003 xmlns:mobile=”http://www.google.com/schemas/sitemap-mobile/1.0”>
 004 <url>
 005 <loc>http://mobile.mysite.com/page1.html</loc>
 006 <mobile:mobile/>
 007 </url>
 008 </urlset>

09 News sitemaps

If your site is a news site, you want – and need – to be indexed by Google News. It’s a major traffic driver, and a make or break for a site that’s trying to ramp up the hits. That’s why there’s a specialised sitemap format for the Google News crawlers. It won’t get you up the rankings (that’s, as always, up to the quality of your content), but it will mean that your news is indexed as quickly as possible. News sitemaps should only show content that’s been published in the last two days.

001 <?xml version=”1.0” encoding=”UTF-8”?>
 002 <urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”
 003 xmlns:news=”http://www.google.com/schemas/sitemap-news/0.9”>
 004 <loc>http://www.mysite.com/news/article1.html</loc>
 005 <news:news>
 006 <news:publication>
 007 <news:name>Web Designer</news:name>
 008 <news:language>en</news:language>
 009 </news:publication>
 010 <news:access>Registration</news:access>
 011 <news:genres>Blog</news:genres>

10 Fine-tuning a news sitemap

Google’s thought hard about what needs to go into a news sitemap, with fields for handling the language of the publication, whether a site is subscription or not (or just needs user registration), and even if you’re delivering user-generated content. Keywords and other metadata are supported, to help Google index your news stories. This is a sitemap for a subscription site, with no genres.

001 <urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”
 002 xmlns:news=”http://www.google.com/schemas/sitemap-news/0.9”>
 003 <loc>http://www.mysite.com/news/article1.html</loc>
 004 <news:news>
 005 <news:publication>
 006 <news:name>Web Designer Special Bulletin</news:name>
 007 <news:language>en</news:language>
 008 </news:publication>
 009 <news:access>Subscription</news:access>

11 Genres in news sitemaps

Classifying news and news-like stories can be difficult. What’s the difference between opinion, press releases or satire? Some sites like The Onion or The Daily Mash use familiar news formats, making it hard for software to identify what’s really news. That’s where the genre tag in a news sitemap comes in, helping identify the different types of non-news content that might be found on a news site.

001 Use these genre tags where necessary:
 002 PressRelease-for pages using content directly from press releases
 003 Satire-for pages containing humourous and satirical content
 004 Blog-for articles published in blog format
 005 OpEd-for opinion content from the OpEd section of a news site
 006 Opinion-for any other opinion-based content
 007 UserGenerated-content from users that’s been through an editorial process

12 Sitemaps for code

If you’re trying to track down a code snippet to help solve a problem with a page you’re working on, it’s always annoying to find nothing but adverts and irrelevant content turning up in your search results. If you’re sharing code with the world, you might want to try using a code search sitemap to add useful metadata to search results.

001 <urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”
 002         xmlns:codesearch=”http://www.google.com/codesearch/schemas/sitemap/1.0”>
 003 <url>
 004 <loc>http://www.mysite.com/codesamples/snippet1.js</loc>
 005 <codesearch:codesearch>
 006 <codesearch:filetype>javascript</codesearch:filetype>
 007 <codesearch:license>mozilla</codesearch:license>
 008 </codesearch:codesearch>
 009 </url>
 010 <url>

13 Mapping software packages

Software doesn’t usually come in single bundles. If you’re installing a WordPress theme or compiling up an Apache module you’re going to be looking for a bundle of code files. Sitemaps only help you find the original archive – so it helps to include a package map showing the files in an archive or a download folder. Call the file packagemap.xml, or it won’t be indexed.

001 <?xml version=”1.0” encoding=”UTF-8”?>
 002 <fileset>
 003 <file>
 004 <path>source/widget.js</path>
 005 <type>javascript</type>
 006 <license>GPL</license>
 007 </file>
 008 <file>
 009 <path>messages/widget.html</path>
 010 <type>html</type>
 011 <license>GPL</license>

14 Sitemapping images

Google’s image search is a powerful tool – but it doesn’t index every image on the web, especially images that may be accessed only by dynamic JavaScript-generated pages. They may be product images that you want to be indexed, so you can use Google’s image sitemap extensions to add information about the images associated with a page.

001 <?xml version=”1.0” encoding=”UTF-8”?>
 002 <urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”
 003 xmlns:image=”http://www.google.com/schemas/sitemap-image/1.1”>
 004 <url>
 005 <loc>http://www.mysite.com/page1.html</loc>
 006 <image:image>
 007 <image:loc>http://www.mysite.com/image1.png</image:loc>
 008 </image:image>
 009 <image:image>
 010 <image:loc>http://www.mysite.com/image2.jpg</image:loc>
 011 </image:image>

15 Adding image metadata

There’s a lot more to adding images to a sitemap than just adding the image URLs to a sitemap. Google’s always after as much information as possible, and the current generation of its image search tools can help track down specific images quickly. That means adding metadata to each image sitemap, including captions, geolocation information and links to creative commons or other licences.

001 <?xml version=”1.0” encoding=”UTF-8”?>
 002 <urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”
 003 xmlns:image=”http://www.google.com/schemas/sitemap-image/1.1”>
 004 <url>
 005 <loc>http://www.mysite.com/page1.html</loc>
 006 <image:image>
 007 <image:loc>http://www.mysite.com/image1.png</image:loc>
 008 <image:caption>A caption goes here</image:caption>
 009 <image:geo_location>Bournemouth, England</image:geo_location>
 010 <image:title>My Image</image:title>

16 Geotagging your sitemaps (1)

We’re living in an increasingly local world, where local search and hyper-local content are important tools for delivering information to end users. If you’re delivering information with a significant local impact it’s a good idea to add geolocation information to your sitemaps, so users using Google’s local search tools can find your content.

001 urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”
 002 xmlns:geo=”http://www.google.com/geo/schemas/sitemap/1.0”>
 003 <url>
 004 <loc>http://www.mysite.com/download?format=kml</loc>
 005 <geo:geo>
 006 <geo:format>kml</geo:format>
 007 </geo:geo>
 008 </url>
 009 </urlset>

17 Geotagging your sitemaps (2)

Google’s geo sitemap extensions support three different geolocation information formats; two using its own format (as used by Google Earth), KML for a single location and the KMZ archive for multiple locations, as well as one in the open GeoRSS format.
We showed a KML format geo tag in our previous example, and now we are going to replace it with a GeoRSS link.

001 urlsetxmlns=”http://www.sitemaps.org/schemas/sitemap/0.9”
 002 xmlns:geo=”http://www.google.com/geo/schemas/sitemap/1.0”>
 003 <url>
 004 <loc>http://www.mysite.com/download?format=georss</loc>
 005 <geo:geo>
 006 <geo:format>georss</geo:format>
 007 </geo:geo>
 008 </url>
 009 </urlset>

18 Extending sitemaps yourself

Google’s sitemap extensions make it easier for Google to extract the metadata it needs to build a better search index. But what happens if you decide that you want to use a sitemap to add your own information about a site? No problem whatsoever – as long as you have defined your own XML schema you can add as much information to your sitemap as you want!

001 ?xml version=’1.0’ encoding=’UTF-8’?>
 002 <urlsetxmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
 003 xsi:schemaLocation=”http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd”
 004 xmlns=”http://www.sitemaps.org/schemassitemap/0.9”
 005 xmlns:myextension=”http://www.mysite.com/schemasmysitemap_schema”><url>
 006 <myextension:tag1>My tag content</myextension:tag2>
 007 </url>
 008 </urlset>
×