All glossary terms
Glossary Spec

XML Sitemap protocol

The 2005 specification at sitemaps.org defining the XML format for sitemaps. Jointly authored by Google, Yahoo, and Microsoft and unchanged in its core form since 2008.

Also known as: sitemap protocol, sitemaps.org spec, Sitemap 0.9

The XML Sitemap protocol is the specification at sitemaps.org defining the XML format that search engines accept for sitemaps. It's the foundation underneath every sitemap-related feature in this glossary.

History

  • April 2005: Google publishes Sitemaps 0.84.
  • November 2006: Google, Yahoo, and Microsoft (then MSN) jointly adopt Sitemaps 0.90 and publish it as an open standard at sitemaps.org.
  • April 2008: Sitemaps 0.9 is finalised. The core spec has not changed since.
  • 2009 onward: Extensions added — image, video, news, hreflang — using XML namespaces. The core protocol stays the same.

The protocol is intentionally simple, and that simplicity is the reason it survived.

What it specifies

  1. Two file types:
    • <urlset> — a list of URLs, the actual sitemap.
    • <sitemapindex> — a list of sitemap files, used when you exceed per-file limits.
  2. The <url> element with four child elements:
    • <loc> (required) — the URL.
    • <lastmod> (optional) — date of last modification.
    • <changefreq> (optional) — change-frequency hint. Google ignores this.
    • <priority> (optional) — relative importance hint. Google ignores this.
  3. Hard limits: 50,000 URLs per file, 50 MB uncompressed per file. Sitemap index limit: 50,000 child sitemaps, but Google Search Console enforces 500.
  4. Character encoding: UTF-8. Other encodings are rejected by most parsers.
  5. URL escaping: Sitemap URLs must be properly XML-escaped (&&amp;, etc.).

What it does NOT specify

  • How often a sitemap should be regenerated. That's between you and your crawlers.
  • How crawlers should treat sitemap data. Search engines build their own scheduling on top.
  • Authentication or signing. Sitemaps are public by design.
  • Rate limits. Up to each crawler.
  • Compression. .xml.gz files are widely supported but not in the core spec.

Extensions (separate specs)

These are not part of sitemaps.org's protocol but are widely supported:

ExtensionNamespaceSearch engines
Image sitemapimage:Google
Video sitemapvideo:Google
News sitemapnews:Google News
Hreflang in sitemapsxhtml:Google, Yandex
Mobile sitemapmobile:Largely defunct

Validating

A valid sitemap can be checked by:

  1. Loading it in any browser (browsers render XML and surface parser errors).
  2. Submitting in Google Search Console — GSC's report tab shows validation errors.
  3. Using xmllint:
    xmllint --noout --schema https://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd sitemap.xml
    
  4. Using SitemapHost's free sitemap audit tool.

Why the spec is the way it is

Three reasons recur in the W3C and sitemaps.org discussions:

  1. Simplicity over expressiveness. A protocol that anyone can implement in 20 lines of code wins adoption over one that requires a parser library.
  2. Conservative extension model. Rather than evolving the core, the original authors added namespaced extensions. The core <url> stays portable.
  3. Trust nothing. <changefreq> and <priority> were the spec's bet on self-reported hints; the bet didn't pay off. The architecture survived because crawlers were able to ignore them without breaking anything.

What SitemapHost does

SitemapHost emits standards-compliant XML by default: <urlset> for small sites, automatic <sitemapindex> when you cross the 50K boundary, optional <lastmod>, UTF-8 encoded, proper XML escaping. We deliberately do not emit <changefreq> or <priority> unless explicitly requested, because both are inert in modern search engines and just add bytes.

Need help managing your sitemaps?

SitemapHost hosts your XML sitemap at your own domain with auto-SSL, IndexNow, and GSC integration.

Get Started Free