All glossary terms
Glossary Spec

Sitemap: (robots.txt directive)

A line in robots.txt declaring the location of your sitemap. Supported by all major search engines, the recommended way to advertise a sitemap to crawlers you don't directly notify.

Also known as: robots.txt sitemap, sitemap discovery

The Sitemap: directive in robots.txt tells crawlers where to find your XML sitemap. It's the simplest and most universal sitemap discovery mechanism — supported by Google, Bing, Yandex, DuckDuckGo, Baidu, and effectively every other major search engine.

Syntax

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml

Key rules:

  • The URL must be absolute (full https:// prefix). Relative URLs are rejected.
  • Multiple Sitemap: lines are allowed. Search engines fetch each.
  • The directive is case-sensitive in some implementations. Always use capital S: Sitemap:.
  • It's not tied to a specific User-agent: block — it applies globally.
  • It can be on any line in the file, but convention is to put it at the bottom after rule blocks.

Why use it

You should declare your sitemap in robots.txt even if you also submit it via:

  • Google Search Console — yes, declare in robots.txt too. Many crawlers (Bingbot, Yandex, AI training crawlers, archiving bots) don't have a GSC-equivalent. The robots.txt directive is the only way they can find your sitemap.
  • IndexNow — IndexNow notifies engines about new URLs; the sitemap is still the canonical inventory and is fetched separately.
  • HTML <link rel="sitemap"> — supported by very few crawlers. Use the robots.txt form instead.

Cross-domain sitemap declarations

If your sitemap lives on a different host than the robots.txt declaring it (robots.txt on example.com, sitemap at sitemap.example.com), Google will only crawl the cross-domain sitemap if both domains are verified in the same Search Console account. Bing follows a similar rule.

This is exactly the scenario SitemapHost addresses: customers serve sitemap.theirsite.com via CNAME, and as long as both theirsite.com and sitemap.theirsite.com are in the same GSC property (or both verified), Google fetches the sitemap.

Common mistakes

  • Relative URLs. Sitemap: /sitemap.xml is invalid. Must be https://example.com/sitemap.xml.
  • Comma-separating multiple sitemaps. Not allowed. Use one Sitemap: line per sitemap.
  • Putting Sitemap inside a User-agent block. It's allowed by spec but confusing — and some parsers misinterpret. Keep it at the top or bottom, unindented.
  • Pointing to a sitemap on http:// when the site is https://. Triggers mixed-content warnings; many crawlers reject or downgrade trust.

Standard

The robots.txt format was officially standardised in 2022 as RFC 9309. The Sitemap directive is documented as an extension; it's not part of the core RFC but is universally implemented.

What SitemapHost does

When you use SitemapHost via a CNAME, we serve a robots.txt automatically at https://sitemap.yourdomain.com/robots.txt with a Sitemap: directive pointing to your root sitemap. You should also add the same Sitemap: line to your main site's robots.txt so non-CNAME crawl paths can discover it.

Need help managing your sitemaps?

SitemapHost hosts your XML sitemap at your own domain with auto-SSL, IndexNow, and GSC integration.

Get Started Free