Sitemap: (robots.txt directive)
A line in robots.txt declaring the location of your sitemap. Supported by all major search engines, the recommended way to advertise a sitemap to crawlers you don't directly notify.
Also known as: robots.txt sitemap, sitemap discovery
The Sitemap: directive in robots.txt tells crawlers where to find your XML sitemap. It's the simplest and most universal sitemap discovery mechanism — supported by Google, Bing, Yandex, DuckDuckGo, Baidu, and effectively every other major search engine.
Syntax
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml
Key rules:
- The URL must be absolute (full https:// prefix). Relative URLs are rejected.
- Multiple
Sitemap:lines are allowed. Search engines fetch each. - The directive is case-sensitive in some implementations. Always use capital S:
Sitemap:. - It's not tied to a specific
User-agent:block — it applies globally. - It can be on any line in the file, but convention is to put it at the bottom after rule blocks.
Why use it
You should declare your sitemap in robots.txt even if you also submit it via:
- Google Search Console — yes, declare in robots.txt too. Many crawlers (Bingbot, Yandex, AI training crawlers, archiving bots) don't have a GSC-equivalent. The robots.txt directive is the only way they can find your sitemap.
- IndexNow — IndexNow notifies engines about new URLs; the sitemap is still the canonical inventory and is fetched separately.
- HTML
<link rel="sitemap">— supported by very few crawlers. Use the robots.txt form instead.
Cross-domain sitemap declarations
If your sitemap lives on a different host than the robots.txt declaring it (robots.txt on example.com, sitemap at sitemap.example.com), Google will only crawl the cross-domain sitemap if both domains are verified in the same Search Console account. Bing follows a similar rule.
This is exactly the scenario SitemapHost addresses: customers serve sitemap.theirsite.com via CNAME, and as long as both theirsite.com and sitemap.theirsite.com are in the same GSC property (or both verified), Google fetches the sitemap.
Common mistakes
- Relative URLs.
Sitemap: /sitemap.xmlis invalid. Must behttps://example.com/sitemap.xml. - Comma-separating multiple sitemaps. Not allowed. Use one
Sitemap:line per sitemap. - Putting Sitemap inside a User-agent block. It's allowed by spec but confusing — and some parsers misinterpret. Keep it at the top or bottom, unindented.
- Pointing to a sitemap on
http://when the site ishttps://. Triggers mixed-content warnings; many crawlers reject or downgrade trust.
Standard
The robots.txt format was officially standardised in 2022 as RFC 9309. The Sitemap directive is documented as an extension; it's not part of the core RFC but is universally implemented.
What SitemapHost does
When you use SitemapHost via a CNAME, we serve a robots.txt automatically at https://sitemap.yourdomain.com/robots.txt with a Sitemap: directive pointing to your root sitemap. You should also add the same Sitemap: line to your main site's robots.txt so non-CNAME crawl paths can discover it.
Related terms
Need help managing your sitemaps?
SitemapHost hosts your XML sitemap at your own domain with auto-SSL, IndexNow, and GSC integration.
Get Started Free