Most of the conversations in SEO revolve around content, backlinks, and keyword strategy. Sitemaps rarely make the agenda. And yet, a missing or broken sitemap is one of the most common technical issues found in site audits, and one of the most immediately fixable.
Without a sitemap, search engines are guessing which pages on your site exist and which ones matter. With a properly built one, you are handing Google a structured roadmap to every URL worth crawling and indexing. Learning how to create a sitemap or how to make a sitemap correctly is not complicated, but there are enough ways to get it wrong that it is worth doing carefully.
In 2026, sitemaps matter more than ever. Search engines and AI crawlers are increasingly selective about what they crawl, particularly as the web grows and crawl budgets become a more significant constraint. Clear, clean signals about which pages are important are no longer just good practice. They are a competitive advantage.
This guide walks through everything: what a sitemap is, the different types available, how to create one depending on your platform, what to include and exclude, how to submit it to Google, and the best practices that keep it working properly long after launch. At 2POINT, sitemap creation and maintenance is built into every technical SEO engagement we run, because even the best content can’t rank if Google can’t find it.
Key Takeaways
- A sitemap is a file that tells search engines which pages on your site are important and should be crawled and indexed, removing the guesswork from discovery.
- XML sitemaps are the standard format for SEO. HTML sitemaps serve users. Both have a role, but XML is what Google reads.
- Most CMS platforms including WordPress, Shopify, and Webflow generate sitemaps automatically, but auto-generated sitemaps still need auditing because they frequently include URLs that should not be there.
- A clean sitemap includes only indexable, canonical, 200-status URLs with accurate lastmod dates. It excludes noindexed, redirected, and error pages without exception.
- Submit your sitemap through Google Search Console and reference it in your robots.txt file to ensure every crawler that visits your site can find it immediately.
What Is a Sitemap?
Sitemap Definition and Purpose
A sitemap is a structured file that lists the URLs on a website to help search engines discover, crawl, and understand its content. Think of it as a table of contents for your website, one that search engines consult before they start crawling to understand what exists and where it lives.
Without a sitemap, search engines rely entirely on internal links to discover pages. If your internal linking is thorough, that might be sufficient. But most sites have gaps: new pages that haven’t been linked yet, deep pages that are hard to reach through navigation, or content that lives several clicks from the homepage. A sitemap solves all of that by providing a direct list of what you want indexed.
Sitemaps are particularly valuable for new sites that haven’t yet accumulated the internal link equity needed for reliable crawling, large sites where internal linking can’t realistically surface every page, sites with content that updates frequently, and any site where poor navigation makes it difficult for crawlers to reach important pages.
XML Sitemaps vs. HTML Sitemaps
XML sitemaps are machine-readable files designed specifically for search engines. They follow the sitemaps.org protocol and can contain URL entries alongside optional metadata tags like lastmod, changefreq, and priority. These files are not meant to be read by humans. They are meant to be parsed by crawlers.
HTML sitemaps are user-facing pages that list links to important sections and pages of a website, helping visitors navigate a large or complex site. They serve a UX function and can contribute marginally to internal linking, but they are not what Google is looking for when it comes to sitemap SEO.
For SEO purposes, the XML sitemap is what matters. It is worth noting that Google has confirmed it now ignores both the changefreq and priority tags in XML sitemaps. So while older guides may tell you to obsess over these values, your effort is better directed at clean URLs and accurate lastmod dates. You should also note that AIs determine whether to pull information based on these lastmod dates, so if you intend to rank in AI search, you should keep these updated regularly.
Types of XML Sitemaps
There are several types of XML sitemaps, each serving a specific purpose. A standard XML sitemap lists regular web page URLs and is the most common type. An image sitemap includes image URLs to help Google discover images that might not be surfaced through standard crawling, which is particularly useful for image-heavy sites and e-commerce. A video sitemap provides metadata about video content including title, description, thumbnail, and duration, helping Google index video properly. A news sitemap follows the Google News format and is designed specifically for news publishers with time-sensitive content. Finally, a sitemap index file is a master file that references multiple individual sitemaps rather than listing URLs directly, and it is essential for any large site that would otherwise exceed the single-file limits.
Step 1: Check If You Already Have a Sitemap
Before doing any creation work, check whether your platform has already taken care of it. Many businesses spend time building a sitemap that already exists.
The quickest check is to visit yourdomain.com/sitemap.xml or yourdomain.com/sitemap_index.xml directly in your browser. If a sitemap exists, it will either render as XML or prompt a download. WordPress with Yoast SEO, RankMath, or All in One SEO generates sitemaps automatically. Shopify generates one at /sitemap.xml out of the box. Wix, Squarespace, and Webflow all handle sitemap generation natively with no configuration required.
If you find that a sitemap already exists, don’t assume it’s in good shape. Auto-generated sitemaps are notoriously noisy. They frequently include tag archives, author pages, filtered URLs, paginated pages, and other URLs that add crawl weight without contributing indexation value. Finding the sitemap is step one. Auditing its quality is step two.
Step 2: Choose How to Create Your Sitemap
CMS Plugins and Built-In Tools
For WordPress sites, the three main options are Yoast SEO, RankMath, and All in One SEO. All three generate XML sitemaps and provide configuration options for which post types, taxonomies, and individual pages to include or exclude. Yoast and RankMath both offer per-page controls, which makes it straightforward to exclude specific pages from the sitemap without touching code.
Shopify generates a sitemap automatically that covers products, collections, blog posts, and standard pages. The main limitation is that Shopify’s sitemap has minimal configurability, so if you need to exclude specific URLs, it requires workarounds or custom app solutions.
Webflow auto-generates a sitemap that updates whenever pages are published or unpublished. This keeps the sitemap reasonably current without manual maintenance, though it still benefits from periodic auditing.
CMS-generated sitemaps are the easiest and most maintainable path for most sites. The key is configuring them properly from the start rather than accepting defaults.
Sitemap Generator Tools
When more control is needed, dedicated sitemap generator tools give you greater precision over what gets included. Screaming Frog SEO Spider crawls your site and generates a clean XML sitemap file with full control over inclusions and exclusions. The free version handles up to 500 URLs, which is sufficient for small sites. The paid version handles enterprise-level crawls and integrates directly with Google Search Console. Sitebulb is a visual site auditor that generates sitemaps with quality filtering built in, making it easier to identify and exclude problematic URLs before the sitemap is finalized. You should note that enterprise SEO looks different from local, mainly on the number of pages your site should have, so advice will differ depending on your level.
For very small sites that don’t warrant a full crawl tool investment, online generators like XML-Sitemaps.com offer a quick solution, though customization options are limited and they require manual re-generation every time content changes.
Manual Creation
For small custom-built sites with fewer than 50 pages, a sitemap can be written manually in XML format. The structure is straightforward: an XML declaration at the top, a urlset element that references the sitemaps.org namespace, and individual url elements containing at minimum a loc tag with the full URL and optionally a lastmod tag with the date of last meaningful update. Manual creation is only practical at very small scale. For anything larger or more frequently updated, it becomes unmanageable quickly.
Dynamic Sitemaps via Code
For custom-built sites without a CMS, generating the sitemap programmatically is the right approach. Server-side scripts in PHP, Python, Node.js, or Ruby can query the site’s database or content API and output a properly formatted XML file on demand. This keeps the sitemap automatically current as content is added, updated, or removed, without any manual steps. For large sites where the URL set changes daily, dynamic generation is not optional. It is the only maintainable solution.
Step 3: Decide What to Include and Exclude
What to Include
The sitemap should contain all pages you genuinely want Google to index. That means pages that return HTTP 200 status codes, use canonical tags pointing to themselves rather than to another URL, and represent meaningful content that serves your users and your SEO goals. Service pages, product pages, blog posts, category pages, and location pages belong in the sitemap. Canonical URLs only should be listed, never duplicate or parameter variations of the same content.
Lastmod dates should reflect when the page content was meaningfully updated. This signal helps Google prioritize crawling recently refreshed content, which is particularly useful if you are running a content update program alongside your SEO strategy.
What to Exclude
The exclude list is arguably more important than the include list. Pages with noindex meta tags should never appear in the sitemap. If a page is noindexed, including it in the sitemap sends conflicting signals. You’re telling Google with the tag that you don’t want the page indexed, and telling Google with the sitemap that you do. The result is confusion and wasted crawl budget.
Redirected URLs, whether 301 or 302, should be excluded because they force Googlebot to follow additional hops rather than landing directly on the final destination. Error pages returning 404 or 410 status codes have no place in a sitemap. Duplicate pages, non-canonical URL variations, admin pages, staging URLs, internal search results, and thin tag or archive pages should all be removed. If a URL is blocked by robots.txt, it should also be excluded from the sitemap since including a blocked URL creates a direct contradiction between the two files.
Step 4: Structure Your Sitemap for Large Sites
Every individual sitemap file has limits: 50,000 URLs or 50MB uncompressed, whichever threshold is hit first. For large sites, a sitemap index file solves this by acting as a master file that references multiple individual sitemaps rather than listing URLs directly.
Beyond handling the size constraint, segmenting sitemaps by content type has real operational value. Separate sitemap files for products, blog posts, category pages, and location pages make it much easier to monitor indexation rates per content type in Google Search Console. If your product pages are being indexed at a different rate than your blog posts, segmentation lets you see that immediately and investigate the cause.
For large XML sitemap files, GZIP compression significantly reduces file size and speeds up retrieval for crawlers. Most servers can be configured to serve sitemaps in compressed format without any changes to the file itself.
Step 5: Submit Your Sitemap
Submit via Google Search Console
Log into Google Search Console and select the property matching your site. In the left navigation, go to the Indexing section and click Sitemaps. In the Add a new sitemap field, enter the path to your sitemap or sitemap index file and click Submit. Google will begin processing the file and show you the number of submitted versus indexed URLs over time.
Monitor this report actively after submission. The gap between submitted and indexed URLs is one of the most useful diagnostics in SEO. A small gap is normal. A large or growing gap signals quality issues that warrant investigation.
Submit via Bing Webmaster Tools
Bing Webmaster Tools offers the same sitemap submission process and should not be ignored. Bing powers a meaningful share of search volume, particularly in certain demographics and regions, and its IndexNow protocol also benefits from an active Webmaster Tools account. The submission process mirrors Google Search Console: navigate to Sitemaps, enter the URL, and submit.
Reference in robots.txt
Adding a Sitemap directive to your robots.txt file ensures that any crawler visiting your site automatically discovers the sitemap without requiring manual submission. The line reads simply: Sitemap: https://yourdomain.com/sitemap.xml. This is a low-effort addition with genuine upside, particularly for crawlers and AI indexers that may not rely on webmaster tools for sitemap discovery.
Sitemap Best Practices for 2026
Keep It Clean and Current
A sitemap is not a set-it-and-forget-it file. Pages get noindexed, deleted, redirected, and added constantly. If the sitemap isn’t updated to reflect those changes, it accumulates noise that wastes crawl budget and sends confusing signals. For CMS-based sites, configure automatic updates so the sitemap stays in sync with content changes without manual intervention. For custom setups, schedule regular regeneration or implement event-triggered updates.
Do not use the changefreq and priority tags to try to influence Google’s crawl behavior. Google has confirmed it ignores them, so any effort spent calibrating those values is wasted.
Use Accurate lastmod Dates
The lastmod tag is valuable, but only if it is accurate. Update this date when page content has been meaningfully changed: a substantial revision, new sections added, significant data refreshed. Do not update it for cosmetic changes like CSS tweaks or template modifications. And absolutely do not set all lastmod dates to today’s date as a tactic to make content look fresh. Google will notice the pattern, distrust the signal, and stop using it for crawl prioritization entirely, which defeats the purpose.
Validate Your Sitemap
After submission, check Google Search Console’s sitemap report for errors and warnings. Validate the XML syntax using an online validator or Screaming Frog’s sitemap audit feature. Cross-reference the sitemap against a full site crawl to identify pages that should be included but aren’t, or pages that are included but shouldn’t be. This cross-check is something most teams skip and where a lot of sitemap debt accumulates.
Monitor Indexation Rates
The comparison between submitted URLs and indexed URLs in Google Search Console is one of the most actionable metrics in technical SEO. Track this ratio over time. A growing gap between submitted and indexed pages suggests content quality issues, crawl budget problems, or sitemap hygiene failures. Pages stuck in “Discovered but not indexed” status for extended periods deserve specific investigation, as they usually indicate that Google has found the page but doesn’t consider it worth the crawl investment.
Common Sitemap Mistakes to Avoid
Including noindexed URLs in the sitemap is the single most common sitemap error, and it creates a direct conflict that confuses Google about your intent. Including redirected URLs forces Googlebot to follow unnecessary hops and dilutes the efficiency of your crawl budget. Never updating the sitemap after launch leaves it stale, gradually filling with broken, removed, or redirected URLs that erode the file’s reliability.
Using one massive uncompressed sitemap file for a large site hits size limits and makes monitoring impossible. Setting all lastmod dates to the current date destroys the signal entirely. And blocking the sitemap in robots.txt while expecting Google to read it is a self-defeating configuration that prevents the file from serving any purpose at all.
There are a number of SEO integrations you should use, some of which can automate the sitemap process completely.
How 2POINT Handles Sitemaps

At 2POINT, sitemap review and optimization is included in every technical SEO audit we conduct. We look at the current sitemap structure, identify stale or incorrect URLs, check for conflicting signals between the sitemap and the site’s noindex and robots.txt configurations, and compare submitted URLs against indexed URLs to understand where crawl budget is being wasted.
For custom-built sites, we implement dynamic sitemap generation that stays automatically synchronized with content changes, eliminating the maintenance gap that typically develops between a static sitemap and an actively updated site. For CMS-based sites, we configure existing plugin settings to produce cleaner output and set up monitoring so sitemap health is tracked continuously rather than reviewed on an ad hoc basis.
Ongoing sitemap maintenance is part of every retainer engagement because a sitemap that was clean at launch will drift over time. Keeping it accurate is a low-effort, high-impact component of a healthy technical SEO foundation
Give Search Engines a Clear Map to Your Best Content
A sitemap is a simple tool with meaningful consequences when it is done right, and equally meaningful consequences when it is done wrong or neglected. Getting it right means faster indexing, better crawl efficiency, and more pages competing for rankings. Getting it wrong means wasted crawl budget, missed indexation, and a file that actively creates confusion rather than clarity.
The principles hold regardless of whether you use a CMS plugin, a generator tool, or custom-built code: include only what should be indexed, keep the file current as your site changes, and monitor the results consistently through Search Console.
If you want to know whether your sitemap is working properly and what it might be costing you, 2POINT offers a free technical SEO audit that includes a full sitemap review. We will identify what is missing, what should not be there, and exactly what needs to change to optimize it for maximum crawl efficiency. Get in touch with us about your SEO.
Frequently Asked Questions
What is a sitemap and why do I need one?
A sitemap is a structured XML file that lists the URLs on your website that you want search engines to discover and index. You need one because search engines discover pages primarily through links, and a sitemap fills in any gaps where your internal linking may not reliably surface every important page. It is particularly critical for new sites, large sites, and sites that update content frequently.
How do I create a sitemap for my website?
The method depends on your platform. WordPress users should install Yoast SEO or RankMath, both of which generate and manage XML sitemaps automatically. Shopify, Webflow, Wix, and Squarespace all generate sitemaps natively with no plugin required. For custom-built sites, you can use Screaming Frog to generate a sitemap from a crawl, or build a dynamic sitemap generator in your server-side language of choice. After creation, submit the sitemap through Google Search Console and reference it in your robots.txt file.
Do I need a sitemap if my site is small?
Yes, even small sites benefit from a sitemap. While Google can typically discover all pages on a small site through internal links, a sitemap provides confirmation that important pages exist and should be indexed. It also gives you a monitoring tool in Google Search Console to track indexation. For a five-page site, a sitemap takes minutes to create and provides ongoing visibility into how Google is treating your content.
How often should I update my sitemap?
Ideally, your sitemap updates automatically whenever content is published, updated, or removed. CMS plugins like Yoast SEO handle this natively. For sites without automatic updates, review and regenerate the sitemap whenever significant content changes occur, which in practice means at minimum once per month for active sites. Stale sitemaps that no longer reflect the current site structure gradually lose their usefulness and introduce conflicting signals.
What is the difference between an XML sitemap and an HTML sitemap?
An XML sitemap is a machine-readable file designed for search engines to parse. It tells Google and other crawlers which pages exist and when they were last updated. An HTML sitemap is a user-facing page that lists links to sections and pages of the site, helping visitors navigate. For SEO purposes, the XML sitemap is what matters because it is what search engines read. An HTML sitemap can improve user experience and contribute to internal linking, but it does not serve the same crawl facilitation function as an XML sitemap.
Can a bad sitemap hurt my SEO?
Yes. A sitemap that includes noindexed pages sends conflicting signals that Google must resolve, typically by disregarding your noindex directive or your sitemap inclusion. A sitemap full of redirected, deleted, or low-quality URLs wastes crawl budget that could be directed toward your valuable content. Inaccurate lastmod dates cause Google to stop trusting the signal entirely. A bad sitemap does not trigger a manual penalty, but it can meaningfully reduce crawl efficiency, slow indexation, and dilute the authority signals you are trying to send to search engines.
let’s connect