The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Should all pages on a site be included in either your sitemap or robots.txt?

    Should all pages on a site be included in either your sitemap or robots.txt?

    Intermediate & Advanced SEO
    8 3 162
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • RossFruin
      RossFruin last edited by

      I don't have any specific scenario here but just curious as I come across sites fairly often that have, for example, 20,000 pages but only 1,000 in their sitemap. If they only think 1,000 of their URL's are ones that they want included in their sitemap and indexed, should the others be excluded using robots.txt or a page level exclusion? Is there a point to having pages that are included in neither and leaving it up to Google to decide?

      1 Reply Last reply Reply Quote 1
      • CleverPhD
        CleverPhD last edited by

        You want to have as many pages in the index as possible, as long as they are high quality pages with original content - if you publish quality original articles on a regular basis, you want to have all those pages indexed.  Yes, from a practical perspective you may only be able to focus on tweaking the SEO on a portion of them, but if you have good SEO processes in place as you produce those pages, they will rank long term for a broad range of terms and bring traffic..

        If you have 20,000 pages as you have an online catalog and you have 345 different ways to sort the same set of page results, or if you have keyword search URLs, or printer friendly version pages or your shopping cart pages, you do not want those indexed.  These pages are typically, low quality/thin content pages and/or are duplicates and those do you no favor.  You would want to use the noindex meta tag or canonical where appropriate.  The reality is that out of the 20,000 pages, there are probably only a subset that are the "originals" and so you dont want to waste Googles time in crawling those pages.

        A good concept here to look up is Crawl Budget or Crawl Optimization

        http://searchengineland.com/how-i-think-crawl-budget-works-sort-of-59768

        http://www.blindfiveyearold.com/crawl-optimization

        1 Reply Last reply Reply Quote 1
        • Ron_McCabe
          Ron_McCabe last edited by

          I think you are looking at the pages indexed which is generally a higher number than those on your web site.  There is a point to marking things up so that there is a no follow on any pages that you do not want indexed as well as properly marking up the web pages that you do specifically want indexed.  It is really important that you eliminate duplicate pages.  A common source of these duplicates is improper tags on the blog. Make sure that your tags are set up in a logical hierarchy like your site map.  This will assist the search engines when they re index your page.

          Hope this helps,

          Ron

          RossFruin 1 Reply Last reply Reply Quote 0
          • RossFruin
            RossFruin @Ron_McCabe last edited by

            Thank you. Just curious, how would the number of pages indexed be higher than the number of actual pages?

            CleverPhD 1 Reply Last reply Reply Quote 0
            • CleverPhD
              CleverPhD @RossFruin last edited by

              I thinks Ron's point was that if you have a bunch of duplicates, the dups are not "real" pages, if you are only counting "real" pages.  Therefore, if Google indexes your "real" pages and the dup versions of them, you can have more pages indexed.  That is the issue then that you have duplicate versions of the same page in Google's index and so which will rank for a given key term?  You could be competing against yourself.   That is why it is so important you deal with crawl issues.

              1 Reply Last reply Reply Quote 2
              • Ron_McCabe
                Ron_McCabe last edited by

                Clever PHD,

                You are correct.  I have found that these little housekeeping issues like eliminating duplicate content really do make a big difference.

                Ron

                CleverPhD 1 Reply Last reply Reply Quote 0
                • CleverPhD
                  CleverPhD @Ron_McCabe last edited by

                  You bet - Cheers!

                  1 Reply Last reply Reply Quote 0
                  • RossFruin
                    RossFruin last edited by

                    Thanks guys!

                    1 Reply Last reply Reply Quote 1
                    • 1 / 1
                    • First post
                      Last post
                    • Robots.txt, Disallow & Indexed-Pages..
                      thekiller99
                      thekiller99
                      0
                      5
                      341

                    • Our parent company has included their sitemap links in our robots.txt file - will that have an impact on the way our site is crawled?
                      GlobeRunner
                      GlobeRunner
                      0
                      2
                      197

                    • Robots.txt Blocked Most Site URLs Because of Canonical
                      0
                      1
                      117

                    • Will disallowing in robots.txt noindex a page?
                      FranckNlemba
                      FranckNlemba
                      0
                      6
                      510

                    • 301 redirect or Robots.txt on an interstatial page
                      BruLee
                      BruLee
                      0
                      2
                      406

                    • Where to Include Mobile Version of Site in Sitemap
                      AdoptionHelp
                      AdoptionHelp
                      0
                      2
                      927

                    • Should the sitemap include just menu pages or all pages site wide?
                      Francisco_Meza
                      Francisco_Meza
                      0
                      3
                      879

                    • Category Pages - Canonical, Robots.txt, Changing Page Attributes
                      Function5
                      Function5
                      0
                      9
                      1.1k

                    Get started with Moz Pro!

                    Unlock the power of advanced SEO tools and data-driven insights.

                    Start my free trial
                    Products
                    • Moz Pro
                    • Moz Local
                    • Moz API
                    • Moz Data
                    • STAT
                    • Product Updates
                    Moz Solutions
                    • SMB Solutions
                    • Agency Solutions
                    • Enterprise Solutions
                    • Digital Marketers
                    Free SEO Tools
                    • Domain Authority Checker
                    • Link Explorer
                    • Keyword Explorer
                    • Competitive Research
                    • Brand Authority Checker
                    • Local Citation Checker
                    • MozBar Extension
                    • MozCast
                    Resources
                    • Blog
                    • SEO Learning Center
                    • Help Hub
                    • Beginner's Guide to SEO
                    • How-to Guides
                    • Moz Academy
                    • API Docs
                    About Moz
                    • About
                    • Team
                    • Careers
                    • Contact
                    Why Moz
                    • Case Studies
                    • Testimonials
                    Get Involved
                    • Become an Affiliate
                    • MozCon
                    • Webinars
                    • Practical Marketer Series
                    • MozPod
                    Connect with us

                    Contact the Help team

                    Join our newsletter
                    Moz logo
                    © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                    • Accessibility
                    • Terms of Use
                    • Privacy