The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Total Indexed 1.5M vs 83k submitted by sitemap. What?

    Total Indexed 1.5M vs 83k submitted by sitemap. What?

    Intermediate & Advanced SEO
    4 4 236
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • seoninjaz
      seoninjaz last edited by

      We recently took a good look at one of our content site's sitemap and tried to cut out a lot of crap that had gotten in there such as .php, .xml, .htm versions of each page. We also cut out images to put in a separate image sitemap.

      The sitemap generated 83,000+ URLs for google to crawl (this partially used the Yoast Wordpress plugin to generate)

      In webmaster tools in the index status section is showing that this site has a total index of 1.5 million.

      With our sitemap coming back with 83k and google indexing 1.5 million pages, is this a sign of a CMS gone rogue? Is it an indication that we could be pumping out error pages or empty templates, or junk pages that we're cramming into Google's bot?

      I would love to hear what you guys think. Is this normal? Is this something to be concerned about? Should our total index more closely match our sitemap page count?

      1 Reply Last reply Reply Quote 0
      • Chris.Menke
        Chris.Menke last edited by

        Rob,

        Your sitemap is but an indication to Google about urls on your domain. The sitemap does not limit google to crawling or indexing only the urls listed on it, nor is it a directive that tells google to remove urls from the index that it has already crawled.  As stated in GWT, use **robots.txt **to specify how search engines should crawl your site, or request **removal **of URLs from Google's search results with the URL removal tool Google webmaster tools under the "google index" link.

        1 Reply Last reply Reply Quote 1
        • Carson-Ward
          Carson-Ward last edited by

          If you have 1.5 million pages and you think your sitemap is comprehensive at 83,000 then yes, your CMS is needlessly generating pages. It's usually not a big deal from a ranking standpoint, but it can make other important issues hard to detect. I would clean it up, but that's a business call you'll have to make.

          The first step is diagnosing where are the URLs are coming from. What you do next will depend, but I will give you the best advice I can without knowing what types of extraneous URLs you have and how Google is treating them:

          First, I'd start with WMT > Crawl > URL Parameters. Quite often your CMS will generate URLs, and Google usually knows how to handle them. If there are a lot of URL parameters, Google them and see if they're exactly the same as other pages. If they are, make sure you have canonical tags in place to point them to the main version. There's more you can do with parameters, but it'll depend on what you find so I won't go into more detail. As a general rule, though, a CMS should not generate a page unless it is uniquely useful as differentiated landing page or a page for people to link to.

          Also check for parameters in your analytics program. They could actually be messing up your pageview data depending on how you report.There's a post on fixing that in GA here:

          http://blog.crazyegg.com/2013/03/29/remove-url-parameters-from-google-analytics-reports/

          Next I'd look at the "Advanced" tab in WMT > Google Index > Index Status . Are there a lot of URLs removed? If so, check on these pages and see why they're removed and why they exist.

          I would also run a crawl with Xenu and Screaming Frog to make sure crawlers are finding a reasonable number of pages and that they're not getting stuck in crawl loops. (crawling variations of a page endlessly). These kinds of issues can prevent new pages from being indexed on time because Google is wasting time (your crawl budget) running in circles.

          1 Reply Last reply Reply Quote 0
          • MickEdwards
            MickEdwards last edited by

            As well as parameters mentioned you may possibly have heaps of duplicating categories, tags etc.  What I would also do is start searching Google with something like site:www.example.com/directory/ or possibly site:www.example.com/category/directory/directory/ so you are tightly narrowing down the results, switch to 100 results per page and manually look for clues.

            1 Reply Last reply Reply Quote 0
            • 1 / 1
            • First post
              Last post
            • Best Practice Approaches to Canonicals vs. Indexing in Google Sitemap vs. No Follow Tags
              effectdigital
              effectdigital
              0
              4
              57

            • Sitemap indexing
              1
              1
              56

            • Client wants to remove mobile URLs from their sitemap to avoid indexing issues. However this will require SEVERAL billing hours. Is having both mobile/desktop URLs in a sitemap really that detrimental to search indexing?
              RosemaryB
              RosemaryB
              0
              7
              89

            • Multiple Sitemaps Vs One Sitemap and Why 500 URLs?
              Leonie-Kramer
              Leonie-Kramer
              0
              2
              270

            • Does Google index more than three levels down if the XML sitemap is submitted via Google webmaster Tools?
              Carla_Dawson
              Carla_Dawson
              0
              3
              1.2k

            • Webmaster Tools: Total Indexed VS Ever Crawled
              SEOAndy
              SEOAndy
              0
              2
              367

            • XML Sitemap index within a XML sitemaps index
              Martijn_Scheijbeler
              Martijn_Scheijbeler
              0
              2
              1.1k

            • Sitemaps / Google Indexing / Submitted
              Copstead
              Copstead
              0
              3
              360

            Get started with Moz Pro!

            Unlock the power of advanced SEO tools and data-driven insights.

            Start my free trial
            Products
            • Moz Pro
            • Moz Local
            • Moz API
            • Moz Data
            • STAT
            • Product Updates
            Moz Solutions
            • SMB Solutions
            • Agency Solutions
            • Enterprise Solutions
            • Digital Marketers
            Free SEO Tools
            • Domain Authority Checker
            • Link Explorer
            • Keyword Explorer
            • Competitive Research
            • Brand Authority Checker
            • Local Citation Checker
            • MozBar Extension
            • MozCast
            Resources
            • Blog
            • SEO Learning Center
            • Help Hub
            • Beginner's Guide to SEO
            • How-to Guides
            • Moz Academy
            • API Docs
            About Moz
            • About
            • Team
            • Careers
            • Contact
            Why Moz
            • Case Studies
            • Testimonials
            Get Involved
            • Become an Affiliate
            • MozCon
            • Webinars
            • Practical Marketer Series
            • MozPod
            Connect with us

            Contact the Help team

            Join our newsletter
            Moz logo
            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
            • Accessibility
            • Terms of Use
            • Privacy