The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Total Indexed 1.5M vs 83k submitted by sitemap. What?

    Total Indexed 1.5M vs 83k submitted by sitemap. What?

    Intermediate & Advanced SEO
    4 4 236
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • seoninjaz
      seoninjaz last edited by

      We recently took a good look at one of our content site's sitemap and tried to cut out a lot of crap that had gotten in there such as .php, .xml, .htm versions of each page. We also cut out images to put in a separate image sitemap.

      The sitemap generated 83,000+ URLs for google to crawl (this partially used the Yoast Wordpress plugin to generate)

      In webmaster tools in the index status section is showing that this site has a total index of 1.5 million.

      With our sitemap coming back with 83k and google indexing 1.5 million pages, is this a sign of a CMS gone rogue? Is it an indication that we could be pumping out error pages or empty templates, or junk pages that we're cramming into Google's bot?

      I would love to hear what you guys think. Is this normal? Is this something to be concerned about? Should our total index more closely match our sitemap page count?

      1 Reply Last reply Reply Quote 0
      • Chris.Menke
        Chris.Menke last edited by

        Rob,

        Your sitemap is but an indication to Google about urls on your domain. The sitemap does not limit google to crawling or indexing only the urls listed on it, nor is it a directive that tells google to remove urls from the index that it has already crawled.  As stated in GWT, use **robots.txt **to specify how search engines should crawl your site, or request **removal **of URLs from Google's search results with the URL removal tool Google webmaster tools under the "google index" link.

        1 Reply Last reply Reply Quote 1
        • Carson-Ward
          Carson-Ward last edited by

          If you have 1.5 million pages and you think your sitemap is comprehensive at 83,000 then yes, your CMS is needlessly generating pages. It's usually not a big deal from a ranking standpoint, but it can make other important issues hard to detect. I would clean it up, but that's a business call you'll have to make.

          The first step is diagnosing where are the URLs are coming from. What you do next will depend, but I will give you the best advice I can without knowing what types of extraneous URLs you have and how Google is treating them:

          First, I'd start with WMT > Crawl > URL Parameters. Quite often your CMS will generate URLs, and Google usually knows how to handle them. If there are a lot of URL parameters, Google them and see if they're exactly the same as other pages. If they are, make sure you have canonical tags in place to point them to the main version. There's more you can do with parameters, but it'll depend on what you find so I won't go into more detail. As a general rule, though, a CMS should not generate a page unless it is uniquely useful as differentiated landing page or a page for people to link to.

          Also check for parameters in your analytics program. They could actually be messing up your pageview data depending on how you report.There's a post on fixing that in GA here:

          http://blog.crazyegg.com/2013/03/29/remove-url-parameters-from-google-analytics-reports/

          Next I'd look at the "Advanced" tab in WMT > Google Index > Index Status . Are there a lot of URLs removed? If so, check on these pages and see why they're removed and why they exist.

          I would also run a crawl with Xenu and Screaming Frog to make sure crawlers are finding a reasonable number of pages and that they're not getting stuck in crawl loops. (crawling variations of a page endlessly). These kinds of issues can prevent new pages from being indexed on time because Google is wasting time (your crawl budget) running in circles.

          1 Reply Last reply Reply Quote 0
          • MickEdwards
            MickEdwards last edited by

            As well as parameters mentioned you may possibly have heaps of duplicating categories, tags etc.  What I would also do is start searching Google with something like site:www.example.com/directory/ or possibly site:www.example.com/category/directory/directory/ so you are tightly narrowing down the results, switch to 100 results per page and manually look for clues.

            1 Reply Last reply Reply Quote 0
            • 1 / 1
            • First post
              Last post
            • Should I submit an additional sitemap to speed up indexing
              Colemckeon
              Colemckeon
              0
              3
              41

            • URL indexed but not submitted in sitemap, however the URL is in the sitemap
              NickSamuel
              NickSamuel
              0
              2
              115

            • Sitemap Indexed vs. Submitted
              LoganRay
              LoganRay
              0
              4
              2.2k

            • Sitemap Indexation
              GPainter
              GPainter
              0
              4
              74

            • Xml sitemap Issue... Xml sitemap generator facilitating only few pages for indexing
              Paddy_Moogan
              Paddy_Moogan
              0
              6
              153

            • Is 1:1 301 redirect required on indexed URL when restructing URL even if the new URL is canonicalized?
              EricaMcGillivray
              EricaMcGillivray
              0
              2
              138

            • Webmaster Tools: Total Indexed VS Ever Crawled
              SEOAndy
              SEOAndy
              0
              2
              367

            • Sitemaps / Google Indexing / Submitted
              Copstead
              Copstead
              0
              3
              360

            Get started with Moz Pro!

            Unlock the power of advanced SEO tools and data-driven insights.

            Start my free trial
            Products
            • Moz Pro
            • Moz Local
            • Moz API
            • Moz Data
            • STAT
            • Product Updates
            Moz Solutions
            • SMB Solutions
            • Agency Solutions
            • Enterprise Solutions
            • Digital Marketers
            Free SEO Tools
            • Domain Authority Checker
            • Link Explorer
            • Keyword Explorer
            • Competitive Research
            • Brand Authority Checker
            • Local Citation Checker
            • MozBar Extension
            • MozCast
            Resources
            • Blog
            • SEO Learning Center
            • Help Hub
            • Beginner's Guide to SEO
            • How-to Guides
            • Moz Academy
            • API Docs
            About Moz
            • About
            • Team
            • Careers
            • Contact
            Why Moz
            • Case Studies
            • Testimonials
            Get Involved
            • Become an Affiliate
            • MozCon
            • Webinars
            • Practical Marketer Series
            • MozPod
            Connect with us

            Contact the Help team

            Join our newsletter
            Moz logo
            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
            • Accessibility
            • Terms of Use
            • Privacy