The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Googlebot found an extremely high number of URLs on your site

    Googlebot found an extremely high number of URLs on your site

    Intermediate & Advanced SEO
    5 4 2.6k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • BenFox
      BenFox last edited by

      I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage.

      The error is as below-

      Googlebot encountered problems while crawling your site.

      Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.

      I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following;

      • No-index a large number of pages using the on page meta  tag.
      • Use a canonical tag where it is appropriate

      But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag.

      So my question is how do I address this problem?

      I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag.

      any suggestions appreciated.

      1 Reply Last reply Reply Quote 0
      • donford
        donford last edited by

        Hi Ben,

        You are attempting to fix your SEO issue by using NOINDEX & CANONICAL but you are not fixing the main issue which is the URL's are still there.

        NOINDEX will not stop Google from recognizing the link nor will NOFOLLOW. They actually use every link's information in one form or another regardless of the tag attributes.

        Here is a direct quote from Matt Cutts about NOINDEX:

        "Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue).....

        REF: http://www.mattcutts.com/blog/google-noindex-behavior/

        The first solution I would be interested in is working on the architecture of the site to see if there is a way to stop the crazy amount of URL's being generated and/or consolidate them to a single point. The next step would be to see if there is any commonality between these extra URL's and if there is any possibility to use a 301 redirect to consolidate these extra urls.

        I think what you're really after was a way to fix this with a tag or patch, but I think the best way to fix this is to replace the engine that is driving these URL's. You're going to have to be a bit more specific in such case as to what kind of site you're using (Joomla, WordPress, Oscommerce, etc) for a more specific answer.

        Hope it helps.

        BenFox 1 Reply Last reply Reply Quote 3
        • BenFox
          BenFox @donford last edited by

          I was afraid that this might be the case.

          Thanks for the help.

          1 Reply Last reply Reply Quote 0
          • Dr-Pete
            Dr-Pete last edited by

            Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.

            It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.

            Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.

            1 Reply Last reply Reply Quote 1
            • Myntra
              Myntra last edited by

              I feel we are missing some information here.

              For example, for our site we have done a canonical on the pages where we have query parameters. We have also specified these parameters as representative URL in Google Webmaster - URL parameters. Even after this we received this message "Googlebot found an extremely high number of URLs on your site".

              The surprising thing is that these parameters are existing on the site for a long time, and the total URL count is reducing. Even after this Google has started sending this message to us since Feb 2014. Seems there has been some algorithmic change because of which some additional conditions that have not been highlighted in this thread have to be taken care of.. Not sure what..

              1 Reply Last reply Reply Quote 0
              • 1 / 1
              • First post
                Last post
              • Same URL-Structure & the same number of URLs indexed on two different websites - can it lead to a Google penalty?
                0
                1
                13

              • Large Site - Complete Site URL Change and How to Preserver Organic Rankings/Traffic
                b.digi
                b.digi
                0
                5
                90

              • After Receiving a "Googlebot can't access your site" would this stop your site from being crawled?
                evolvingSEO
                evolvingSEO
                0
                4
                394

              • Received "Googlebot found an extremely high number of URLs on your site:" but most of the example URLs are noindexed.
                nicole.healthline
                nicole.healthline
                0
                3
                139

              • Severe health issues are found on your site. - Check site health (GWT)
                bjs2010
                bjs2010
                0
                5
                1.7k

              • Site revamp for neglected site - modifying site structure, URLs and content - is there an optimal approach?
                macrobbo
                macrobbo
                0
                3
                171

              • Sudden increase in number of indexed URLs. How ca I know what URLs these are?
                SEODinosaur
                SEODinosaur
                0
                6
                1.1k

              • Is it OK to have a site that has some URLs with hyphens and other, older, legacy URLs that use underscores?
                PeterAlexLeigh
                PeterAlexLeigh
                0
                4
                552

              Get started with Moz Pro!

              Unlock the power of advanced SEO tools and data-driven insights.

              Start my free trial
              Products
              • Moz Pro
              • Moz Local
              • Moz API
              • Moz Data
              • STAT
              • Product Updates
              Moz Solutions
              • SMB Solutions
              • Agency Solutions
              • Enterprise Solutions
              • Digital Marketers
              Free SEO Tools
              • Domain Authority Checker
              • Link Explorer
              • Keyword Explorer
              • Competitive Research
              • Brand Authority Checker
              • Local Citation Checker
              • MozBar Extension
              • MozCast
              Resources
              • Blog
              • SEO Learning Center
              • Help Hub
              • Beginner's Guide to SEO
              • How-to Guides
              • Moz Academy
              • API Docs
              About Moz
              • About
              • Team
              • Careers
              • Contact
              Why Moz
              • Case Studies
              • Testimonials
              Get Involved
              • Become an Affiliate
              • MozCon
              • Webinars
              • Practical Marketer Series
              • MozPod
              Connect with us

              Contact the Help team

              Join our newsletter
              Moz logo
              © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
              • Accessibility
              • Terms of Use
              • Privacy