The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Robots.txt: Link Juice vs. Crawl Budget vs. Content 'Depth'

    Robots.txt: Link Juice vs. Crawl Budget vs. Content 'Depth'

    Intermediate & Advanced SEO
    5 3 1.5k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • kurus
      kurus last edited by

      I run a quality vertical search engine. About 6 months ago we had a problem with our sitemaps, which resulted in most of our pages getting tossed out of Google's index. As part of the response, we put a bunch of robots.txt restrictions in place in our search results to prevent Google from crawling through pagination links and other parameter based variants of our results (sort order, etc). The idea was to 'preserve crawl budget' in order to speed the rate at which Google could get our millions of pages back in the index by focusing attention/resources on the right pages.

      The pages are back in the index now (and have been for a while), and the restrictions have stayed in place since that time. But, in doing a little SEOMoz reading this morning, I came to wonder whether that approach may now be harming us...

      http://www.seomoz.org/blog/restricting-robot-access-for-improved-seo
      http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions

      Specifically, I'm concerned that a) we're blocking the flow of link juice and that b) by preventing Google from crawling the full depth of our search results (i.e. pages >1), we may be making our site wrongfully look 'thin'. With respect to b), we've been hit by Panda and have been implementing plenty of changes to improve engagement, eliminate inadvertently low quality pages, etc, but we have yet to find 'the fix'...

      Thoughts?

      Kurus

      1 Reply Last reply Reply Quote 0
      • baptisteplace
        baptisteplace last edited by

        I would not dig too much in the crawl budget + pagination problem - Google knows what is a pagination and will increase the crawl budget when necessary. On the 'thin' vision of your site, I think your right and I would immediately allow pages > 1 to be indexed.

        Beware this may or not impact a lot on your site, it depends on the navigation system (you may have a lot of paginated subsets).

        What tells site: requests ? Do you have all your items submitted in your sitemaps and indexed (see WMT) ?

        On a side note, if you were impacted by Panda, I would strongly suggest to remove / disallow the empty pages on your site. This will give you more crawl budget for interesting content.

        kurus 1 Reply Last reply Reply Quote 0
        • kurus
          kurus @baptisteplace last edited by

          Baptiste,

          Thanks for the feedback. Can you clarify what you mean by the following?

          "On a side note, if you were impacted by Panda, I would strongly suggest to remove / disallow the empty pages on your site. This will give you more crawl budget for interesting content."

          baptisteplace 1 Reply Last reply Reply Quote 0
          • baptisteplace
            baptisteplace @kurus last edited by

            Got disconnected by seomoz as I posted so here is the short answer :

            You were affected by Pand so you may pages with almost no content. These pages may be the one using crawl budget, much more than the paginated results. Worry about these low value pages and let Google handle the paginated results 😉

            1 Reply Last reply Reply Quote 0
            • rishil
              rishil last edited by

              I always advise people NOT to use the robots txt to block off pages - it isnt the best way to handle things. In your case, there may be two options that you can consider:

              1. For variant pages, (multiple parameters of the same page) use the rel canonical to increase the strength of the original page, and to keep the variants out of the index.

              2. A controversial one this, and many may disagree, but depends on situation basis - allow crawling of the page, but dont allow indexing - follow, no index, which would still pass any juice, but wont index pages that you dont want in the SERPs. I normally do this for Search Result Pages that get indexed...

              1 Reply Last reply Reply Quote 0
              • 1 / 1
              • First post
                Last post
              • Top hierarchy pages vs footer links vs header links
                brettmandoes
                brettmandoes
                0
                2
                544

              • Syndicated content with meta robots 'noindex, nofollow': safe?
                Fabio80
                Fabio80
                0
                2
                97

              • Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
                Martijn_Scheijbeler
                Martijn_Scheijbeler
                0
                11
                1.6k

              • Robots.txt vs noindex
                Saijo.George
                Saijo.George
                0
                6
                142

              • URL Value: Menu Links vs Body Content Links
                iPullRank
                iPullRank
                0
                5
                174

              • Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
                Everett
                Everett
                0
                10
                2.0k

              • Could you use a robots.txt file to disalow a duplicate content page from being crawled?
                KyleChamp
                KyleChamp
                0
                11
                1.3k

              • Subdomains - duplicate content - robots.txt
                SeoStallion
                SeoStallion
                0
                4
                890

              Get started with Moz Pro!

              Unlock the power of advanced SEO tools and data-driven insights.

              Start my free trial
              Products
              • Moz Pro
              • Moz Local
              • Moz API
              • Moz Data
              • STAT
              • Product Updates
              Moz Solutions
              • SMB Solutions
              • Agency Solutions
              • Enterprise Solutions
              • Digital Marketers
              Free SEO Tools
              • Domain Authority Checker
              • Link Explorer
              • Keyword Explorer
              • Competitive Research
              • Brand Authority Checker
              • Local Citation Checker
              • MozBar Extension
              • MozCast
              Resources
              • Blog
              • SEO Learning Center
              • Help Hub
              • Beginner's Guide to SEO
              • How-to Guides
              • Moz Academy
              • API Docs
              About Moz
              • About
              • Team
              • Careers
              • Contact
              Why Moz
              • Case Studies
              • Testimonials
              Get Involved
              • Become an Affiliate
              • MozCon
              • Webinars
              • Practical Marketer Series
              • MozPod
              Connect with us

              Contact the Help team

              Join our newsletter
              Moz logo
              © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
              • Accessibility
              • Terms of Use
              • Privacy