The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Blocking poor quality content areas with robots.txt

    Blocking poor quality content areas with robots.txt

    Intermediate & Advanced SEO
    4 3 128
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Eric_edvisors
      Eric_edvisors last edited by

      I found an interesting discussion on seoroundtable where Barry Schwartz and others were discussing using robots.txt to block low quality content areas affected by Panda.

      http://www.seroundtable.com/google-farmer-advice-13090.html

      The article is a bit dated. I was wondering what current opinions are on this.

      We have some dynamically generated content pages which we tried to improve after panda.  Resources have been limited and alas, they are still there.  Until we can officially remove them I thought it may be a good idea to just block the entire directory.  I would also remove them from my sitemaps and resubmit.  There are links coming in but I could redirect the important ones (was going to do that anyway).  Thoughts?

      1 Reply Last reply Reply Quote 0
      • Mark_Ginsberg
        Mark_Ginsberg last edited by

        When you block a page or folder in robots.txt, it doesn't remove the page from the search engine's index, it just prevents them from recrawling the page. For pages/folders/sites that were never crawled by the search engines, robots.txt can prevent them from being crawled and read. But blocking pages already crawled by robots.txt will not be enough on its own to remove them from the index.

        To remove this low quality content, you can do one of two things:

        1. Add a meta robots noindex tag to the content you want to remove - this tells the engine to remove the page from the index and that the content to them shouldn't be there - in effect, it's dead to them
        2. After blocking the folder via robots.txt, going in to Webmaster Tools and using the URL removal tool on the folder or domain.

        I usually recommend option number 1, because it works for multiple engines, doesn't require webmaster tools for each engine separately, and is easier to manage and a lot more customizable exactly which pages you want removed.

        But you are on the right track with the sitemaps - don't include links to the no index pages in the sitemap.

        Good luck,

        Mark

        Eric_edvisors 1 Reply Last reply Reply Quote 3
        • Eric_edvisors
          Eric_edvisors @Mark_Ginsberg last edited by

          Hey Mark - Thank you, this is really helpful.

          This is really great advice for deindexing the pages when they still actually do exist.

          One more question though.  Once we actually remove them, once the directory no longer actually exists, there's no point in using the robots.txt disallow, right?  At that point if they're still in the index only the tool will be useful.

          I read these: https://support.google.com/webmasters/answer/59819?hl=en

          While the webmaster guidelines say you need to use robots.txt, I don't see how that's a requirement for pages which don't actually exist anymore. Google shouldn't be able to crawl the pages once they no longer exist.  Also, if the directory is in robots.txt but there are a few redirects within it, they redirects would not work.  I also don't think adding a line to robots.txt every time we remove something is a good practice.  Thoughts?

          KaneJamison 1 Reply Last reply Reply Quote 0
          • KaneJamison
            KaneJamison @Eric_edvisors last edited by

            If the page no longer exists and you remove the robots command for that directory it shouldn't make much difference. Google could start reporting it as a 404 since it knows that the files used  to exist and there's no longer a robots command to ignore the directory. I don't see any harm in leaving it there, but I also don't see many issues arising from removing the robots command.

            1 Reply Last reply Reply Quote 0
            • 1 / 1
            • First post
              Last post
            • Block session id URLs with robots.txt
              Mat_C
              Mat_C
              1
              4
              130

            • Block in robots.txt instead of using canonical?
              RobertFisher
              RobertFisher
              0
              9
              1.6k

            • Robots.txt & Duplicate Content
              DonnaDuncan
              DonnaDuncan
              0
              14
              422

            • Why are these results being showed as blocked by robots.txt?
              eyepaq
              eyepaq
              0
              9
              203

            • About robots.txt for resolve Duplicate content
              magician
              magician
              0
              4
              449

            • Files blocked in robot.txt and seo
              john4math
              john4math
              0
              4
              344

            • 10,000 New Pages of New Content - Should I Block in Robots.txt?
              EGOL
              EGOL
              0
              2
              709

            • Blocking Dynamic URLs with Robots.txt
              TaitLarson
              TaitLarson
              1
              4
              5.1k

            Get started with Moz Pro!

            Unlock the power of advanced SEO tools and data-driven insights.

            Start my free trial
            Products
            • Moz Pro
            • Moz Local
            • Moz API
            • Moz Data
            • STAT
            • Product Updates
            Moz Solutions
            • SMB Solutions
            • Agency Solutions
            • Enterprise Solutions
            • Digital Marketers
            Free SEO Tools
            • Domain Authority Checker
            • Link Explorer
            • Keyword Explorer
            • Competitive Research
            • Brand Authority Checker
            • Local Citation Checker
            • MozBar Extension
            • MozCast
            Resources
            • Blog
            • SEO Learning Center
            • Help Hub
            • Beginner's Guide to SEO
            • How-to Guides
            • Moz Academy
            • API Docs
            About Moz
            • About
            • Team
            • Careers
            • Contact
            Why Moz
            • Case Studies
            • Testimonials
            Get Involved
            • Become an Affiliate
            • MozCon
            • Webinars
            • Practical Marketer Series
            • MozPod
            Connect with us

            Contact the Help team

            Join our newsletter
            Moz logo
            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
            • Accessibility
            • Terms of Use
            • Privacy