The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?

    How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?

    Technical SEO Issues
    5 5 1.9k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • SpringMountain
      SpringMountain last edited by

      Today's sitemap webinar made me think about the disallow feature, seems opposite of sitemaps, but it also seems both are kind of ignored in varying ways by the engines.

      I don't need help semantically, I got that part. I just can't seem to find a contemporary answer about what should be blocked using the robots.txt file.

      For example, I have folders containing site comps for clients that I really don't want showing up in the SERPS. Is it better to not have these folders on the domain at all?

      There are also security issues I've heard of that make sense, simply look at a site's robots file to see what they are hiding. It makes it easier to hunt for files when they know the directory the files are contained in. Do I concern myself with this?

      Another example is a folder I have for my xml sitemap generator. I imagine google isn't going to try to index this or count it as content, so do I need to add folders like this to the disallow list?

      1 Reply Last reply Reply Quote 0
      • saibose
        saibose last edited by

        You can type the following syntax:

        after User-agent: *

        Disallow: /foldername/subfoldername

        also, you can name your sitemaps in the robots.txt file.

        They can be defined as

        Sitemap: http://www.yourdomain.com/sitemap.xml

        If you have multiple sitemaps, you can have multiple sitemaps listed.

        1 Reply Last reply Reply Quote -3
        • KrisRoadruck
          KrisRoadruck last edited by

          You may also want to think about slapping a robots noindex on the individual pages as well.

          1 Reply Last reply Reply Quote 0
          • KeriMorgret
            KeriMorgret last edited by

            Hi Jay,

            There's actually a recent similar discussion at http://www.seomoz.org/q/what-reasons-exist-to-use-noindex-robots-txt regarding deciding what to block via robots.

            For site comps for clients, you could also password-protect those to help hide them, or do a different domain that you have entirely excluded in robots. I've also seen services like Basecamp used for posting comps. It all depends on how much you want to hide the comps.

            You do want your sitemap itself to be crawled, but I'm presuming this is in the root directory so that shouldn't be a problem. Folders like your sitemap generator and other purely-framework folders can certainly be disallowed. Blocking the files that list the version of your website (if you're using a CMS) can help prevent people from searching for opportunities to hack that version and finding your site.

            Also, just do a site:domain.com search on your domain, see what's indexed, see what content from there you don't want indexed, and use that as a starting point.

            Are you running on a content management system, or a custom site? For a CMS, here are example robots.txt files for several popular CMSs. http://www.stayonsearch.com/robots-txt-guide

            1 Reply Last reply Reply Quote 0
            • portalseo
              portalseo last edited by

              Hi,

              Usin;

              User-agent: *
              Disallow: /folder/subfolder

              is fine, however if you have information stored in your website that you certainly want crawled make sure it is in your site map and use ...

              User-agent: *
              allow: /folder/subfolder

              adding a no follow attribute to all of your pages wont be practical, if a spam crawler ignores the robots.txt it will ignore your no follow attribute. If anything new occurs with robots.txt check large website's robots.txt as they always update to new trends i.e

              www.google.com/robots.txt

              Hope this helps:)

              1 Reply Last reply Reply Quote 0
              • 1 / 1
              • First post
                Last post
              • Crawl solutions for landing pages that don't contain a robots.txt file?
                Nomader
                Nomader
                1
                10
                353

              • Why don't sites using Drupal have keywords
                ThompsonPaul
                ThompsonPaul
                0
                2
                44

              • The use of tabs on productpages, do or don't?
                wilcoXXL
                wilcoXXL
                0
                3
                64

              • Robots.txt crawling URL's we dont want it to
                Peterli
                Peterli
                0
                2
                82

              • Can't find mistake in robots.txt
                Debdulal
                Debdulal
                0
                3
                338

              • Robots.txt to disallow /index.php/ path
                Mikkehl
                Mikkehl
                0
                9
                7.1k

              • Different version of site for "users" who don't accept cookies considered cloaking?
                ASOS
                ASOS
                0
                6
                516

              • Url's don't want to show up in google. Please help?
                RyanKent
                RyanKent
                0
                5
                587

              Get started with Moz Pro!

              Unlock the power of advanced SEO tools and data-driven insights.

              Start my free trial
              Products
              • Moz Pro
              • Moz Local
              • Moz API
              • Moz Data
              • STAT
              • Product Updates
              Moz Solutions
              • SMB Solutions
              • Agency Solutions
              • Enterprise Solutions
              • Digital Marketers
              Free SEO Tools
              • Domain Authority Checker
              • Link Explorer
              • Keyword Explorer
              • Competitive Research
              • Brand Authority Checker
              • Local Citation Checker
              • MozBar Extension
              • MozCast
              Resources
              • Blog
              • SEO Learning Center
              • Help Hub
              • Beginner's Guide to SEO
              • How-to Guides
              • Moz Academy
              • API Docs
              About Moz
              • About
              • Team
              • Careers
              • Contact
              Why Moz
              • Case Studies
              • Testimonials
              Get Involved
              • Become an Affiliate
              • MozCon
              • Webinars
              • Practical Marketer Series
              • MozPod
              Connect with us

              Contact the Help team

              Join our newsletter
              Moz logo
              © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
              • Accessibility
              • Terms of Use
              • Privacy