The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Robots.txt: how to exclude sub-directories correctly?

    Robots.txt: how to exclude sub-directories correctly?

    Intermediate & Advanced SEO
    10 3 48.0k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • fablau
      fablau last edited by

      Hello here,

      I am trying to figure out the correct way to tell SEs to crawls this:

      _http://www.mysite.com/directory/_

      But not this:

      _http://www.mysite.com/directory/sub-directory/_

      or this:

      http://www.mysite.com/directory/sub-directory2/sub-directory/...

      But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way:

      disallow: /directory/sub-directory/

      disallow: /directory/sub-directory2/

      disallow: /directory/sub-directory/sub-directory/

      disallow: /directory/sub-directory2/subdirectory/

      etc...

      I would end up having thousands of definitions to disallow all the possible sub-directory combinations.

      So, is the following way a correct, better and shorter way to define what I want above:

      allow: /directory/$

      disallow: /directory/*

      Would the above work?

      Any thoughts are very welcome! Thank you in advance.

      Best,

      Fab.

      1 Reply Last reply Reply Quote 1
      • MickEdwards
        MickEdwards last edited by

        As long as you dont have directories somewhere in /* that you want indexed then I think that will work.  There is no allow so you don't need the first line just

        disallow: /directory/*

        You can test out here- https://support.google.com/webmasters/answer/156449?rd=1

        fablau sjunaidali 2 Replies Last reply Reply Quote 0
        • fablau
          fablau @MickEdwards last edited by

          Thank you Michael,

          Google and other SEs actually recognize the "allow:" command:

          https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

          The fact is: if I don't specify that, how can I be sure that the following single command:

          disallow: /directory/*

          Doesn't prevent SEs to spider the /directory/ index as I'd like to?

          MickEdwards 1 Reply Last reply Reply Quote 0
          • MickEdwards
            MickEdwards @fablau last edited by

            I've always stuck to Disallow and followed -

            "This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:"

            http://www.robotstxt.org/robotstxt.html

            From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory

            | /* | equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |

            I think this post will be very useful  for you - http://moz.com/community/q/allow-or-disallow-first-in-robots-txt

            1 Reply Last reply Reply Quote 1
            • fablau
              fablau last edited by

              Thank you Michael, it is my understanding then that my idea of doing this:

              allow: /directory/$

              disallow: /directory/*

              Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise.

              In the meantime if anyone else has more ideas about all this and can confirm me that would be great!

              Thank you again.

              1 Reply Last reply Reply Quote 1
              • fablau
                fablau last edited by

                Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives:

                allow: /directory/$

                disallow: /directory/*

                Which allows this URL:

                _http://www.mysite.com/directory/_

                But doesn't allow the following one:

                http://www.mysite.com/directory/sub-directory2/...

                This page also gives an update similar to mine:

                https://support.google.com/webmasters/answer/156449?hl=en

                I think I am good! Thanks 🙂

                1 Reply Last reply Reply Quote 2
                • sjunaidali
                  sjunaidali @MickEdwards last edited by

                  I am using wordpress, Enfold theme (themeforest).

                  I want some files to be accessed by google, but those should not be indexed.

                  Here is an example: http://prntscr.com/h8918o

                  I have currently blocked some JS directories/files using robots.txt (check screenshot)

                  But due to this I am not able to pass Mobile Friendly Test on Google: http://prntscr.com/h8925z (check screenshot)

                  Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out.

                  MickEdwards 1 Reply Last reply Reply Quote 0
                  • MickEdwards
                    MickEdwards @sjunaidali last edited by

                    Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap.

                    sjunaidali 1 Reply Last reply Reply Quote 1
                    • sjunaidali
                      sjunaidali @MickEdwards last edited by

                      But google is still free to index a link/page even if it is not included in xml sitemap.

                      MickEdwards 1 Reply Last reply Reply Quote 0
                      • MickEdwards
                        MickEdwards @sjunaidali last edited by

                        I mentioned both.  You add a meta robots to noindex and remove from the sitemap.

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        • First post
                          Last post
                        • Robots.txt wildcards - the devs had a disagreement - which is correct?
                          McTaggart
                          McTaggart
                          0
                          8
                          97

                        • What are best page titles for sub-folders or sub-directories? Same as website?
                          impactzoneco
                          impactzoneco
                          0
                          10
                          252

                        • How do I get the sub-domain traffic to count as sub-directory traffic without moving off of WordPress?
                          Chris_Hickman
                          Chris_Hickman
                          0
                          5
                          67

                        • Block subdomain directory in robots.txt
                          DirkC
                          DirkC
                          0
                          5
                          1.1k

                        • Robots.txt
                          Travis_Bailey
                          Travis_Bailey
                          0
                          4
                          107

                        • Is our robots.txt file correct?
                          Igal_Zeifman
                          Igal_Zeifman
                          0
                          5
                          175

                        • Sub-domain and Sub-directory - Is there no difference
                          AlanMosley
                          AlanMosley
                          0
                          7
                          785

                        • Robots.txt unblock
                          Elchanan
                          Elchanan
                          0
                          5
                          4.3k

                        Get started with Moz Pro!

                        Unlock the power of advanced SEO tools and data-driven insights.

                        Start my free trial
                        Products
                        • Moz Pro
                        • Moz Local
                        • Moz API
                        • Moz Data
                        • STAT
                        • Product Updates
                        Moz Solutions
                        • SMB Solutions
                        • Agency Solutions
                        • Enterprise Solutions
                        • Digital Marketers
                        Free SEO Tools
                        • Domain Authority Checker
                        • Link Explorer
                        • Keyword Explorer
                        • Competitive Research
                        • Brand Authority Checker
                        • Local Citation Checker
                        • MozBar Extension
                        • MozCast
                        Resources
                        • Blog
                        • SEO Learning Center
                        • Help Hub
                        • Beginner's Guide to SEO
                        • How-to Guides
                        • Moz Academy
                        • API Docs
                        About Moz
                        • About
                        • Team
                        • Careers
                        • Contact
                        Why Moz
                        • Case Studies
                        • Testimonials
                        Get Involved
                        • Become an Affiliate
                        • MozCon
                        • Webinars
                        • Practical Marketer Series
                        • MozPod
                        Connect with us

                        Contact the Help team

                        Join our newsletter
                        Moz logo
                        © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                        • Accessibility
                        • Terms of Use
                        • Privacy