The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Allow or Disallow First in Robots.txt

    Allow or Disallow First in Robots.txt

    Technical SEO Issues
    12 7 27.8k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • irvingw
      irvingw last edited by

      If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command?

      example:

      Allow: /models/ford///page*

      Disallow: /models////page

      1 Reply Last reply Reply Quote 0
      • NakulGoyal
        NakulGoyal last edited by

        I don't think it matters, but I think I would disallow first, because by default everything is an Allow.

        irvingw 1 Reply Last reply Reply Quote 0
        • irvingw
          irvingw @NakulGoyal last edited by

          Thanks. I want to make sure I get this right in a syntax universally understood by all engines. I have seen webmasters all over the place on this one with some saying that crawlers use a first matching rule and others that say that crawlers use a last matching rule. I am almost thinking to have the allow command twice - before and after, to cover all bases.

          NakulGoyal 1 Reply Last reply Reply Quote 0
          • zigojacko
            zigojacko last edited by

            The allow directives need to come before the disallow directives for the same directory/file paths. (I have never personally tested this although it makes logical sense to instruct a robot to access one particular path within a directory structure before it sees that it is blocked from crawling that directory).

            For example:-

            Allow: /profiles

            Disallow: /s2/profiles/me

            Allow: /s2/profiles

            Allow: /s2/photos

            Allow: /s2/static

            Disallow: /s2

            As per how Google have formatted their robots.txt.

            1 Reply Last reply Reply Quote 2
            • NakulGoyal
              NakulGoyal @irvingw last edited by

              I understand your concern. I am basing my answer based on the fact that if you don't have a robots.txt at all, Google will still crawl you, which means its an allow by default. So all that matters in my opinion is the disallow, but because you need an allow from the wildcard disallow, you could allow that and disallow next.

              Honestly, I don't think it matters. If you think the way a bot would work, it's not like robots.txt 1 line is read, then the bot goes crawling and then comes back reads the next line and so on. Does that make sense ? It reads all the lines in the robots.txt and then follows the directives. But to be sure, you can do either of the scenarios and see for yourself. I am sure the results would be same either way.

              1 Reply Last reply Reply Quote 1
              • Cyrus-Shepard
                Cyrus-Shepard last edited by

                Interesting question - I've had this discussion a couple of times with different SEOs. Here's my best understanding: There are actually 2 different answers - one if you are talking about Google, and one for every other search engine.

                For most search engines, the "Allow" should come first. This is because the first matching pattern always wins, for the reasons Geoff stated.

                But Google is different. They state:

                "At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule. The order of precedence for rules with wildcards is undefined."

                Robots.txt Specifications - Webmasters — Google Developers

                So for Google, order is not important, only the specificity of the rule based on the length of the entry. But the order of precedence for rules with wildcards is undefined.

                This last part is important, because your directives contain wildcards. If I'm reading this right, your particular directives:

                Allow: /models/ford///page*

                Disallow: /models////pageSo if it's "undefined" which directive will Google follow, if order isn't important? Fortunately, there's a simple way to find out.Google Webmaster allows you to test any robots.txt file. I created a dummy file based on your rules, In this case, your directives worked perfectly no matter what order I put them in.

                | http://cyrusshepard.com/models/ford/test/test/pages | Allowed by line 2: Allow: /models/ford///page* | Allowed by line 2: Allow: /models/ford///page* |
                | http://cyrusshepard.com/models/chevy/test/test/pages | Blocked by line 3: Disallow: /models////page | Blocked by line 3: Disallow: /models////page |

                So, to summarize:1. Always put Allow directives first, as most search engines follow the "first rule counts" rule.2. Google doesn't care about order, but rather the specificity based on the length of the entry.3. The order of precedence for rules with wildcards is undefined.4. When in doubt, check your robots.txt file in Google Webmaster tools.Hope this helps.(sorry for the very long answer which basically says you were right all along 🙂

                irvingw 1 Reply Last reply Reply Quote 3
                • irvingw
                  irvingw @Cyrus-Shepard last edited by

                  I really appreciate all that effort you put in to ensure your method was correct. many thanks.

                  1 Reply Last reply Reply Quote 0
                  • fablau
                    fablau last edited by

                    What about something like:

                    allow: /directory/$

                    disallow: /directory/*

                    Where I want this to be indexed:

                    _http://www.mysite.com/directory/_

                    But not this:

                    _http://www.mysite.com/directory/sub-directory/_

                    Ideas?

                    KeriMorgret Cyrus-Shepard 2 Replies Last reply Reply Quote 0
                    • KeriMorgret
                      KeriMorgret @fablau last edited by

                      Just a quick note, this question is actually from spring of 2012.

                      1 Reply Last reply Reply Quote 0
                      • Cyrus-Shepard
                        Cyrus-Shepard @fablau last edited by

                        Can't say with 100% confidence, but sounds like it might work. You could always upload it to a server and use a robots.txt checker to validate, although sometimes the validator tools may incorporate slight differences in edge cases like this that make them moot.

                        fablau 1 Reply Last reply Reply Quote 1
                        • fablau
                          fablau @Cyrus-Shepard last edited by

                          Thank you Cyrus, yes, I have tried your suggested robots.txt checker and despite it validates the file, it shows me a couple of warnings about the "unusual" use of wildcard. It is my understanding that I would probably need to discuss all this with Google folks directly.

                          Thank you for you answer... and, yes Keri, I know this is a old thread, but still useful today!

                          Thanks 🙂

                          1 Reply Last reply Reply Quote 0
                          • Net66SEO
                            Net66SEO last edited by

                            Just caught this a bit late and probably to late to add something but my two pence is test it in Webmaster Tools, via Crawl -> Robot.txt tester - if you've not used this before simply add the url you want to test and Google highlights the directive that allows or disallows it.

                            1 Reply Last reply Reply Quote 0
                            • 1 / 1
                            • First post
                              Last post
                            • Disallow wildcard match in Robots.txt
                              effectdigital
                              effectdigital
                              0
                              3
                              1.0k

                            • Does putting Disallow: / at the end of a robots.txt file override the Allow: /xxxx that come before it?
                              Alick300
                              Alick300
                              0
                              4
                              249

                            • Will a robots.txt disallow apply to a 301ed URL?
                              Martijn_Scheijbeler
                              Martijn_Scheijbeler
                              0
                              3
                              158

                            • Robots.txt to disallow /index.php/ path
                              Mikkehl
                              Mikkehl
                              0
                              9
                              7.1k

                            • Googlebot does not obey robots.txt disallow
                              Cyrus-Shepard
                              Cyrus-Shepard
                              0
                              12
                              1.4k

                            • Can I Disallow Faceted Nav URLs - Robots.txt
                              AlanMosley
                              AlanMosley
                              0
                              5
                              914

                            • Robots.txt and robots meta
                              TheEspresseo
                              TheEspresseo
                              0
                              5
                              1.1k

                            Get started with Moz Pro!

                            Unlock the power of advanced SEO tools and data-driven insights.

                            Start my free trial
                            Products
                            • Moz Pro
                            • Moz Local
                            • Moz API
                            • Moz Data
                            • STAT
                            • Product Updates
                            Moz Solutions
                            • SMB Solutions
                            • Agency Solutions
                            • Enterprise Solutions
                            • Digital Marketers
                            Free SEO Tools
                            • Domain Authority Checker
                            • Link Explorer
                            • Keyword Explorer
                            • Competitive Research
                            • Brand Authority Checker
                            • Local Citation Checker
                            • MozBar Extension
                            • MozCast
                            Resources
                            • Blog
                            • SEO Learning Center
                            • Help Hub
                            • Beginner's Guide to SEO
                            • How-to Guides
                            • Moz Academy
                            • API Docs
                            About Moz
                            • About
                            • Team
                            • Careers
                            • Contact
                            Why Moz
                            • Case Studies
                            • Testimonials
                            Get Involved
                            • Become an Affiliate
                            • MozCon
                            • Webinars
                            • Practical Marketer Series
                            • MozPod
                            Connect with us

                            Contact the Help team

                            Join our newsletter
                            Moz logo
                            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                            • Accessibility
                            • Terms of Use
                            • Privacy