The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Robots.txt wildcards - the devs had a disagreement - which is correct?

    Robots.txt wildcards - the devs had a disagreement - which is correct?

    Intermediate & Advanced SEO
    8 2 97
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • McTaggart
      McTaggart last edited by

      Hi – the lead website developer was assuming that this wildcard: Disallow: /shirts/?* would block URLs including a ? within this directory, and all the subdirectories of this directory that included a “?”

      The second developer suggested that this wildcard would only block URLs featuring a ? that come immediately after /shirts/ - for example: /shirts?minprice=10&maxprice=20 BUT argued that this robots.txt directive would not block URLS featuring a ? in sub directories - e.g. /shirts/blue?mprice=100&maxp=20

      So which of the developers is correct?

      Beyond that, I assumed that the ? should feature a * on each side of it – for example - /? - to work as intended above? Am I correct in assuming that?

      1 Reply Last reply Reply Quote 0
      • LoganRay
        LoganRay last edited by

        Hi Luke,

        The second developer is correct....well, more correct than the first. Your example of /shirts?minprice=10&maxprice=20 would not be blocked by this direction, since there's no slack after shirts.

        For future reference, you can test how directives function in Google Search Console. Under the 'Crawl' menu, there's a robots.txt tester in which you can manually edit the robots.txt directives (they don't apply to the live file) and enter test URLs to see which directive, if any, would prevent crawling.

        You are correct in your assumption that a * on either side of the ? would prevent crawling of both /shirts/blue?mprice=100&maxp=20 and /shirts/?minprice=10&maxprice=20

        McTaggart 1 Reply Last reply Reply Quote 1
        • McTaggart
          McTaggart @LoganRay last edited by

          Thanks Logan -  the lead website developer was assuming that this wildcard: Disallow: /shirts/?* would block URLs including a ? within this directory, and all the subdirectories of this directory that included a “?”

          If I amended the URL to 
          /shirts/?minprice=10&maxprice=20 would robots.txt work as intended right there?

          and would that robots.txt work as intended further down the directory structure of the URLs? E.g.
          /shirts**/golden/**?minprice=10&maxprice=20

          LoganRay 1 Reply Last reply Reply Quote 0
          • McTaggart
            McTaggart last edited by

            I suppose the nub of the disagreement is this: would Disallow: /shirts/?* block /shirts/?minprice=10&maxprice=20 and also block URLS further down the URL directory structure - e.g. /shirts/mens/navyblue/?minprice=10&maxprice=20 ?

            1 Reply Last reply Reply Quote 0
            • LoganRay
              LoganRay @McTaggart last edited by

              Disallow: /shirts/?* will only block URLs that end with /shirts/ before beginning a parameter string. If you want to block /shirts**/golden/**?minprice=10&maxprice=20 you'll have to add the asterisk before and after the ?

              What the end goal here? Preventing bots from crawling any parameter'd URL?

              McTaggart 1 Reply Last reply Reply Quote 1
              • McTaggart
                McTaggart @LoganRay last edited by

                Thanks Logan - much appreciated - the aim would be to prevent bots crawling any parameter'd URL but only in the products section, and not all of them - see below.

                I noticed the shirt URLs can be produce many pages of results - e.g. if you look for a type of shirt you can get up to 20 pages of results - the resulting URLs also feature a ?

                So you end up with - for example - /shirts/?resultspage=01 and then /shirts/?resultspage=02 or shirts/navy/?resultspage=01 and /shirts/navy/?resultspage=02 - and so on - and it would be good to index them somehow. So I wonder how I can override disallow parameters robots.txt instruction only for specific paths and even individual pages?

                LoganRay 1 Reply Last reply Reply Quote 0
                • LoganRay
                  LoganRay @McTaggart last edited by

                  Ok, gotcha. Add the following directives:

                  Disallow: /shirts/?

                  This prevents crawling of the following:

                  • /shirts**/golden/**?minprice=10&maxprice=20
                  • /shirts/?minprice=10&maxprice=20

                  Allow: /*?resultspage=

                  Allows crawling of the following:

                  • /shirts/navy/?resultspage=02
                  • /shirts/?resultspage=01
                  McTaggart 1 Reply Last reply Reply Quote 1
                  • McTaggart
                    McTaggart @LoganRay last edited by

                    Thanks Logan - much appreciated, as ever - that really helps 🙂 - if I was to add another * to **Allow: /?resultspage= > so **Allow: /?*resultspage= - what would happen then? ****

                    1 Reply Last reply Reply Quote 0
                    • 1 / 1
                    • First post
                      Last post
                    • Twitter Robots.TXT
                      MarketingChimp10
                      MarketingChimp10
                      0
                      5
                      453

                    • Robots.txt
                      Travis_Bailey
                      Travis_Bailey
                      0
                      4
                      107

                    • Robots.txt Syntax
                      MichaelC-15022
                      MichaelC-15022
                      0
                      2
                      118

                    • Robots.txt: how to exclude sub-directories correctly?
                      MickEdwards
                      MickEdwards
                      1
                      10
                      48.0k

                    • Robot.txt help
                      evolvingSEO
                      evolvingSEO
                      0
                      23
                      203

                    • Is our robots.txt file correct?
                      Igal_Zeifman
                      Igal_Zeifman
                      0
                      5
                      175

                    • Robots.txt: Can you put a /* wildcard in the middle of a URL?
                      irvingw
                      irvingw
                      0
                      2
                      410

                    • Using 2 wildcards in the robots.txt file
                      lonniea
                      lonniea
                      0
                      2
                      605

                    Get started with Moz Pro!

                    Unlock the power of advanced SEO tools and data-driven insights.

                    Start my free trial
                    Products
                    • Moz Pro
                    • Moz Local
                    • Moz API
                    • Moz Data
                    • STAT
                    • Product Updates
                    Moz Solutions
                    • SMB Solutions
                    • Agency Solutions
                    • Enterprise Solutions
                    • Digital Marketers
                    Free SEO Tools
                    • Domain Authority Checker
                    • Link Explorer
                    • Keyword Explorer
                    • Competitive Research
                    • Brand Authority Checker
                    • Local Citation Checker
                    • MozBar Extension
                    • MozCast
                    Resources
                    • Blog
                    • SEO Learning Center
                    • Help Hub
                    • Beginner's Guide to SEO
                    • How-to Guides
                    • Moz Academy
                    • API Docs
                    About Moz
                    • About
                    • Team
                    • Careers
                    • Contact
                    Why Moz
                    • Case Studies
                    • Testimonials
                    Get Involved
                    • Become an Affiliate
                    • MozCon
                    • Webinars
                    • Practical Marketer Series
                    • MozPod
                    Connect with us

                    Contact the Help team

                    Join our newsletter
                    Moz logo
                    © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                    • Accessibility
                    • Terms of Use
                    • Privacy