The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. If i disallow unfriendly URL via robots.txt, will its friendly counterpart still be indexed?

    If i disallow unfriendly URL via robots.txt, will its friendly counterpart still be indexed?

    Intermediate & Advanced SEO
    4 3 620
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • mrwestern
      mrwestern last edited by

      Our not-so-lovely CMS loves to render pages regardless of the URL structure, just as long as the page name itself is correct. For example, it will render the following as the same page:

      example.com/123.html

      example.com/dumb/123.html

      example.com/really/dumb/duplicative/URL/123.html

      To help combat this, we are creating mod rewrites with friendly urls, so all of the above would simply render as example.com/123

      I understand robots.txt respects the wildcard (*), so I was considering adding this to our robots.txt:

      Disallow: */123.html

      If I move forward, will this block all of the potential permutations of the directories preceding 123.html yet not block our friendly example.com/123?

      Oh, and yes, we do use the canonical tag religiously - we're just mucking with the robots.txt as an added safety net.

      1 Reply Last reply Reply Quote 0
      • irvingw
        irvingw last edited by

        that disallow command will block all files with the name 123.html in any folder deeper that the root.

        This with the canonical (absolute not relative) will probably cover you, but it is really recommended to get a robots noindex meta tag on these duplicate pages as well. Bots coming in from an external link pointing to that page could result in the page getting indexed, also the canonical is a suggestion not a rule.

        mrwestern 1 Reply Last reply Reply Quote 1
        • mrwestern
          mrwestern @irvingw last edited by

          Thanks, however, the meta tag won't work in this case because it's technically one page with an infinite amount of names via the URL (remember, the CMS only depends on the 123.html and ignores the directories preceding it). If I applied the NOINDEX within the meta, then the version I do want to get indexed would not be indexed.

          The question was really around "will the internal rewrite of /123.html to just /123 be impacted if we disallow */123.html" - and since the rewrite happens before the bot sees it, I presume the answer is "no, it will not be impacted: 123.html will be blocked yet /123 will still be indexed.

          Now, after I posted the question I realized this is the case where I should use a "greedy" 301 redirect via htaccess rather than try to block permutations of the URL via robots.txt. So I decided to not go the robots.txt route and instead do a 301 redirect via regex:

          */123.html to /123 (that's obviously not perfect regex, but you see my point)

          Cyrus-Shepard 1 Reply Last reply Reply Quote 1
          • Cyrus-Shepard
            Cyrus-Shepard @mrwestern last edited by

            Yeah, if you could solve this via .htaccess that would be great, especially if you have link equity flowing into any of those URLs.

            I'd go one step further than Irving and highly recommend canonical tags on those URLs. Since, as you said, it's all one page with infinite URL possibilities, the canonical should be easy to implement.

            Best of luck!

            1 Reply Last reply Reply Quote 0
            • 1 / 1
            • First post
              Last post
            • Should I disallow all URL query strings/parameters in Robots.txt?
              OlegKorneitchouk
              OlegKorneitchouk
              0
              5
              11.2k

            • Disallow URLs ENDING with certain values in robots.txt?
              Andy.Drinkwater
              Andy.Drinkwater
              0
              4
              1.9k

            • Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?
              TakeshiYoung
              TakeshiYoung
              0
              4
              302

            • Robots.txt: Syntax URL to disallow
              Anti-Alex
              Anti-Alex
              0
              8
              479

            • Will blocking urls in robots.txt void out any backlink benefits? - I'll explain...
              AubieJon
              AubieJon
              0
              4
              1.0k

            • Will disallowing in robots.txt noindex a page?
              FranckNlemba
              FranckNlemba
              0
              6
              510

            • Disallow my store in robots.txt?
              AlanMosley
              AlanMosley
              0
              2
              308

            • Is it fine to use an iframe for video content? Will it still be indexed on your URL?
              AU-SEO
              AU-SEO
              0
              3
              979

            Get started with Moz Pro!

            Unlock the power of advanced SEO tools and data-driven insights.

            Start my free trial
            Products
            • Moz Pro
            • Moz Local
            • Moz API
            • Moz Data
            • STAT
            • Product Updates
            Moz Solutions
            • SMB Solutions
            • Agency Solutions
            • Enterprise Solutions
            • Digital Marketers
            Free SEO Tools
            • Domain Authority Checker
            • Link Explorer
            • Keyword Explorer
            • Competitive Research
            • Brand Authority Checker
            • Local Citation Checker
            • MozBar Extension
            • MozCast
            Resources
            • Blog
            • SEO Learning Center
            • Help Hub
            • Beginner's Guide to SEO
            • How-to Guides
            • Moz Academy
            • API Docs
            About Moz
            • About
            • Team
            • Careers
            • Contact
            Why Moz
            • Case Studies
            • Testimonials
            Get Involved
            • Become an Affiliate
            • MozCon
            • Webinars
            • Practical Marketer Series
            • MozPod
            Connect with us

            Contact the Help team

            Join our newsletter
            Moz logo
            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
            • Accessibility
            • Terms of Use
            • Privacy