The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Robots.txt

    Robots.txt

    Intermediate & Advanced SEO
    4 3 107
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • TomKing
      TomKing last edited by

      Hi all,

      Happy New Year!

      I want to block certain pages on our site as they are being flagged (according to my Moz Crawl Report) as duplicate content when in fact that isn't strictly true, it is more to do with the problems faced when using a CMS system...

      Here are some examples of the pages I want to block and underneath will be what I believe to be the correct robots.txt entry...

      http://www.XYZ.com/forum/index.php?app=core&module=search&do=viewNewContent&search_app=members&search_app_filters[forums][searchInKey]=&period=today&userMode=&followedItemsOnly=

      Disallow: /forum/index.php?app=core&module=search

      http://www.XYZ.com/forum/index.php?app=core&module=reports&rcom=gallery&imageId=980&ctyp=image

      Disallow: /forum/index.php?app=core&module=reports

      http://www.XYZ.com/forum/index.php?app=forums&module=post§ion=post&do=reply_post&f=146&t=741&qpid=13308

      Disallow: /forum/index.php?app=forums&module=post

      http://www.XYZ.com/forum/gallery/sizes/182-promenade/small/

      http://www.XYZ.com/forum/gallery/sizes/182-promenade/large/

      Disallow: /forum/gallery/sizes/

      Any help \ advice would be much appreciated.

      Many thanks

      Andy

      1 Reply Last reply Reply Quote 0
      • DirkC
        DirkC last edited by

        You can quite easily check if these filters work - using Google Webmastertools (crawl section > robots.txt tester).
        In the test-tool you can enter the criteria & check if they do block Googlebot from indexing these pages. I tried a few of the examples you gave & they seem to work.

        Apart from updating your robots.txt (which seems quite a radical solution) you could also consider implementing canonical url's for these duplicate url's.

        Another alternative is to configure url parameters in Google Webmastertools (also in the crawl section) - where you can indicate which parameters need to be ignored.

        1 Reply Last reply Reply Quote 1
        • TomKing
          TomKing last edited by

          Thanks DC1611, I will look into the other options but I have hundreds (and I mean hundreds) of examples that I would need to investigate...

          Andy

          1 Reply Last reply Reply Quote 1
          • Travis_Bailey
            Travis_Bailey last edited by

            You may be better off just doing a pattern match if your CMS generates a lot of junk URLs. You could save yourself a lot of time and heartache with the following:

            User-agent: *
            Disallow: /*?

            That will block everything with with a ? in the string. So yeah, use with caution - as always.

            If you're quite certain you want to block access to the image sizes subdirectory you may use:

            User-agent: *

            Disallow: /sizes*/

            More on all of that fun from Google and SEO Book.

            Robots.txt is almost as unforgiving as .htaccess, especially once you start pattern matching. Make sure to test everything thoroughly before you push to a live environment. For serious. You have been warned. đŸ˜‰

            Google WMT and Bing WMT also provide parameter handling tools. Once you tell Bing and/or Google that you want their bots to ignore urls with certain parameter(s) you select. So if you wanted to handle it that way, it looks like ignoring the app= parameter should do the trick for most of your expressed concerns.

            Good luck! explosions in the distance XD

            1 Reply Last reply Reply Quote 2
            • 1 / 1
            • First post
              Last post
            • Large robots.txt file
              ThomasHarvey
              ThomasHarvey
              0
              2
              513

            • Twitter Robots.TXT
              MarketingChimp10
              MarketingChimp10
              0
              5
              453

            • Robots.txt Allowed
              GlobeRunner
              GlobeRunner
              0
              4
              118

            • Have a Robots.txt Issue
              MattRoney
              MattRoney
              0
              5
              226

            • Robots.txt help
              KeriMorgret
              KeriMorgret
              0
              4
              85

            • Robots.txt assistance
              theLotter
              theLotter
              0
              9
              280

            • Robots.txt Syntax
              MichaelC-15022
              MichaelC-15022
              0
              2
              118

            • Robot.txt error
              Rubix
              Rubix
              0
              13
              203

            Get started with Moz Pro!

            Unlock the power of advanced SEO tools and data-driven insights.

            Start my free trial
            Products
            • Moz Pro
            • Moz Local
            • Moz API
            • Moz Data
            • STAT
            • Product Updates
            Moz Solutions
            • SMB Solutions
            • Agency Solutions
            • Enterprise Solutions
            • Digital Marketers
            Free SEO Tools
            • Domain Authority Checker
            • Link Explorer
            • Keyword Explorer
            • Competitive Research
            • Brand Authority Checker
            • Local Citation Checker
            • MozBar Extension
            • MozCast
            Resources
            • Blog
            • SEO Learning Center
            • Help Hub
            • Beginner's Guide to SEO
            • How-to Guides
            • Moz Academy
            • API Docs
            About Moz
            • About
            • Team
            • Careers
            • Contact
            Why Moz
            • Case Studies
            • Testimonials
            Get Involved
            • Become an Affiliate
            • MozCon
            • Webinars
            • Practical Marketer Series
            • MozPod
            Connect with us

            Contact the Help team

            Join our newsletter
            Moz logo
            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
            • Accessibility
            • Terms of Use
            • Privacy