The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Robots.txt & Duplicate Content

    Robots.txt & Duplicate Content

    Intermediate & Advanced SEO
    14 6 422
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Careerbags
      Careerbags last edited by

      In reviewing my crawl results I have 5666 pages of duplicate content. I believe this is because many of the indexed pages are just different ways to get to the same content. There is one primary culprit. It's a series of URL's related to CatalogSearch - for example; http://www.careerbags.com/catalogsearch/result/index/?q=Mobile

      I have 10074 of those links indexed according to my MOZ crawl. Of those 5349 are tagged as duplicate content. Another 4725 are not.

      Here are some additional sample links:

      http://www.careerbags.com/catalogsearch/result/index/?dir=desc&order=relevance&p=2&q=Amy
      http://www.careerbags.com/catalogsearch/result/index/?color=28&q=bellemonde
      http://www.careerbags.com/catalogsearch/result/index/?cat=9&color=241&dir=asc&order=relevance&q=baggallini

      All of these links are just different ways of searching through our product catalog. My question is should we disallow  - catalogsearch   via the robots file? Are these links doing more harm than good?

      1 Reply Last reply Reply Quote 0
      • WesleySmits
        WesleySmits last edited by

        You could add a canonical tag to link to the default page. This way Google will know that it should only index that.
        The code for this would be:

        This should be placed in the section of your HTML code.

        Some more resources on the subject:

        • Complete guide to rel canonical
        • Learn about canonicalization and rel canonical
        1 Reply Last reply Reply Quote 0
        • MickEdwards
          MickEdwards last edited by

          To back up the detail Wesley gave you, you can also add URL parameters in Google Webmaster Tools

          https://support.google.com/webmasters/answer/1235687?hl=en

          Stewart_SEO 1 Reply Last reply Reply Quote 0
          • DonnaDuncan
            DonnaDuncan last edited by

            Hi Jeremy.

            Yours is a common problem. The best way to deal with it is, as Wesley mentions, by putting canonical tags on all the duplicate pages - the one you want indexed and to show up in search results AND all the others that you can arrive at via catalog search or any other means of navigation.

            Michael's suggestion will prevent the duplicate pages from getting indexed by Google. Unfortunately you lose any link equity going that route, so I'd suggest starting with canonical tags first.

            1 Reply Last reply Reply Quote 0
            • Careerbags
              Careerbags last edited by

              I'm not sure this is the right approach. The catalog search is based on the search box on the website. The query parameter can be anything the customer enters. Are you suggesting that the backend code be modified to always return the  in every result?

              And why that page because that URL just redirects to the home page because there is no query parameter provided for the search.

              In terms o losing link equity, how much equity do they have it they are duplicate content?

              MickEdwards WesleySmits 2 Replies Last reply Reply Quote 0
              • MickEdwards
                MickEdwards @Careerbags last edited by

                There are 2 distinct possible issues here

                1. Search results are creating duplicate content

                2. Search results are creating lots of thin content

                You want to give the user every possibility of finding your products, but you don't want those search results indexed because you should already have your source product page indexed and aiming to rank well.  If not see last paragraph.

                I slightly misread your post and took the URLs to be purely filtered. You should add disallow  /catalogsearch to your robots.txt and if any are indexed you can remove the directory in Webmaster Tools > Google Index > Remove URLs > Reason: Remove Directory.  This from Google - http://www.mattcutts.com/blog/search-results-in-search-results/

                If your site has any other parameters not in that directory you can add them in Webmaster Tools > Crawl > URL Parameters > Let Googlebot Decide. Google will understand they are not the main URLs and treat them accordingly.

                As a side issue with your search results it would be a good idea to analyse them in Analytics. You might find you have a trend, maybe something searched for or not the perfect match for the returned result, where you can create new more targeted content.

                1 Reply Last reply Reply Quote 1
                • simon_realbuzz
                  simon_realbuzz last edited by

                  Webmaster guidelines specifically request that you prevent crawling of search results pages using a robots.txt file. The relevant section reads: "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."

                  DonnaDuncan 1 Reply Last reply Reply Quote 0
                  • WesleySmits
                    WesleySmits @Careerbags last edited by

                    In this case you could add the meta robots tag on the search result pages like this:

                    content="noindex, follow">

                    Search results can indeed spawn an infinite amount of different URL's. This can be avoided by making sure they are not included in the index but are followed.

                    1 Reply Last reply Reply Quote 0
                    • DonnaDuncan
                      DonnaDuncan @simon_realbuzz last edited by

                      Simon, Wesley, Michael...

                      These customer facing search result pages are the ones often bookmarked and shared by site visitors. How worried does one need to be about losing link equity? I realize every site is going to be different and social shares don't have link equity - at least for now - but this could add up over time. The rel canonical will enable capture of link equity whereas the robots noindex will not.

                      Am I over thinking this?

                      WesleySmits 1 Reply Last reply Reply Quote 0
                      • WesleySmits
                        WesleySmits @DonnaDuncan last edited by

                        I do agree that a rel="canonical" is good option for the problem that's at hand.
                        As jeremy has stated however the link we are referring to in the href section redirects to the home page. http://www.careerbags.com/catalogsearch/result/index/

                        In my original answer i did not test this. I assumed there would be a list of all products here not filtered by search results. Since this is not the case and this page in fact does not exist it's hard to point at a url to be canonical.

                        Therefor i changed my answer to include the robots meta tag. This would indeed remove the search pages from the search index. I do think this is a positive thing though.

                        Look at the following url: http://www.careerbags.com/catalogsearch/result/?q=rolling+laptop+bags

                        Not really the type of URL i would click on in the search results. The following URL however is something i would want to click on: http://www.careerbags.com/laptop-bags/women-s/rolling-laptop-bags.html

                        Search result pages are too varied to be included in the index to my opinion.

                        Hope you agree with this, if not then i would like to hear your thoughts on this. 🙂

                        DonnaDuncan 1 Reply Last reply Reply Quote 0
                        • Stewart_SEO
                          Stewart_SEO @MickEdwards last edited by

                          Where is the evidence that these work? I have never seen them work. Google totally ignores the URL parameters tools in GWTs.

                          1 Reply Last reply Reply Quote 0
                          • DonnaDuncan
                            DonnaDuncan @WesleySmits last edited by

                            I agree entirely that "Search result pages are too varied to be included in the index".

                            That said, my understanding is that if you canonical a page, it doesn't get indexed. So we wouldn't have to worry about the appearance / user-friendliness of the URL. But (again, in my opinion) we should still worry about link equity being passed, and that won't happen if you noindex.

                            This gets complicated fast. I like your solution b/c it's a lot cleaner and easier to implement. Still not convinced it's the "best" way to go though.

                            WesleySmits 1 Reply Last reply Reply Quote 0
                            • WesleySmits
                              WesleySmits @DonnaDuncan last edited by

                              Oke, the question concerning rel="canonical" is which URL becomes the canonical version? Since there is no page on the website which would be appropiate (as far as i've seen) i recommended the meta robots tag.

                              I do agree that rel="canonical" is the preferred option, but in this situation i can't see a way to implement it properly. Which page would you highlight as the canonical?

                              DonnaDuncan 1 Reply Last reply Reply Quote 0
                              • DonnaDuncan
                                DonnaDuncan @WesleySmits last edited by

                                For product pages, I would canonical the page with the most descriptive URL.

                                For category pages, I agree with you, I would noindex them.

                                I think I just answered my own question!!

                                1 Reply Last reply Reply Quote 0
                                • 1 / 1
                                • First post
                                  Last post
                                • Robots.txt & Disallow: /*? Question!
                                  BabaBha0173
                                  BabaBha0173
                                  0
                                  8
                                  321

                                • SEM Rush & Duplicate content
                                  BeckyKey
                                  BeckyKey
                                  0
                                  3
                                  666

                                • Application & understanding of robots.txt
                                  Yoav-Blustein
                                  Yoav-Blustein
                                  0
                                  5
                                  191

                                • Woocommerce SEO & Duplicate content?
                                  c2g
                                  c2g
                                  1
                                  5
                                  4.5k

                                • Duplicate content - Images & Attachments
                                  BlueprintMarketing
                                  BlueprintMarketing
                                  0
                                  7
                                  659

                                • Could you use a robots.txt file to disalow a duplicate content page from being crawled?
                                  KyleChamp
                                  KyleChamp
                                  0
                                  11
                                  1.3k

                                • What content should I block in wodpress with robots.txt?
                                  ENSO
                                  ENSO
                                  0
                                  4
                                  518

                                • Subdomains - duplicate content - robots.txt
                                  SeoStallion
                                  SeoStallion
                                  0
                                  4
                                  890

                                Get started with Moz Pro!

                                Unlock the power of advanced SEO tools and data-driven insights.

                                Start my free trial
                                Products
                                • Moz Pro
                                • Moz Local
                                • Moz API
                                • Moz Data
                                • STAT
                                • Product Updates
                                Moz Solutions
                                • SMB Solutions
                                • Agency Solutions
                                • Enterprise Solutions
                                • Digital Marketers
                                Free SEO Tools
                                • Domain Authority Checker
                                • Link Explorer
                                • Keyword Explorer
                                • Competitive Research
                                • Brand Authority Checker
                                • Local Citation Checker
                                • MozBar Extension
                                • MozCast
                                Resources
                                • Blog
                                • SEO Learning Center
                                • Help Hub
                                • Beginner's Guide to SEO
                                • How-to Guides
                                • Moz Academy
                                • API Docs
                                About Moz
                                • About
                                • Team
                                • Careers
                                • Contact
                                Why Moz
                                • Case Studies
                                • Testimonials
                                Get Involved
                                • Become an Affiliate
                                • MozCon
                                • Webinars
                                • Practical Marketer Series
                                • MozPod
                                Connect with us

                                Contact the Help team

                                Join our newsletter
                                Moz logo
                                © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                                • Accessibility
                                • Terms of Use
                                • Privacy