The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Could you use a robots.txt file to disalow a duplicate content page from being crawled?

    Could you use a robots.txt file to disalow a duplicate content page from being crawled?

    Intermediate & Advanced SEO
    11 5 1.3k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • gregelwell
      gregelwell last edited by

      A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO.

      I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?

      1 Reply Last reply Reply Quote 0
      • KyleChamp
        KyleChamp last edited by

        The best way would be to use the Rel canonical tag

        On the page you would like to rank for put the Rel canonical tag in

        This lets google know that this is the original page.

        Check out this link posted by Rand about the Rel canonical tag [http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps](http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps)

        gregelwell 1 Reply Last reply Reply Quote 2
        • anthonytjm
          anthonytjm last edited by

          Well, the answer would be yes and no. A robots.txt file would stop the bots from indexing the page, but links from other pages in site to that non indexed page could therefor make it crawlable and then indexed. AS posted in google webmaster tools here:

          "You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one).

          While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results."

          I think the best way to avoid any conflict is applying the rel="canonical"  tag to each duplicate page that you don't want indexed.

          You can find more info on rel canonical here

          Hope this helps out some.

          gregelwell Dr-Pete 2 Replies Last reply Reply Quote 2
          • Adam.Whittles
            Adam.Whittles last edited by

            I'm not sure I understand why the site owner seems to think that the duplicate content is necessary?

            If I was in your situation I would be trying to convince the client to remove the duplicate content from their site, rather than trying to find a way around it.

            If the information is difficult to find then this may be due to a problem with the site architecture. If the site does not flow well enough for visitors to find the information they need, then perhaps a site redesign is necessary.

            1 Reply Last reply Reply Quote 0
            • gregelwell
              gregelwell @anthonytjm last edited by

              Anthony, Thanks for your response. See Kyle, he also felt using the rel canonical tag was the best thing to do. However he seemed to think you'd put it on the original page - the one you want to rank for. And you're suggesting putting on the duplicate page. Should it be added to both while specifying which page is the 'original'?

              Thanks!

              Greg

              anthonytjm gregelwell 2 Replies Last reply Reply Quote 0
              • gregelwell
                gregelwell @KyleChamp last edited by

                Thanks Kyle. Anthony had a similar view on using the rel canonical tag. I'm just curious about adding it to both the original page or duplicate page? Or both?

                Thanks,

                Greg

                KyleChamp 1 Reply Last reply Reply Quote 0
                • anthonytjm
                  anthonytjm @gregelwell last edited by

                  per google webmaster tools:

                  If Google knows that these pages have the same content, we may index only one version for our search results. Our algorithms select the page we think best answers the user's query. Now, however, users can specify a canonical page to search engines by adding a element with the attribute rel="canonical" to the section of the non-canonical version of the page. Adding this link and attribute lets site owners identify sets of identical content and suggest to Google: "Of all these pages with identical content, this page is the most useful. Please prioritize it in search results."

                  1 Reply Last reply Reply Quote 0
                  • gregelwell
                    gregelwell @gregelwell last edited by

                    Next time I'll read the reference links better 🙂

                    Thank you!

                    1 Reply Last reply Reply Quote 0
                    • Dr-Pete
                      Dr-Pete @anthonytjm last edited by

                      Generally agree, although I'd just add that Robots.txt also isn't so great at removing content that's already been indexed (it's better at prevention). So, I find that it's not just not ideal - it sometimes doesn't even work in these cases.

                      Rel-canonical is generally a good bet, and it should go on the duplicate (you can actually put it on both, although it's not necessary).

                      gregelwell 1 Reply Last reply Reply Quote 1
                      • gregelwell
                        gregelwell @Dr-Pete last edited by

                        Peter, Thanks for the clarification.

                        1 Reply Last reply Reply Quote 0
                        • KyleChamp
                          KyleChamp @gregelwell last edited by

                          Yeah, sorry for the confusion. I put the tag on all the pages (Original and Duplicate). I sent you a PM with another good article on Rel canonical tag

                          1 Reply Last reply Reply Quote 0
                          • 1 / 1
                          • First post
                            Last post
                          • Shall we add engaging and useful FAQ content in all our pages or rather not because of duplication and reduction of unique content?
                            0
                            1
                            29

                          • Set Robots.txt file to crawl my website at specific times
                            Tenlo
                            Tenlo
                            0
                            2
                            50

                          • Duplicate Content: Is a product feed/page rolled out across subdomains deemed duplicate content?
                            danwebman
                            danwebman
                            0
                            4
                            146

                          • Duplicate Page Content Errors on Moz Crawl Report
                            SamWeber
                            SamWeber
                            0
                            3
                            236

                          • Duplicate page content and Duplicate page title errors
                            Cyrus-Shepard
                            Cyrus-Shepard
                            0
                            9
                            834

                          • 202 error page set in robots.txt versus using crawl-able 404 error
                            EricaMcGillivray
                            EricaMcGillivray
                            0
                            3
                            870

                          • Does using robots.txt to block pages decrease search traffic?
                            KeriMorgret
                            KeriMorgret
                            0
                            4
                            520

                          • Negative impact on crawling after upload robots.txt file on HTTPS pages
                            ShaMenz
                            ShaMenz
                            0
                            2
                            892

                          Get started with Moz Pro!

                          Unlock the power of advanced SEO tools and data-driven insights.

                          Start my free trial
                          Products
                          • Moz Pro
                          • Moz Local
                          • Moz API
                          • Moz Data
                          • STAT
                          • Product Updates
                          Moz Solutions
                          • SMB Solutions
                          • Agency Solutions
                          • Enterprise Solutions
                          • Digital Marketers
                          Free SEO Tools
                          • Domain Authority Checker
                          • Link Explorer
                          • Keyword Explorer
                          • Competitive Research
                          • Brand Authority Checker
                          • Local Citation Checker
                          • MozBar Extension
                          • MozCast
                          Resources
                          • Blog
                          • SEO Learning Center
                          • Help Hub
                          • Beginner's Guide to SEO
                          • How-to Guides
                          • Moz Academy
                          • API Docs
                          About Moz
                          • About
                          • Team
                          • Careers
                          • Contact
                          Why Moz
                          • Case Studies
                          • Testimonials
                          Get Involved
                          • Become an Affiliate
                          • MozCon
                          • Webinars
                          • Practical Marketer Series
                          • MozPod
                          Connect with us

                          Contact the Help team

                          Join our newsletter
                          Moz logo
                          © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                          • Accessibility
                          • Terms of Use
                          • Privacy