The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Do you bother cleaning duplicate content from Googles Index?

    Do you bother cleaning duplicate content from Googles Index?

    Intermediate & Advanced SEO
    5 3 450
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • FashionLux
      FashionLux last edited by

      Hi,

      I'm in the process of instructing developers to stop producing duplicate content, however a lot of duplicate content is already in Google's Index and I'm wondering if I should bother getting it removed... I'd appreciate it if you could let me know what you'd do...

      For example one 'type' of page is being crawled thousands of times, but it only has 7 instances in the index which don't rank for anything. For this example I'm thinking of just stopping Google from accessing that page 'type'.

      Do you think this is right?

      Do you normally meta NoIndex,follow the page, wait for the pages to be removed from Google's Index, and then stop the duplicate content from being crawled?

      Or do you just stop the pages from being crawled and let Google sort out its own Index in its own time?

      Thanks

      FashionLux

      1 Reply Last reply Reply Quote 0
      • Highland
        Highland last edited by

        Your options are

        1. De-index the duplicate pages yourself and save yourself the crawl budget
        2. 301 the duplicates to the pages you want to keep (preferred)
        3. Canonical the duplicate pages, which lets you pick which page remains in the index. The duplicate pages will still be crawled, however.
        1 Reply Last reply Reply Quote 1
        • Dr-Pete
          Dr-Pete last edited by

          I DO NOT believe in letting Google sort it out - they don't do it well, and, since Panda (and really even before), they basically penalize sites for their inability to sort out duplicates. I think it's very important to manage your index.

          Unfortunately, how to do that can be very complex and depends a lot on the situation. Highland's covered the big ones, but the details can get messy. I wrote a mega-post about it:

          http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world

          Without giving URLs, can you give us a sense of what kind of duplicates they are (or maybe some generic URL examples)?

          FashionLux 1 Reply Last reply Reply Quote 2
          • FashionLux
            FashionLux @Dr-Pete last edited by

            Hi Highland/Dr Pete,

            My apologies I wasn't very clear - fixing the duplicate problem... or rather stopping our site from generating further duplicate content isn't an issue at all, I'm going to instruct our developers to stop generating dupe content by doing things like no longer passing variables in the URL's (mysite.com/page2?previouspage=page1).

            However the problem is that for a lot of instances duplicate URL's work and they need to work - for example if a user types in the URL but gets one character wrong ('1q' rather than '1') then from a usability perspective its the correct thing to serve the content they wanted. You don't want to make the user have to stop, figure out what they did wrong and redo it - not when you can make it work seamlessly.

            My question relates to 'once my site is no longer generating unnecessary duplicate content, what should I to do about the duplicate pages that have already made their way into the Index?' and you have both answered the question very well, thank you.

            I can manually set-up 301 redirects for all of the duplicate pages that I find in the index, once they disappear from the index I can probably remove those 301's. I was thinking of going down the noindex meta tag route which is harder to develop.

            Thanks guys

            FashionLux

            Dr-Pete 1 Reply Last reply Reply Quote 0
            • Dr-Pete
              Dr-Pete @FashionLux last edited by

              One tricky point - you don't necessarily want to fix the duplicate URLs before you 301-redirect and clear out the index. This is counter-intuitive and throws many people off. If you cut the crawl paths to the bad URLs, then Google will never crawl them and process the 301-redirects (since those exist on the page level). Same is try for canonical tags. Clear out the duplicates first, THEN clean up the paths. I know it sounds weird, but it's important.

              For malformed URLs and usability, you could still dynamically 301-redirect. In most cases, those bad URLs shouldn't get indexed, because they have no crawl path in your site. Someone would have to link to them. Google will never mis-type, in other words.

              1 Reply Last reply Reply Quote 0
              • 1 / 1
              • First post
                Last post
              • Javascript content not being indexed by Google
                evolvingSEO
                evolvingSEO
                0
                4
                148

              • Google Not Indexing App Content
                Mobilio
                Mobilio
                0
                4
                120

              • Duplicate Content: Is a product feed/page rolled out across subdomains deemed duplicate content?
                danwebman
                danwebman
                0
                4
                146

              • Apps content Google indexation ?
                SamuelScott
                SamuelScott
                0
                2
                54

              • Does Google see this as duplicate content?
                Everett
                Everett
                0
                4
                95

              • How to Avoid Duplicate Content Issues with Google?
                LynnPatchett
                LynnPatchett
                0
                2
                193

              • Indexing non-indexed content and Google crawlers
                CleverPhD
                CleverPhD
                0
                8
                508

              • Duplicate Content/ Indexing Question
                clotairedamy
                clotairedamy
                0
                2
                265

              Get started with Moz Pro!

              Unlock the power of advanced SEO tools and data-driven insights.

              Start my free trial
              Products
              • Moz Pro
              • Moz Local
              • Moz API
              • Moz Data
              • STAT
              • Product Updates
              Moz Solutions
              • SMB Solutions
              • Agency Solutions
              • Enterprise Solutions
              • Digital Marketers
              Free SEO Tools
              • Domain Authority Checker
              • Link Explorer
              • Keyword Explorer
              • Competitive Research
              • Brand Authority Checker
              • Local Citation Checker
              • MozBar Extension
              • MozCast
              Resources
              • Blog
              • SEO Learning Center
              • Help Hub
              • Beginner's Guide to SEO
              • How-to Guides
              • Moz Academy
              • API Docs
              About Moz
              • About
              • Team
              • Careers
              • Contact
              Why Moz
              • Case Studies
              • Testimonials
              Get Involved
              • Become an Affiliate
              • MozCon
              • Webinars
              • Practical Marketer Series
              • MozPod
              Connect with us

              Contact the Help team

              Join our newsletter
              Moz logo
              © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
              • Accessibility
              • Terms of Use
              • Privacy