The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. White Hat / Black Hat SEO
    4. Removing duplicated content using only the NOINDEX in large scale (80% of the website).

    Removing duplicated content using only the NOINDEX in large scale (80% of the website).

    White Hat / Black Hat SEO
    14 4 477
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Lukas_TheCurious
      Lukas_TheCurious last edited by

      Hi everyone,

      I am taking care of the large "news" website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content.

      However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn't help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site "cheating" and not worthy for the user.

      What do you think about this "theory"? What would you do?

      Thank you for your help!

      1 Reply Last reply Reply Quote 0
      • DmitriiK
        DmitriiK last edited by

        Hi there.

        NOINDEX !== no crawling.  and surely it doesn't equal NOFOLLOW. what you probably should be looking at is canonical links.

        My understanding is (and i can be completely wrong) that when you get hit by Panda for duplicate content and then try to recover, Google checks your website for the same duplicate content - it's still crawlable, all the links are still "followable", it's still scraped content, you aren't telling crawlers that you took it from somewhere else (by canonicalizing), it's just not displayed in SERPs. And yes, 80% of content being noindex probably doesn't help either.

        So, I think that what you need to do is either remove that duplicate content whatsoever, or use canonical links to originals or (bad idea, but would work) block all those links in robots.txt (at least this way those pages will become uncrawlable whatsoever). All this still is unreputable techniques though, kinda like polishing the dirt.

        Hope this makes sense.

        Lukas_TheCurious 1 Reply Last reply Reply Quote 1
        • EGOL
          EGOL last edited by

          If there are 500,000 pages of "news" then a lot of that content is "history" instead of "news".   Visitors are probably not consuming it.  People are probably not searching for it.  And actively visited pages on the site are probably not linking to it.

          So, I would use analytics to determine if these "history" pages are being viewed, are pulling in much traffic, have very many links, and I would delete and redirect them if they are not important to the site any longer.   This decision is best made at the page level.

          For "unique content" pages that appear only on my site, I would assess them at regular intervals to determine which ones are pulling in traffic and which ones are not.  Some sites place news in folders according to their publication dates and that facilitates inspecting old content for its continued value.   These pages can then be abandoned and redirected once their content is stale and not being consumed. Again, this can best be done at the page level.

          I used to manage a news section and every few months we would assess, delete and redirect, to keep the weight of the site as low as possible for maximum competitiveness.

          DmitriiK Lukas_TheCurious 2 Replies Last reply Reply Quote 3
          • DmitriiK
            DmitriiK @EGOL last edited by

            Good point! News gotta be new 🙂

            1 Reply Last reply Reply Quote 0
            • CleverPhD
              CleverPhD last edited by

              Couple of things here.

              1. If a second Panda update has not occurred since the changes that were made then you may not get credit for the noindexed content.   I don't think this is "cheating" as with the noindex, it just told Google to take 350K of its pages out of the index.  The noindex is one of the best ways to get your content out of Google's index.

              2. If you have not spent time improving the non-syndicated content then you are missing the more important part and that is to improve the quality of the content that you have.

              A side point to consider here, is your crawl budget.  I am assuming that the site still internally links to these 350K pages and so users and bots will go to them and have to process etc.  This is mostly a waste of time.  As all of these pages are out of Google's index thanks to the noindex tag, why not take out all internal links to those pages (i.e. from sitemaps, paginated index pages, menus, internal content) so that you can have the user and Google focus on the quality content that is left over.  I would then also 404/410 all those low quality pages as they are now out of Google's index and not linked internally.  Why maintain the content?

              CleverPhD Lukas_TheCurious 2 Replies Last reply Reply Quote 0
              • CleverPhD
                CleverPhD @CleverPhD last edited by

                Just seeing the other responses.  Agree with what EGOL mentions.  A content audit would be even better to see if there was any value at all on those pages (GA traffic, links, etc).  Odds are though that there was not any and you already killed all of it with the noindex tag in place.

                1 Reply Last reply Reply Quote 0
                • Lukas_TheCurious
                  Lukas_TheCurious @DmitriiK last edited by

                  HI Dimitrii,

                  thank you very much for your opinion. The idea of canonical links is very interesting. We may try that in the "first" phase. But I still miss the point of paying for the content that is not accessible from SE.

                  Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?

                  DmitriiK 1 Reply Last reply Reply Quote 0
                  • DmitriiK
                    DmitriiK @Lukas_TheCurious last edited by

                    But I still miss the point of paying for the content that is not accessible from SE

                    • "paying"?

                    Is my understanding right, that if I would set canonical for these duplicates, Google has no reason to show this pages in the SERP?

                    • correct
                    Lukas_TheCurious 1 Reply Last reply Reply Quote 0
                    • Lukas_TheCurious
                      Lukas_TheCurious @EGOL last edited by

                      EGOL your insights are very appreciated :-)!

                      I agree with you. Makes total sense.

                      So you didn't experience any problems removing outdated  content (or "content with no traffic value") from your website? You have set 410 for those pages and remove all internal links to them and google was ok with that?

                      Redirecting useless content - you mean set 301 to the most relevant page that is bringing traffic?

                      Thank you sir 🙂

                      EGOL 1 Reply Last reply Reply Quote 0
                      • Lukas_TheCurious
                        Lukas_TheCurious @DmitriiK last edited by

                        Yeah, paying ... we actually pay for this content (earlier management decisions :-))

                        DmitriiK 1 Reply Last reply Reply Quote 0
                        • DmitriiK
                          DmitriiK @Lukas_TheCurious last edited by

                          Yaikes! Will you guys still pay for it if it's removed? If so, then combining below comments with my thoughts - I'd delete it, since it's old and not time relevant.

                          1 Reply Last reply Reply Quote 0
                          • EGOL
                            EGOL @Lukas_TheCurious last edited by

                            We deleted thousands of pages every few months.

                            Before deleting anything we identified valuable pages that continued to receive traffic from other websites or from search.  These were often updated and kept on the site.  Everything else was 301 redirected to the "news homepage" of the site.  This was not a news site, it was a very active news section on an industry portal site.

                            You have set 410 for those pages and remove all internal links to them and google was ok with that?

                            Our goal was to avoid internal links to pages that were going to be deleted.  Our internal "story recommendation" widgets would stop showing links to pages after a certain length of time.  Our periodic purges were done after that length of time.

                            We never used hard coded links in stories to pages that were subject to being abandoned.  Instead we simply linked to category pages where something relevant would always be found.

                            Develop a strategy for internal linking that will reduce site maintenance and focus all internal links to pages that are permanently maintained.

                            Lukas_TheCurious 1 Reply Last reply Reply Quote 3
                            • Lukas_TheCurious
                              Lukas_TheCurious @EGOL last edited by

                              Yeah, this strategy will be definitely part of the guidelines for the editors.

                              One last question: do you know some good resources I can use as an inspiration?

                              Thank you so much..

                              1 Reply Last reply Reply Quote 0
                              • Lukas_TheCurious
                                Lukas_TheCurious @CleverPhD last edited by

                                1. it has been almost a year now from the massive hit. after that, there were also some smaller hits 🙂

                                2. we are putting effort into improvements. that is quite frustrating for me, because I believe that our effort is demolished by old duplicated content (that creates 80% of the website :-))

                                Yeah, we will need to take care about the link-mess... 
                                Thank you!

                                1 Reply Last reply Reply Quote 0
                                • 1 / 1
                                • First post
                                  Last post
                                • Linking Websites/ Plagiarized Content Ranking Above Original Content
                                  EGOL
                                  EGOL
                                  0
                                  2
                                  42

                                • Duplicate product content - from a manufacturer website, to retailers
                                  MattAntonino
                                  MattAntonino
                                  0
                                  2
                                  886

                                • Is Syndicated (Duplicate) Content considered Fresh Content?
                                  ColeLusby
                                  ColeLusby
                                  0
                                  15
                                  442

                                • How to treat 3rd party website that has duplicated our US content in Spanish?
                                  gazzerman1
                                  gazzerman1
                                  0
                                  3
                                  132

                                • Is this Duplicate content?
                                  DougRoberts
                                  DougRoberts
                                  0
                                  5
                                  164

                                • Is it a duplicate content ?
                                  Chris.Menke
                                  Chris.Menke
                                  0
                                  2
                                  143

                                • Does posting a source to the original content avoid duplicate content risk?
                                  EGOL
                                  EGOL
                                  0
                                  14
                                  402

                                • Competitors and Duplicate Content
                                  Cyrus-Shepard
                                  Cyrus-Shepard
                                  0
                                  8
                                  579

                                Get started with Moz Pro!

                                Unlock the power of advanced SEO tools and data-driven insights.

                                Start my free trial
                                Products
                                • Moz Pro
                                • Moz Local
                                • Moz API
                                • Moz Data
                                • STAT
                                • Product Updates
                                Moz Solutions
                                • SMB Solutions
                                • Agency Solutions
                                • Enterprise Solutions
                                • Digital Marketers
                                Free SEO Tools
                                • Domain Authority Checker
                                • Link Explorer
                                • Keyword Explorer
                                • Competitive Research
                                • Brand Authority Checker
                                • Local Citation Checker
                                • MozBar Extension
                                • MozCast
                                Resources
                                • Blog
                                • SEO Learning Center
                                • Help Hub
                                • Beginner's Guide to SEO
                                • How-to Guides
                                • Moz Academy
                                • API Docs
                                About Moz
                                • About
                                • Team
                                • Careers
                                • Contact
                                Why Moz
                                • Case Studies
                                • Testimonials
                                Get Involved
                                • Become an Affiliate
                                • MozCon
                                • Webinars
                                • Practical Marketer Series
                                • MozPod
                                Connect with us

                                Contact the Help team

                                Join our newsletter
                                Moz logo
                                © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                                • Accessibility
                                • Terms of Use
                                • Privacy