The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. On-Page / Site Optimization
    4. Locating Duplicate Pages

    Locating Duplicate Pages

    On-Page / Site Optimization
    15 6 241
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ChrisHolgate
      ChrisHolgate last edited by

      Hi,

      Our website consists of approximately 15,000 pages however according to our Google Webmaster Tools account Google has around 26,000 pages for us in their index.

      I have run through half a dozen sitemap generators and they all only discover the 15,000 pages that we know about.  I have also thoroughly gone through the site to attempt to find any sections where we might be inadvertently generating duplicate pages without success.

      It has been over six months since we did any structural changes (at which point we did 301's to the new locations) and so I'd like to think that the majority of these old pages have been removed from the Google Index.  Additionally, the number of pages in the index doesn't appear to be going down by any discernable factor week on week.

      I'm certain it's nothing to worry about however for my own peace of mind I'd like to just confirm that the additional 11,000 pages are just old results that will eventually disappear from the index and that we're not generating any duplicate content.

      Unfortunately there doesn't appear to be a way to download a list of the 26,000 pages that Google has indexed so that I can compare it against our sitemap.  Obviously I know about site:domain.com however this only returned the first 1,000 results which all checkout fine.

      I was wondering if anybody knew of any methods or tools that we could use to attempt to identify these 11,000 extra pages in the Google index so we can confirm that they're just old pages which haven’t fallen out of the index yet and that they’re not going to be causing us a problem?

      Thanks guys!

      1 Reply Last reply Reply Quote 0
      • Travis-W
        Travis-W last edited by

        Be sure to try different combinations of 'site:www.domain.com' and 'site:domain.com'.  They will all yield different results.

        Sounds to me like you probably have an internal search engine that is generating search results pages based off the search term, and each different results page is a piece of duplicate content.

        ChrisHolgate 1 Reply Last reply Reply Quote 0
        • ahead4
          ahead4 last edited by

          Does your website force 'www.'?

          Both yourdomain.com and www.yourdomain.com are separate sites and can have different pages spidered.

          ChrisHolgate 1 Reply Last reply Reply Quote 0
          • Asher
            Asher last edited by

            it could be that your tags and categories are considered individual pages and therefore creating their own permalink: ex: http:www.example.com/keyword, and http://www.example.com/tag/keyword and http://www.example.com/category/keyword.  Another way would be to check the sitemaps you have in webmaster tools and compare those to each other.  Just a suggestion.

            ChrisHolgate 1 Reply Last reply Reply Quote 0
            • MikeRoberts
              MikeRoberts last edited by

              Have you checked for instances where a page parameter is being seen as another version of the same page? One of the sites I work for had an issue a few months back where every instance of a product page was being flagged as duplicate content because of an oversight. We had one of our coders write a clause into the page where every time a page loaded with a parameter such as ?color=72 it would canonicalize it to the page minus the parameter. This decreased our duplicate content warnings quickly and effectively.

              ChrisHolgate 1 Reply Last reply Reply Quote 0
              • TommyTan
                TommyTan last edited by

                Hi Chris,

                Google Webmaster has a tool that helps identify duplicate HTMLs and maybe you can use that to see if the 11,000 pages are duplicate.  IF they are, I am assuming they should have the duplicate Title Tag and etc. which the tool may discover.

                ChrisHolgate 1 Reply Last reply Reply Quote 0
                • ChrisHolgate
                  ChrisHolgate @ahead4 last edited by

                  Thanks for your reply.  Indeed our website does force www. if someone were to attempt to navigate to us without prefixing www.

                  1 Reply Last reply Reply Quote 0
                  • ChrisHolgate
                    ChrisHolgate @Travis-W last edited by

                    I'm not certain if this is the case as our search engine requires physical input in order to yield a result.  I don't know if it helps but the URL is http://bit.ly/4Cogchww if you fancy taking a look   🙂

                    Travis-W ChrisHolgate 4 Replies Last reply Reply Quote 0
                    • ChrisHolgate
                      ChrisHolgate @MikeRoberts last edited by

                      We did have this at the beginning of the year when we used a ?dispmode=grid and ?dispmode=list to change the way our results were displayed.  This has been rectified however by us completely removing the option and any instances of dispmode present in the URL force a 301 to the correct master page.  There are still a few hundred instances of this dispmode being present in the Google index but 99% of them have fallen out now.

                      I have checked and double checked and we don't seem to have any issues like this at present.

                      1 Reply Last reply Reply Quote 0
                      • ChrisHolgate
                        ChrisHolgate @Asher last edited by

                        Many thanks for your response.  Our site is an eCommerce site that doesn't employ tags as such and our categories are all accounted for in the 15,000 page figure.

                        1 Reply Last reply Reply Quote 0
                        • ChrisHolgate
                          ChrisHolgate @TommyTan last edited by

                          Many thanks for your input on this.  I have actually looked at this through the HTML improvements section of GWMT however I am showing only a few dozen duplicated titles / descriptions and this is simply due to the product categories being almost identical (for example HP Deskjet 500 and HP Deskjet 500+)

                          1 Reply Last reply Reply Quote 0
                          • Travis-W
                            Travis-W @ChrisHolgate last edited by

                            I can't get that link to work.

                            What I said before still applies with physical input (this is what I assumed when I said it).

                            For example, user inputs the words "snakes and dogs" and clicks search.  The new URL is "www.yoursite.com/search?q=snakes and dogs"  All these weird URL pages need noindex meta tags or Google will flag them as duplicate content because, for example, this page and the result for "dogs and snakes" generate almost the same page.

                            Does that make sense?
                            It is in Google's Webmaster Guidelines that you should noindex these pages.

                            1 Reply Last reply Reply Quote 0
                            • ChrisHolgate
                              ChrisHolgate @ChrisHolgate last edited by

                              Sorry, I'm not sure what happened to that bit.ly address - The actual address of the website is www.refreshcartridges.co.uk.

                              Ah, I see what you mean about the search results now however this hopefully shouldn't be an issue as for security (our web guy said something about injections) the URL that is returned irrespective of what is searched for is http://www.refreshcartridges.co.uk/advanced_search_result.php

                              Thanks again!

                              1 Reply Last reply Reply Quote 0
                              • Travis-W
                                Travis-W @ChrisHolgate last edited by

                                Hmm, I'm not too knowledgeable about php pages.  Sorry!

                                1 Reply Last reply Reply Quote 0
                                • ChrisHolgate
                                  ChrisHolgate @ChrisHolgate last edited by

                                  It's cool.  Sorry, the point I was making is that irrespective of what you search for the page that is returned is http://www.refreshcartridges.co.uk/advanced_search_result.php (with nothing after the .php) and as such the search results page couldn't spurn multiple pages which could be indexed by Google.

                                  1 Reply Last reply Reply Quote 0
                                  • 1 / 1
                                  • First post
                                    Last post
                                  • How to optimize WordPress Pages with Duplicate Page Content?
                                    lautman
                                    lautman
                                    0
                                    4
                                    125

                                  • Duplicate pages
                                    evolvingSEO
                                    evolvingSEO
                                    0
                                    9
                                    110

                                  • Duplicate Page Content for Product Pages
                                    Marcus_Miller
                                    Marcus_Miller
                                    0
                                    4
                                    228

                                  • Opinions please on Duplicate page titles & too many on-page links warnings.-
                                    CSC
                                    CSC
                                    0
                                    3
                                    298

                                  • Is reported duplication on the pages or their canonical pages?
                                    Safelincs
                                    Safelincs
                                    0
                                    4
                                    355

                                  • Duplicate page
                                    NakulGoyal
                                    NakulGoyal
                                    0
                                    2
                                    337

                                  • My website is saying I have duplicate page content and page title. How do I fix it?
                                    SEOKeith
                                    SEOKeith
                                    0
                                    4
                                    505

                                  • How to fix duplicate page content and page titles?
                                    prospects
                                    prospects
                                    1
                                    3
                                    403

                                  Get started with Moz Pro!

                                  Unlock the power of advanced SEO tools and data-driven insights.

                                  Start my free trial
                                  Products
                                  • Moz Pro
                                  • Moz Local
                                  • Moz API
                                  • Moz Data
                                  • STAT
                                  • Product Updates
                                  Moz Solutions
                                  • SMB Solutions
                                  • Agency Solutions
                                  • Enterprise Solutions
                                  • Digital Marketers
                                  Free SEO Tools
                                  • Domain Authority Checker
                                  • Link Explorer
                                  • Keyword Explorer
                                  • Competitive Research
                                  • Brand Authority Checker
                                  • Local Citation Checker
                                  • MozBar Extension
                                  • MozCast
                                  Resources
                                  • Blog
                                  • SEO Learning Center
                                  • Help Hub
                                  • Beginner's Guide to SEO
                                  • How-to Guides
                                  • Moz Academy
                                  • API Docs
                                  About Moz
                                  • About
                                  • Team
                                  • Careers
                                  • Contact
                                  Why Moz
                                  • Case Studies
                                  • Testimonials
                                  Get Involved
                                  • Become an Affiliate
                                  • MozCon
                                  • Webinars
                                  • Practical Marketer Series
                                  • MozPod
                                  Connect with us

                                  Contact the Help team

                                  Join our newsletter
                                  Moz logo
                                  © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                                  • Accessibility
                                  • Terms of Use
                                  • Privacy