The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Duplicate pages in Google index despite canonical tag and URL Parameter in GWMT

    Duplicate pages in Google index despite canonical tag and URL Parameter in GWMT

    Technical SEO Issues
    5 3 2.1k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Tinhat
      Tinhat last edited by

      Good morning Moz...

      This is a weird one. It seems to be a "bug" with Google, honest...

      We migrated our site www.three-clearance.co.uk to a Drupal platform over the new year. The old site used URL-based tracking for heat map purposes, so for instance

      www.three-clearance.co.uk/apple-phones.html

      ..could be reached via

      www.three-clearance.co.uk/apple-phones.html?ref=menu or

      www.three-clearance.co.uk/apple-phones.html?ref=sidebar and so on.

      GWMT was told of the ref parameter and the canonical meta tag used to indicate our preference. As expected we encountered no duplicate content issues and everything was good.

      This is the chain of events:

      1. Site migrated to new platform following best practice, as far as I can attest to.

      2. Only known issue was that the verification for both google analytics (meta tag) and GWMT (HTML file) didn't transfer as expected so between relaunch on the 22nd Dec and the fix on 2nd Jan we have no GA data, and presumably there was a period where GWMT became unverified.

      3. URL structure and URIs were maintained 100% (which may be a problem, now)

      4. Yesterday I discovered 200-ish 'duplicate meta titles' and 'duplicate meta descriptions' in GWMT. Uh oh, thought I. Expand the report out and the duplicates are in fact ?ref= versions of the same root URL. Double uh oh, thought I.

      5. Run, not walk, to google and do some Fu:

      http://is.gd/yJ3U24 (9 versions of the same page, in the index, the only variation being the ?ref= URI)

      Checked BING and it has indexed each root URL once, as it should.

      Situation now:

      1. Site no longer uses ?ref= parameter, although of course there still exists some external backlinks that use it. This was intentional and happened when we migrated.

      2. I 'reset' the URL parameter in GWMT yesterday, given that there's no "delete" option. The "URLs monitored" count went from 900 to 0, but today is at over 1,000 (another wtf moment)

      I also resubmitted the XML sitemap and fetched 5 'hub' pages as Google, including the homepage and HTML site-map page.

      1. The ?ref= URls in the index have the disadvantage of actually working, given that we transferred the URL structure and of course the webserver just ignores the nonsense arguments and serves the page. So I assume Google assumes the pages still exist, and won't drop them from the index but will instead apply a dupe content penalty. Or maybe call us a spam farm. Who knows.

      Options that occurred to me (other than maybe making our canonical tags bold or locating a Google bug submission form 😄 ) include

      A) robots.txt-ing .?ref=. but to me this says "you can't see these pages", not "these pages don't exist", so isn't correct

      B) Hand-removing the URLs from the index through a page removal request per indexed URL

      C) Apply 301 to each indexed URL (hello BING dirty sitemap penalty)

      D) Post on SEOMoz because I genuinely can't understand this.

      Even if the gap in verification caused GWMT to forget that we had set ?ref= as a URL parameter, the parameter was no longer in use because the verification only went missing when we relaunched the site without this tracking. Google is seemingly 100% ignoring our canonical tags as well as the GWMT URL setting - I have no idea why and can't think of the best way to correct the situation.

      Do you? 🙂

      Edited To Add: As of this morning the "edit/reset" buttons have disappeared from GWMT URL Parameters page, along with the option to add a new one. There's no messages explaining why and of course the Google help page doesn't mention disappearing buttons (it doesn't even explain what 'reset' does, or why there's no 'remove' option).

      1 Reply Last reply Reply Quote 0
      • Vizergy
        Vizergy last edited by

        They arent in your xml sitemap are they? You probably generated a new one when you moved the site over... that could possibly be overriding the parameters... maybe... weird...

        Tinhat 1 Reply Last reply Reply Quote 0
        • Tinhat
          Tinhat @Vizergy last edited by

          Nope, nice clean site map that GWMT says provides the right number of URLs with no 404s and no ?ref= links.

          It's like Google has always indexed these links separately but for some reason has decided to only show them now they no longer exist..

          1 Reply Last reply Reply Quote 0
          • Tinhat
            Tinhat last edited by

            Monday morning, still the same, still no reset/add parameters buttons in GMWT any more, still not understanding why Google is being so stubborn about this.

            Sample: http://www.google.co.uk/search?q=site:three-clearance.co.uk+"Beats+Audio,+the+Sensation+XL"&num=30&hl=en&client=safari&tbo=d&rls=en&filter=0&biw=1920&bih=915

            3 identical pages in the index, Google ignoring both GWMT URL parameter and canonical meta tag.

            Sigh.

            1 Reply Last reply Reply Quote 0
            • Dr-Pete
              Dr-Pete last edited by

              GWT numbers sometimes ignore parameter handling, oddly, and can be hard to read. I'm only seeing about 40 indexed pages with "ref" in the URL, which hardly seems disastrous. One note - once the pages get indexed, for whatever reason, de-indexing can take weeks, even if you do everything correctly. Don't change tactics every couple of days, or you're only going to make this worse, long-term. I think canonicals are fine for this, and they should be effective. It just may take Google some time to re-crawl and dis-lodge the pages. You actually may want to create an XML sitemap (for Google only) that just contains the "ref=" pages Google has indexed. This can nudge them to re-crawl and honor the canonical. Otherwise, the pages could sit there forever. You could 301-redirect - it would be perfectly valid in this case, since those URLs have no value to visitors. I wouldn't worry about the Bing sitemaps - just don't include the "ref=" URLs in the Bing maps, and you'll be fine.

              1 Reply Last reply Reply Quote 1
              • 1 / 1
              • First post
                Last post
              • Over 40+ pages have been removed from the indexed and this page has been selected as the google preferred canonical.
                willcritchlow
                willcritchlow
                0
                4
                69

              • Do URLs with canonical tags get indexed by Google?
                zasite
                zasite
                0
                5
                2.4k

              • Canonical Tags on Parameter Pages With Hreflang
                TeespringMoz
                TeespringMoz
                0
                3
                860

              • Page disappeared from Google index. Google cache shows page is being redirected.
                shop.nordstrom
                shop.nordstrom
                0
                5
                761

              • Canonical - how can you tell if page is appearing duplicate in Google?
                RobMay
                RobMay
                0
                3
                470

              • Will Google Continue to Index the Page with NoIndex Tag Upon Google +1 Button Impression or Click?
                STPseo
                STPseo
                0
                3
                636

              • Canonical tag in preferred and duplicate page
                NRMA
                NRMA
                0
                2
                541

              • Existing Pages in Google Index and Changing URLs
                hiphound
                hiphound
                0
                9
                1.1k

              Get started with Moz Pro!

              Unlock the power of advanced SEO tools and data-driven insights.

              Start my free trial
              Products
              • Moz Pro
              • Moz Local
              • Moz API
              • Moz Data
              • STAT
              • Product Updates
              Moz Solutions
              • SMB Solutions
              • Agency Solutions
              • Enterprise Solutions
              • Digital Marketers
              Free SEO Tools
              • Domain Authority Checker
              • Link Explorer
              • Keyword Explorer
              • Competitive Research
              • Brand Authority Checker
              • Local Citation Checker
              • MozBar Extension
              • MozCast
              Resources
              • Blog
              • SEO Learning Center
              • Help Hub
              • Beginner's Guide to SEO
              • How-to Guides
              • Moz Academy
              • API Docs
              About Moz
              • About
              • Team
              • Careers
              • Contact
              Why Moz
              • Case Studies
              • Testimonials
              Get Involved
              • Become an Affiliate
              • MozCon
              • Webinars
              • Practical Marketer Series
              • MozPod
              Connect with us

              Contact the Help team

              Join our newsletter
              Moz logo
              © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
              • Accessibility
              • Terms of Use
              • Privacy