Removing Parameterized URLs from Google Index
-
We have duplicate eCommerce websites, and we are in the process of implementing cross-domain canonicals. (We can't 301 - both sites are major brands). So far, this is working well - rankings are improving dramatically in most cases.
However, what we are seeing in some cases is that Google has indexed a parameterized page for the site being canonicaled (this is the site that is getting the canonical tag - the "from" page). When this happens, both sites are being ranked, and the parameterized page appears to be blocking the canonical.
The question is, how do I remove canonicaled pages from Google's index? If Google doesn't crawl the page in question, it never sees the canonical tag, and we still have duplicate content. Example:
A. www.domain2.com/productname.cfm%3FclickSource%3DXSELL_PR is ranked at #35, and
B. www.domain1.com/productname.cfm is ranked at #12.
(yes, I know that upper case is bad. We fixed that too.)
Page A has the canonical tag, but page B's rank didn't improve. I know that there are no guarantees that it will improve, but I am seeing a pattern.
Page A appears to be preventing Google from passing link juice via canonical. If Google doesn't crawl Page A, it can't see the rel=canonical tag. We likely have thousands of pages like this. Any ideas? Does it make sense to block the "clicksource" parameter in GWT? That kind of scares me.
-
The reason that duplicate content is an issue is because Google will eventually drop one of the duplicates from its index, and if you have both page A and page B canonicalized to page B, Google will eventually drop page A in favor of page B.
If you are concerned that page A is not being crawled, you can use Fetch as Google to ask Google to recrawl it. But if A is not being crawled and B is, again B will be chosen over A in the results pages.