Rel=canonical on pre-migration website
-
I have an e-commerce client that is migrating platforms. The current structure of their existing website has led to what I would believe to be mass duplicate content. They have something north of 150,000 indexed URLs. However, 143,000+ of these have query strings and the content is identical to pages without any query string. Even so, the site does pretty well from an organic stand point compared to many of its direct competitors.
Here is my question:
(1) I am assuming that I should go into WMT (Google/Bing) and tell both search engines to ignore query strings.
(2) In a review of back links, it does appear that there is a mish mash of good incoming links both to the clean and the dirty URLs. Should I add a rel=canonical via a script to all the pages with query strings before we make our migration and allow the search engines some time to process?
(3) I'm assuming I can continue to watch the indexation of the URLs, but should I also tell search engines to remove the URLs of the dirty URLs?
(4) Should I do Fetch in WMT? And if so, what sequence should I do for 1-4.
How long should I wait between doing the above and undertaking the migration?
-
Hi there
Any query string URLs should be canonicalized to their static URL. You can also tell Google how to handle these URLs in Search Console. I wouldn't tell Google to ignore them, however.
Do not tell Google to remove the dirty URLs - if you have a canonical tag and review Search Console, you will be fine - that's what these tags and resources are for.
Your test site should be noindexed and blocked by robots.txt so it's not being picked up by crawlers. You should be making sure your pages are canonicalized to the proper URLs well before migration. Also, make sure your sure you review Google's migration resources as well.
Hope this helps! Good luck!
-
Thanks -
I'm not terribly worried about the test site as we use a password protected and IP blocked development domain that is completely different from the root domain. Its not even a subdomain. Eg. www.realsite.com and www.testdomain.com
My dev team is trying to get me to wait and just do a massive 301 redirect > moving the URLs with the query strings (old site) to new page (e.g. multiple many:1) vs doing the canoncial. The new site won't create the query string issue.
The challenge I see is that the 150,000+ indexed URLs really should be around 7,000, so the organic value of the real 7,000 pages (other than possibly the root domain) are probably getting punished, even though the site is doing decently well.