Can I robots.txt an entire site to get rid of Duplicate content?
-
I am in the process of implementing Zendesk and will have two separate Zendesk sites with the same content to serve two separate user groups (for the same product-- B2B and B2C). Zendesk does not allow me the option to changed canonicals (nor meta tags). If I robots.txt one of the Zendesk sites, will that cover me for duplicate content with Google? Is that a good option? Is there a better option.
I will also have to change some of the canonicals on my site (mysite.com) to use the zendesk canonicals (zendesk.mysite.com) to avoid duplicate content. Will I lose ranking by changing the established page canonicals on my site go to the new subdomain (only option offered through Zendesk)?
Thank you.
-
Hi there!
Just for clarification, I'm really not sure what you mean by "robots.txt-ing" the site. Do you mean, should you use robots.txt to block crawlers from accessing the entire site? That would be fine, if you're not concerned about that site never ranking, ever.
-
Thank you. I do mean use robots.txt to block crawlers.
-
Hi,
I do mean use robots.txt to block crawlers.
What you need to do is first noindex the site in question and then after a period of time, you can disallow it via the robots.txt.
The reason you do it this way is because right now you will have pages from this site indexed in Google - these need to be removed first. You can either do this with the noindex META and wait for Google to spider the site and action all of the noindex requests, or to speed things up, noindex the page and then remove it with Webmaster Tools.
If you don't do this, you are then just blocking the site from Google ever seeing it, so you will probably find that pages remain in the index - which you don't want as this is duplicate content.
-Andy
-
What if the site is not live yet?
-
Just disallow in Robots. No need to do anything else.
-Andy