Rel canonical and duplicate subdomains
-
Hi,
I'm working with a site that has multiple sub domains of entirely duplicate content. So, the production level site that visitors see is (for made-up illustrative example):
Then, there are sub domains which are used by different developers to work on their own changes to the production site, before those changes are pushed to production:
Google ends up indexing these duplicate sub domains, which is of course not good.
If we add a canonical tag to the head section of the production page (and therefor all of the duplicate sub domains) will that cause some kind of problem... having a canonical tag on a page pointing to itself? Is it okay to have a canonical tag on a page pointing to that same page?
To complete the example...
In this example, where our production page is 123abc456.edu, our canonical tag on all pages (this page and therefor the duplicate subdomains) would be:
Is that going to be okay and fix this without causing some new problem of a canonical tag pointing to the page it's on?
Thanks!
-
Assuming that you do not need the development environments indexed in Google, why not simply block all crawlers on those subdomains?
-
Hi Bob,
Thanks for the suggestion/question. I'm thinking about that, but wouldn't putting some robots do not crawl text on pages already indexed be a little like closing the barn door after the horses left? Do you think it would un-index the already crawled sub-domain? Thanks!
-
This should be exactly what you need: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1663427
-
Well, Bob, it looks like you're right! I guess it will for sure see all the pages in
as the ones to remove and not
Also, how does that robots text not get pushed to production as the developer working on that branch completes his work and pushes it to production.
I must confess, it still feels a little like bomb disposal.
-
Is the subdomain data stored on the server as directories?
So for example, is the Moe.123abc456.edu data stored in a folder like 123abc456.edu/Moe
If so, you can simply have one robots.txt on your root domain, blocking those directories
Disallow: /Moe/
-
Hi Bob,
That excellent question I'll have to look in to and confirm. More later. Thanks!