Rel=canonical - Identical .com and .us Version of Site
-
We have a .us and a .com version of our site that we direct customers to based on location to servers. This is not changing for the foreseeable future.
We had restricted Google from crawling the .us version of the site and all was fine until I started to see the https version of the .us appearing in the SERPs for certain keywords we keep an eye on.
The .com still exists and is sometimes directly above or under the .us. It is occasionally a different page on the site with similar content to the query, or sometimes it just returns the exact same page for both the .com and the .us results. This has me worried about duplicate content issues.
The question(s): Should I just get the https version of the .us to not be crawled/indexed and leave it at that or should I work to get a rel=canonical set up for the entire .us to .com (making the .com the canonical version)? Are there any major pitfalls I should be aware of in regards to the rel=canonical across the entire domain (both the .us and .com are identical and these newly crawled/indexed .us pages rank pretty nicely sometimes)? Am I better off just correcting it so the .us is no longer crawled and indexed and leaving it at that?
Side question: Have any ecommerce guys noticed that Googlebot has started to crawl/index and serve up https version of your URLs in the SERPs even if the only way to get into those versions of the pages are to either append the https:// yourself to the URL or to go through a sign in or check out page? Is Google, in the wake of their https everywhere and potentially making it a ranking signal, forcing the check for the https of any given URL and choosing to index that?
I just can't figure out how it is even finding those URLs to index if it isn't seeing http://www.example.com and then adding the https:// itself and checking...
Help/insight on either point would be appreciated.
-
I would set sitewide canonicals from both versions to the .com site. I wouldn't block any pages since people might still stumble and link back to the .us version.
I'm not positive about google auto-checking https versions of websites without any direction but it could be plausible. I know a common way that Google finds https urls is by going to the "My Account" or "My Cart" page which is https, which then changes any relative URLs from http to https, go G re-crawls all of those. Maybe that's what is happening on your end?
-
Rel=canonical is great for helping search engines serve the correct language or regional URL to searchers, but I'm not sure how it would work for two sites both purposed for the US (.us and .com).
What's the thought behind having two sites - is the .us site intended for Google US searches and .com the default for anything outside of the US? Are there language variations? What are the different "locations" you're referring to?