Why do I get duplicate pages, website referencing the capital version of the url vs the lowercase www.agi-automation.com/Pneumatic-grippers.htm
-
Can I the rel=canonical tag this?
-
The problem is that search engines view these URLs as two separate pages, so both pages get indexed and you run into duplication issues.
Yes, using rel=canonical is a good way to handle this. I would suggest using the lowercase version as your canonical page, so you would place this bit of HTML on both pages:
The other option is to create a 301 redirect from the caps version to the lowercase version. This would ensure that anyone arriving at the page (including search engine bots) would end up being directed to the lowercase version.
-
I'd vote for doing the rewrite to the lowercase version. This gives you a couple of added benefits:
-
If people copy and paste the URL from their browser then link to it, you're getting all the links going to the same place.
-
Your analytics based on your URLs will be more accurate. Instead of seeing:
urla.htm 70 visits
urlb.htm 60 visits
urlB.htm 30 visitsYou'll see
urlb.htm 90 visits
urla.htm 70 visits -
-
Excellent points, Keri. I hadn't thought about either of those issues. Using a redirect is definitely the best way to go.

-
Hi Keri and Tim,
Thanks for your responses. This is what the IT team has found. Let me know your thoughts:
On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.
To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Linear-EscapeMents.htm). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.
What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.
So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."
-
Hi Tim,
Thanks for your responses. This is what the IT team has found. Let me know your thoughts:
On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.
To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Lin...). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.
What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.
So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."
-
I'm not a pro when it comes to technical server set ups, so maybe Keri can jump in with some better knowledge.
It seems to me like you have everything set up on your server correctly. And it looks like Google currently has only one version indexed of the original page in question.
You site navigation menu points to the capitalized version of the URL, but somewhere on your site there must be a link that points to the lowercase version which would explain how SEOmoz found the duplication when crawling your site, and if SEOmoz can find, so can Google.
I still think you should use the rel=canonical attribute just to be safe. Again, I'm not that great at technical stuff. Sorry I couldn't be of more help here.
Tim