Truncate page URLs
-
We have some pages (for example a contact us form) for which the URL is modified by the CMS depending on the referring page (this helps to put the form submission in context for the sales reps who get the contact submission).
The SEOmoz crawler considers each URL a new page -- and so numbers like in diagnostics are all inflated as the same page is listed multiple times (e.g. for too many links)
Is there a setting to change what the crawler considers to be the same page?
Here are two URLs for the same page that the reports treat as separate pages:
http://www.spirent.com/About-Us/Contact_us.aspx?referurl=0F528F4D703D8BB3523738D6373AA8AD
http://www.spirent.com/About-Us/Contact_us.aspx?referurl=10ACDA6055244E369395223437FDCF30
The page is actually: http://www.spirent.com/About-Us/Contact_us.aspx
Thanks
Ken
-
As you can see here, this is an issue as Google are indexing many variations of the same page although this means that somewhere is linking to them unless your site is set up so that even a crawler passing through links to your contact page is creating the query parameter in the URL's.
To resolve this, you need to add the following to your robots.txt file:-
Disallow: ?referurl=
This will prevent any URL's passing that query parameter from getting crawled and indexed ensuring that only the originals of the pages will appear in search engines and not flag as duplicate content.
Hopefully, someone from SEOmoz can add as to whether there is an option for obeying robots.txt directives within their crawler so that these URL's are not listed as I'm not sure.