RegEx help needed for robots.txt potential conflict
-
I've created a robots.txt file for a new Magento install and used an existing site-map that was on the Magento help forums but the trouble is I can't decipher something. It seems that I am allowing and disallowing access to the same expression for pagination. My robots.txt file (and a lot of other Magento site-maps it seems) includes both:
Allow: /*?p=
and
Disallow: /?p=&
I've searched for help on RegEx and I can't see what "&" does but it seems to me that I'm allowing crawler access to all pagination URLs, but then possibly disallowing access to all pagination URLs that include anything other than just the page number?
I've looked at several resources and there is practically no reference to what "&" does...
Can anyone shed any light on this, to ensure I am allowing suitable access to a shop?
Thanks in advance for any assistance
-
Hey James
It looks to me like you are just disallowing access to any URLs that have more than the initial p= variable. So, you are reducing the impact of potential duplication through searches and the like.
Good
?p=1
Bad
?p=1&q=search string
I am no magento expert but this seems to be a simple attempt to reduce the myriad duplication that can happen with search pages and the like inside a complex CMS like Magento.
The SEOMoz crawler tool should give you some good insight and to be sure, try removing the 'Disallow: /?p=&' and see if you get a buckletload of duplicate content warnings.
Ultimately, the thing to remember here is that the & is part of the URL and not part of the regex.
Hope that helps!
Marcus