Getting thousands of errors in Crawl Diagnostics after changing website structure. Is there a problem with the crawler?
-
This post is deleted! -
I've been having similar problems with a site of mine. Apparently, Rogerbot does not follow the same rules as Google's spiders and, as a result, Roger should be taken as a rough guide instead of a definitive answer.
I know, this isn't a particularly helpful answer. If anyone could chime in with the specifics of what Roger does and does not crawl and how it relates to google's spider crawls I'd appreciate it as much as you.
-
Hey Tiberiu,
Thanks for posting your question. I took a look at several of the duplicate content/title errors and found them all to be valid. I am unfortunately not able to check all of them (597) but I checked about a dozen. By and large the common issue was that you're using self-referential canonical tags.
Here's a pretty solid explanation of how we deal with duplicates and canonical:
Assuming A, B, C, and D are all duplicates,If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicatesSo, those 4XX errors, I checked a few and they are definitely valid as well. Here's an example:
http://www.247airporttransfer.co.uk/author/ is a 404 that is reported in your Crawl Diagnostic. Checking the CSV shows me that that page is linked to from: http://www.247airporttransfer.co.uk/airports/gatwick/airport-transfer-to-and-from-gatwick/attachment/gatwick-airport-transfer2/ which gives a 200 OK HTTP response.I hope that makes sense. To touch on GWT, I can't really speak for what they will or won't show. I can tell you that our crawler is an analytical crawler and is going to report each error it finds.
Cheers,
Joel. -
Hi Joel,
I just wanted to thank you for your response to the question. It helped me.
-
Hi Joel,
I would also like to thank you for the answer. I'm not really experienced in the area so, I just wanted to ask you, how could I disable the self-referential canonical tags to solve the issue?
Hope you can help!
Thank you!
Tiberiu