404s in Google Search Console and javascript
-
The end of April, we made the switch from http to https and I was prepared for a surge in crawl errors while Google sorted out our site. However, I wasn't prepared for the surge in impossibly incorrect URLs and partial URLs that I've seen since then.
I have learned that as Googlebot grows up, he'she's now attempting to read more javascript and will occasionally try to parse out and "read" a URL in a string of javascript code where no URL is actually present. So, I've "marked as fixed" hundreds of bits like
/TRo39,
category/cig
etc., etc....But they are also returning hundreds of otherwise correct URLs with a .html extension when our CMS system generates URLs with a .uts extension like this:
https://www.thompsoncigar.com/thumbnail/CIGARS/90-RATED-CIGARS/FULL-CIGARS/9012/c/9007/pc/8335.html
when it should be:
https://www.thompsoncigar.com/thumbnail/CIGARS/90-RATED-CIGARS/FULL-CIGARS/9012/c/9007/pc/8335.utsWorst of all, when I look at them in GSC and check the "linked from" tab it shows they are linked from themselves, so I can't backtrack and find a common source of the error.
Is anyone else experiencing this? Got any suggestions on how to stop it from happening in the future? Last month it was 50 URLs, this month 150, so I can't keep creating redirects and hoping it goes away.
Thanks for any and all suggestions!
Liz Micik -
Google Search Console 404s can be very irritating.
What does your robots.txt file currently say?
You can remove all instances of .html in your htaccess file to help solve for this issue. More on that here: https://alexcican.com/post/how-to-remove-php-html-htm-extensions-with-htaccess/
-
Hi Liz,
What I would do as well is go with a solution around your robots.txt to make sure the crawlers will respect it and don't go on a hunch trying to find new URLs that are embedded somewhere else. Usually it's something you shouldn't worry about too much, it's just the crawler doing a good job trying to find more content/URLs on your site.
Martijn.