GSC is reporting a lot of chopped URLs
-
Recently, in the last two weeks, I started seeing a lot of odd 404 errors in GSC for my site. Upon investigation, the URLs are for fairly new articles, and the URLs are chopped in various places. From missing a character at the end to missing about 10 characters at the end of the URL. (an old similar issue is that GSC reports duplicate contents on weird subdomains that we've never used like 'smtp' 'ww1' or even random ones like 'bobo'.)
GSC doesn't report any 'linked from' for those odd URLs and I know for sure these links aren't on the site itself. They're definitely not errors in the CMS.
The site is a long established site (started 1997-1998) and we've been subject to a lot of negative SEO. I recently had to disavow about 1000 .ru domain linking to us, with some domains containing over a million link each.
Could these chopped links be a new tactic of negative SEO? How do I find these seemingly intentionally broken links to us?
-
Thanks for the question. It isn't uncommon for there to be strange 404 errors in Search Console with little information/bad information. They are working hard to improve this, but I wouldn't take everything you see there as set-in-stone.
This doesn't sound like a negative SEO tactic. I would just mark them all as fixed, and see if they appear again in about a week. If they do, I'd make sure they are actually served as 4xx status and not worry too much about it. If you want to do more digging...
Some ideas of where you could look further
- Logs logs logs. This will be the ultimate truth - you will be able to see whether or not GoogleBot is actually hitting those URLs.
- It could be something weird happening with a plugin of yours that generates those URLs (particularly on Wordpress).
- Perhaps you have a filtering system setup that generates these URLs?
- If you have a search function on the site, sometimes weird URLs can be generated through that.
- Do the URLs come-up when you crawl the site at all?
Just a few ideas!