Website blog is hacked. Whats the best practice to remove bad urls
-
Hello
So our site was hacked which created a few thousand spam URLs on our domain. We fixed the issue and changed all the spam urls now return 404. Google index shows a couple of thousand bad URLs.
My question is-
What's the fastest way to remove the URLs from google index. I created a site map with sof the bad urls and submitted to Google. I am hoping google will index them as they are in the sitemap and remove from the index, as they return 404.
Any tools to get a full list of google index? ( search console downloads are limited to 1000 urls). A Moz site crawl gives larger list which includes URLs not in Google index too. Looking for a tool that can download results from a site: search.
Any way to remove the URLs from the index in bulk? Removing them one by one will take forever.
Any help or insight would be very appreciated.
-
Technically 404 means "temporarily unavailable but coming back later" so you might want to consider Status 410 instead of 404. You could also supplement it with Meta no-index, if you can't use the HTML implementation then fire the no-index directive through the HTTP header using X-robots:
https://developers.google.com/search/reference/robots_meta_tag (scroll down a little to find the relevant part)
E.g:
"HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: noindex
(…)"... something like that.
You can't use Search Console to remove URLs from Google at all. The remove URL tool, only removes URLs one at a time and it only does so 'temporarily', the URLs pop back again after a bit. The best thing you can do is give Google some harsher directives and hope they listen, in a month or two most of those should be gone
Don't use robots.txt on the URLs as, if Google can't crawl them it won't find the 410s or the no-index directives