Spam pages being redirected to 404s but sill indexed
-
Client had a website that was hacked about a year ago. Hackers went in and added a bunch of spam landing pages for various products. This was before the site had installed an SSL certificate.
After the hack, the site was purged of the hacked pages and and SLL certificate was implemented. Part of that process involved setting up a rewrite that redirects http pages to the https versions.
The trouble is that the spam pages are still being indexed by Google, even months later. If I do a site: search I still see all of those spam pages come up before most of the key "real" landing pages. The thing is, the listing on the SERP are to the http versions, so they're redirecting to the https version before serving a 404.
Is there any way I can fix this without removing the rewrite rule?
-
I'd recommend putting all of the urls to deindex into a sitemap, set LASTMOD date to something recent and submit for google to recrawl.
If possible, set the status codes on those pages to 410 as well.
-
You could also redirect those pages with a 301 directly to the 404 page. Or you could block those pages on robots.txt if you don't need them anymore.
-
In addition to the above, you can request removal from Google's index in Search Console
https://support.google.com/webmasters/answer/1663419?hl=en
As noted, the removal is temporary (90 days), but if you've removed the pages and any links to them, then they won't reappear.
What I would do is just check that your sitemap is up to date, and there aren't any legacy sitemaps hanging about that might still reference the pages, and also run a crawl of your site to ensure that there aren't any remaining links to these pages hanging about.