Clarification regarding robots.txt protocol
-
Hi,
I have a website , and having 1000 above url and all the url already got indexed in Google . Now am going to stop all the available services in my website and removed all the landing pages from website. Now only home page available . So i need to remove all the indexed urls from Google . I have already used robots txt protocol for removing url. i guess it is not a good method for adding bulk amount of urls (nearly 1000) in robots.txt . So just wanted to know is there any other method for removing indexed urls.
Please advice. -
If the target is to get the URLs out of the search engine index than there are the few solutions can work for you:
- The one your mentioned: I think it’s bad to add 1000+ URLs in robots.txt file its make sense for your business.
- Adding meta no-index tag to the pages (if pages physically exist).
Also in order to quickly remove them from the index you can update robots.txt file and then go to GWC and use remove URL feature.
Just a thought!
-
There are a few ways to do this.
First, I would use the Google Removal Tool to remove those URLs. More information here: https://support.google.com/webmasters/answer/1663419?hl=en
Then, using the robots.txt file is good, you need to make sure that you're listing the correct URLs or URL path there.
I would make sure that you are using a "410 Gone" in the server header, and not a 404 error. The 410 Gone will get those URLs removed faster.
-
If the pages are already indexed and you want them to be completely removed, you need to allow the crawlers in robots.txt and noindex the individual pages.
So if you just block the site with robots.txt (and I recommend blocking via folders or variables, not individual pages) while the pages are indexed, they will continue to appear in search results but have a meta description of (this page is being blocked by robots.txt). However, it will continue to rank and appear because of the cached data.
If you add the noindex tags to your pages instead, the next time crawlers visit the pages they will see the new tag and remove the page from the search index (meaning it won't show up at all). However, make sure your robots.txt isn't blocking the crawlers from seeing this updated code.