Best blocking solution for Google
-
Posting this for Dave SottimanoI Here's the scenario: You've got a set of URLs indexed by Google, and you want them out quickly Once you've managed to remove them, you want to block Googlebot from crawling them again - for whatever reason. Below is a sample of the URLs you want blocked, but you only want to block /beerbottles/ and anything past it: www.example.com/beers/brandofbeer/beerbottles/1 www.example.com/beers/brandofbeer/beerbottles/2 www.example.com/beers/brandofbeer/beerbottles/3 etc.. To remove the pages from the index should you?: Add the Meta=noindex,follow tag to each URL you want de-indexed Use GWT to help remove the pages Wait for Google to crawl again If that's successful, to block Googlebot from crawling again - should you?: Add this line to Robots.txt: DISALLOW */beerbottles/ Or add this line: DISALLOW: /beerbottles/ "To add the * or not to add the *, that is the question" Thanks! Dave
-
Hi Goodnewscowboy,
To block the whole folder you dont need to use the wild card (*)
and I advise you to also do these steps:
- Verify your ownership of the site in Webmaster Tools.
- On the Webmaster Tools home page, click the site you want.
- On the Dashboard, click Site configuration in the left-hand navigation.
- Click Crawler access, and then click Remove URL.
- Click New removal request.
- Type the URL of the page you want removed, and then click Continue. Note that the URL is case-sensitive—you will need to submit the URL using exactly the same characters and the same capitalization that the site uses.
- Select Remove page from cache only.
- Select the checkbox to confirm that you have completed the requirements listed in this article, and then clickSubmit Request.
Cheers
-
I believe you can confirm the block via the webmaster tools also.
-
I would put noindex,follow on those page and wait a little until they disappear for Google index. Of course, if you have only a few pages, I would do it manually in GWT. If you have rather big site with a good crawl rate, this should be done in a few days.
When you don't see them anymore, you may use DISALLOW */beerbottles/ but this could be annoying later. I would recommend to use the meta robots as you have more control on it. It will allow page rank to flow in the beerbottles pages too !
-
Following up here -- did this answer Dave's question?