What is the best way to stop a page being indexed?
-
What is the best way to stop a page being indexed? Is it to implement robots.txt at a site level with a Robots.txt file in the main directory or at a page level with the tag?
-
Hi,
While the page-level robots meta tag is the best way to stop the page from being indexed, a domain-level robots.txt can save some bandwidth of the search engines. With robots.txt blocking in place, Google will not crawl the page from within the website but can pickup the URLs mentioned some where else on a third-party website. In cases like these, the page-level robots meta tag comes to the rescue. So, it would be best if the pages are blocked using robots.txt file as well as the page-level meta robots tag. Hope that helps.
Good luck friend.
Best regards,
Devanur Rafi
-
Why not both? Some cases one method is preferred over another, or in fact necessary. As with non html documents such as pdf, you may have to use the robots.txt to keep it from being indexed or header tags as well. I'll also give you another option, and that is to password protect a directory.
-
Thanks for the answers guys... Can I ask in the event that the Robots.txt file is implemented at the domain level but the mark up on the page is <meta name="robots" content="index, follow"> which one take wins?
-
"noindex" takes precedents over "index" so basicly if it says "noindex" anywhere google will follow that.
-
Thanks that's good to know.
-
To prevent all robots from indexing a page on your site, place the following meta tag into the section of your page:
To allow other robots to index the page on your site, preventing only a specific search engine bot, for example here Google's robots from indexing the page:
When Google see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. Other search engines, however, may interpret this directive differently. As a result, a link to the page can still appear in their search results.
Note that because Google have to crawl your page in order to see the noindex meta tag, there's a small chance that Googlebot won't see and respect the noindex meta tag. If your page is still appearing in results, it's probably because Google haven't crawled your site since you added the tag. (Also, if you've used your robots.txt file to block this page, Google won't be able to see the tag either.)
If the content is currently in Google's index, it will remove it after the next time it crawl it. To expedite removal, use the Remove URLs tool in Google Webmaster Tools.
-
Thanks that's good to know!
