Have you ever seen or experienced a page indexed which is actually from a website which is blocked by robots.txt?

vtmoz

Hi all,

We use robots file and meta robots tags for blocking website or website pages to block bots from crawling. Mostly robots.txt will be used for website and expect all the pages to not getting indexed. But there is a condition here that any page from website can be indexed by Google even the site is blocked from robots.txt; because crawler may find the page link somewhere on internet as stated here at last paragraph. I wonder if this really the case where some webpages have got indexed.

And even we use meta tags at page level; do we need to block from robots.txt file? Can we use both techniques at a time?

Thanks

GastonRiera

Hi vtmoz,

The most mandatory way to prevent any page to be indexed is by using a meta robots tag with a _noindex _parameter.
Then using robots.txt will help to optimize your server resources and is a way that prevent google to crawl any new page that do not have the meta robots tag.

And yeah, its very common to have indexed pages even the robots.txt file blocks the entire website.

If what you are looking for is to remove from index the pages, follow this steps:

Allow the whole website to be crawable (or at least that specific pages/section) in the robots.txt
add the robots meta tag with "noindex,follow" parametres
wait several weeks, 6 to 8 weeks is a fairly good time. Or just do a followup on those pages
when you got the results (all your desired pages to be de-indexed) re-block with robots.txt those pages
DO NOT erase the meta robots tag.

Hope it helps.
Best luck.
GR.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Have you ever seen or experienced a page indexed which is actually from a website which is blocked by robots.txt?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved