Does Google respect User-agent rules in robots.txt?

lzhao

We want to use an inline linking tool (LinkSmart) to cross link between a few key content types on our online news site.

LinkSmart uses a bot to establish the linking.

The issue: There are millions of pages on our site that we don't want LinkSmart to spider and process for cross linking.

LinkSmart suggested setting a noindex tag on the pages we don't want them to process, and that we target the rule to their specific user agent.

I have concerns. We don't want to inadvertently block search engine access to those millions of pages. I've seen googlebot ignore nofollow rules set at the page level. Does it ever arbitrarily obey rules that it's been directed to ignore?

Can you quantify the level of risk in setting user-agent-specific nofollow tags on pages we want search engines to crawl, but that we want LinkSmart to ignore?

JamesNorquay

Hi,

I would advise to block the directories which the files sit in in robots.txt, over adding no index tags to specific pages.

Yet then this would also leave these pages to not be indexed by Google, other search engines and also this Link Smart software you are referring to.

The thing is if you add a no index tag or if you add a robots .txt block to pages it will also block all search engines too.

So yes their is some risk involved, you have to do things carefully around this area.