When block bots using robots.txt vs meta tag "no index"?
-
This post is deleted! -
Hey Doug,
Your question sounds interesting but it's a bit hard to understand what what you're asking. Think you could rewrite it from scratch below? Be sure to include mention of any redirects you're employing. No need for the link again. Maybe that will shed more light on your issue.
-
I have 4 specific questions related to this issue:
1. Can Google bots and other bots index pages that can only be accessed after the user logs in? If no, what do the bots do when then encounter a login requirement?
2. Is it good or bad to tell bots to stay away from the /Login folder with "Disallow: /Login" added to robots.txt?
3. I added robots meta tag "noindex" to /Login/Control.asp (see link above). Will this prevent Google from reporting a "Indexed, though blocked by robots.txt" warning?
4. Do redirect chains related to login requirements negatively impact my site rankings and, if so, what can I do about this?
-
1. No. It is not crawled and it is not indexed--unless googlebot finds the url elsewhere or the page was crawled before it was made inaccessible via the login.
2. The purpose of the robots.txt file is to tell bots not to access resources on a site so I don't see how it would be bad in your case.
3. If you block a page via robots.txt (or if it is redirected), google will not get to the noindex meta tag on the page itself. You have to let googlebot crawl the page first, let it see the noindex tag, then block it via robots.txt. https://support.google.com/webmasters/answer/93710?hl=en
4. Still not clear on the redirect chain issue.
-
You might be able to first remove the /Login URL + variants from Google (+ noindex it) and once it is removed, add the Disallow so it doesn't get crawled so no crawl budget is wasted on the probably low-content-value login URL.