Robots.txt
-
I have a page used for a reference that lists 150 links to blog articles. I use in in a training area of my website. I now get warnings from moz that it has too many links. I decided to disallow this page in robots.text. Below is the what appears in the file.
Robots.txt file for http://www.boxtheorygold.com
User-agent: *
Disallow: /blog-links/
My understanding is that this simply has google bypass the page and not crawl it. However, in Webmaster Tools, I used the Fetch tool to check out a couple of my blog articles. One returned an expected result. The other returned a result of "access denied" due to robots.text. Both blog article links are listed on the /blog/links/ reference page.
Question: Why does google refuse to crawl the one article (using the Fetch tool) when it is not referenced at all in the robots.text file. Why is access denied? Should I have used a noindex on this page instead of robots.txt? I am fearful that robots.text may be blocking many of my blog articles. Please advise.
Thanks,
Ron -
User-agent: *
Disallow: /blog-links/Will prevent spiders from crawling/indexing content that is located within that specific subfolder. If your articles are not located within that folder, then they should not be blocked. Maybe check for for meta noindex tags on the actual articles? You should also keep an eye on the "Blocked URLs" page in GWT to see if there are pages being blocked that shouldn't be.