Robots.txt - Googlebot - Allow... what's it for?
-
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke
User-Agent: Googlebot
Allow: /.js
Allow: /.css
-
Hi Luke
As you have correctly assumed, that particular robots command would be pointless.
The Googlebot does follow allow commands (while other ones do not), but it should only be used if it is an exception to a disallow rule.
So, for example, if you had a rule that blocked pages within a sub-directory, with:
Disallow: /example/*
You could create an allow rule that indexes a specific page within that directory to be indexed, like:
Allow: /example/page.html
Couple of things to point out here. "At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule." (Google Source). In this example, because the more specific rule is the allow rule, that will prevail. It is also best practice to put your "allow" rules at the top of the robots.txt file.
But in your example, if they have allow rules for JS and CSS files without having disavow rules for those directories/paths etc - it's a waste of space. Google will attempt to crawl anything it can by default - unless you disavow access.
TL;DR - You don't need to proactively tell Google to crawl CSS and JS - it will by default.
Hope this helps.
-
This post is deleted! -
Just as a follow-up to Tom's great post. If you were wanting to test a robots.txt setup, especially if you were using a wildcard or using an allow combined with a disallow, Google Search Console under the Crawl section has a robots.txt Tester. You will see your most recent robots.txt file there that Google has a copy of. You can then modify that version and then enter a URL at the bottom to see if everything is set correctly or not. It is pretty handy, especially if you have a big robots.txt file. Note that this tool does not change how Google crawls your site or your robots.txt file, it is just for testing. Once you find the configuration that works, you would still need to update the robots.txt on your server.
-
Thanks Tom - that's very useful - appreciated
- and thanks also Clever PhD re: the robots.txt tester info - Luke