Load balanced Site
-
Our client ecommerce site load from 3 different servers using load balancing.
abc.com : IP: 222.222.222
Abc.com: IP: 111.111.111
For testing purpose 111.111.111 also point to beta.abc.com
Now google crawling site beta.abc.com
If we block beta.abc.com using robots.txt it will block google bot also , since beta.abc.com is really abc.com
I know its confusing but I been trying to figure out. Ofcourse I can ask my dev to remove beta.abc.com make a seperate code and block it using .htaccess
-
Maybe I'm not understanding, but if you can differentiate on the server the difference between beta.abc.com and abc.com, subdomains can have different robots.txt files, so you should be able to serve a file for http://beta.abc.com/robots.txt that disallows everything, with
User-agent: * Disallow: /and a different robots.txt file for http://abc.com/robots.txt. If you do so, it shouldn't block Googlebots access to abc.com.
-
Our solution for this is to use http authentication. So our dev sites require a simple password to access.
Here is an example: http://dev.zeta-commerce.com/
This keeps the bots out and avoids the risk of a robots.txt file being released accidentally to the live site.