Using meta tags to block access to your site
-
This post is deleted! -
In general, you want to use meta tags, not your robots.txt files to control which pages of your sites are indexed. Meta tags are a superior option for many reasons, a few of which are:
-
a page blocked in robots.txt wont be seen at all when a crawler visits your site directly; but, often a crawler may follow a link from another site and then see the page without viewing the robots.txt file and the page can still be indexed.
-
at times a robots.txt file is mistakenly deleted or moved. This often happens during software updates. If your site is crawled without your robots.txt file in place, all of the pages can be indexed. This problem causes quite a headache. If you replace the robots.txt file then the crawlers can't see the pages so even if you do add the "noindex" tag to a page, it wont be seen.
-
any PR flowing to a page blocked with robots.txt is dead. If you noindex the page, PR can still flow to other pages through the non-indexed page.
In summary, it is always preferable to use meta tags. So when do you use robots.txt? When you have no other choice. Some CMS and shopping cart software is not designed with SEO in mind so your choice is either to pay for an expensive modification, or use robots.txt. You should contact your software developer and inquire which pages should be blocked via robots.txt. They should have a pretty good idea.
-
-
I would advise on blocking these pages/sub folders via Robots.txt file which sites on your server.
Using meta tags to blog these pages will not be as strong.
Kind Regards,
James.
-
This post is deleted! -
This post is deleted! -
You can designate in Webmaster Tools how to handle parameters URLs. So if you have a URL ?product=id12345 you can ask Google and Bing to not index those URLs. It would be best to take care of the problem at the source.
I know you are saying those URLs don't exist on your site. I have experienced those issues in the past. If Google is adding those pages they are finding those links somewhere in the code on your site. Again, I would recommend contacting the software vendor. I noticed you use Magento and there are a couple SEOs who post on SEOmoz which use Magento software. Perhaps one of them can answer your question as it pertains specifically to that software.
-
Only someone who is very familiar with Magento can answer that question.
-
a page blocked in robots.txt wont be seen at all when a crawler visits your site directly
- Partly true not all robots respect the robots.txt file but Google and Bing do, even if they follow a link from elsewhere.
at times a robots.txt file is mistakenly deleted or moved
- the same can be said about meta data.
That being said the part about "link juice" is a very good and valid argument and reason enough to not block using robots.txt unless you wan't to block an intire directory of images, like the images you use in the design.
In short robots.txt is best used to block non HTML elements like flash, image files, PDF files ex. Where meta blogs are best used to block HTML elements
-
This post is deleted! -
Log into Google WMT > Site configuration > URL parameters