Robot.txt help
-
Ok then you should be all set if your tests on GWMT did not indicate any errors.
-
I'm wondering why you want to block crawling of these URLs - I think what you're going for is to not index them, yes? If you block them from being crawled, they'll remain in the index. I would suggest considering robots meta noindex tags - unless you can describe in a little more detail what the issue is?
-Dan
-
Hi Dan,
The issue is my blog had tagging switched on, it cause canonicalization mayhem.
I switched it off, but the tags still appears in Google Webmaster Tools (GWMT). I Remove URL via GWMT but they are still appearing. This has also caused me to plummet down the SERPs! I am hoping this is why my SERPs had dropped anyway! I am now trying to get to a point where google just sees my blog posts and not the ?Tag or ?Author or any other parameter that is going to cause me canoncilization pain. In the meantime I am sat waiting for google to bring me back up the SERPs when things settle down but it has been 2 weeks now so maybe something else is up?
-
Hi There
Where are they appearing in WMT? In crawl errors?
You can also control crawling of parameters within webmaster tools - but I am still not quite sure if you are trying to remove these from the index or just prevent crawling (and if preventing crawling, for what reason?) or both?
-Dan
-
Hi Dan,
I am getting duplicate content errors in WMT like
This is because tag=ABC and page=1 are both different ways to get to www.mysite.com/Blog/Post/My-Blog-Post.aspx
To fix this I have remove the URL's www.mysite.com/Blog/?tag=ABC and www.mysite.com/Blog/?Page=1from GWMT and by setting robot.txt up like
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/PostI hope to solve the duplicate content issue to stop it happening again.
Since doing this my SERP's have dropped massively. Is what I have done wrong or bad? How would I fix?
Hope this makes sense thanks for you help on this its appreciated.
Andrew
-
Andrew
I doubt that one thing made your rankings drop so much. Also, what type of CMS are you on? Duplicate content like that should be controlled through indexation for the most part, but I am not recognizing that type of URL structure as any particular CMS?
Are just the title tags duplicate or the entire page content? Essentially, I would either change the content of the pages so they are not duplicate, or if that doesn't make sense I would just "noindex" them.
-Dan
-
Hi
The blog is www.dotnetblogengine.com
The content is only on the blog once it is just it can be accessed lots of different ways
-
Hi There... that address does not seem to work for me. Should it be .net? http://www.dotnetblogengine.net/
-Dan
-
Hi Dan, Yes sorry that's the one!
-
Ahh. I see. You just need to "noindex" the pages you don't want in the index. As far as how to do that with blogengine, I am not sure, as I have never used it before.
But I think a bigger issue is like the giant box areas at the top of every page. They are pushing your content way down. That's definitely hurting UX and making the site a little confusing. I'd suggest improving that as well

-Dan
-
Thanks Dan, but what grey areas, what url are you looking at?
-
These: http://screencast.com/t/p120RbUhCT
They appear on every page I looked at, and take up the entire area "above the fold" and the content is "below the fold"
-Dan