Google index dymamic webpages after block in robots.txt...
-
This post is deleted! -
Atul,
We have experienced the same issue with our shopping cart paginating the product results. Even though robots.txt has specifically disallowed the crawling of certain pages, doesn't mean they wont be indexed. After all when you think about it you are providing a link to each of the pages you told the spider to disallow. Lets not forget that Google, and other robots are controlled by their respective company's, and their job is to gather content, even if we don't like or want it.
So the best answer we have found is to embrace it! You can redirect the link juice back to the beginning URL with the "REL=Canonical" tag. Its fairly easy to create a statement in your template file that will test the current URL and build a dynamic URL that points the link juice back to the base URL. Here is an example:
<linkrel="canonical"href="http://www.yourdomaine.com"/>
Actually Google says you can point that link juice to any other Domain as well
-
Hi!
Could it be that the pages where already crawled by Google before you added the directives to robots.txt. Perhaps you could remove it, and add the rel="canonical", as Allen suggests. That way you will allow Google to reindex the pages and fetch the changes.
Hope this helps

-
This post is deleted! -
Unfortunately, Robots.txt is a poor choice for content that may have already been indexed, including dynamic content. It's good for blocking specific pages and folders (especially prior to Google crawling them), but it tends to be unreliable in these situations.
Pagination is a tricky topic, and the "best" solution varies a lot with the situation, but the basic options are:
(1) Use rel="prev" and rel="next", which helps Google handle the paginated series properly, but still allows it to rank.
(2) Use META NOINDEX, FOLLOW on pages 2+ of search results (this was probably the most popular method before rel=prev/next).
(3) Use rel=canonical to point all paginated results to a "View All" page. This page should be available to users and not be too large. It's a decent option if you have a few dozen results, but not 100s or 1000s.
(4) Use Google Webmaster Tools parameter handling on the "page=" parameter. It seems to work, but since it's Google-specific, it's not the go-to option for most SEOs.
-
This post is deleted!