How to compete with duplicate content in post panda world?
-
I want to fix duplicate content issues over my eCommerce website.
I have read very valuable blog post on SEOmoz regarding duplicate content in post panda world and applied all strategy to my website.
I want to give one example to know more about it.
http://www.vistastores.com/outdoor-umbrellas
Non WWW version:
http://vistastores.com/outdoor-umbrellas redirect to home page.
For HTTPS pages:
https://www.vistastores.com/outdoor-umbrellas
I have created Robots.txt file for all HTTPS pages as follow.
https://www.vistastores.com/robots.txt
And, set Rel=canonical to HTTP page as follow.
http://www.vistastores.com/outdoor-umbrellas
Narrow by search:
My website have narrow by search and contain pages with same Meta info as follow.
http://www.vistastores.com/outdoor-umbrellas?cat=7
http://www.vistastores.com/outdoor-umbrellas?manufacturer=Bond+MFG
http://www.vistastores.com/outdoor-umbrellas?finish_search=Aluminum
I have restricted all dynamic pages by Robots.txt which are generated by narrow by search.
http://www.vistastores.com/robots.txt
And, I have set Rel=Canonical to base URL on each dynamic pages.
Order by pages:
http://www.vistastores.com/outdoor-umbrellas?dir=asc&order=name
I have restrict all pages with robots.txt and set Rel=Canonical to base URL.
For pagination pages:
http://www.vistastores.com/outdoor-umbrellas?dir=asc&order=name&p=2
I have restrict all pages with robots.txt and set Rel=Next & Rel=Prev to all paginated pages.
I have also set Rel=Canonical to base URL.
I have done & apply all SEO suggestions to my website but, Google is crawling and indexing 21K+ pages. My website have only 9K product pages.
Google search result:
Since last 7 days, my website have affected with 75% down of impression & CTR.
I want to recover it and perform better as previous one.
I have explained my question in long manner because, want to recover my traffic as soon as possible.
-
Not a complete answer but instead of rel-canonicaling your dynamic pages you may just want to robot.txt block them somthing like:
Disallow: /*?
this will prevent google from crawling any version of the page that includes the ? in the URL. Cannonical is a suggetion whereas robots is more of a command.
as you can see from this query:
Google has indexed 132 versions of that single page rather than follow your rel=canonical suggestion.
To further enforce this you may be able to use a fancy bit of php code to detect if the url is dynamic and do a
robots noindex, noarchive on only the dynamic renderings of the page.
This could be done like this:
I also believe there are some filtering tools for this right within webmaster tools. Worth a peek if your site is registered.
Additionally where you are redirecting non-www subpages to the home page you may instead want to redirect them to their www versions.
this can be done in htaccess like this:
Redirect non-www to www: RewriteEngine On RewriteBase / RewriteCond %{HTTP_HOST} ^yourdomain.com [NC] RewriteRule ^(.*)$ http://www.yourdomain.com/$1 [L,R=301]
This will likely provide both a better user experience as well as a better solution in googles eyes.
I'm sure some other folks will come in with some other great suggestions for you as well
