Crawling/indexing of near duplicate product pages
-
Hi,
Hope someone can help me out here. This is the current situation:
We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles.
- We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg).
- There is not any search volume related to the different quantities
- The 'top' page does not link to the pages for the different quantities
- The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same.
Current situation:
- Most pages for the different quantities do not have internal links (about 95%)- But the sitemap does contain all of these pages.
- Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them.
Problems:
- Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's.
- Having url's in the sitemap that do not have an internal link is a problem on its own
- All these pages are indexed so all sorts of gravel/pebbles have near duplicates.
My solution:
- remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages
- Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index
My questions:
- To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap?
- Do you agree that these pages are near duplicates and that it is best to remove them from the index?
- A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem?
Thanks a lot in advance for your help!
Best!
-
Hello there,
To answer your questions,
1. Google will still crawl your pages even if it's not from the sitemap unless you specify disallow from your robots.txt
2. If they are similar content with the main difference at "quantities" couldn't you consolidate them into one single page that lists all the quantities your company sell in and then 301 redirect the other pages to the consolidated one?
3. It doesn't seem like going to be causing any problem nor hurting your SEO performance, but you could always change these link to the canonical link.
Hope this helps,
Joseph Yap -
Hi Joseph, thanks for your reply, really helpful! 301 is not really an option, because these quantity URL's are sometimes used for promotions and need to be reachable. Therefore I guess canonicals are the second best solution.
We will implement the solution I described and see what will happen. Thanks again!