Duplicate URLs
-
A campaign that I ran said that my client's site had some 47,000+ duplicate pages and titles. I was wondering how I can possibly set that many 301 redirects, but a Moz help engineer said it has a lot to do with session IDs. See this set of duplicate URLs:
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring (clearly the main URL for the page)
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac00a2e0ad53eb90cb0b0304d178fc1
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac3039d0ad4af2720b3ccd2238547ab
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac071ed0ad4af292684b0746931158fTo a crawler, that looks like 4 different pages, when it's clear that they're actually all different URLs for the same page. I was wondering if some of you, maybe with experience in site architecture, would have insight into how to address this issue?
Thanks
Alan
-
A quick way would be to disallow crawling of all pages starting with /?PIPELINE. That will prevent Google from seeing them. You can do this by adding the following into your robots.txt file...
Disallow: /*?PIPELINE
However, you want to get to the root cause, which will be something to do with the system generating these. Ideally, this needs to be fixed.
-Andy
-
Hm, have you looked into rel canonical?
If those are all stand alone pages, you will have to redirect, if they are no longer active, or if they can be replaced by the original page.
Andy is correct, those pages likely are not 'created' with intent.
You should look at what is causing this issue and start there. If not, you are going to be redirecting till the cows come home.
If you are deciding on going through 301's, you may want to take a step back and look at the folders of the entire domain. /ll/ is a folder but not a page, nor is /ll/c/.
Good Luck, Alan!