Duplicate pages coming from links from the login page - what should we do about them?
-
This is a follow on to an earlier question which was well answered by Dirk Ceuppens regarding abnormal crawl issues. We are seeing that the issues relating to Duplicate Pages are coming from links from the login page which shows information about where the user was redirected from.
For example, if the visitor is not logged on and wishes to wish-list an item, they will be redirected to the login page, with the item code and intended action in the url; which can then continue on to the desired page once logged on.
The MOZ crawler is seeing these pages as having Duplicated Content whilst they are all the same apart from a piece of information in the URL. Should we be blocking these duplications? Are they a risk to us? What should we be doing?
Many thanks,
Sarah
-
Honestly I wouldn't be real worried about it. It seems Google is smart enough these days to understand what's going on there though canonicalization would be wise - just point the canonical tag on the login page to itself.
By doing this, assuming your URLs look something like domain.com/login?product=-product-name, all variations will theoretically be seen as the /login page.
If you really wanted to, you could use Robots to block these as well but I honestly wouldn't bother.
-
Thanks for this Chris.
One other thing, how then do I block this from showing up in my MOZ crawl, which is giving me 16,9k crawl issues and also how do i then work out what the other crawl issues are that are mixed up in this huge report?
-
Hi Sarah,
I missed this notification on this one somehow!
To be honest, I don't have an answer for you on this one. Perhaps it might be worth either getting in touch with the Moz team or posting another question specifically tagged as "Product Support". They seem to be pretty good at answering those queries too

-
Hi Sarah,
Somehow I answered this and I must have forgotten to post the answer! Arg, it was a long one, too. Let me try to summarize what I'd do:
-If possible, noindex any page that doesn't display content while not logged in. Wait for those pages to drop out of the index, and monitor for errors.
- If not possible, skip straight to blocking pages behind a login wall with robots.txt. For example, to block anything in the login folder:
Disallow: /login
Or to block anything with a login variable:
Disallow: /*?login
This should prevent bots from crawling those URLs where you don't have any content to show them. Make sure to use this carefully.
I do apologize for the delay. If you have additional questions please feel free to PM me. I'd be happy to do a quick consult online or over the phone, as I feel bad that I never actually answered, and I can give you more specific ideas if we look at the site. If this answers your question that's fine too.
Good luck!