How can I find duplicate pages from a Moz Crawl?
-
We have many duplicate pages that show up on the Moz Crawl, and we're trying to fix these but it's very difficult because I can't see a way to isolate the code where the duplicate is found. For instance, http://experiencemission.org/immersion/ is one of our main pages, and the crawl shows one duplicate of http://experiencemission.org/immersion. It appears that one of our staff manually edited the source code in one of our pages but forgot the trailing slash. This would be an easy fix but the problem is that this page is linked to internally on our website 2423 times, so it's next to impossible to find the code that is incorrect. We have many other pages with this same basic problem. We know we have duplicates, but it's next to impossible to isolate them.
So my question is this: When viewing the Moz Crawl data is there any way to see where a specific duplicate page link is located on our website?
Thanks for any and all help!
-
Hey there! You've got a couple different options for ways to track this information down. The first would be to head into your campaign, head over to the Site Crawl and click on the link towards the bottom for Duplicate Page Content. Right below the graph you'll see a button that says Download CSV. Open that up and head on over to column AM and you'll see the referring URL! Another option is to jump into Open Site Explorer and check out the internal inbound links. Hope this helps and let us know if you need anything else!
-
Thanks for taking the time to respond. The open site explorer is helpful for issues that have a manageable number of internal links. However, for the example above and a few others like it on our website it is not that helpful because isolating the link would still require us to click on the pages individually to view the source code. This is because most of our errors are minor errors such as an omitted slash or capitalization. Such errors are flagged as duplicate content in our Moz crawl but the links still work because they redirect to the correct page and thus they are not able to be isolated on the open site explorer. Unfortunately the .csv is no help at all because it only shows the page being linked to not the page where the actual link is coming from.
Are we just out of luck on this or is there another option?