Crawl diagnostic errors due to query string
-
I'm seeing a large amount of duplicate page titles, duplicate content, missing meta descriptions, etc. in my Crawl Diagnostics Report due to URLs' query strings. These pages already have canonical tags, but I know canonical tags aren't considered in MOZ's crawl diagnostic reports and therefore won't reduce the number of reported errors. Is there any way to configure MOZ to not consider query string variants as unique URLs? It's difficult to find a legitimate error among hundreds of these non-errors.
-
Hi there
Check out Google's duplicate content resources - they provide help on how to categorize your parameters and URL strings.
You can also handle this via your robots.txt. Make sure that you have a canonical tag on that page as well.
Hope this helps! Good luck!
-
Hi Patrick,
Thanks for the quick reply as always. As far as Google is concerned, these pages are set up correctly with canonical tags and URL strings - MOZ actually reports far more duplicate content than Webmaster tools.
My issue is just with the number of errors reported in MOZ. You mentioned that I can handle this via the robots.txt file - Is there a way to only disallow Rogerbot from crawling URLs with query strings, or URLs that contain a certain phrase such as "item_id=" or "cat_id="?
-
Hi there
My bad! Yeah - you could just do this:
User-agent: Rogerbot
Disallow: (check out this resource on how to block specific query strings)Hope this helps! Good luck!
-
Is your traffic lower than expected?
I was having an issue like this where moz was showing a lot more duplicate content than webmaster tools was, actually webmaster tools showed none, but I was being penalized. I realized this when I added an exclusion to robots.txt to exclude any query strings on my site. After I did this I saw my rankings shoot through the roof.
Not saying that this is happening to you but I just like to err on the side of caution.
-
This is very interesting! Strange that Webmaster tools wouldn't display duplicate content, but Google would still penalize you. I'd like to try this on my site, but am a little wary because I think some pages rank with the query string version of the URL, despite a canonical being specified.
-
Hi there!
Our tool has a 90% tolerance for duplicate content, which means it will flag any content that has 90% of the same code between pages. This includes all the source code on the page and not just the viewable text. You can run your own tests using this tool: http://www.webconfs.com/similar-page-checker.php. In the case of http://www.optp.com/SMARTROLLER?cat_id=205#.VZreQhNVhBc and http://www.optp.com/SMARTROLLER?cat_id=54#.VZrdJhNVhBc, these pages are 100% similar, which is why they're being flagged.
I hope this helps! If you need any more help with your crawl, feel free to contact our Help Team at help@moz.com.
Thanks!
Kevin
Help Team -
Hi Kevin,
I understand how MOZ's duplicate content system works. It would just be nice if it could take canonical URLs into consideration for Crawl Diagnostics Reports or give you the option of not counting URLs appended with parameters as unique pages.
Patrick was able to help me figure out that I can do the latter via the robots.txt feature by using a wildcard: Disallow: *?.
-
I'm glad to hear you got this figured out - thank you Patrick for your help!

Kevin
Help Team