How does the SEOMOZ crawl define duplicate content?
-
This post is deleted! -
Here's a response I got from SEOmoz support when I asked a similar question:
"Our crawler counts duplicate content as about 90% similarity in the bulk code, rather than just the actual words on each page and the meta description. Since the formats of all of these pages are very similar, the crawler will consider them to be duplicate content based on the overall source code."
Google's crawler doesn't quite view duplicate content in the same way, and can separate written content from other things such as navigation and code.
The Penguin update targeted spam, are you aware there has also been an update to the Panda algorithm recently? The latter targeted low quality pages, so as you mention some of your pages "only contains a small amount of content placed within a template page" it might be wise to expand on that content, and to make that content quality (a 400 word post with images, diagrams optimised with alt text etc. would usually do better than a 100 word post with no other content) - it could be the Panda update that hit you. You'll need to check the dates of the traffic drop in your analytics.
You'll find pretty much everything you need to know here: http://searchengineland.com/penguin-update-recovery-tips-advice-119650
-
Thanks Alex! It appears to be penguin, as the rankings dropped April 24th. Quite frustrating as even though these pages have little content, they are really not meant to have more. They are basically just quick posts done on the job site showing what the contractor is currently working on and including a couple pictures. It would be impossible to get the contractor to write a 400 word post each time.
Anyway, we'll keep working on it, thanks for your response.
-
How about combining related posts onto the same page, so each page has more content?
If it is Penguin though, Google's algorithm must think you've violated their Webmaster Guidelines somehow?