Can PDF be seen as duplicate content? If so, how to prevent it?

ATMOSMarketing56

What about this instance:

(A) I made an "ultimate guide to X" and posted it on my site as individual HTML pages for each chapter

(B) I made a PDF version with the exact same content that people can download directly from the site

(C) I uploaded the PDF to sites like Scribd.com to help distribute it further, and build links with the links that are embedded in the PDF.

Would those all be dup content? Is (C) recommended or not?

Dr-Pete

Potentially, but I'm honestly not sure how Scrid's pages are indexed. Don't you need to log in or something to actually see the content on Scribd?

ATMOSMarketing56

I think you can set it to public or private (logged-in only) and even put a price-tag on it if you want. So yes setting it to private would help to eliminate the dup content issue, but it would also hide the links that I'm using to link-build.

I would imagine that since this guide would link back to our original site that it would be no different than if someone were to copy the content from our site and link back to us with it, thus crediting us as the original source. Especially if we ensure to index it through GWMT before submitting to other platforms. Any good resources that delve into that?

Dr-Pete

Unfortunately, there's no great way to have it both ways. If you want these pages to get indexed for the links, then they're potential duplicates. If Google filters them out, the links probably won't count. Worst case, it could cause Panda-scale problems. Honestly, I suspect the link value is minimal and outweighed by the risk, but it depends quite a bit on the scope of what you're doing and the general link profile of the site.

ASriv

Hi all

I've been discussing the topic of making content available as both blog posts and pdf downloads today.

Given that there is a lot of uncertainty and complexity around this issue of potential duplication, my plan is to house all the pdfs in a folder that we block with robots.txt

Anyone agree / disagree with this approach?

EGOL

I assigned rel=canonical to my PDFs using htaccess.

Then, if anyone links to the PDFs the linkvalue gets passed to the webpage.

ASriv

Thanks EGOL

That would be ideal.

For a site that has multiple authors and with it being impractical to get a developer involved every time a web page / blog post and the pdf are created, is there a single line of code that could be used to accomplish this in .htaccess?

If so, would you be able to show me an example please?

EGOL

I would like to give that to you but it is on a site that I don't share in forums. Sorry.

ASriv

Sure, I understand - thanks EGOL

ilonka65

It looks like Google is not crawling tabs anymore, therefore if your pdf's are tabbed within pages, it might not be an issue: https://www.seroundtable.com/google-hidden-tab-content-seo-19489.html

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Can PDF be seen as duplicate content? If so, how to prevent it?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved