The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Can PDF be seen as duplicate content? If so, how to prevent it?

    Can PDF be seen as duplicate content? If so, how to prevent it?

    Intermediate & Advanced SEO
    20 7 10.0k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EGOL
      EGOL last edited by

      I am really interested in hearing what others have to say about this.

      I know that .pdfs can be very valuable content.  They can be optimized, they rank in the SERPs, they accumulate PR and they can pass linkvalue.  So, to me it would be a mistake to block them from the index...

      However, I see your point about dupe content... they could also be thin content.  Will panda whack you for thin and dupes in your PDFs?

      How can canonical be used... what about author?

      Anybody know anything about this?

      1 Reply Last reply Reply Quote 3
      • Dr-Pete
        Dr-Pete last edited by

        I think it's possible, but I've only seen it in cases that are a bit hard to disentangle. For example, I've seen a PDF outrank a duplicate piece of regular content when the regular content had other issues (including massive duplication with other, regular content). My gut feeling is that it's unusual.

        If you're concerned about it, you can canonicalize PDFs with the header-level canonical directive. It's a bit more technically complex than the standard HTML canonical tag:

        http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html

        I'm going to mark this as "Discussion", just in case anyone else has seen real-world examples.

        Gestisoft-Qc 1 Reply Last reply Reply Quote 2
        • Gestisoft-Qc
          Gestisoft-Qc @Dr-Pete last edited by

          To make sure I understand what I'm reading:

          • PDFs don't usually rank as well as regular pages (although it is possible)
          • It is possible to configure a canonical tag on a PDF

          My concern isn't that our PDFs may outrank the original content but rather getting slammed by Google for publishing them.

          Am right in thinking a canonical tag prevents to accumulate link juice? If so I would prefer to not use it, unless it leads to Google slamming.

          Any one has experienced Google retribution for publishing PDF coming from a 3rd party?

          @EGOL: Can you expend a bit on your Author suggestion?

          Thanks all!

          EGOL Dr-Pete 5 Replies Last reply Reply Quote 0
          • EGOL
            EGOL @Gestisoft-Qc last edited by

            @EGOL: Can you expend a bit on your Author suggestion?

            I was wondering if there is a way to do rel=author for a pdf document.  I don't know how to do it and don't know if it is possible.

            1 Reply Last reply Reply Quote 0
            • Dr-Pete
              Dr-Pete @Gestisoft-Qc last edited by

              Oh, sorry - so these PDFs aren't duplicates with your own web/HTML content so much as duplicates with the same PDFs on other websites?

              That's more like a syndication situation. It is possible that, if enough people post these PDFs, you could run into trouble, but I've never seen that. More likely, your versions just wouldn't rank. Theoretically, you could use the header-level canonical tag cross-domain, but I've honestly never seen that tested.

              If you're talking about a handful of PDFs, they're a small percentage of your overall indexed content, and that content is unique, I wouldn't worry too much. If you're talking about 100s of PDFs on a 50-page website, then I'd control it. Unfortunately, at that point, you'd probably have to put the PDFs in a folder and outright block it. You'd remove the risk, but you'd stop ranking on those PDFs as well.

              1 Reply Last reply Reply Quote 2
              • EGOL
                EGOL @Gestisoft-Qc last edited by

                Thanks for all of your input Dr. Pete. The example that you use is almost exactly what I have - hundreds of .pdfs on a fifty page site. These .pdfs rank well in the SERPs, accumulate pagerank, and pass traffic and link value back to the main site through links embedded within the .pdf. The also have natural links from other domains. I don't want to block them or nofollow them butyour suggestion of using header directive sounds pretty good.

                1 Reply Last reply Reply Quote 0
                • Dr-Pete
                  Dr-Pete @Gestisoft-Qc last edited by

                  If they duplicate your main content, I think the header-level canonical may be a good way to go. For the syndication scenario, it's tough, because then you're knocking those PDFs out of the rankings, potentially, in favor of someone else's content.

                  Honestly, I've seen very few people deal with canonicalization for PDFs, and even those cases were small or obvious (like a page with the exact same content being outranked by the duplicate PDF). It's kind of uncharted territory.

                  1 Reply Last reply Reply Quote 2
                  • EGOL
                    EGOL @Gestisoft-Qc last edited by

                    Thanks!. I am going to look into this.  I'll let you know if I learn anything.

                    1 Reply Last reply Reply Quote 0
                    • ATMOSMarketing56
                      ATMOSMarketing56 last edited by

                      What about this instance:

                      (A) I made an "ultimate guide to X" and posted it on my site as individual HTML pages for each chapter

                      (B) I made a PDF version with the exact same content that people can download directly from the site

                      (C) I uploaded the PDF to sites like Scribd.com to help distribute it further, and build links with the links that are embedded in the PDF.

                      Would those all be dup content? Is (C) recommended or not?

                      1 Reply Last reply Reply Quote 0
                      • Dr-Pete
                        Dr-Pete last edited by

                        Potentially, but I'm honestly not sure how Scrid's pages are indexed. Don't you need to log in or something to actually see the content on Scribd?

                        1 Reply Last reply Reply Quote 0
                        • ATMOSMarketing56
                          ATMOSMarketing56 last edited by

                          I think you can set it to public or private (logged-in only) and even put a price-tag on it if you want. So yes setting it to private would help to eliminate the dup content issue, but it would also hide the links that I'm using to link-build.

                          I would imagine that since this guide would link back to our original site that it would be no different than if someone were to copy the content from our site and link back to us with it, thus crediting us as the original source. Especially if we ensure to index it through GWMT before submitting to other platforms. Any good resources that delve into that?

                          Dr-Pete 1 Reply Last reply Reply Quote 0
                          • Dr-Pete
                            Dr-Pete @ATMOSMarketing56 last edited by

                            Unfortunately, there's no great way to have it both ways. If you want these pages to get indexed for the links, then they're potential duplicates. If Google filters them out, the links probably won't count. Worst case, it could cause Panda-scale problems. Honestly, I suspect the link value is minimal and outweighed by the risk, but it depends quite a bit on the scope of what you're doing and the general link profile of the site.

                            1 Reply Last reply Reply Quote 0
                            • ASriv
                              ASriv last edited by

                              Hi all

                              I've been discussing the topic of making content available as both blog posts and pdf downloads today.

                              Given that there is a lot of uncertainty and complexity around this issue of potential duplication, my plan is to house all the pdfs in a folder that we block with robots.txt

                              Anyone agree / disagree with this approach?

                              1 Reply Last reply Reply Quote 0
                              • EGOL
                                EGOL last edited by

                                I assigned rel=canonical to my PDFs using htaccess.

                                Then, if anyone links to the PDFs the linkvalue gets passed to the webpage.

                                1 Reply Last reply Reply Quote 0
                                • ASriv
                                  ASriv last edited by

                                  Thanks EGOL

                                  That would be ideal.

                                  For a site that has multiple authors and with it being impractical to get a developer involved every time a web page / blog post and the pdf are created, is there a single line of code that could be used to accomplish this in .htaccess?

                                  If so, would you be able to show me an example please?

                                  EGOL 1 Reply Last reply Reply Quote 0
                                  • EGOL
                                    EGOL @ASriv last edited by

                                    I would like to give that to you but it is on a site that I don't share in forums.  Sorry.

                                    1 Reply Last reply Reply Quote 0
                                    • ASriv
                                      ASriv last edited by

                                      Sure, I understand - thanks EGOL

                                      1 Reply Last reply Reply Quote 0
                                      • ilonka65
                                        ilonka65 last edited by

                                        It looks like Google is not crawling tabs anymore, therefore if your pdf's are tabbed within pages, it might not be an issue: https://www.seroundtable.com/google-hidden-tab-content-seo-19489.html

                                        1 Reply Last reply Reply Quote 0
                                        • 1 / 1
                                        • First post
                                          Last post
                                        • How to find duplicate content, boilerplate content (repeated content) for entire website?
                                          Alick300
                                          Alick300
                                          0
                                          3
                                          110

                                        • Duplicate content - how to diagnose duplicate content from another domain before publishing pages?
                                          Chemometec
                                          Chemometec
                                          0
                                          7
                                          141

                                        • Could this be seen as duplicate content in Google's eyes?
                                          KateWaite
                                          KateWaite
                                          0
                                          6
                                          156

                                        • Duplicate content within sections of a page but not full page duplicate content
                                          J_Sinclair
                                          J_Sinclair
                                          0
                                          3
                                          112

                                        • Can I duplicate my websites content on Ebay Store?
                                          EGOL
                                          EGOL
                                          0
                                          5
                                          1.7k

                                        • Can a website be punished by panda if content scrapers have duplicated content?
                                          RG_SEO
                                          RG_SEO
                                          0
                                          5
                                          173

                                        • PDF for link building - avoiding duplicate content
                                          DoRM
                                          DoRM
                                          0
                                          2
                                          188

                                        • Can videos be considered duplicate content?
                                          goodlegaladvice
                                          goodlegaladvice
                                          0
                                          3
                                          353

                                        Get started with Moz Pro!

                                        Unlock the power of advanced SEO tools and data-driven insights.

                                        Start my free trial
                                        Products
                                        • Moz Pro
                                        • Moz Local
                                        • Moz API
                                        • Moz Data
                                        • STAT
                                        • Product Updates
                                        Moz Solutions
                                        • SMB Solutions
                                        • Agency Solutions
                                        • Enterprise Solutions
                                        • Digital Marketers
                                        Free SEO Tools
                                        • Domain Authority Checker
                                        • Link Explorer
                                        • Keyword Explorer
                                        • Competitive Research
                                        • Brand Authority Checker
                                        • Local Citation Checker
                                        • MozBar Extension
                                        • MozCast
                                        Resources
                                        • Blog
                                        • SEO Learning Center
                                        • Help Hub
                                        • Beginner's Guide to SEO
                                        • How-to Guides
                                        • Moz Academy
                                        • API Docs
                                        About Moz
                                        • About
                                        • Team
                                        • Careers
                                        • Contact
                                        Why Moz
                                        • Case Studies
                                        • Testimonials
                                        Get Involved
                                        • Become an Affiliate
                                        • MozCon
                                        • Webinars
                                        • Practical Marketer Series
                                        • MozPod
                                        Connect with us

                                        Contact the Help team

                                        Join our newsletter
                                        Moz logo
                                        © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                                        • Accessibility
                                        • Terms of Use
                                        • Privacy