The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. PDF on financial site that duplicates ~50% of site content

    PDF on financial site that duplicates ~50% of site content

    Intermediate & Advanced SEO
    11 5 392
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 540SEO
      540SEO last edited by

      I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site.

      Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect.

      Thanks --

      1 Reply Last reply Reply Quote 0
      • danatanseo
        danatanseo last edited by

        As long as you have rel=canonical tags properly in place, you don't need to worry about the PDF causing duplicate content problems. That way, any original content should be picked up and any duplicate can be attributed to your existing Web pages. Hope that's helpful!

        Dana

        540SEO 1 Reply Last reply Reply Quote 0
        • 540SEO
          540SEO @danatanseo last edited by

          Not sure which page I would mark as being canonical, since the pdf contains content from several different pages on the site. I don't think it's possible to assign different rel=canonical tags to separate portions of a pdf, is it?

          danatanseo 540SEO Valarlf dmccarthy EGOL 7 Replies Last reply Reply Quote 0
          • danatanseo
            danatanseo @540SEO last edited by

            Hi Keith,

            I'm sorry, I should have clarified. The rel=canonical tags would be on your Web pages, not the PDF (they are irrelevant in a PDF document). Then Google will attribute your Web page as the original source of the content and will understand that the PDF just contains bits of content from those pages. In this instance I would include a rel=canonical tag on every page of your site, just to cover your bases. Hope that helps!

            Dana

            1 Reply Last reply Reply Quote 0
            • 540SEO
              540SEO @540SEO last edited by

              I thought the idea was to put rel=canonical on the duplicated page, to signal that "hey, this page may look like duplicate content, but please refer to this canonical URL"?

              Looks like there is a pdf option for rel=canonical, I guess the question is, what page on the site to make canonical?

              http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394

              Indicate the canonical version of a URL by responding with the Link rel="canonical" HTTP header. Adding rel="canonical" to the head section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with the Link rel="canonical" HTTP header, like this (note that to use this option, you'll need to be able to configure your server):

              Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>

              1 Reply Last reply Reply Quote 0
              • Valarlf
                Valarlf last edited by

                I think the right way here is to put the rel canonical in PDF header http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html

                1 Reply Last reply Reply Quote 0
                • Valarlf
                  Valarlf @540SEO last edited by

                  If you are using apache, you should put it on your .htaccess with this form

                  <filesmatch “my-file.pdf”="">Header set Link ‘<http: misite="" my-file.html="">; rel=”canonical”‘</http:></filesmatch>

                  1 Reply Last reply Reply Quote 1
                  • 540SEO
                    540SEO @540SEO last edited by

                    Thanks. Anybody want to weigh in on where to rel=canonical to? Home page?

                    1 Reply Last reply Reply Quote 0
                    • Valarlf
                      Valarlf @540SEO last edited by

                      Personally I think it would be better not to index, it but if necessary, the index folder root seems like a good option

                      1 Reply Last reply Reply Quote 0
                      • dmccarthy
                        dmccarthy @540SEO last edited by

                        You could set the header to noindex rather than rel=canonical

                        1 Reply Last reply Reply Quote 0
                        • EGOL
                          EGOL @540SEO last edited by

                          This is what we have done with pdfs.   Assign rel="canonical" in .htaccess.

                          We did this with a few hundred files and it took google a LONG time to find and credit them.

                          1 Reply Last reply Reply Quote 0
                          • 1 / 1
                          • First post
                            Last post
                          • How bad is duplicate content for ecommerce sites?
                            BradsDeals
                            BradsDeals
                            0
                            6
                            1.7k

                          • Duplicate Multi-site Content, Duplicate URLs
                            MonicaOConnor
                            MonicaOConnor
                            0
                            2
                            129

                          • Duplicate content within sections of a page but not full page duplicate content
                            J_Sinclair
                            J_Sinclair
                            0
                            3
                            112

                          • News sites & Duplicate content
                            CleverPhD
                            CleverPhD
                            0
                            5
                            1.7k

                          • What is the best way to allow content to be used on other sites for syndication without taking the chance of duplicate content filters
                            irvingw
                            irvingw
                            0
                            5
                            453

                          • Bi-Lingual Site: Lack of Translated Content & Duplicate Content
                            gfiorelli1
                            gfiorelli1
                            0
                            4
                            466

                          • Affiliate Site Duplicate Content Question
                            Sebes
                            Sebes
                            0
                            2
                            533

                          • Avoiding duplicate content on an ecommerce site
                            CMoore85
                            CMoore85
                            0
                            7
                            662

                          Get started with Moz Pro!

                          Unlock the power of advanced SEO tools and data-driven insights.

                          Start my free trial
                          Products
                          • Moz Pro
                          • Moz Local
                          • Moz API
                          • Moz Data
                          • STAT
                          • Product Updates
                          Moz Solutions
                          • SMB Solutions
                          • Agency Solutions
                          • Enterprise Solutions
                          • Digital Marketers
                          Free SEO Tools
                          • Domain Authority Checker
                          • Link Explorer
                          • Keyword Explorer
                          • Competitive Research
                          • Brand Authority Checker
                          • Local Citation Checker
                          • MozBar Extension
                          • MozCast
                          Resources
                          • Blog
                          • SEO Learning Center
                          • Help Hub
                          • Beginner's Guide to SEO
                          • How-to Guides
                          • Moz Academy
                          • API Docs
                          About Moz
                          • About
                          • Team
                          • Careers
                          • Contact
                          Why Moz
                          • Case Studies
                          • Testimonials
                          Get Involved
                          • Become an Affiliate
                          • MozCon
                          • Webinars
                          • Practical Marketer Series
                          • MozPod
                          Connect with us

                          Contact the Help team

                          Join our newsletter
                          Moz logo
                          © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                          • Accessibility
                          • Terms of Use
                          • Privacy