PDFs and webpages
-
If a website provides PDF versions of the page as a download option, should the PDF be no-indexed in your opinion?
We have to offer PDF versions of the webpage as our customers want them, they are a group who will download/print the pdfs. I thought of leaving the pdfs alone as they site in a subdomain but the more I think about it, I should probably noindex them. My reasons
- They site in a subdomain, if users have linked to them, my main domain isn't getting the rank juice
- Duplication issues, they might be affecting the rank of the existing webpages
- I can't track the PDF as they are in a subdomain, I can see event clicks to them from the main site though
On the flipside
- I could lose out on the traffic the pdfs bring when a user loads it from an organic search and any link existing on the pdf
What are your experiences?
-
Google now class subdomains pretty much as part of your main domain: http://www.youtube.com/watch?v=_MswMYk05tk - so you will be getting some of that rank juice.
I'd think that the major search engines wouldn't have a problem knowing that an HTML version of a page is preferred over a PDF. However, you can use canonical HTTP headers to make sure there are no problems with duplicate content: http://moz.com/blog/how-to-advanced-relcanonical-http-headers
If you use Google Analytics you will be able to track the subdomain. You can do it as part of your existing profile or by setting up a separate one: https://developers.google.com/analytics/devguides/collection/gajs/gaTrackingSite (ensure this is the version of Analytics you have installed).
There's a short guide here on getting more data about PDFs through Google Analytics: http://moz.com/ugc/how-to-track-pdf-traffic-links-in-google-analytics-open-site-explorer
-
Thanks Alex,
I do have canonical tags on the webpages to ensure they are seen as the main one. I'll look into tracking subdomains.
-
Cool. It's advisable to add canonical HTTP headers to the PDFs too, if you can.