Resolving Duplicate PDF and HTML Content for Google

Recently, a client began receiving a significant portion of traffic to a specific PDF on their domain, but because the traffic was directed from Google straight to the PDF, all interaction with the site and marketing was being lost.

The fix, if applied correctly is simple, but worth noting that it took about 5 weeks for the change to take hold.

You’ll need to tell Google what the preferred content is by using canonical URLs (the same method could apply for similar web pages).

In .htaccess:

<Files "nameOfThePDF.pdf">
 Header add Link "<http://domain.com/url-to-related-html-content/>; rel=\"canonical\""
</Files>

Note that there is no directory path to the pdf and the file name is in quotes.

More info:
http://moz.com/blog/how-to-advanced-relcanonical-http-headers

/ / / /