Are you a Google Analytics enthusiast?
More SEO Content
Pdf Seen As Duplicate Content?
Posted 28 May 2008 - 06:22 AM
- will the pdf content be seen as duplicate content if google has indexed almost the same pdf on some other sites (I'm sure google did index the content on other sites since the pdf is on many sites)
- if the pdf counts as duplicate content, what would be a good strategy to avoid penalties? I thought it might be an idea to place the pdf file(s) in a seperate subdirectoy and then use a robots.txt file indicating that any content in this directory should not be indexed. Would this be the best approach?
thanks a lot!
Posted 28 May 2008 - 06:45 AM
Yes, because it is duplicate content!
Google doesn't really have a duplicate content penalty. Instead it's a filter. Yahoo! can be another matter if there is massive duplication, but I don't get the sense you're saying the entire site is being duplicated.
This is exactly the way to handle it, exclude the pdf files via robots.txt. Whether you do this via a subdirectory exclusion or simply exclude all files that carry a .pdf extension matters not, just that they're being excluded.
Posted 28 May 2008 - 05:46 PM
All Data and Spec pages are duplicated as PDFs and a download link to them from the HTML page. It's been like that for some years now. No special measures are taken and the HTML pages show high in the SERPs. Interesting though, I get a lot of researchers using advanced search for PDFs coming from Google.
Posted 30 May 2008 - 02:20 AM
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users