Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!


Sponsored Content

 

 
 

Photo
- - - - -

Pdf Seen As Duplicate Content?


  • Please log in to reply
3 replies to this topic

#1 seobarry

seobarry

    HR 3

  • Active Members
  • PipPipPip
  • 53 posts

Posted 28 May 2008 - 06:22 AM

I was wondering if the contents of a pdf file will count as duplicate content too. I want to add a pdf information document to my website but this is almost completely the same as the version hosted on many other sites. Two questions:

- will the pdf content be seen as duplicate content if google has indexed almost the same pdf on some other sites (I'm sure google did index the content on other sites since the pdf is on many sites)

- if the pdf counts as duplicate content, what would be a good strategy to avoid penalties? I thought it might be an idea to place the pdf file(s) in a seperate subdirectoy and then use a robots.txt file indicating that any content in this directory should not be indexed. Would this be the best approach?

thanks a lot!

#2 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 28 May 2008 - 06:45 AM

QUOTE
will the pdf content be seen as duplicate content if google has indexed almost the same pdf on some other sites (I'm sure google did index the content on other sites since the pdf is on many sites)


Yes, because it is duplicate content!

QUOTE
if the pdf counts as duplicate content, what would be a good strategy to avoid penalties?


Google doesn't really have a duplicate content penalty. Instead it's a filter. Yahoo! can be another matter if there is massive duplication, but I don't get the sense you're saying the entire site is being duplicated.

QUOTE
I thought it might be an idea to place the pdf file(s) in a seperate subdirectoy and then use a robots.txt file indicating that any content in this directory should not be indexed. Would this be the best approach?


This is exactly the way to handle it, exclude the pdf files via robots.txt. Whether you do this via a subdirectory exclusion or simply exclude all files that carry a .pdf extension matters not, just that they're being excluded.

#3 piskie

piskie

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 1,092 posts
  • Location:Cornwall

Posted 28 May 2008 - 05:46 PM

I have a clients site that is unformation dissemination, Data Specifications etc.
All Data and Spec pages are duplicated as PDFs and a download link to them from the HTML page. It's been like that for some years now. No special measures are taken and the HTML pages show high in the SERPs. Interesting though, I get a lot of researchers using advanced search for PDFs coming from Google.

#4 MaKa

MaKa

    HR 6

  • Active Members
  • PipPipPipPipPipPip
  • 856 posts
  • Location:Llantwit Major, Wales, UK

Posted 30 May 2008 - 02:20 AM

Good point Piskie, I was thinking about the PDF duplicate content "issue" the other day and couldn't find a convincing reason to block them. What does it matter whether people find your PDF or your HTML page in the results. They both have their advantages and disadvantages.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users