High Rankings Search Engine Optimization ForumHigh Rankings Advisor Search Marketing Newsletter

Welcome Guest ( Log In | Register )

Important Announcement: ***Need an Affordable SEO Website Review?***
 
Reply to this topicStart new topic
> Pdf Seen As Duplicate Content?
seobarry
post May 28 2008, 06:22 AM
Post #1


HR 2
**

Group: Active Members
Posts: 32
Joined: 30-April 08
User's local time:
Feb 9 2010, 06:15 PM
Member No.: 20,756



I was wondering if the contents of a pdf file will count as duplicate content too. I want to add a pdf information document to my website but this is almost completely the same as the version hosted on many other sites. Two questions:

- will the pdf content be seen as duplicate content if google has indexed almost the same pdf on some other sites (I'm sure google did index the content on other sites since the pdf is on many sites)

- if the pdf counts as duplicate content, what would be a good strategy to avoid penalties? I thought it might be an idea to place the pdf file(s) in a seperate subdirectoy and then use a robots.txt file indicating that any content in this directory should not be indexed. Would this be the best approach?

thanks a lot!
Go to the top of the page
 
+Quote Post
Randy
post May 28 2008, 06:45 AM
Post #2


Convert Me!
Group Icon

Group: Admin
Posts: 17,377
Joined: 17-August 03
User's local time:
Feb 9 2010, 11:15 AM
Member No.: 551



QUOTE
will the pdf content be seen as duplicate content if google has indexed almost the same pdf on some other sites (I'm sure google did index the content on other sites since the pdf is on many sites)


Yes, because it is duplicate content!

QUOTE
if the pdf counts as duplicate content, what would be a good strategy to avoid penalties?


Google doesn't really have a duplicate content penalty. Instead it's a filter. Yahoo! can be another matter if there is massive duplication, but I don't get the sense you're saying the entire site is being duplicated.

QUOTE
I thought it might be an idea to place the pdf file(s) in a seperate subdirectoy and then use a robots.txt file indicating that any content in this directory should not be indexed. Would this be the best approach?


This is exactly the way to handle it, exclude the pdf files via robots.txt. Whether you do this via a subdirectory exclusion or simply exclude all files that carry a .pdf extension matters not, just that they're being excluded.
Go to the top of the page
 
+Quote Post
piskie
post May 28 2008, 05:46 PM
Post #3


HR 6
******

Group: Active Members
Posts: 798
Joined: 16-September 03
User's local time:
Feb 9 2010, 05:15 PM
From: Cornwall
Member No.: 824



I have a clients site that is unformation dissemination, Data Specifications etc.
All Data and Spec pages are duplicated as PDFs and a download link to them from the HTML page. It's been like that for some years now. No special measures are taken and the HTML pages show high in the SERPs. Interesting though, I get a lot of researchers using advanced search for PDFs coming from Google.
Go to the top of the page
 
+Quote Post
MaKa
post May 30 2008, 02:20 AM
Post #4


HR 6
******

Group: Active Members
Posts: 848
Joined: 21-November 05
User's local time:
Feb 9 2010, 05:15 PM
From: Ogmore-by-Sea, Wales, UK
Member No.: 9,487



Good point Piskie, I was thinking about the PDF duplicate content "issue" the other day and couldn't find a convincing reason to block them. What does it matter whether people find your PDF or your HTML page in the results. They both have their advantages and disadvantages.
Go to the top of the page
 
+Quote Post

  
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



This forum is sponsored by High Rankings, a Boston SEO Agency
- Lo-Fi Version Time is now: 9th February 2010 - 12:15 PM