| Important Announcement: ***Need an Affordable SEO Website Review?*** |
![]() ![]() |
May 28 2008, 06:22 AM
Post
#1
|
|
|
HR 2 ![]() ![]() Group: Active Members Posts: 32 Joined: 30-April 08 User's local time: Feb 9 2010, 06:15 PM Member No.: 20,756 |
I was wondering if the contents of a pdf file will count as duplicate content too. I want to add a pdf information document to my website but this is almost completely the same as the version hosted on many other sites. Two questions:
- will the pdf content be seen as duplicate content if google has indexed almost the same pdf on some other sites (I'm sure google did index the content on other sites since the pdf is on many sites) - if the pdf counts as duplicate content, what would be a good strategy to avoid penalties? I thought it might be an idea to place the pdf file(s) in a seperate subdirectoy and then use a robots.txt file indicating that any content in this directory should not be indexed. Would this be the best approach? thanks a lot! |
|
|
|
May 28 2008, 06:45 AM
Post
#2
|
|
![]() Convert Me! Group: Admin Posts: 17,377 Joined: 17-August 03 User's local time: Feb 9 2010, 11:15 AM Member No.: 551 |
QUOTE will the pdf content be seen as duplicate content if google has indexed almost the same pdf on some other sites (I'm sure google did index the content on other sites since the pdf is on many sites) Yes, because it is duplicate content! QUOTE if the pdf counts as duplicate content, what would be a good strategy to avoid penalties? Google doesn't really have a duplicate content penalty. Instead it's a filter. Yahoo! can be another matter if there is massive duplication, but I don't get the sense you're saying the entire site is being duplicated. QUOTE I thought it might be an idea to place the pdf file(s) in a seperate subdirectoy and then use a robots.txt file indicating that any content in this directory should not be indexed. Would this be the best approach? This is exactly the way to handle it, exclude the pdf files via robots.txt. Whether you do this via a subdirectory exclusion or simply exclude all files that carry a .pdf extension matters not, just that they're being excluded. |
|
|
|
May 28 2008, 05:46 PM
Post
#3
|
|
|
HR 6 ![]() ![]() ![]() ![]() ![]() ![]() Group: Active Members Posts: 798 Joined: 16-September 03 User's local time: Feb 9 2010, 05:15 PM From: Cornwall Member No.: 824 |
I have a clients site that is unformation dissemination, Data Specifications etc.
All Data and Spec pages are duplicated as PDFs and a download link to them from the HTML page. It's been like that for some years now. No special measures are taken and the HTML pages show high in the SERPs. Interesting though, I get a lot of researchers using advanced search for PDFs coming from Google. |
|
|
|
May 30 2008, 02:20 AM
Post
#4
|
|
![]() HR 6 ![]() ![]() ![]() ![]() ![]() ![]() Group: Active Members Posts: 848 Joined: 21-November 05 User's local time: Feb 9 2010, 05:15 PM From: Ogmore-by-Sea, Wales, UK Member No.: 9,487 |
Good point Piskie, I was thinking about the PDF duplicate content "issue" the other day and couldn't find a convincing reason to block them. What does it matter whether people find your PDF or your HTML page in the results. They both have their advantages and disadvantages.
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 9th February 2010 - 12:15 PM |