Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

Pdf And Html Content Virtually The Same -


Best Answer qwerty , 25 January 2013 - 07:02 PM

You actually can tell Google that the PDF file isn't canonical. Have a look at this Webmaster Central post on supporting rel-canonical http headers

 

I can't tell you how to set the server to return that in the header response when the PDF is requested, however. Maybe it's done by editing .htaccess, or it's somewhere in the PDF file's properties. Someone here is bound to know the trick...

Go to the full post


  • Please log in to reply
4 replies to this topic

#1 ttw

ttw

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 379 posts
  • Location:San Mateo, California

Posted 25 January 2013 - 04:31 PM

Hi all:

 

I have a client that wants to take some of their better PDF content and in addition to making it available as a PDF, to also show virtually the same content (70% the same) on an HTML page.

 

I can't do a canonical on these PDF/HTML pages - should I worry about this?

 

Thanks

 

Rosemary



#2 SelfMade

SelfMade

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 295 posts

Posted 25 January 2013 - 05:53 PM

I don't even believe dupe content is soo bad.

 

If that was the case, half the G serps would be deindexed.

 

I once ranked a site with NOTHING but affiliate tools content, you can take that as, articles, blog posts etc..etc.. that hundreds if not thousands of other affiliate used as well, I literally cut & Paste it into my Affiliate site, changed NOTHING and it ranked.

 

That site got slapped in the last update.

 

But it was strong for ages.



#3 qwerty

qwerty

    HR 10

  • Moderator
  • 8,628 posts
  • Location:Somerville, MA

Posted 25 January 2013 - 07:02 PM   Best Answer

You actually can tell Google that the PDF file isn't canonical. Have a look at this Webmaster Central post on supporting rel-canonical http headers

 

I can't tell you how to set the server to return that in the header response when the PDF is requested, however. Maybe it's done by editing .htaccess, or it's somewhere in the PDF file's properties. Someone here is bound to know the trick...



#4 piskie

piskie

    HR 7

  • Active Members
  • PipPipPipPipPipPipPip
  • 1,098 posts
  • Location:Cornwall

Posted 25 January 2013 - 08:03 PM

I have a client who presents hundreds of HTML documents containig Product Data, User Guidelines Health and Safety Info etc. For the last 10 years, these have been 100% replicated as PDFs.

No duplication avoidance action whatsoever has ever been taken.

 

Both versions sometimes appear consecutively in the SERPs although the HTML pages predominate with the PDF version usually  returned about 10 places lower.

 

Always been that way for more than a decade now.



#5 Jill

Jill

    Recovering SEO

  • Admin
  • 33,006 posts

Posted 26 January 2013 - 09:01 AM

I wouldn't worry about it. They are different types of content. However, if you didn't want the PDFs indexed, just put them in one folder and robots.txt that folder out. 






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

SPAM FREE FORUM!
 
If you are just registering to spam,
don't bother. You will be wasting your
time as your spam will never see the
light of day!