Jump to content

  • Log in with Facebook Log in with Twitter Log In with Google      Sign In   
  • Create Account

Subscribe to HRA Now!

 



SEO Class in Chicago, IL

Learn How To Optimize Your Website on July 26, 2013


Looking for personalized in-depth SEO training among your peers?



High Rankings is offering a 1-day customized SEO training class in Chicago. Class size is limited so please sign-up now if you want in!



 


Are you a Google Analytics enthusiast?

Share and download Custom Google Analytics Reports, dashboards and advanced segments--for FREE! 

 



 

 www.CustomReportSharing.com 

From the folks who brought you High Rankings!



Photo
- - - - -

Pdfs - Doesn't Always Show Up Depending On Phrases Searched


  • Please log in to reply
8 replies to this topic

#1 ttw

ttw

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 357 posts
  • Location:San Mateo, California

Posted 19 February 2008 - 01:20 PM

I was checking to make sure that all my client's PDFs were appearing in Google's index and noticed something odd. Luckily all the PDFs all appear when I do a "site:yoursite.com filetype:pdf"

What's odd is that sometimes - depending on the search phrase that I use to bring up the PDF - the PDF may or may not appear in the SERPs.

When I grab a text snipet from the first page of the PDF - the PDF will appear in the SERPs. When I grap a snipet from other pages in the PDF - the PDF file doesn't appear in the SERPs.

These PDFs are short - about 8 pages -- and the snipets are all straight text - not graphics. The text snipets also appear when I select "View in HTML" in the SERPs.

Why would this be happening? If someone was searching for a keyword that was found on pages 2-8 and not on page 1 - then they wouldn't even have the chance of finding the PDF?

I haven't noticed this problem before.

Rosemary

#2 Jill

Jill

    High Rankings Advisor

  • Admin
  • 32,326 posts

Posted 20 February 2008 - 03:17 AM

That's an interesting observation, Rosemary. I've not personally looked into the indexing of pdfs that deeply, but are you saying that this is a new phenomenon?

#3 ttw

ttw

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 357 posts
  • Location:San Mateo, California

Posted 20 February 2008 - 08:23 AM

QUOTE(Jill @ Feb 20 2008, 12:17 AM) View Post
That's an interesting observation, Rosemary. I've not personally looked into the indexing of pdfs that deeply, but are you saying that this is a new phenomenon?


Jill: I've never seen this before. And I often check for PDFs by taking a snippet. At first I couldn't believe my client's older PDFs weren't indexed so I double-checked by doing a site:. And sure'nuf they were listed - I just couldn't find them by snippets taken from the ladder part of the PDF.

#4 Randy

Randy

    Convert Me!

  • Moderator
  • 17,540 posts

Posted 20 February 2008 - 11:33 AM

I've not looked at this at all, but just to throw it out there...

Since PDF are usually considerably larger than plain text files, html files and even .doc files the first thing I'd be looking at is the filesize of those PDFs. It's at least possible that Google etal are only going to go so far into the files because of their file size.

#5 ttw

ttw

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 357 posts
  • Location:San Mateo, California

Posted 20 February 2008 - 02:03 PM

QUOTE(Randy @ Feb 20 2008, 08:33 AM) View Post
Since PDF are usually considerably larger than plain text files, html files and even .doc files the first thing I'd be looking at is the filesize of those PDFs. It's at least possible that Google etal are only going to go so far into the files because of their file size.


Nice try smile.gif But that's not it. I took one PDF - file size 195 KB -- and searched for various strings of text from the 4 pages. Sometimes Google would show no results - sometimes it would bring up the PDF. I couldn't come up with any pattern and I even varied the length of text I was searching for.

Then I went over to Yahoo and MSN. Here's what's really weird - MSN brought up a PDF but it was from the client's Staging Server - not the live site.

I then tried one of our own PDFs (much larger file size) and saw the same thing.

Anybody else seeing this same issue?

Rosemary



#6 TimB

TimB

    HR 1

  • Members
  • Pip
  • 1 posts

Posted 21 February 2008 - 11:12 AM

Perhaps this is related to the way the webserver handles the serving of the PDF.

For example, we see that what has been counted as a download in the server logfiles may actually be only part of the PDF. Perhaps the indexing by an SE is also dependent on whether the server decides the download is complete? In my experience the issue is independent of filesize.

That would at least explain why the filetype: search is successful but that some search terms do not find the PDF.

Tim

#7 ttw

ttw

    HR 5

  • Active Members
  • PipPipPipPipPip
  • 357 posts
  • Location:San Mateo, California

Posted 23 February 2008 - 07:39 AM

QUOTE(TimB @ Feb 21 2008, 08:12 AM) View Post
For example, we see that what has been counted as a download in the server logfiles may actually be only part of the PDF. Perhaps the indexing by an SE is also dependent on whether the server decides the download is complete? In my experience the issue is independent of filesize.
Tim

\

Tim: Not sure that explains the issue I'm seeing because the staging site is on the same server as the live site and a search for the exact phrase brought up the PDF on the staging server but not on the live site. (at least in MSN).


Rosemary

#8 Archie

Archie

    HR 2

  • Members
  • PipPip
  • 14 posts

Posted 25 February 2008 - 04:34 AM

QUOTE(ttw @ Feb 19 2008, 06:20 PM) View Post
When I grab a text snipet from the first page of the PDF - the PDF will appear in the SERPs. When I grap a snipet from other pages in the PDF - the PDF file doesn't appear in the SERPs.


I've been conducting a little research and not yet come up eith anything conclusive but the perhaps following may lead to something useful:

According to Duff Joihnson in this article www.acrobatusers.com/articles/2006/02/pdf_for_google/pdf_for_google.php
Google does not index every word in a PDF.

Regarding more text appearing when choosing to view the item as HTML, is it not the case that Google is converting the pdf to the html on the fly so one would expect to see the entire document content?

Anothe rarticle found so far is below but again not had chance to read thoroughly

www.timnash.co.uk/02/2008/pdf-seo/

Hope this helps!

Edited by chrishirst, 25 February 2008 - 05:33 AM.
delinked URIs


#9 ttwblb

ttwblb

    HR 4

  • Active Members
  • PipPipPipPip
  • 132 posts

Posted 03 March 2008 - 02:01 PM

QUOTE(Archie @ Feb 25 2008, 01:34 AM) View Post
According to Duff Joihnson in this article www.acrobatusers.com/articles/2006/02/pdf_for_google/pdf_for_google.php
Google does not index every word in a PDF.


The article at www.acrobatusers.com/articles/2006/02/pdf_for_google/pdf_for_google.php says Google has a size limit for indexing text in PDFs, no one knows for sure but it is estimated to be somewhere between 100kb and 500kb.

It does not surprise me that Google does not automatically index every line of text in a PDF file. PDF files can be very large, potentially hundreds of pages for things like user manuals, and Google would be foolish not to put some kind of limit on what it indexes. Web pages are usually not over 1,000 words, but Google has an index limit on HTML files as well.

Rosemary has done some excellent sleuthing on this subject. But then, I'm biased. <G>




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users