Are you a Google Analytics enthusiast?
More SEO Content
Pdfs - Doesn't Always Show Up Depending On Phrases Searched
Posted 19 February 2008 - 01:20 PM
What's odd is that sometimes - depending on the search phrase that I use to bring up the PDF - the PDF may or may not appear in the SERPs.
When I grab a text snipet from the first page of the PDF - the PDF will appear in the SERPs. When I grap a snipet from other pages in the PDF - the PDF file doesn't appear in the SERPs.
These PDFs are short - about 8 pages -- and the snipets are all straight text - not graphics. The text snipets also appear when I select "View in HTML" in the SERPs.
Why would this be happening? If someone was searching for a keyword that was found on pages 2-8 and not on page 1 - then they wouldn't even have the chance of finding the PDF?
I haven't noticed this problem before.
Posted 20 February 2008 - 03:17 AM
Posted 20 February 2008 - 08:23 AM
Jill: I've never seen this before. And I often check for PDFs by taking a snippet. At first I couldn't believe my client's older PDFs weren't indexed so I double-checked by doing a site:. And sure'nuf they were listed - I just couldn't find them by snippets taken from the ladder part of the PDF.
Posted 20 February 2008 - 11:33 AM
Since PDF are usually considerably larger than plain text files, html files and even .doc files the first thing I'd be looking at is the filesize of those PDFs. It's at least possible that Google etal are only going to go so far into the files because of their file size.
Posted 20 February 2008 - 02:03 PM
Nice try But that's not it. I took one PDF - file size 195 KB -- and searched for various strings of text from the 4 pages. Sometimes Google would show no results - sometimes it would bring up the PDF. I couldn't come up with any pattern and I even varied the length of text I was searching for.
Then I went over to Yahoo and MSN. Here's what's really weird - MSN brought up a PDF but it was from the client's Staging Server - not the live site.
I then tried one of our own PDFs (much larger file size) and saw the same thing.
Anybody else seeing this same issue?
Posted 21 February 2008 - 11:12 AM
For example, we see that what has been counted as a download in the server logfiles may actually be only part of the PDF. Perhaps the indexing by an SE is also dependent on whether the server decides the download is complete? In my experience the issue is independent of filesize.
That would at least explain why the filetype: search is successful but that some search terms do not find the PDF.
Posted 23 February 2008 - 07:39 AM
Tim: Not sure that explains the issue I'm seeing because the staging site is on the same server as the live site and a search for the exact phrase brought up the PDF on the staging server but not on the live site. (at least in MSN).
Posted 25 February 2008 - 04:34 AM
I've been conducting a little research and not yet come up eith anything conclusive but the perhaps following may lead to something useful:
According to Duff Joihnson in this article www.acrobatusers.com/articles/2006/02/pdf_for_google/pdf_for_google.php
Google does not index every word in a PDF.
Regarding more text appearing when choosing to view the item as HTML, is it not the case that Google is converting the pdf to the html on the fly so one would expect to see the entire document content?
Anothe rarticle found so far is below but again not had chance to read thoroughly
Hope this helps!
Edited by chrishirst, 25 February 2008 - 05:33 AM.
Posted 03 March 2008 - 02:01 PM
Google does not index every word in a PDF.
The article at www.acrobatusers.com/articles/2006/02/pdf_for_google/pdf_for_google.php says Google has a size limit for indexing text in PDFs, no one knows for sure but it is estimated to be somewhere between 100kb and 500kb.
It does not surprise me that Google does not automatically index every line of text in a PDF file. PDF files can be very large, potentially hundreds of pages for things like user manuals, and Google would be foolish not to put some kind of limit on what it indexes. Web pages are usually not over 1,000 words, but Google has an index limit on HTML files as well.
Rosemary has done some excellent sleuthing on this subject. But then, I'm biased. <G>
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users