Are you a Google Analytics enthusiast?
More SEO Content
How To Check Pdf For Indexability?
Posted 19 October 2009 - 12:24 PM
1) It seems to me that if I can highlight text character by character, that's a good thing.
2) But also shouldn't I be able to grep for the word (market - for example) or in a text editor (such as vi) search for and find the word if it indeed, market, is in on the page and can be highlighted?
Is there a standard techy way to check pdf documents for indexability?
BTW - I first searched for forum for pdf first, but was surprised to get zero results.
Posted 19 October 2009 - 04:00 PM
However trying to grep, vi or nano search for words within the PDF's isn't going to get you very far. The issue being that the PDFs are actually binary files where the text in them is going to be encoded into pdf format. So the text that shows up when you view one of these files with acrobat reader is simply not going to be there for grep, vi or nano to see.
Try opening one of those pdf files with your vi and you'll see what I mean. Other than a few header lines here human readable text is available, everything else is going to look like gobbly gook.
Posted 19 October 2009 - 05:30 PM
We know that PDFs may be a combo of text and imagery. The vertical text that is highlightable, letter by letter, is that likely to be the only indexable text, as the angled, curved stuff is really probably created using an image editing utility and is graphic in nature?
What do you mean by 65 character titles? Relevance to?
Posted 20 October 2009 - 08:58 AM
Posted 10 November 2009 - 10:08 AM
That being said, I assume that if you set up both a web page and a pdf with similar/identical content that the web page would win, hands down.
So the question here is "How much findability value is there in adding your stock PDF product info pages to your website? As opposed to taking the time/$$ in converting them to text/html?"
Posted 10 November 2009 - 01:23 PM
FTR, there is a property in PDF files that seems to basically function the same as a <title> tag in an html page. It's been quite awhile since I last tested this so I would need to go back and re-review all of the various files and make sure things haven't changed. (The test was testing PDF security settings, Read only vs. Editable text in the PDF and the various PDF Properties.
When I last checked the parts that got indexed and ended up being searchable in Google were:
- Text content in the PDF. It didn't appear to matter if it was Read Only text or was set up in a form so as to be editable.
- The Title of the PDF file properties. In Acrobat Pro those accessible via File > Document Properties
None of the other things available in the PDF Properties area held any sway with Google, so presumably were not indexed. Those include the Author, Subject and yes the Keywords properties.
If you have the paid Pro version of Acrobat available to you, or something else that will allow you to see the Propertes, I still have the test page up on my personal site that links to the 16 different PDF versions I tested at the time. 3+ years after the fact.
Posted 11 November 2009 - 05:29 AM
Posted 11 November 2009 - 09:07 AM
I don't recall publishing a table of what did and didn't get indexed with each version. I might have, but if I did that's lost to my fading memory. Of course since the test PDF's are still up there it should be fairly simple to look at each test PDF again and see what's being done with it years after the fact. Just gotta find a few hours to actually look at the specifics.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users