SEO Class in Chicago, IL
Are you a Google Analytics enthusiast?
More SEO Content
How To Optimize Pdfs
Posted 26 January 2005 - 08:06 PM
Thanks! I hope the input is valuable, and look forward to contributing where I can here. Love your board and there's lots of good stuff here.
Have an awesome evening,
Mark Alan Effinger (aka ME)
Posted 12 July 2006 - 02:53 PM
Just wanted to say thanks for the tips and sharing knowledge!
Posted 12 July 2006 - 03:07 PM
I'm just amazed you were able to find it!
Posted 12 July 2006 - 03:24 PM
I just noticed that we've since moved the directory for the demo PDF I provided, and hadn't updated this forum post.
Here's the new directory, along with the PDF that I've actually upgraded with 2 additional links:
The text at the top of the page that says "Internet marketing" now has a link (an important keyword for this industry, even if it's pretty competitive), and...
The "How much does it cost" headline near the bottom now has a link as well.
I linked to the About Us page and the Articles pages because they provide the most specific information in regards to this specific site, not the home page (which is built for people who are ready to get going on an Affiliate program).
Here's the link to the Affiliate Marketing PDF: richcontent.com/prweb/share_results_overview.pdf. Now, go forth the and perform your amazing Acrobatics, and have an awesome week!
Edit to remove live link per [url=http://www.highrankings.com/forum/index.php?act=boardrules]Forum Rules[/url]
Edited by torka, 12 July 2006 - 03:51 PM.
Posted 14 July 2006 - 02:19 PM
A third party will be providing PDF documents for use on a client website. The third party requires that the PDF files be saved as Version 5 (core audience compatability requirement) and with security set at 40-bit encryption: "Encrypt all document contents" (company policy). With those requirements set, the PDF software reports the following warning message: "All contents of the document will be encrypted and search engines will not be able to access the document's metadata".
With that said, we have confirmed that the search engines are still able to index and use the PDF title with these settings in place. We're making the assumption that the title may not be considered part of the metadata and is exempt from the warning message (any thoughts?). The third party has not completed any other metadata fields except for author. Subject, keyword, description, etc. have all been left blank.
Does it make sense to complete all of the metadata of a Version 5 PDF with 40-bit encryption enabled or will it be inaccessible and not worth the effort as reported by the warning message?
Any insight would be greatly appreciated. Thanks.
Posted 14 July 2006 - 03:00 PM
ME: My experience is it's not worth the effort.
Frankly, it seems counter-effective.
Here's an idea: You could also place the protected documents on PRWebPhotowire.com, and add the metadata as a wrapper around the PDF. In the description of the attachment on PRWebPhotowire, place a link in the description back to the page where the PDF resides (or to the actual PDF). Use PRWeb's link format, link.com [anchor text].
Now you have rich meta, social tagging and a high-traffic, high visibility platform to push your otherwise invisible content into the webspace.
Make sense? I'm not exactly sure that you're looking to get the message out, but it would seem that if you can't work with the PDF's internal meta, you can certainly express it using an external wrapper.
I hope that's useful info.
Edited by Randy, 14 July 2006 - 03:56 PM.
Posted 14 July 2006 - 04:01 PM
It's an interesting question and one I don't know the answer to.
Since my previous test was long ago and far away and since I was working with Live documents and didn't want to mess with them too much I've decided to run another little test. This time I'll do it strictly with test documents and test several things: Editable and Read Only text fields, each of the normal meta data fields and also some various security settings.
I'm off to construct the test pdf's and upload 'em to my personal site. Somebody remind me to check back in a week or so to see if they've been spidered yet by the various engines.
<edit to add>
FTR, I won't be able to test with Acrobat Version 5. The farthest I can go back is 6. But I'm reasonably sure if the engines can read the meta information from Ver. 6 they should be able to read Ver. 5 as well. To my knowledge Adobe hasn't changed much if anything in the meta information construct since Version 4.
Posted 14 July 2006 - 04:18 PM
And, many thanks Randy for the welcome and for running some tests. I'll be very curious to see the results.
Posted 21 July 2006 - 08:10 AM
Posted 21 July 2006 - 09:14 AM
I also just realized that I don't have a series of test pages that have a Master password set, but no User-level password. I've got some with a User password only and some with both Master and User, but I forgot the Master only. I'll whip of some of those and get them uploaded this morning.
Very early results are:
It appears that MSN is still not indexing or doing anything with the META information. They're pulling stuff right from the page if the text is in a certain format. This doesn't surprise me much since it's something I've seen before. I need to devise a test to sort out exactly what kind of text they'll index in a pdf file, because it's something different than creating text field via Acrobat. I thnk I'll try making some test files in Word, then converting those to PDF format to see what happens.
Google is only grabbing the Title info from the META fields. They're skipping Subject, Author and Keywords entirely by the look of things. The Security levels I've set seem to make no difference as long as a password isn't involved. No matter if it's standard 40 or 128 security they're still getting to the data. On the text side of things Google is pulling and indexing both Read Only and Editable text fields.
The details in the Security settings (eg No Print, No Copy, etc) appear to make no difference.
The only files that Google has had an issue with so far are those that require a User-level password. Apparently this setting blocks them from even being able to grab the META Title information. Still too early to say for sure since they've only fully indexed one password protected file, but it's showing up in the SERPs with a title of "Untitled" and no snippet. So setting a User password looks like it may lead to indexing issues, as one might expect. Though I sort of hoped they'd at least be able to grab the META info.
More to follow as I know it. Since there are already 16 test files involved and we'll probably end up with 25-30 I'll probably do a complete write up on my personal site for those interested in the minutiae and just post a summary with a link here.
Edited by Randy, 21 July 2006 - 12:17 PM.
Posted 21 July 2006 - 10:05 AM
Posted 06 August 2006 - 08:44 AM
General findings first.
1. Setting a password of any type looks to be a very bad thing to do if you want anything at all to be indexed. Most of the password protected files either never showed up in any search engine index, or were picked up showing no information other than the URL address. All but one of my password protected test files have been dropped from the index at this point, save one that is still in Google's index.
When you password protect a pdf file not even the Meta information will be picked up.
This same treatment has been seen with user passwords, master passwords or both.
2. Other than Password Protection, none of the other security settings one may choose appears to make any difference. Nor does the encryption level. I tested both Acrobat's standard 40 and 128 encryption, with all sorts of different restrictions. Ranging from nothing being allowed to only printing and/or copying being disallowed. These individual settings appear to make no difference. Just whether the file is password protected.
3. MSN is having issues indexing anything in these Acrobat-created pdfs. They're including the files in their index, but the listings only show the URL address. No meta information is picked up or searchable. Nor is any text field, whether the text field is set to be Read Only or Editable.
Basically, MSN is spidering and picking up the files. They know they're there. But are seemingly unable to do anything with them. Bad news there. We'll have to see if they pick up the Word created versions. My hunch is they will.
4. a. Google always uses the Meta Title value as the snippet title, if you provide one. Much like the <title> tag in html.
b. Google also is able to index the contents of Acrobat text fields. It doesn't matter if they're set to be Read Only or may be edited. Both are searchable.
c. Google ignores all other Meta fields where search is concerned. Only the Meta Title is used. The Meta Keywords are not indexed, which I mention specifically because of something you'll see below.
5. a. Yahoo is very similar in indexing Acrobat-created text fields. Both Read Only and Editable fields are indexed, appear in the SERP snippet and may be searched.
b. Yahoo apparently doesn't do anything with the Meta Title. You can't search for content in the Meta Title, nor does it appear as the title in the SERP snippet. Surprising one that!
Instead, Yahoo! is making a line of Editable text the SERP Title by default. Skipping completely over the Meta information and even a Read Only text field that appeared first in the test pdf documents. Strange reaction indeed!
c. The only Meta information Yahoo! appears to pay attention to is the Meta Keywords in the test pdf's. They don't show up in the snippet or anything like that, but you can find the document by what's in the Meta Keywords field.
As a summary of the first test findings for Acrobat-created pdfs:
- Don't use passwords if you want a pdf file to be indexable and searchable. This applies across all of the big three engines.
- Don't worry much about any other security settings you may choose. They apppear to have no effect one way or the other.
- Text fields are picked up by both Google and Yahoo, and the content is searchable.
- MSN has some serious issues with Acrobat-created pdfs. They've been unable to index anything in the test files.
- Google will pick up the Meta Title field and use its content as the SERP Title if one exists. This is the only Meta field Google seems to index.
- Yahoo is a strange bird! It's using an Editable Text field as the SERP Title, to the exclusion of everything else. They don't appear to index the Meta Title information at all. They do however index the Meta Keywords field, and this content is searchable.
- In the test files that had anything indexed Google picked up the Meta Title and all forms of text fields. Yahoo picked up the Meta Keywords and all forms of text fields.
Edited by Randy, 06 August 2006 - 08:51 AM.
Posted 07 August 2006 - 06:43 PM
This is very valuable information and confirms similar results I've seen with some of my client's PDFs. The two differences between your test files and those of my client are version (6 vs 5) and file origin (Acrobat vs Freehand).
My client is exporting as version 5 and using password security, 40-bit encryption. Settings do not allow the various changing or extracting options.
The "document properties" security window warns that search engines "cannot access the document's metadata" due to these settings, yet the files are being indexed in Google using the meta title. Yahoo! and MSN results are similar to yours and are using different text snippets from different parts of the file. My client did not complete the meta keywords field so I don't have a comparison for a Yahoo! search.
It will be interesting to see the results of your files originating from Word. As a note, my client's files are heavily formatted and use table layouts.
Posted 08 August 2006 - 06:05 AM
In my site i have pdf files which are password protected still they are listed on all search engines. And Google also listed title, appropriate description from content of PDF files.
When i have checked file > Document Settings it shows pdf version 1.2 (Acrobat 3.x)
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users