First - Thanks Dan! This is great stuff.
As Dan says, this changes the way we should be looking at keyword density.
I think a very important bit of information is this:
"Thus, terms which appear in too many documents (e.g., stopwords, very frequent terms) receive a low weight, while uncommon terms which appear in few documents and receive a high weight."
Heavy.
Now, my site sells t-shirts and I have "t-shirt" on every page in every document more often than the word "the". We're probably killing ourselves for that search term but it is definitely good for the real visitors.
Hmmm. Can I offer 100's of variations of "100% Cotton Superman Knit Garment with a Tubular Body and Short Sleeves resembling the letter "T" instead of just "100% Cotton Superman T-Shirts"? I'd reserve the word "T-Shirts" for only a few popular items. What does this do to the visitor experience?
I'm not going to drop the words use altogether but we will definitely be reducing it's use.
Obviously, this is much more important for keyword phrases.
-----------------------------------
OK. This is a great formula but it must be for competitive intelligence right? If I run this formula on my site I will get a Term Weight but it is not terribly useful unless I have a target Term Weight. I assume this target Term Weight is one that comes from running the formula on a competitors site that comes up #1 - Right?
To run this formula on a competitors site that is ranking well, it would be very valuable knowledge. So I have a few questions.
1. Can you tell me what Dr. Garcia means by "documents in a database"?. Since we are talking about html pages, would this be documents within a directory?
2. How can I tell how many documents someone else has in their "database"?
3. How can I search only those documents to get the number of occurrences of a keyword phrase in each document?
- Or... is there a universal good Term Weight? That would save us all a lot of time.
At first, this is complicated, but on second look, it's not so tough as long as you have a calculator that handles Logs.
However, I have still not come up with a solid number.
So, Term Weight should equal

Or...
Term Weight = number of times a term occurs in a document multiplied by: the Log of: Number of documents in the database divided by Number of documents containing the term.
With answers to these questions, our copywriting could be very effective.
Dan, anyone? I'm interested in finding out if I am on base or not.
Here is the original link Dan submitted:
tp://www.miislita.com/term-vector/term-vector-1.html
Thanks.











