High Rankings Search Engine Optimization ForumHigh Rankings Advisor Search Marketing Newsletter

Welcome Guest ( Log In | Register )

Important Announcement: ***Need an Affordable SEO Website Review?***
2 Pages V   1 2 >  
Reply to this topicStart new topic
> Term Weight Formula, New development on keyword density?
T Bill
post Feb 5 2005, 02:47 PM
Post #1


HR 4
****

Group: Active Members
Posts: 107
Joined: 7-June 04
User's local time:
Feb 9 2010, 10:32 AM
From: Mountainair, New Mexico USA
Member No.: 3,849



If nobody minds, I'd like to go back to a post Dan Thies had about Dr. Garcia's formula. It was a different thread and the thread had changed course so I posted this as a new one.

First - Thanks Dan! This is great stuff.

As Dan says, this changes the way we should be looking at keyword density.

I think a very important bit of information is this:

"Thus, terms which appear in too many documents (e.g., stopwords, very frequent terms) receive a low weight, while uncommon terms which appear in few documents and receive a high weight."

Heavy.

Now, my site sells t-shirts and I have "t-shirt" on every page in every document more often than the word "the". We're probably killing ourselves for that search term but it is definitely good for the real visitors.

Hmmm. Can I offer 100's of variations of "100% Cotton Superman Knit Garment with a Tubular Body and Short Sleeves resembling the letter "T" instead of just "100% Cotton Superman T-Shirts"? I'd reserve the word "T-Shirts" for only a few popular items. What does this do to the visitor experience?

I'm not going to drop the words use altogether but we will definitely be reducing it's use.

Obviously, this is much more important for keyword phrases.

-----------------------------------

OK. This is a great formula but it must be for competitive intelligence right? If I run this formula on my site I will get a Term Weight but it is not terribly useful unless I have a target Term Weight. I assume this target Term Weight is one that comes from running the formula on a competitors site that comes up #1 - Right?

To run this formula on a competitors site that is ranking well, it would be very valuable knowledge. So I have a few questions.

1. Can you tell me what Dr. Garcia means by "documents in a database"?. Since we are talking about html pages, would this be documents within a directory?

2. How can I tell how many documents someone else has in their "database"?

3. How can I search only those documents to get the number of occurrences of a keyword phrase in each document?

- Or... is there a universal good Term Weight? That would save us all a lot of time.

At first, this is complicated, but on second look, it's not so tough as long as you have a calculator that handles Logs.

However, I have still not come up with a solid number.

So, Term Weight should equal

(IMG:http://www.miislita.com/searchito/images-marketing/co-occurrence-26.gif)

Or...

Term Weight = number of times a term occurs in a document multiplied by: the Log of: Number of documents in the database divided by Number of documents containing the term.

With answers to these questions, our copywriting could be very effective.

Dan, anyone? I'm interested in finding out if I am on base or not.

Here is the original link Dan submitted:
tp://www.miislita.com/term-vector/term-vector-1.html

Thanks.
Go to the top of the page
 
+Quote Post
T Bill
post Feb 6 2005, 06:15 AM
Post #2


HR 4
****

Group: Active Members
Posts: 107
Joined: 7-June 04
User's local time:
Feb 9 2010, 10:32 AM
From: Mountainair, New Mexico USA
Member No.: 3,849



There must be also be a weight applied to a term that is found on every page as well. If it is not a general term, wouldn't that term be what the site is about? How do you sell t-shirts without having that word a number of times on every page of the site?

Am I missing something? There must be another formula as well. Because a site has a subject, it is about something. Whatever that something is will be a theme throughout the site. Variations on that theme should make specific pages unique within the site and I can see the unique pages getting higher weight for that term but wouldn't the site need to be ranked high simply because that theme is found on so many pages? It seems to me that this is why Google will show two results from the same site within a search.

So, there is the weight of the term throughout a site and then there is the weight of unique terms within the site. It would also seem logical to look at a term that is found throughout a site and give a page, or pages, that used a variation of the original term much more weight in searches for that term variation.

So, there must be two formulas, at least.

Any ideas?
Go to the top of the page
 
+Quote Post
Jill
post Feb 6 2005, 10:56 AM
Post #3


High Rankings Advisor
Group Icon

Group: Admin
Posts: 29,201
Joined: 21-July 03
User's local time:
Feb 9 2010, 12:32 PM
From: Ashland, MA
Member No.: 2



No ideas here. You might as well be speaking Greek as far as I'm concerned as none of what you said has anything to do with the work I do. (IMG:http://www.highrankings.com/forum/style_emoticons/default/smile.gif)
Go to the top of the page
 
+Quote Post
DanThies
post Feb 6 2005, 04:24 PM
Post #4


Keyword Super Freak
Group Icon

Group: Moderator
Posts: 861
Joined: 23-July 03
User's local time:
Feb 9 2010, 12:32 PM
From: Texas, y'all
Member No.: 14



I hate to point someone at another forum, but Dr. Garcia doesn't post here. He does post at the Search Engine Watch forums, as Orion. Just about everything you're asking here has already been discussed there.

You may be reading too much into this, though. The main point I was trying to get across is that "keyword density" is not part of the algorithm that retrieves search results.

I'll address a couple questions to clarify...

The number of documents in a database refers to how many documents exist in the search engine's database. Google has 8 billion documents in their database, although they may not actually have them all indexed.

The term weight refers to how important that specific term is in retrieving results for a specific query - different queries will be different because they are different, so there is no magic formula for writing a perfect web page.

When Jill says it has nothing to do with the work she does, she's right in one sense, because she just uses natural copywriting. On the other hand, what this stuff really means is that natural copywriting is the logical approach... so in that sense it has everything to do with the work she does.

Wow. I'm having another Zen moment, my conscious mind is breaking. What is the sound of two hands clapping? (IMG:http://www.highrankings.com/forum/style_emoticons/default/appl.gif)
Go to the top of the page
 
+Quote Post
sonnyyu
post Feb 6 2005, 05:24 PM
Post #5


HR 4
****

Group: Active Members
Posts: 157
Joined: 7-April 04
User's local time:
Feb 9 2010, 02:32 PM
Member No.: 3,158



Here we go (IMG:http://www.highrankings.com/forum/style_emoticons/default/appl.gif)
Go to the top of the page
 
+Quote Post
Jill
post Feb 6 2005, 05:35 PM
Post #6


High Rankings Advisor
Group Icon

Group: Admin
Posts: 29,201
Joined: 21-July 03
User's local time:
Feb 9 2010, 12:32 PM
From: Ashland, MA
Member No.: 2



My point was that you don't really need to know that stuff to do your job, and in fact, knowing it may end up doing you more harm than good becuase you would be working toward the formulas instead of just creating great web pages naturally!

Jill
Go to the top of the page
 
+Quote Post
copywriter
post Feb 7 2005, 07:07 PM
Post #7


HR 7
Group Icon

Group: Moderator
Posts: 1,736
Joined: 23-July 03
User's local time:
Feb 9 2010, 12:32 PM
From: South Carolina, USA
Member No.: 12



QUOTE
there is no magic formula for writing a perfect web page.


Agreed. (IMG:http://www.highrankings.com/forum/style_emoticons/default/thumbup1.gif)

TBill, it's not as hard as you're making it. It also isn't the same for every page/industry/keyphrase.

Read this if you haven't already. It may help.
Go to the top of the page
 
+Quote Post
randfish
post Feb 7 2005, 07:40 PM
Post #8


Daily SEO Show Anchor
****

Group: Active Members
Posts: 229
Joined: 21-July 04
User's local time:
Feb 9 2010, 01:32 PM
From: Seattle, WA
Member No.: 4,442



I have to disagree somewhat. I believe it's not only important, but critical to understand these elements and what they mean in order to be a great SEO. TF * IDF is a concept that refers to the term frequency times the inverse document frequency.

Basically, keyword density is not measured by IR systems (including search engines). Instead, search engines compare the frequency of occurence of a particular term or phrase, normalized against the total number of terms in the document, then compare that to the number of documents in the database (index) and those that contain the term. It does sound complex, but the equations are simple and very informative:

There's two I use consistently:

Classic Normalized Term Weight uses the following equation:

Wi = tfdi / max tfdi * log (D/dfi)

Where:
tfdi = term (or phrase of a given length) frequency in document
tfdi = maximum frequency of any (same number word) phrase in document
D = number of documents in the database (when using Google, I estimate at 8.1 billion)
dfi = number of documents containing the term/phrase (# of results for a search in quotes)

A second equation, Glasgow Weight, can also be useful (I generally use both when analyzing my own site vs. the competition):

Wij = log(freqij + 1) / log(lengthj) * log (N/ni)

Where:
freqij = frequency of term i (a word or phrase of a given length) in document j
lengthj = number of unique terms (word or phrase of the same length) in document j
N = number of documents in database (again, I use 8.1 billion for Google)
ni - number of documents containing the term (results of a search in quotes)

What these are telling us is that keyword stuffing and long pages with many keywords are not the answer to page optimization. Given the normalization process and the simplicity of these equations, it's pretty easy to check your competition (I'd recommend analyzing the top 20 pages) and then your own page.
Go to the top of the page
 
+Quote Post
tempy
post Feb 8 2005, 06:34 AM
Post #9


HR 5
*****

Group: Active Members
Posts: 344
Joined: 1-November 04
User's local time:
Feb 9 2010, 01:32 PM
Member No.: 5,554



I also don't understand those formulae. But as a graduate of linguistics (amongst other things - but let's not go there) I also have to question the term 'natural copywriting'. There really is no such thing.

I am fully willing to accept that the forms of copywriting that Jill and the others here works wonders and is probably the best consistent method of achieving SEO results. But it is still targeted at achieving a succesful ranking - even if that is an alleged by-product of producing a well-written, readable web site - and whether we are conscious of it or not. Writing for the web has always been different from writing for the printed medium. Neither are 'natural' and have evolved in ways that are far from natural. In fact the English written word has often been specifically engineered for different circumstances and communication mediums. Google is one such medium.
Go to the top of the page
 
+Quote Post
copywriter
post Feb 8 2005, 04:21 PM
Post #10


HR 7
Group Icon

Group: Moderator
Posts: 1,736
Joined: 23-July 03
User's local time:
Feb 9 2010, 12:32 PM
From: South Carolina, USA
Member No.: 12



Randfish, is what you're saying that using either of these formulas, you can take the exact same steps toward optimizing a site that wants to rank highly for the phrase "stapler parts" and one that wants to rank highly for "website design" and give them both grand results? I'm not being sarcastic here, I am truly curious.

QUOTE
have to question the term 'natural copywriting'. There really is no such thing.


Sure there is. The phrase "natural copywriting" was created (by whom, I don't know) to represent SEO copywriting that isn't forced. It flows... it's difficult or impossible to find the keyphrases within the copy... it seems unencumbered. "Natural copywriting" is a concocted term just like "search engine copywriting." Until Jill invented it, there was no such thing (IMG:http://www.highrankings.com/forum/style_emoticons/default/smile.gif)
Go to the top of the page
 
+Quote Post
sonnyyu
post Feb 8 2005, 05:20 PM
Post #11


HR 4
****

Group: Active Members
Posts: 157
Joined: 7-April 04
User's local time:
Feb 9 2010, 02:32 PM
Member No.: 3,158



Bad news is Google support total 103 languages now, for people only do English natural copywriting that mean 0.97% natural copywriting under the sun. (IMG:http://www.highrankings.com/forum/style_emoticons/default/kicking.gif)
Go to the top of the page
 
+Quote Post
randfish
post Feb 8 2005, 05:29 PM
Post #12


Daily SEO Show Anchor
****

Group: Active Members
Posts: 229
Joined: 21-July 04
User's local time:
Feb 9 2010, 01:32 PM
From: Seattle, WA
Member No.: 4,442



QUOTE(copywriter @ Feb 8 2005, 05:21 PM)
Randfish, is what you're saying that using either of these formulas, you can take the exact same steps toward optimizing a site that wants to rank highly for the phrase "stapler parts" and one that wants to rank highly for "website design" and  give them both grand results?
copywriter,
This is not a 'magic formula' for writing web pages. This is the application of scientific methodology to search engine optimization. What I'm suggesting is that rather than just using 'natural copywriting', which I believe is great and should be the first step, you also use these formulas.

Let me lay out a scenario. You want to write about a new Diet Fanta Soda. You want to rank well for the phrase - diet fanta. My recommendation would be to follow the steps below:

1. Write a page of great copy that is usable (broken up into smaller paragraphs, uses bullet points, etc.) and readable and convincing.
2. Analyze the top 20 sites ranking for the term 'diet fanta' at Google (I would use Yahoo! & MSN as well).
3. Conduct an analysis of the term weight - using both formulas above - of the phrase 'diet fanta' for each ranking page.
4. Compare the results and make sure that your own page's term weight for 'diet fanta' similiar to the top results (Unlike with KW Density, I usually go a little higher than the competition).
5. Edit your copy to comply with this formula.
6. Start this analysis for related words, etc.

Search engines will always rely on math and equations for ranking. There is no reason why a professional optimization specialist should not be armed with the same knowledge and apply it.
Go to the top of the page
 
+Quote Post
DanThies
post Feb 8 2005, 06:49 PM
Post #13


Keyword Super Freak
Group Icon

Group: Moderator
Posts: 861
Joined: 23-July 03
User's local time:
Feb 9 2010, 12:32 PM
From: Texas, y'all
Member No.: 14



QUOTE(randfish @ Feb 8 2005, 04:29 PM)
copywriter,
This is not a 'magic formula' for writing web pages. This is the application of scientific methodology to search engine optimization. What I'm suggesting is that rather than just using 'natural copywriting', which I believe is great and should be the first step, you also use these formulas.

...

Search engines will always rely on math and equations for ranking. There is no reason why a professional optimization specialist should not be armed with the same knowledge and apply it.
*


(IMG:http://www.highrankings.com/forum/style_emoticons/default/huh.gif) Yes, search engines run on math, but there are a lot of assumptions loaded in there. You aren't really armed with the same knowledge, are you?

For one thing, when you graph out the top 20 pages for density, term weight, or whatever, it's going to be a scattergraph. There's too much happening off the page. Search engines aren't using averages - that's the mistake everyone makes in thinking about keyword density.

I suppose you could add a "-inanchor:diet -inanchor:fanta" to try to eliminate some of the off page factors from the set of pages, but you're still taking a shot in the dark - may as well just use the top ranked page as a template and replace their words with your own.

If you don't know how different types of hits (heading, bold, hyperlink, etc.) are weighted in the algorithm then you don't know what you've really got there anyway. Since this part of the algorithm may very well be query-dependent, you may as well rely on averages as any other guess.

Show me good natural copy that doesn't fall within the range on density, term weight, or whatever vs. other ranked pages.
Go to the top of the page
 
+Quote Post
Jill
post Feb 8 2005, 07:11 PM
Post #14


High Rankings Advisor
Group Icon

Group: Admin
Posts: 29,201
Joined: 21-July 03
User's local time:
Feb 9 2010, 12:32 PM
From: Ashland, MA
Member No.: 2



Aww Randfish...you take all the fun out of things!

No wonder we're all getting so bored with SEO!
Go to the top of the page
 
+Quote Post
randfish
post Feb 8 2005, 07:23 PM
Post #15


Daily SEO Show Anchor
****

Group: Active Members
Posts: 229
Joined: 21-July 04
User's local time:
Feb 9 2010, 01:32 PM
From: Seattle, WA
Member No.: 4,442



Dan,

If I hear you right, you're suggesting that the use of term weight, or any other calculation of a statistical formula for that matter, is not useful unless you know precisely how the SEs are weighting them in the overall algo (and which specific pieces get +/-).

I'd have to disagree. I think that using formulas (even approximated, incomplete ones) like term weight and watching where the data falls, especially if done over a large number of queries, can result in better optimization techniques.

Don't get me wrong, I know that this is a guess. I know that the math being used here is not 100% accurate. I even know that the major SEs probably do not use the two formulas above exactly. But, I also know that this type of analysis can be valuable. It's one more tool in the toolbox and I would recommend taking advantage of it, given my current level of knowledge.

Maybe, Dan, you can tell us why, at your level of knowledge, you don't use these (and possibly even hint at what you do use (IMG:http://www.highrankings.com/forum/style_emoticons/default/smile.gif) ).
Go to the top of the page
 
+Quote Post

2 Pages V   1 2 >   
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



This forum is sponsored by High Rankings, a Boston SEO Agency
- Lo-Fi Version Time is now: 9th February 2010 - 12:32 PM